As banks explore the potential of blockchains, they’ve been quick to surmise that the technology, as it was originally designed, does not provide robust privacy.

When Satoshi Nakamoto invented bitcoin in 2009, he (or she or they) provided a way for multiple participants, who have no reason to trust each other, to work together in maintaining a canonical, tamperproof history of transactions and digital messages. But the design required that all activity be exposed for anyone to see.

“He had to sacrifice confidentiality. And he tried to preserve it by saying you can have multiple different payment addresses,” said Zooko Wilcox, one of the founders of a new anonymous digital currency called Zcash. “But that’s not a very strong defense of confidentiality at all.”

This is unfortunate for financial institutions, which are eager to benefit from the organizational cost-cutting promises of a shared ledger, but are compelled by law and by their own competitive natures to keep the bulk of their activities confidential.

“The privacy requirements for blockchain won’t be any different to current regulations applied to any other technology in financial markets,” said Edward Budd, the chief digital officer of global transaction banking at Deutsche Bank. “Therefore, any potential adoption of distributed ledger technology must ensure that the high industry security standards which are applied to financial services are met.”

Over the past year, software developers and crypto-engineers have devoted an immense amount of time and resources to devising schemes to protect the privacy of people making transactions on blockchains.

Last November, the research arm of R3, a consortium of banks pursuing blockchain-enabled financial applications, released to its members a study making sense of the most promising solutions. The study, which has not previously been made public, provides a breakdown of the level of privacy granted by each approach while examining the trade-offs that inevitably attend them.

The permissioned approach

Banks have funneled their energy (both on their own and as members of consortia like R3) into building permissioned ledgers, a strategy which limits the participants in a blockchain to known entities. This is partly driven by banks’ reluctance to rely on anonymous actors to validate transactions, not to mention anti-money-laundering and know-your-customer regulations, which require them to extensively vet their counterparties. But another benefit of restricting who can participate is that it means restricting who can read the ledger and see the transactions.

The authors of the R3 paper describe the restriction of read access as a “low-tech” option for privacy. Permissioned ledgers also allow for faster processing by getting rid of mining, which in bitcoin and Ethereum is needed to determine the order of transactions, but which is extremely costly in both time and energy.

However, permissioned ledgers alone may not be enough to protect participants from antitrust and insider trading laws which require confidentiality even between different departments in the same financial institution.

“If I’m at Goldman Sachs and I called you up and I’m like, ‘We’re doing a lot of transactions with JPMorgan,’ that probably would land me into some sort of legal trouble. And that’s exactly the sort of information you get if you have visibility within these permissioned networks, even if you don’t know what’s going on inside of them,” said Jared Harwayne-Gidansky, the deputy global head of emerging business and technology at Bank of New York Mellon. “It is an issue, for legal and regulatory reasons and also for just competitive reasons.”

Off-chain approaches

Sidechains, state channels and off-chain messaging are all ways to further sequester data from the main blockchain. They differ in the extent to which the blockchain is retained as a definitive record. In systems like JPMorgan’s Quorum, private messages are relayed off-chain while their cryptographic fingerprints, or hashes, are included in the blockchain as verification that the events occurred.

Hashes are a one-way scrambling function; if all you have is the hash, a random-looking string of numbers, it should be unbearably time-consuming to figure out the original data, but that data will instantly produce the same hash every time. Even the tiniest modification to the data will completely change the hash, as shown in the table below:

Input data Hash using SHA-256 algorithm
The quick brown fox jumps over the lazy dog d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592
The quick fox jumps over the lazy brown dog 109d51daea4988dbbcf10113bd7de272d5df5af1739844f4e3a0fb0f4b4567db
The quick fox jumps over the lazy brown dog. 90894b449198193133b3acd96561d61d677e48fe760071e0277ea70b900bf5c1
No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks. 57fda799521f01c9f1a2c320cd37dc1e2882790ba59729ee7357e5b236736871
No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour or reputation. Everyone has the right to the protection of the law against such interference or attacks. e4998f47c86fb13f4107729ae2a589b857c867f0b8093b562250316c8bef65d5

Sidechains and state channels allow parties to conduct transactions on parallel, privately controlled, chains with assets that are represented, and therefore reconcilable, on a more public chain.

While solving many of the privacy concerns, these off-chain approaches abandon one of the main features of open blockchain technology, which is the resilience that comes from having the same data duplicated on multiple computers.

“Because blockchains are a distributed technology, it means that you don’t have a single point of failure,” said Jack Gavigan, the lead author of the R3 study and another Zcash founder. “For example, if Facebook goes down, you can’t access Facebook. But if a single blockchain node goes down, that doesn’t mean you can’t access the blockchain. It just means that you end up connecting to a different node.”

When transactions occur off the main, universally shared blockchain, they do not benefit from this key feature.

Mixing

The bitcoin blockchain stores the complete transactional history of every coin, who owned it (what address it was assigned to) and when it was spent. Mixing combines coins from multiple users, shuffles them, breaks them into smaller amounts, and then redistributes the money to the intended recipients so as to randomize the transactional history.

Early mixing services were run by third parties that users had to temporarily entrust with their money. Now there are decentralized alternatives, such as CoinJoin, that allow multiple people to jointly sign a single transaction to multiple recipients. The result is more secure than relying on a third party (who might retain a record of the intended premixed transactions — or abscond with the funds). MimbleWimble, a proposal outlined by an anonymous author in a paper last summer, would go even further to mix together all transactions in every new block that is created.

An ideal mixing service merges many random transactions and distributes funds after a delay, meaning that it’s slow and requires coordination.

Furthermore, mixing is not likely to sit well with regulators of the financial industry.

“If you’re saying to your regulator, ‘Oh yeah, we’re achieving privacy and confidentiality for our customers by mixing their transactions with a whole bunch of other transactions,’ then the regulator is probably going to raise their eyebrows a little bit,” Gavigan said. “But for a different use case it might be perfectly valid, where the provenance of the assets isn’t so much of a big deal.”

Ring signatures

Ring signatures were first described as part of the CryptoNote protocol—which is implemented in the digital currency Monero—to hide the sending address of a transaction. Using a ring signature creates a transaction that is attributed to the public keys of multiple senders, only one of which is the real author. It is impossible to discern, simply by looking at a ring signature, which address initiated the transaction and ultimately signed it.

But other observations could increase an attacker’s chance of guessing. For example, looking at previous transactions from each of the addresses might reveal times of day when transactions were more likely to occur or other data that could be used in a transaction graph analysis.

Wilcox, who worked with Gavigan on the report, calls this a strategy of “hiding in a crowd,” and argues that its success depends on how big the crowd is and how random the people in it are. (Monero, it should be noted, has a fierce rivalry with his project, Zcash.)

Monero selects decoy addresses with a method called triangular distribution which favors coins that have been used frequently in recent transactions and are therefore more likely to look authentic than addresses where coins have been sitting idle.

"Every ring signature includes at least one very recent output, either because a user wants to spend a recent output, or because they don’t but we always want a recent output in," said Riccardo Spagni, the lead maintainer of the Monero source code. "There’s no way of knowing, which means that analysis is back to assuming every output in the ring is a candidate."

Pederson commitments

Pedersen commitments are the key ingredient in bitcoin core developer Greg Maxwell’s Confidential Transactions and are a planned addition to Monero.

The technique allows a sender to commit to a transaction amount without revealing it to the general public by broadcasting it on a blockchain as a hash. The user can then reveal the amount to the recipient, or anyone else who might need to know it (such as regulators), by reproducing the hash stored on the blockchain as proof.

A Pederson commitment is also transferable such that the person who receives it can spend it again elsewhere without unmasking the amount. This is because the hashes chosen are homomorphic, meaning that you can run simple arithmetic functions on them, like subtraction and addition, without decrypting the data. This feature enables miners and validating nodes to check that inputs and outputs of transactions zero each other out and that no one is spending coins they don’t own.

Zero-knowledge proofs

Zero-knowledge proofs are the key feature of Zcash. The remarkable feature of a zero-knowledge proof is that you can use it to prove statements about a set of data without having to reveal the content of the data. In the case of Zcash, zero-knowledge proofs are used to cryptographically perform validation on encrypted transaction data such that the sender and amount sent can be proven to be legitimate even as they remain private.

The authors of the R3-commissioned study, two of whom founded and continue to run the company that oversees Zcash development, note that slowness is one downside of zero-knowledge proofs. The computations take about 48 seconds to complete. Until performance improves, zero-knowledge proofs might not be suitable for applications such as high throughput trading, which demand fast results.

There’s one more big problem with zero-knowledge proofs. In order to implement them in a cryptocurrency like Zcash, developers have to brew up some other cryptographic elements, called parameters, and inject them into the system. The process yields a dangerous byproduct, a private key, which can be used to create counterfeit coins. Anyone using Zcash must trust that the parameters were created and the counterfeit-enabling key was dutifully destroyed — and for hardcore decentralists, that was a dealbreaker, despite the elaborate steps taken to mitigate the problem.

The Zcash team used a decentralized parameter generation procedure that was designed to ensure that a full copy of the private key never came into existence. To that end, they created six separate key fragments that were isolated at stations spread around the world. The participants then used their key shards to collaborate in a round robin of computations that resulted in a complete set of parameters but which never required the individual key shards to be shared with the group. At the end of the process, each station destroyed its own part of the key. In order to compromise this process, an attacker would have to get all six key shards and put them together.

Stealth addresses

As originally designed, bitcoin requires people who want to receive the currency to communicate an address to the sender. Stealth addresses reverse this process. The sender can instead create an address and fill it with a transaction. Even though the address is new, the sender is certain that the receiver has a corresponding key that will open it.

Stealth addresses provide a way for both parties to a transaction to agree on a destination address without broadcasting the information to everyone else in the system and without sharing other addresses that might be under the recipient’s control.

Mix and match

Each of these technologies solves a fraction of the puzzle, but we are beginning to see how they can be stitched together to deliver more complete privacy.

“A lot of the implementations take multiple different technologies and combine them together to achieve an additive result. It may well be that a combination of different technologies emerges as being a better solution,” Gavigan said.

At the same time, he said, there is no way to generalize the needs of companies building on blockchain technology, as each one is trying to achieve something completely different.

“In some respects, blockchain technology is a bit like the three blind men describing an elephant,” Gavigan said. “One person feels the leg and it’s a tree trunk. Another person feels the side of the elephant and it’s a wall. And a third person feels the trunk and it’s a snake.”