bfdb_tree_colorResearchers have been studying searchable encryption for years, looking for the breakthrough that would make it practical enough to be widely adopted. It is well understood that the ability to search encrypted data without decrypting it and without the keys being present would be revolutionary for data security. Unfortunately, previous methods simply haven't been fast enough to be practical and/or leaked too much information about the encrypted data, allowing attackers to learn things they shouldn't.

Fully Homomorphic Encryption

Craig Gentry published a paper in 2009 about something called fully homomorphic encryption, allowing arbitrary operations to be performed on encrypted data without decrypting it. Unfortunately, this technique was orders of magnitude too slow to be practical. Since then speed improvements have been announced every couple of years and some companies are even claiming a few specialized use cases it could be fast enough for. We see a bright future for fully homomorphic encryption in secure computation, but don't believe it will ever be a practical answer for searchable encryption. Search performance will always be dependent on efficient indexing.

Deterministic Encryption

Another method of searchable encryption that is really simple is called deterministic encryption. Simply take a key/value pair and encrypt the key field using something like AES-256 in electronic code book (ECB) mode. Then store in a map like a NoSQL database. To query, simply encrypt the key the same way again and look up the associated value. Although this method is much more secure than leaving your NoSQL data in plaintext (AES-256 can't currently be broken), researchers have shown that with a-priori knowledge of the data distribution they can infer significant information about the encrypted data. Thus, the mantra in cryptography is that deterministic encryption is not secure. Many firms have policies against using AES-256 in ECB mode for this very reason, even though it is computationally intractable to break. In other words, you may not use deterministic encryption in the key field in a NoSQL database because an attacker could steal your encrypted data and run a statistical analysis and infer information from your encrypted data. Instead, you must leave it in plaintext. This is quite amazing and illustrates a substantial disconnect between academia and industry. "Secure" is simply not binary.

Leaking Ordering Information

There is another important problem with deterministic encryption, as simple key/value lookups have limited usefulness. How about range query or even temporal or spatial queries? Using deterministic encryption for indexing and searching sorted/ordered data is even less secure that the key/value case because it leaks ordering data. This makes it trivial to infer the range of a given value. At least one company we've seen is marketing searchable encryption that is equivalent to a simple substitution cipher. These approaches are really insecure, but much better than nothing.

Theoretical Limits

Deterministic encryption is not considered secure because the same value is always encrypted to the same ciphertext every time. This makes it vulnerable to statistical analysis. To be really clear, none of this means that a cipher like AES-256 can be cracked! We are only talking about information leakage. The ultimate answer to the problem of information leakage is adding randomness to the encryption so that the encryption is different every time. Several AES-256 modes allow the use of a random initialization vector for this purpose. If we do that, how can we possibly store encrypted data and then create a query that can find that encrypted data if we don't know (and can't know) the randomness that was added? It feels like there are hard theoretical limits or trade-offs between search and security in terms of information leakage. Cryptographers use concepts such as semantic security and ciphertext indistinguishability under chosen-plaintext attack (IND-CPA) to understand encryption systems with respect to these limits or trade-offs.

Conclusion

So this is the state of the art? Fully homomorphic encryption that is likely always going to be impractical for search? Deterministic encryption which is not secure (but much better than nothing)? Oblivous RAMs and other techniques that aren't very fast? Is there an approach that is fast enough, secure enough, and useful enough to be widely adopted, dramatically improving information security? We believe we have that approach...