Living in the information age, everything about everyone is just one click away. Though we may not always readily admit it, we are all victims of Google – often checking our online presence to ensure that we are putting only our most marketable selves out into the world, and keeping the rest for our eyes only.
But do the metaphorical walls we put up work? And what about the outside enterprises that we trust with our most sensitive files?
Consider the following use cases: a local healthcare service provider wants to store medical records and guarantee the privacy of individual patient records, and a security firm wants to maintain a database to store information about a user’s daily whereabouts in his or her home.
While the former may use the stored information to come up with a personalised treatment plan, the latter may use the data to anticipate a user’s heating and lighting preferences.
Both circumstances are examples of statistical databases that are routinely queried by practitioners and third-party analysis firms. The databases ought to be secure enough that a notion of indistinguishability between users is maintained, thereby preserving privacy.
By standard definition, a privacy-preserving statistical database should enable enterprises to learn the properties of a population without compromising the privacy of an individual.
Therefore, if a database is a representative sample of an underlying population, the goal of a privacy-preserving statistical database is to enable the user to learn properties of the population as a whole, while protecting the privacy of the individuals in the sample.
But, specifically, how is data ‘privatised’? Information is stored in databases, so guaranteeing the database’s privacy is paramount.
Various methods are commonly found to be in practice, from simple fixes like removing columns containing PII (personally identifiable information) to more advanced treatment methods. But all are vulnerable to attack in one way or another. The question is not if but when an attack will occur.
However, in the past few years, a new method has emerged that shows promise and is surviving scrutiny: differential privacy.
Unlike earlier methods, differential privacy operates from a solid mathematical foundation, making it possible to provide strong theoretical guarantees on the privacy and utility of released data.
In theoretical cryptography, it is a well-known fact that the presence of auxiliary information (i.e. information available to an adversary other than the access to a statistical database) can enable the eavesdropper to deduce information in the database.
Enter differential privacy, which says that the risk to one’s privacy should not greatly increase as a result of participating in a statistical database. In other words, an adversary should not be able to glean information about any individual in a database that it could not learn had the participant opted out. If one can guarantee this with some confidence, then there is a low risk of an individual’s data being compromised from the database.
Differential privacy is the only approach thus far that guarantees bounds on the amount of privacy that could be provided to users. This is the essential component, as past techniques, such as the routinely utilised anonymisation technique of hashing, proved incapable of coming anywhere close to guaranteeing privacy.
With current methods considered, the open question remains: how private is private enough? Opening their eyes to the importance of differential privacy, Microsoft released a programming language and execution platform, PINQ, which enables privacy-integrated queries and satisfies differential privacy.
Similarly, MapReduce frameworks, such as Airavat, provide strong privacy guarantees, primarily by integrating mandatory access control with differential privacy so that MapReduce computations can adhere to stringent privacy requirements without the need to audit untrusted code.
Such frameworks have huge potential, as to a data curator the idea of guaranteeing privacy to database participants without additional tweaking of source code is priceless. It would not seem surprising to see a plethora of start-ups in Silicon Valley jumping on the bandwagon.
Despite showing massive potential, differential privacy is currently used sparingly in practice and remains largely confined to theoretical settings. Though it holds significant potential, the concept is still in its infancy, and there are many factors that remain to be fleshed out.