Sunday, 20 December 2009

Can Anonymisation Still Work?

The concept of anonymising data is a simple one to grasp. For any number of different reasons real data is taken and a process applied to it, removing or obscuring any part of it deemed too sensitive for release, creating anonymised data. Most commonly this is used to create test data so that development teams can work with data similar to their live environments, but without the security constraints applied to live systems. Sometimes anonymised data is released, either to academic groups or to the public at large. As Paul Ohm has pointed out in an article on Social Science Research Network and discussion in his blog, there are major complexity problems with anonymising data from the internet.

Data can be combined in unanticipated ways to generate new information. If anonymised data about journeys were released about travel (for example from Google Latitude) then a fairly simple algorithm could be created to work out when houses will be empty. When given suitable incentives, additional data can be sourced even if it isn't currently available, as the DARPA Red Balloon challenge showed.

All of this avoids the fact that some companies own huge quantities of non-anonymised data. This is a problem on two fronts. Firstly, as an excellent article on the The Register points out, personal data is a valuable resource which companies will do the best that they can to monetise. Secondly, the storage of personal data is a problem as eventually data is released to the wider world, either by accident or by intentional sharing. Anonymisation's days may be numbered.

0 comments:

Post a Comment