Chapter 4. Informational Privacy in EU Law: Challenges in Data Protection and Privacy
A. Challenges to EU Data Protection Law
I. The Limits of Anonymisation
2. The status of pseudonymised data
The limitations of anonymization are problematic because they dispel the conception that personal data and data value can co-exist without conflict. The goal of anonymisation was to create sets of data which could be left to be used and mined free of protective
433 Barbaro, M. & Zeller, T. (2006), A Face Is Exposed for AOL Searcher No. 4417749, N.Y.
TIMES, Aug. 9, 2006, at Al.
434 Narayanan, A. & Shmatikov, V. (2008), Robust De-Anonymization of Large Sparse Datasets
(2008 IEEE Symp. on Sec. and Privacy 111, Feb. 5, 2008), available at http://www.
cs.utexas.edu/-shmat/shmatoak08netflix.pdf.
435 Ibid
436 Sweeney, L. (2000), Simple Demographics Often Identify People Uniquely, Carnegie Mellon
Univ., Sch. of Computer Sci., Data Privacy Lab., Working Paper No. 3, 2000.
119
restrictions and limitations437. This idea of “release-and-forget anonymization”, data which can be anonymized then released into the wild with no further oversight, cannot have been fully effective as data can often be re-identified, and there is no way to know with certainty whether re-identification is possible438.
If anonymisation fails, then data sets used in certain sectors, including healthcare, cannot be used to create useful information, as the sensitivity of the data is too important to warrant the risk of re-identification. As long as data created by individuals is involved, identification will be a risk - the only real way to be perfectly sure the data will be protected is by not collecting it at all. This means that either an alternative to anonymisation needs to be found - another process which can protect data while allowing the data to be used for creating value - or a balance needs to be struck between the right to data protection and the responsibilities imposed on data controllers.
An alternative has been proposed, taking into account the fact that the main factor in re- identification is the combination of “anonymized” data with other data439. This alternative, “pseudonymised data”, has been primarily proposed in fields where studying data is necessary, but where the data is also very sensitive - including, in particular, healthcare440. Pseudonymised data is data which, on its own, is anonymised (by removing personal identifiers) but which may be re-identified if it is linked to other pieces of data, kept separately. It is appropriate that this concept be prominent in the medical field, as it is akin to isolating a patient in a germ-free room: the data is safe and protected as long as it is not “infected” by “foreign agents” - other pieces of data441. It is defined by the GDPR’s Recital 26 in these terms: “Personal data which has undergone pseudonymisation, which could be attributed to a natural person by the use of additional information, should be considered as information on an identifiable natural person.”442
The ICO’s “Code of Practice on Anonymisation”443 makes a distinction between data aggregation exercises which result in non-individualized data (and as such anonymised under the DPD’s definition) and processes which remove certain identifiers from person-
437 Ohm (n.429)
438 Stalla-Bourdillon, S., and Knight, A. (2016), Anonymous data v. Personal data–A false
debate: An EU perspective on anonymisation, pseudonymisation and personal data. Wis. Int’l
LJ. APA
439 Ohm (n.429) 440 Ibid
441Tsakalakis, N., Stalla-Bourdillon, S., & O'hara, K. (2016). What's in a name: the conflicting views of pseudonymisation under eIDAS and the General Data Protection Regulation.
442 Recital 26, GDPR
443 Information Commissioner’s Office, Code of Practice on Anonymisation: Managing Data
Protection Risk, (2012).
120
specific data but leave individual-level information (carrying higher risks)444. The later includes pseudonymised data, which is defined as “distinguishing individuals in a dataset by using a unique identifier which does not reveal their ‘real world’ identity.”445 What the ICO envisions as a means to turn this pseudonymised data into anonymised data is not specified.446
The General Data Protection Regulation gives some leeway to pseudonymised data, in order to incentivize the use of the practice. Nevertheless, as established, it is not enough to exempt the data processing operation from the GDPR: pseudonymised data is still often personal data447.
This shows that the European regulators are aware of the evolution of the Data Protection landscape, and that the binary approach of the Data Protection Directive is gaining some shades of grey, emphasising the “means likely to be used” as a prominent criteria in what constitutes enough protection or not.
The limitation of pseudonymised data is that it only protects data as long as it stays isolated from “contamination” by other data. As such, pseudonymisation provides limited usefulness, especially in the wider commercial online context, where data is frequently sold and combined with other pieces of data without the inherent protective apparatus found in fields like healthcare.
In conclusion, anonymisation has not been successful in creating “privacy-free” data. Data can never be truly free of the possibility of endangering informational privacy, no matter what techniques are used, because unexpected Information can always result from it. Pseudonymisation actually only proves that further with its protection which, while promising, is threatened by contact with external data. As long as complete non- identification is the requirement, anonymisation will fail to achieve it.