The status of pseudonymised data - The Limits of Anonymisation

Chapter 4. Informational Privacy in EU Law: Challenges in Data Protection and Privacy

A. Challenges to EU Data Protection Law

I. The Limits of Anonymisation

2. The status of pseudonymised data

The limitations of anonymization are problematic because they dispel the conception that personal data and data value can co-exist without conflict. The goal of anonymisation was to create sets of data which could be left to be used and mined free of protective

433_{Barbaro, M. & Zeller, T. (2006), A Face Is Exposed for AOL Searcher No. 4417749, N.Y.}

TIMES, Aug. 9, 2006, at Al.

434_{Narayanan, A. & Shmatikov, V. (2008), Robust De-Anonymization of Large Sparse Datasets}

(2008 IEEE Symp. on Sec. and Privacy 111, Feb. 5, 2008), available at http://www.

cs.utexas.edu/-shmat/shmatoak08netflix.pdf.

435_Ibid

436_{Sweeney, L. (2000), Simple Demographics Often Identify People Uniquely, Carnegie Mellon}

Univ., Sch. of Computer Sci., Data Privacy Lab., Working Paper No. 3, 2000.

119

restrictions and limitations437_{. This idea of “release-and-forget anonymization”, data} which can be anonymized then released into the wild with no further oversight, cannot have been fully effective as data can often be re-identified, and there is no way to know with certainty whether re-identification is possible438_.

If anonymisation fails, then data sets used in certain sectors, including healthcare, cannot be used to create useful information, as the sensitivity of the data is too important to warrant the risk of re-identification. As long as data created by individuals is involved, identification will be a risk - the only real way to be perfectly sure the data will be protected is by not collecting it at all. This means that either an alternative to anonymisation needs to be found - another process which can protect data while allowing the data to be used for creating value - or a balance needs to be struck between the right to data protection and the responsibilities imposed on data controllers.

An alternative has been proposed, taking into account the fact that the main factor in re- identification is the combination of “anonymized” data with other data439_{. This alternative,} “pseudonymised data”, has been primarily proposed in fields where studying data is necessary, but where the data is also very sensitive - including, in particular, healthcare440_{. Pseudonymised data is data which, on its own, is anonymised (by} removing personal identifiers) but which may be re-identified if it is linked to other pieces of data, kept separately. It is appropriate that this concept be prominent in the medical field, as it is akin to isolating a patient in a germ-free room: the data is safe and protected as long as it is not “infected” by “foreign agents” - other pieces of data441_{. It is defined by} the GDPR’s Recital 26 in these terms: “Personal data which has undergone pseudonymisation, which could be attributed to a natural person by the use of additional information, should be considered as information on an identifiable natural person.”442

The ICO’s “Code of Practice on Anonymisation”443_{makes a distinction between data} aggregation exercises which result in non-individualized data (and as such anonymised under the DPD’s definition) and processes which remove certain identifiers from person-

437_{Ohm (n.429)}

438_{Stalla-Bourdillon, S., and Knight, A. (2016), Anonymous data v. Personal data–A false}

debate: An EU perspective on anonymisation, pseudonymisation and personal data. Wis. Int’l

LJ. APA

439_{Ohm (n.429)} 440_Ibid

441_{Tsakalakis, N., Stalla-Bourdillon, S., & O'hara, K. (2016). What's in a name: the conflicting} views of pseudonymisation under eIDAS and the General Data Protection Regulation.

442_{Recital 26, GDPR}

443_{Information Commissioner’s Office, Code of Practice on Anonymisation: Managing Data}

Protection Risk, (2012).

120

specific data but leave individual-level information (carrying higher risks)444_{. The later} includes pseudonymised data, which is defined as “distinguishing individuals in a dataset by using a unique identifier which does not reveal their ‘real world’ identity.”445_{What the} ICO envisions as a means to turn this pseudonymised data into anonymised data is not specified.446

The General Data Protection Regulation gives some leeway to pseudonymised data, in order to incentivize the use of the practice. Nevertheless, as established, it is not enough to exempt the data processing operation from the GDPR: pseudonymised data is still often personal data447_.

This shows that the European regulators are aware of the evolution of the Data Protection landscape, and that the binary approach of the Data Protection Directive is gaining some shades of grey, emphasising the “means likely to be used” as a prominent criteria in what constitutes enough protection or not.

The limitation of pseudonymised data is that it only protects data as long as it stays isolated from “contamination” by other data. As such, pseudonymisation provides limited usefulness, especially in the wider commercial online context, where data is frequently sold and combined with other pieces of data without the inherent protective apparatus found in fields like healthcare.

In conclusion, anonymisation has not been successful in creating “privacy-free” data. Data can never be truly free of the possibility of endangering informational privacy, no matter what techniques are used, because unexpected Information can always result from it. Pseudonymisation actually only proves that further with its protection which, while promising, is threatened by contact with external data. As long as complete non- identification is the requirement, anonymisation will fail to achieve it.

3. The Risk-Based Approach to Identification:

In document The information / guarantees balance - protecting informational privacy interests within the European data protection framework. (Page 119-121)