Chapter 3. Information in the Age of Big Data
B. Profiling and Group Profiling: A New Way of Creating Knowledge and Predicting
I. Defining Profiling
We will now study a few of the distinctions in profiling, to highlight the opposition between the types of profiling that came before, and the ways profiling has changed and is now challenging traditional understandings of data. The GDPR defines profiling as ‘’any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person’’398 A more thorough definition was outlined in “Defining Profiling”399 along these lines: “Profiling is a technique to automatically process personal and non-personal data, aimed at developing predictive knowledge from the data in the form of constructing profiles that can subsequently be applied as a basis for decision-making. A profile is a set of correlated data that represents a (human or non-human, individual or group) subject. Constructing profiles is the process of discovering unexpected patterns between data in large data sets that can be used to create profiles. Applying profiles is the process of identifying and representing a specific subject or to identify a subject as a member of a specific group or category and taking some form of decision based on this identification and representation.”400 This definition gives us further elements which we will develop to assess the impact of profiling on informational privacy.
Organic profiling is profiling as performed by non-human entities, which is developed as a survival trait. An animal understanding that someone is predator or prey depending on
398 Ferraris, V. and Bosco, D. and Cafiero, G. and D'Angelo, E. and Suloyeva, Y., Defining
Profiling (December 11, 2013). Available at SSRN: http://ssrn.com/abstract=2366564 orhttp://dx.doi.org/10.2139/ssrn.2366564
399 Ibid 400 Ibid
107
the input of information from its senses is organic profiling. Human profiling, by opposition, is different because it is reflective: we can understand the conclusion we draw, and study and perfect them401. Finally, machine learning is done based not on mechanics decided by survival evolution, but by the input of a man-made software architecture402. As we can already see, a vital difference between human and machine profiling lies in how decisions are made, and in their accountability - a human may understand and rectify their biases, where a computer or animal cannot.
Another distinction regards this factor: the reasoning involved in the decision. Animals and machines perform automated profiling - the aggregation and processing of data where no decisions are made based on outside reflection403. Meanwhile, autonomic profiling is a process where there is human intervention and reflection, but at a low level - the machine makes all of the decisions, with a minimized human role. Finally, non- automatic profiling is where machines are not involved in the decision-making process404. As we are focusing on Big Data and its implications, we are focusing this study on autonomic profiling.
Finally, an important distinction when it comes to profiling is the difference between group profiling and personalised profiling. As mentioned before, profiling creates a correlation between pieces of data. Once the data is mined and the correlations established, a set of assumptions is created. These assumptions however can be made about two types of entities: an individual, or a group. For example, if one finds data correlating incomes with shopping patterns, one can either create assumptions such as “Individuals with incomes of X or higher buy more of product Y”, or “Particular individual A with income of X buys a lot of product Y”. Both profiles - the group one and the individual one - can be built on the same data, but have different implications.
To understand the implications of group profiling, a distinction needs to be made between distributive and non-distributive profiles. A distributive profiles identifies a group where all members share all of the attributes correlated in the profile. These profiles have certain implications because all members of the group will be assumed to follow certain characteristics, while this is actually rare405. For example, one could assume that all individuals in an extremely poor neighborhood would be poor and apply a group profile
401 Ibid
402Van der Hof, S., & Prins, C. (2008). Personalisation and its influence on identities, behaviour and social values. In Profiling the European Citizen (pp. 111-127). Springer, Dordrecht.
403 Ibid 404 Ibid 405 Ibid
108
to them, using it for further constructions later on such as targeted advertising. In fact, most profiles are non-distributive, which means they are probabilistic: every member of the group is likely, not assured, to share certain characteristics406.
This phenomenon is at the source of many instances of data inaccuracy: on the one hand, most profiles are non-distributive, which means blanket assumptions made on the individuals will be wrong at least some of the time. On the other hand, data controllers will want to use them, even when there is a chance their profiles will not apply some of the time. When the probabilities are wrong, or even the assumptions are wrong, the consequences on informational privacy can be significant.
II. Profiling in the Law
Profiling was not addressed directly in the Data Protection Directive, mainly because the Directive was created in a context where profiling and its implications in conjunction with Big Data were not a major concern. However, the GDPR now has in its Article 22 provisions specifically addressing it407. Under this Article, data subjects have a right not necessarily to avoid the exercise of profiling, but instead to avoid being “subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.”408 Recital 58 provides as examples the “automatic refusal of an on-line credit application or e- recruiting practices without any human intervention.”409
In the GDPR, profiling is defined in the context of data protection as “any form of automated processing of personal data, intended to analyze or predict the personality or certain personal aspects relating to a natural person, in particular the analysis and prediction of the person’s health, economic situation, performance at work, personal preferences or interests, reliability or behaviour, location or movements.”
This Article has some important implications. A concern about the automation of decision-making falls under the field of discrimination, as unthinking machines - machines using the automated profiling explained above - use no reasoning to come to their conclusions, which means if they are not programmed to avoid discrimination, there
406 Ibid 407 GDPR, Article 22 408 GDPR, Article 22(1) 409 GDPR, Recital 58 109
can be some breaches of the right to non-discrimination (protected amongst others by Article 14 of the European Convention on Human Rights410). Specifically, GDPR Recital 71 states that data controllers need to “implement appropriate technical and organizational measures” that “prevents, inter alia, discriminatory effects” on the basis of processing sensitive data (“sensitive data” being data considered especially personal such as race or religion)411.
This focus on “sensitive data” in both Article 22 and Recital 71 are relevant. They can either be interpreted as applying these extra protections only to sensitive data. This would involve carefully piecing together the data points which make up the data processing and finding whether some of them are sensitive412. However, a second interpretation is that any data processing operation containing some sensitive data needs to have this level of protection. In either case, it involves pinpointing the sensitive data being examined, which can be difficult - the correlations created by data become more and more complex and difficult to identify the bigger the dataset becomes, and the more likely it is that both some sensitive data appears, and that sensitive data becomes harder and harder to assess. Overall, paradoxically, the bigger the data, the harder it is to find specific information within it.
The ability to create profiles about individuals is a new tool in creating Information. All of this new Information has been considered to be increasing and improving human knowledge, but that is not necessarily the case413. In fact, what makes data different from Information is the human element, which leads to the potential for human error414.