Profiling and Group Profiling: A New Way of Creating Knowledge and Predicting

Chapter 3. Information in the Age of Big Data

B. Profiling and Group Profiling: A New Way of Creating Knowledge and Predicting

I. Defining Profiling

We will now study a few of the distinctions in profiling, to highlight the opposition between the types of profiling that came before, and the ways profiling has changed and is now challenging traditional understandings of data. The GDPR defines profiling as ‘’any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person’’398_{A more thorough} definition was outlined in “Defining Profiling”399_{along these lines: “Profiling is a technique} to automatically process personal and non-personal data, aimed at developing predictive knowledge from the data in the form of constructing profiles that can subsequently be applied as a basis for decision-making. A profile is a set of correlated data that represents a (human or non-human, individual or group) subject. Constructing profiles is the process of discovering unexpected patterns between data in large data sets that can be used to create profiles. Applying profiles is the process of identifying and representing a specific subject or to identify a subject as a member of a specific group or category and taking some form of decision based on this identification and representation.”400_{This definition} gives us further elements which we will develop to assess the impact of profiling on informational privacy.

Organic profiling is profiling as performed by non-human entities, which is developed as a survival trait. An animal understanding that someone is predator or prey depending on

398_{Ferraris, V. and Bosco, D. and Cafiero, G. and D'Angelo, E. and Suloyeva, Y., Defining}

Profiling (December 11, 2013). Available at SSRN: http://ssrn.com/abstract=2366564 orhttp://dx.doi.org/10.2139/ssrn.2366564

399_Ibid 400_Ibid

107

the input of information from its senses is organic profiling. Human profiling, by opposition, is different because it is reflective: we can understand the conclusion we draw, and study and perfect them401_{. Finally, machine learning is done based not on} mechanics decided by survival evolution, but by the input of a man-made software architecture402_{. As we can already see, a vital difference between human and machine} profiling lies in how decisions are made, and in their accountability - a human may understand and rectify their biases, where a computer or animal cannot.

Another distinction regards this factor: the reasoning involved in the decision. Animals and machines perform automated profiling - the aggregation and processing of data where no decisions are made based on outside reflection403_{. Meanwhile, autonomic} profiling is a process where there is human intervention and reflection, but at a low level - the machine makes all of the decisions, with a minimized human role. Finally, non- automatic profiling is where machines are not involved in the decision-making process404_. As we are focusing on Big Data and its implications, we are focusing this study on autonomic profiling.

Finally, an important distinction when it comes to profiling is the difference between group profiling and personalised profiling. As mentioned before, profiling creates a correlation between pieces of data. Once the data is mined and the correlations established, a set of assumptions is created. These assumptions however can be made about two types of entities: an individual, or a group. For example, if one finds data correlating incomes with shopping patterns, one can either create assumptions such as “Individuals with incomes of X or higher buy more of product Y”, or “Particular individual A with income of X buys a lot of product Y”. Both profiles - the group one and the individual one - can be built on the same data, but have different implications.

To understand the implications of group profiling, a distinction needs to be made between distributive and non-distributive profiles. A distributive profiles identifies a group where all members share all of the attributes correlated in the profile. These profiles have certain implications because all members of the group will be assumed to follow certain characteristics, while this is actually rare405_{. For example, one could assume that all} individuals in an extremely poor neighborhood would be poor and apply a group profile

401_Ibid

402_{Van der Hof, S., & Prins, C. (2008). Personalisation and its influence on identities, behaviour} and social values. In Profiling the European Citizen (pp. 111-127). Springer, Dordrecht.

403_Ibid 404_Ibid 405_Ibid

108

to them, using it for further constructions later on such as targeted advertising. In fact, most profiles are non-distributive, which means they are probabilistic: every member of the group is likely, not assured, to share certain characteristics406_.

This phenomenon is at the source of many instances of data inaccuracy: on the one hand, most profiles are non-distributive, which means blanket assumptions made on the individuals will be wrong at least some of the time. On the other hand, data controllers will want to use them, even when there is a chance their profiles will not apply some of the time. When the probabilities are wrong, or even the assumptions are wrong, the consequences on informational privacy can be significant.

II. Profiling in the Law

Profiling was not addressed directly in the Data Protection Directive, mainly because the Directive was created in a context where profiling and its implications in conjunction with Big Data were not a major concern. However, the GDPR now has in its Article 22 provisions specifically addressing it407_{. Under this Article, data subjects have a right not} necessarily to avoid the exercise of profiling, but instead to avoid being “subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.”408_{Recital 58} provides as examples the “automatic refusal of an on-line credit application or e- recruiting practices without any human intervention.”409

In the GDPR, profiling is defined in the context of data protection as “any form of automated processing of personal data, intended to analyze or predict the personality or certain personal aspects relating to a natural person, in particular the analysis and prediction of the person’s health, economic situation, performance at work, personal preferences or interests, reliability or behaviour, location or movements.”

This Article has some important implications. A concern about the automation of decision-making falls under the field of discrimination, as unthinking machines - machines using the automated profiling explained above - use no reasoning to come to their conclusions, which means if they are not programmed to avoid discrimination, there

406_Ibid 407_{GDPR, Article 22} 408_{GDPR, Article 22(1)} 409_{GDPR, Recital 58} 109

can be some breaches of the right to non-discrimination (protected amongst others by Article 14 of the European Convention on Human Rights410_{). Specifically, GDPR Recital} 71 states that data controllers need to “implement appropriate technical and organizational measures” that “prevents, inter alia, discriminatory effects” on the basis of processing sensitive data (“sensitive data” being data considered especially personal such as race or religion)411_.

This focus on “sensitive data” in both Article 22 and Recital 71 are relevant. They can either be interpreted as applying these extra protections only to sensitive data. This would involve carefully piecing together the data points which make up the data processing and finding whether some of them are sensitive412_{. However, a second} interpretation is that any data processing operation containing some sensitive data needs to have this level of protection. In either case, it involves pinpointing the sensitive data being examined, which can be difficult - the correlations created by data become more and more complex and difficult to identify the bigger the dataset becomes, and the more likely it is that both some sensitive data appears, and that sensitive data becomes harder and harder to assess. Overall, paradoxically, the bigger the data, the harder it is to find specific information within it.

The ability to create profiles about individuals is a new tool in creating Information. All of this new Information has been considered to be increasing and improving human knowledge, but that is not necessarily the case413_{. In fact, what makes data different from} Information is the human element, which leads to the potential for human error414_.

In document The information / guarantees balance - protecting informational privacy interests within the European data protection framework. (Page 107-110)