5.3 Creating the Meta-Model
5.3.1 Visual Representation
Unified Modelling Language (UML) diagramming is a universal visual language that is used to capture and represent concepts and the relationships between them (Rumbaugh et al. 2004). Thus, in the explanations that follow UML diagrams have been used to better illustrate each phase and provide readers with an easy point of reference and a better overview of how the elements relate within a phase (Fowler 2004).
5.3.2 Validation of Meta-Model
For validation the model will be aligned with Nissenbaum’s validation approach as ex- plained above combined with aligning the model to the strategic themes created in Chapter
4, Section 4.5, Figure 4.5. This will involve providing worked examples for each concept as the model is conceptualised to illustrate how this might be applied by a public body. This will take the form of applying the concepts to a theoretical public body, ’PB’, wishing to assess the privacy risk of publishing a dataset in open format and relating this to the strategic themes and codes derived from the analysis of the guidance documentation conducted in Chapter 4 (see Figure 4.5). Then, to demonstrate how the meta-model might be applied in practice, a complete worked example will be provided in Section 5.5, applying the meta-model to a hypothetical PB, a library.
5.3.3 CI Guiding Principles
Within the explanation of how the framework works, two guiding sets of principles have been identified as being core to applying CI in practice:
1. The three key elements: explanation, evaluation and prescription (page 190); 2. The nine decision heuristics (page 182).
Thus, these principles have been used as the basis for creating a working meta-model of CI, dealing with each principle in turn.
5.3.4 Key Elements
The meta-model is intended as a practical, useful tool that will guide practitioners through CI and the privacy assessment process, culminating in a decision, to publish or not to publish.
In her book, Nissenbaum explains that the framework has three key elements; expla- nation, evaluation and prescription (p. 190). These elements provided a logical group of overarching categories that would frame the meta-model into understandable, logical progression steps. To this end, it is necessary to be clear about what information is being gathered (explanation) and assessed (evaluated) in order to make a decision (prescription) so that concrete actions and processes can be provided for the practitioners to follow.
In the meta-model these phases have been renamed; Explanation, Risk Assessment and Decision (see Figure 5.1) to better align with terminology that practitioners will relate to and to aid the flow of the staged approach that the meta-model will be asking practitioners to follow. The reasoning behind this change in terminology is discussed in more detail in Sections 5.4.2 and 5.4.3.
5.3.5 Decision Heuristics
Beneath the key elements (Phases), Nissenbaum proposes 9 decision heuristics (DH) that should be considered in relation to both existing and proposed new information flows. This
Figure 5.1: Meta-Model - Phases
will help establish whether privacy is likely to be, or has been, breached by a proposed new flow of information. These decision heuristics have been used to expand on the detail of information to be captured within each phase. The nine decision heuristics are described by Nissenbaum as follows:
1. ”Describe the proposed new practice in terms of information flows. 2. Identify the existing context.
3. Identify the actors, i.e. the information subjects, senders, and recipients. 4. Identify the transmission principles.
5. Describe existing entrenched informational norms and any significant points of departure.
6. Prima facie assessment.
7. Evaluation I: Establish whether there are any political or moral factors that could support/discourage publication e.g. what might be the potential effects or implications for justice, power structures, democracy etc.
8. Evaluation II: Establish how the system or practices could directly impinge on any values, goals, and ends of the context.
9. On the basis of these findings, contextual integrity recommends for or against the proposed new practices” (Nissenbaum 2010).
5.4
Meta-model
A public body will have a number of datasets that they need to assess for privacy risks. To do so they need to consider each dataset separately to establish what risks might be
associated with publishing that dataset in open format. The meta-model asks that these risks are considered in stages (phases) to ensure that all risk are captured in light of the prevailing and future contexts. The first phase therefore, captures what data is included in the dataset, who have been involved in handling the data and what the prevailing context of the data collection and processing is. This is captured in phase one, Explanation.
5.4.1 Phase 1 - Explanation
The explanation element refers to the practice or system to be assessed. These should be assessed in view of any ”context-relative informational norms” that may be breached. This should include an assessment of the key ”actors”, i.e. the people that are/could be affected and their ”roles”, as ”data subjects; data senders or data recipients” It should also consider the ”attributes”, i.e. the information itself (the data) and how this information is transmitted (”transmission principles” ) and whether any changes to these elements potentially violate the existing or proposed new information flow.
Figure 5.2: Explanation - Meta-Model
Explanation is the governing context used to determine whether any fundamental roles have or will be affected by changes in transmission principles. Thus, the explanation element forms the first phase of the meta-model. This phase involves establishing details about the dataset itself, the actors that handle the data or that the data is about, and the context in which the data has been collected, shared and transmitted. In this phase, information about existing informational norms is also collected.
The relationship between the explanation and the composite classes is one to many as there are multiple actors, contexts and transmission principles. However, the open dataset under review will have a one to one relationship with the explanation as there will only be one dataset under review for each explanation (see Figure 5.2).
To incorporate the decision heuristics (DH), it was determined that the first four DH’s relate to gathering a more detailed overview of the data; the people (actors); the exist- ing informational norms and transmission principles. The first decision heuristic (DH1),
concerns the data itself and how it is proposed the data is to be transmitted. The second asks us to consider the existing context of the situation and environment surrounding the data and the people involved (DH2). The third, concerns the people involved with the data (DH3); and the fourth, seeks to establish how the data is currently transmitted (DH4). Thus, these four DH’s were used to depict the explanation elements and how they relate. At a more detailed level, explanation therefore requires that details are collected about the data itself, the people involved and their roles, how the data is transmitted and the context. Thus, the explanation is the superclass with each of the elements below depicted as subclasses (see Figure 5.3).
Figure 5.3: Explanation - Class Relationships
Figure 5.3 shows the relationship between the subclasses. These relationships can be explained as follows:
Open Dataset Each dataset needs to be considered separately to avoid overcomplicating the decision-making process and ensure all elements are thoroughly considered and CI is maintained throughout the assessment of each dataset. If we look at relations between the dataset and the other subclasses, there will always only be one dataset. Thus, the dataset will always be one in relation to each of the other subclasses. The attributes within the open dataset have been grouped by attribute type as follows: Personal Identifiers (PI) i.e. directly identifying personal information;
Quasi Identifiers (QI) i.e. attributes that could, if linked with other QI’s create a PI and thus, allow re-identification to occur (Henriksen-Bulmer and Jeary 2016);
Sensitive Attributes (SA) i.e. attributes that contain person specific but non-identifying information, such as disease or salary;
Non-sensitive Attributes (NA) i.e. attributes that have no directly or indirectly PI information (see Section 2.4.3 for more detailed descriptions).
Actors Each actor will act in one or more capacities. At data level, the actor will perform a data transmission role, sender, receiver or subject of the data being transmitted. It is also possible that the sender or the data subject may also download the data and thus, also become the receiver. Beyond the data transmission role however, the actor will also perform multiple relationship and/or work roles. For example, some actors will collate the data and may also know one or more of the data subjects. These actors may be one and the same or they may be different. Therefore, to allow for these nuances to be taken into account, the roles have been separated out as a class of its own.
Roles Any actor will act in the capacity of one or more roles in relation to the data and within each role the actor will have had different inputs. Therefore, each actor will be associated with multiple roles. The role may be context based such as how the actor is related to or interacts with another role or, it may be function based and defined by the work roles or duties. Thus, for this subclass, the relationship between the actor and each of his/her roles will be many to many.
Relation This group will contain details on what relationship the actors have to other actors, depicting both personal and professional relationships. For example, profes- sional, family etc.
Interaction These attributes will contain information about how the actor interacts with the other actors such as citizen to professional or friend to citizen;
Data Handling this will capture information about what input or output the actor has in handling the data. They may have handled or processed the data or they may be the data controller who makes decisions around how the data is transmitted; Work This refers to the occupation of the actor;
Data Originating This group seeks to capture information about the role of the data originator, it is possible the data has come from a third party and thus, the role of that third party needs to be considered as well.
Transmission Principles The transmission principles govern how information flows be- tween actors. There are two sets of transmission principles, the existing and the proposed new flow of information, these are depicted as the data flow. In addition the data type and format may influence how the data can be transmitted and therefore, these have been added as considerations in this category as well. Whilst there
will always be at least one transmission principle applicable when considering the subclasses of open dataset and context, for the actors subclass there may be no transmission principle applicable as not all actors will be involved in transmitting the data. Thus, the relationship between the transmission principles to actors will be zero or more to many.
Context There are many contexts that require consideration in relation to the other subclasses. Therefore, the relationship between this class and the other classes will be multiple to multiple. In the explanation phase, what needs to be captured for the full context to be considered in phase two, risk assessment, are, what Nissenbaum refers to as the; ”prevailing context” (page 182). Therefore, in this phase, it is necessary to collect information about the context of how and why the data was captured in the first place. These contexts are:
Purpose This seeks to capture the original purpose of why the data was collected; Social Capturing the social context in which the data was collected. For example, the data might have been collected from a school and thus, in an educational context;
Consent Where a dataset contains personal data or potentially sensitive data, the issue of consent will also need to be considered. Consent traditionally is considered from a legal perspective, discussing whether or not it is valid. However, consent also needs to be considered in relation to the data. Thus, in the meta-model at the explanation phase, what needs to be established is whether consent has been given and, if so, to what extent?
Validation - Explanation
In terms of the hypothetical PB and the strategic themes, the explanation phase is where they will record the dataset objectives of; ’what’, ’who’ and ’where’ and part of the ’how’ from the actions (see Figure 4.5) to include:
1. What - here the PB will outline details about what attributes types are in the dataset to be assessed and what information each of these contain (see Section 2.4.3); 2. Who - this will involve identifying the people within PB that are or have been involved
in handling the data (the actors) and what role(s) they played in interacting with the data (data controller, processor and or subject, see Appendix 6.2, Section 6.3.2) For instance, was the data collected by a data controller from an internal department within the PB or by a data controller from a partner organisation?;
3. Where - here information about where the data was originally collected will be recorded (see Appendix 6.2, Section 6.3.2);
4. How - this involves recording how the data currently flows between stakeholders and what is the proposed new flow will be if the data is released as open data (transmission principles). Also, in what context was the data collected for each actor, for example, did one of the data processors (data sender) know the data subject (data subject) personally (governance).
Once PB has captured details of the data, actors, roles, transmission principles and the prevailing context, risks can be identified in light of legal obligations, established norms and values. To this end the LA will need to determine how publishing the data in open format might affect the transmission flow and what privacy risks might be associated with this new flow of data. This is captured in phase two, the risk assessment.