By investigating approaches for web engineering, systems for IdM and languages for describing identities, we retrieved an overview of tech- nologies related not only to management of personal data, but also to control and protection of such data.
Focusing on reusability of building blocks, CBWE approaches offer strong support for evolving web systems in relation to approaches orientating on other aspects. Here, especially WebComposition in association with WAM satisfies the collective requirements by considering the life cycle of web systems and by providing conciseness, review and simplicity when designing and making architectural changes to web systems. WAM enables to describe composition of federated web systems and involved components as well as their relations at a high level of abstraction. It is easy to apply during design using a graphical notation and allows for automatic model verification during system evolution (A. Heil, 2012). However, it lacks capabilities for consistently describing and detailing semantics of involved entities, which would assist in modeling web systems at different granularity levels and improve interoperability and utilization of models through machines. With regard to IdM, the open silo model is superior to alternative models through fostering control, ownership, responsibility and self-determination while ensuring accessibility to personal data. WebID is a promising repre- sentative of a decentralized IdMS implementing this model. It 1) empowers individual entities rather than authorities, 2) supports domain-independent authentication involving certificates that offer high cryptographic strength, 3) utilizes flexible, extensible and machine-interpretable identity descrip- tions, and 4) provides application-independent identification and linkability of identities. Here, identities are not restricted to represent characteristics of human entities only. Moreover, WebID facilitates consolidating personal data and, thus, eases controlling access and privacy with the option of both local and global revocation of identities or identity proofs respectively. Easing interpretation and inference by machines, semantic languages are well-suited for conceptual modeling and domain-specific de- scriptions of identities in an accurate, expressive and portable way. RDF-based vocabularies are application-neutral, standard-based and
web-compliant. They enable to create adaptable, distributed, extensible, interlinked and reusable descriptions of heterogeneous facts, which human and non-human entities can then integrate, process or utilize in a dynamic, scalable and seamless manner.
Now that technologies suitable for meeting the challenges outlined in Chapter 2 have been detected, the next chapter proceeds with describing how we make use of them as part of the proposed solution.
4
Enhanced Security
in Managing
Personal Data
To holistically address the challenges outlined in Chapter 2 with respect to the suitability of technologies analyzed in Chapter 3, this chapter describes an approach to enhance security in managing personal data by web systems. Section 4.1 outlines the design of the proposed solution. Using the design, Section 4.2 then specifies the solution architecture and process to enable reusability and show possible integration points. Section 4.3 introduces three key components that extend the fundamental solution architecture in order to meet the remaining challenges. Not only to manifest the pro- posed solution architecture and process, but also to establish a basis for integrating the key components, Section 4.4 describes the proof-of-concept platform. Summarizing our proposal for solving the central problem stated in Section 1.2, Section 4.5 concludes this chapter.
4.1 Design
For designing the solution, we involve three reusable artifacts that are building upon each other. Each artifact represents a certain state of the design through a particular model, i.e., conceptual model, logical model, and physical model. Taking into account the principles of web engineer- ing and WebComposition in particular (cf. Subsection 3.3.1), we facilitate both adaptability and reusability through postponing the technical imple- mentation until a sound security foundation has been established. As a starting point, Subsection 4.1.1 describes the conceptual model. To ac- complish a common understanding of the matter, this model only denotes significant entities as well as the relationships between them, and creates a generalized formal structure. Subsection 4.1.2 then outlines the logi- cal model, which details the conceptual model by putting the concepts into context, yet without considering the physical representation. Finally, Subsection 4.1.3 specifies the physical model, which defines the basis for the technical implementation of the logical model.
4.1.1 Conceptual Model
In order to model the concepts relevant for enhancing security in managing personal data by web systems, we have to properly take account of all enti- ties involved, i.e., particularly persons, web applications and web services. For consistently defining these entities, we make use of semantic vocabu- laries as they enable to apply the same metamodel by RDF and, thus, offer advantages with regard to universal linkage, discovery, accessibility and arbitrary detailing data (cf. Subsection 3.3.3). Therefore, we distinguish between entity classes, entities and identities. While entity classes define the general concepts of entities, identities characterize specific aspects of entities within defined contexts (cf. Subsection 2.1.2 on “Terms and Def-
initions”). For instance, Alice is an entity of entity class person and has a co-worker identity representing her in a business context.
As an effort of making architectural descriptions of web systems machine- readable and linkable, we use the WAM ontology proposed and extended in (A. Heil, 2012; Wild and Gaedke, 2014). Modeled using OWL, the WAM ontology does not only define the classes of web entities, legacy entities, and security realms, but also their associations. To reflect the organizational boundaries of control, both web and legacy entities can be contained within security realms. The subclasses identity (service) provider, application, and service are inherited from the web entity class. They might invoke other web entities and maintain relationships to legacy entities, which are responsible for tasks like storage or processing. Figure 4.1 illustrates the WAM ontology.
SubClassOf SubClassOf Data Unit Legacy Entity Process Unit Security Realm Contains Trusts Contains WebID URI Web Entity Contains Application Service Identity Provider SubClassOf SubClassOf
SubClassOf WebID URI Invokes
Invokes
Legacy Relationship
Figure 4.1: WAM Ontology
Having extended WAM (Meinecke and Gaedke, 2005) towards a semanti- cally enriched architecture model for web systems, we also have to consider typical SOA implementations with SOAP and RESTful services. Similar to above model definition, we detail the concepts present in such architectures using domain-specific semantic vocabularies. A descriptive approach using Linked Data facilitates systematic use by authorized services in later phases and also assists in controlling the evolution of web systems as proposed in (Wild and Gaedke, 2009). In addition to describing web applications
and web services, this allows denoting the people involved as suggested in (Maamar et al., 2011). For modeling identities that denote entities of the class of web services, we rely on the vocabulary introduced with WSDL 2.0 RDF mapping (Kopecký, 2007). Furthermore, we make use of existing vocabularies such as FOAF or the contact ontology to character- ize identities that refer to entities of the class of persons. Although other ontologies enable to model the identities of the remaining entity classes referenced in the WAM ontology, these classes are of less relevance for this dissertation and, therefore, left out of consideration.
To clearly identify relevant representations of entities in an architecture description of a web system, we need to detail essential concepts of identity management first. For IdM we rely on WebID (Sambra et al., 2014). Not only does WebID allow for identifying entities, but it also enables authentica- tion and facilitates the creation of precise, extensible, interpretable, linkable and portable descriptions (cf. Subsection 3.3.2). WebID is a distributed iden- tification approach that involves three underlying artifacts for various pur- poses, including recognition, characterization and legitimization. These ar- tifacts are the WebID URI, the WebID profile and the WebID certificate. The following formalism of WebID extends the definitions of Subsection 2.1.2. A WebID URI refers to an identity i that represents entity e. Like a user- name in other IdMS, a WebID URI w ∈ W ⊂ U is a URI denoting an identity i, where W is the set of all WebID URIs and U is the set of all URIs. Dereferencing a WebID URI w returns a set of RDF triples T ⊂ T that describes personal attributes of identity i using Linked Data. While Tis the set of all RDF triples, each triple t ∈ T consists of subject t1,
predicate t2, and object t3. Equation (4.1) formalizes the dereferencing through functionα(u), which yields T for URI u being a valid WebID URI.
A WebID profile is a URI addressable resource. It is available at WebID URI w and contains a set of RDF triples T describing identity i. As RDF is used for specifying all personal data, an identity’s attributes are expressive, extensible and machine-readable (Schreiber and Raimond, 2014). This is a major advantages to other IdMSs, which are restricted in assigning and exchanging user attributes. Such semantic descriptions facilitate con- trolled large scale exploitation of profile data to optimize customer services and improve the user experience (Wild, Chudnovskyy, et al., 2013a). As RDF triples T span graph G = (V , L), G ∈ G, where G is the set of all graphs, and graph G refers to a set of triples describing identity i, we formalize this equivalence in Equation (4.2).
T∼ G ⇔ ∀t = (t1, t2, t3) ∈ T :
t1, t2, t3 ∈ V ∧ (t1t2) ∈ L ∧ (t2, t3) ∈ L