CHAPTER 5: PRIVACY FLOW ANALYSIS
5.6 Implementation
Fig. 5.9 depicts the architecture of the proposed system. In our scenario, a client initiates a business process then a service agent parses the abstract definitions of each of the collaborating services (S={GenomWS, PharmaWS, DrugWS, ClinicWS, DemogWS}) and looks up the
matching services to get the concrete BPEL composition (concrete processes). Each process in- stance communicates with the composition engine when it receives an action and when it finally replies. Upon receiving a query, a process is instantiated and the corresponding invoked services are looked up in the service registry. The run-time environment executes the service logic by invoking other services (through SOAP and HTTP modules). Upon receiving an operation invo- cation, each service looks up it’s data repository for the matching instances (e.g. RDF files). Data owners specify minimum input by informing the hosting service about their privacy preferences. Thus, each data instance indicates a privacy level for each data type property.
Figure 5.9: Composition with Private data flow
The annotator uses the initial privacy levels indicated at the data type properties of each of the data instances (required for the process execution) to generate the initial set of annotations. In our scenario, for instance, each of the variables gene, disease, medicine, gender, age, name has an actual privacy level. Annotations are then added to both the variables involved in the process instance as well as service definitions in WSDL files. The privacy flow analysis uses the annotated process instance to analyze the actions in the context of that instance and infer privacy levels as they flow between services. Since data type properties flow as input or output variables throughout a process instance execution and throughout the entire composition, the analysis keeps track of the privacy levels needed for every process instance and uses those to propagate privacy level annotations to each subsequent activity in the process execution. The analysis uses an internal type checker that uses the annotations to enforce private flow of data between services. In the following, we explain how we implemented each component.
5.6.1 The Privacy Flow Analysis
We implemented our framework in Java using the Crystal static dataflow analysis frame- work [2]. We extended the Crystal framework to support our model. Crystal’s dataflow anayl- sis works on any control flow graph (CFG) and intermediate representation (Three Address Code TAC) of a language’s Abstract Syntax Tree (AST), on which it performs AST-walking analy- ses. We refactored the core classes of the crystal framework to abstract away the concept of an ASTNode to work generically on any AST node, including both the BPEL4WS constructs and the WSDL constructs. We also extended the ControlFlowGraph interface to support the model generated from BPEL4WS and WSDL. We utilized the capability of the dataflow anal- ysis infrastructure provided in Crystal to implement a forward analysis that is context-sensitive (it distinguishes between different invocation sites), flow-sensitive (the order of the execution af- fects the result of the analysis), and branch sensitive (to avoid loss of precision in handling con- ditional expressions). The PrivacyFlowAnalysis algorithm (Algorithm. 10) runs an instance of the worklist algorithm implemented in Crystal (Algorithm 9) on each process expression if it is not yet analyzed. The analysis core functionality lies in both the PAASTVisitor and the Annotation-BasedPATransferFunction.
Algorithm 10 AnnotatedPrivacyFlowAnalysis 1: Input Proot
2: output: result
3: worklist = createWorkList(Proot)
4: result = worklist.performAnalysis(); 5: labeledResultsBefore=result.getLabeledResultsBefore(); 6: labeledResultsAfter = result.getLabeledResultsAfter(); 7: nodeMap = result.getNodeMap(); 8: currentLattice = result.getLattice(); 9: cfgStartNode = result.getCfgStartNode(); 10: cfgEndNode = result.getCfgEndNode();
We implemented several privacy flow transfer functions that are aware of the privacy level annotations added to each ASTNode expression. Based on those function the PAASTVisitor checks each variable. For instance, the transfer function for anInvoke expression invoke(xS,˜i, ˜o) takes as arguments the invocation expressioninvoke and the tuple lattice, which maps a variable to it’s abstract lattice value. The visitor checks whether parameters of the operation invocation are safe based by comparing their incoming actual privacy level annotations and the formal privacy
level requirements on that operation parameters. Given the operation binding the visitor obtains a summary of annotation info for that operation definition. It then looks up the annotation summary for the operation xS, and for each parameter variable in i, it adds to the lattice the variable and it’s corresponding lattice value.
5.6.2 The Privacy Analysis Type Checker
We implemented the type checker as a plugin to the Crystal framework. The type checker relies on an initially generated set of annotations added to a process instance based on initial set of privacy levels annotations. The annotation generator starts by adding the first round of annotations based on privacy preferences of the requested data in the concrete process instance that is being executed. It annotates every bound variable or service instance in the process instance with a privacy type. The annotation generation tool implements support for annotating process definitions as well as WSDL interface definitions of external services referenced in the BPEL process.
Annotations on BPEL processes are added to operation invocations, input, and output vari- ables. Annotations on WSDL interfaces are added to operations input and output variables. The annotation generator rewrites BPEL and WSDL ASTNode expressions with the annotations. For BPEL process instances the annotation generator feeds the annotations as concrete privacy level types. Whereas for BPEL and WSDL definitions it defines formal privacy level parameters. For the WSDL definition in our scenario, the annotator adds the formal privacy level parameters G,D corresponding to the operation input and output variables, respectively. Upon receipt of thegene variable in the BPEL instance with actual privacy level H, the formal parameter G gets bound to to the actual valueH.
<receive partner-Link="client" portType="GenoWSPT" Variable="gene"
privacyLevel="H" createInstance="yes"/> <wsdl:definitions>
<privacylevelparams>
<privacylevelparam name="G"/> <privacylevelparam name="D"/> </privacylevelparams>
<portType name="GenomWSPT">
<input privacyLevel="G" message="gene"/> <output privacyLevel="D" message="disease"/> </operation>
</portType>
</wsdl:definitions>
The type checker then performs the analysis by feeding the process instance as an input to the analysis. Type checking is performed on annotations to guarantee that dataflow between services can only flow according to Table 5.2. The PAASTVisitor implements a method that is used by the type checker to report either a warning or an error based on the severity of the privacy violation. The errors and warnings get displayed in the Eclipse problems view.
5.6.3 Technology Implications
The proposed privacy flow analysis implementation can be incorporated into any service com- position middleware. It could be integrated either in a composition development environment to assist developers during process design by performing compile-time process validation or in a run-time environment (composition engine) to perform run-time validation during process exe- cution. For instance, it could be incorporated into the composition middleware provided by the WSO2 business process server (WSO2 BPS), which provides a comprehensive web-based console to manage, deploy, view and execute processes within a single server instance. WSO2 BPS im- plements an Apache ODE-powered BPEL engine and provides extensible RESTful management APIs. WSO2 BPS supports the BPWS4J [34] implementation of the IBM, Microsoft and BEA BPEL4WS specification. The BPWS4J platform is an Eclipse plug-in that consists of an engine and an editor. The BPWS4J engine takes the BPEL document for each process to be executed, a WSDL description of the interface that the process presents to the external clients or service partners (without binding information), and several WSDL documents of the partner services with which the process may interact.