22 The problem of testing whether the single-subset problem has a solution with cost smaller than a given bound is NP-complete when the public-closure forms an arbitrary subgraph.

7 2 Privacy via propagation

Theorem 7 22 The problem of testing whether the single-subset problem has a solution with cost smaller than a given bound is NP-complete when the public-closure forms an arbitrary subgraph.

This is the case even when both the number of attributes and the number of safe and UD-safe subsets of the individual modules is bounded by a (small) constant.

The hardness proof works by a reduction from3SAT and is given in Appendix A.5.11.

The NP algorithm simply guesses a set of attributes and checks whether it forms a legal solution and has cost lower than the given bound. A corresponding EXPtime algorithm that iterates over all subsets can be used to find the optimal solution.

The NP-completeness here is in n, the number of modules in the public closure. We note however that in practice the number of public modules that process the output on an individual private module is typically not that high. So the obtained solution to the

optimum-viewproblem is still better than the naive one, which is exponential in the size of thefullworkflow.

7.5 General Workflows

The previous sections focused on single-predecessor workflows. In particular we pre- sented a privacy theorem for such workflows and studied optimization w.r.t. this theorem. The following two observations highlight how this privacy theorem can be extended to general workflows. For the sake of brevity, the discussion is informal; full details are given in Appendix A.5.12.

Need for propagation through private modules All examples in the previous sections that show the necessity of the single-predecessor assumption had another private modulemk as a successor of the private modulemi being considered. For instance, in Example7.17,m

i=m1andmk=m4. If we had continued hiding output attributes

of m4 in Example 7.17, we could obtain the required possible worlds leading to

a non-trivial privacy guarantee Γ> 1. This implies that for general workflows, the propagation of attribute hiding should continue outside the public closure and through the descendant private modules.

D-safe suffices (instead of UD-safe) The proof of Lemma 7.12 shows that the UD-safe

property of modules in the public-closure is needed only when some public-module in the public-closure has a private successor whose output attributes are visible. If all modules in the public closure have no such private successor, then a downstream- safety property (called theD-safe property) is sufficient. More generally, if attribute hiding is propagated through private modules (as discussed above), then it suffices to require the hidden attributes to satisfy the D-safe property rather than the stronger UD-safe property.

The intuition from the above two observations is formalized in a privacy theorem for general workflows, analogous to Theorem 7.10. First, instead of public-closure, it uses

downward-closure: for a private modulemi, and a set of hidden attributeshi, the downward- closureD(hi)consists of all modules (public or private)mj, that are reachable frommi by

a directed path. Second, instead of requiring the sets Hi of hidden attributes to ensure UD-safe, it requires them to only ensure D-safe.

The proof of the revised theorem follows lines similar to that of Theorem7.10, with an

added complication due to the fact that, unlike in the previous case, here the Hi subsets are no longer guaranteed to be disjoint. This is resolved by proving that D-safe subsets are closed under union, allowing for the (possibly overlapping) Hi subsets computed for the individual private modules to be assembled together.

The hardness results from the previous section transfer to the case of general workflow. Since Hi-s in this case may be overlapping, the union of optimal solutions Hi for individual modulesmi may not give the optimal solution for the workflow, and whether a non-trivial approximation exists is an interesting open problem.

To conclude the discussion, note that for single-predecessor workflows, we now have two options to ensure workflow-privacy: (i) by considering public-closures and ensuring UD-safe properties for their modules (following the privacy theorem for single-predecessor workflows); or (ii) by considering downward-closures and ensuring D-safe properties for their modules (following the privacy theorem for general workflows). Observe that these two options are incomparable: Satisfying UD-safe properties may require hiding more attributes compared to what is needed for satisfying D-safe properties. On the other hand, the downward-closure includes more modules than the public-closure (for instance the reachable private modules), and additional attributes need to be hidden to satisfy their D-safe properties. One could therefore run both algorithms, and choose the lower cost solution.

7.6 Conclusion

In this chapter, we addressed the problem of preserving module privacy in public/private workflows (called workflow-privacy), by providing a view of provenance information in which the input to output mapping of private modules remains hidden. As several examples in this chapter show, the workflow-privacy of a module critically depends on the structure (connection patterns) of the workflow, the behavior/functionality of other modules in the workflow, and the selection of hidden attributes. We show that for an im-

portant class of workflows called single-predecessor workflows, workflow-privacy can be achieved viapropagationthrough public modules only, provided we maintain an invariant on the propagating modules called the UD-safe property. On the other hand, for general workflows, we show that even though propagation through both public and private modules is necessary, a weaker invariant (called the D-safe property) on the propagating modules suffices. We also study related optimization problems.

Several interesting future research directions related to the application of differential privacy were discussed in Section6.6. Another interesting problem is to develop PTIME

approximation algorithms for module privacy (that can handle non-monotonicity of UD- safe and D-safe subsets) in single-predecessor and general workflows.

Chapter

8

In document Provenance and Uncertainty (Page 186-190)