Limiting Disclosure of Sensitive Attribute Values: An Active Approach

(1)

Limiting Disclosure of Sensitive Attribute

Values: An Active Approach

Junping Sun, Member IAENG

∗

Abstract—As computer and other technologies ad-vance and various data and information are perva-sively available, preservation of private information embedded in volume data are facing more and more challenges than ever before. Although there are available mechanisms in database management sys-tems limiting access to some sensitive and private information, some inference techniques might still be able to bypass the access control mechanisms to acquire these unauthorized information inappropri-ately. This paper will propose an active approach to limiting disclosure of sensitive and private infor-mation.

Keywords: Preservation of Private and Sensitive In-formation, Limiting Disclosure of Sensitive Data

1 Introduction

As both the size and the number of databases are grow-ing exponentially, it is not only important to manage access control of sensitive and private data [4] [5], but also critical to protect the privacy from data inference by obscure database queries [2] [6] [7]. Next the gener-alized disclosure scenery with respect to inference rules will be given, and its related problem statement will be deﬁned correspondingly. Section 2 will discuss various types of inference rules leading to possible disclosure of sensitive and private information. Section 3 will give the algorithms that detect these rules as well as set of private attributes whose values might be disclosed. Sec-tion 4 will present an active model based trigger mech-anism that will take appropriate actions when a query might access to some private attribute values. Section 5 is the conclusion.

∗_{Nova Southeastern University, Graduate School of Computer}

and Information Sciences, Fort Lauderdale, Florida USA 33314 Tel/Fax: 954-262-2082/3915 Email: [email protected]

1.1 Disclosure Scenery

Given a universal database schema S(A₁, A₂, ..., A_n) with a set of attributes A_i ∈A =A_{P U} ∪A_{P R} where

AP U is a set of publicly accessible attributes, andAP R

set of private attributes accessible to an authorized group of users, assuming that there is a set of publicly known rules such asR_i∈R:

Pi−→Aj

whereP_i∈P_{P U}, 1≤i≤nandA_j∈A_{P R}, 1≤j≤m, if ∃P_i ∈ P_{P U} can be satisﬁed by the query predicates Pk ∈PP U, 1≤i=k≤nfrom a given a queryQ∈Q, then ∃A_j ∈ A_{P R} V(A_j) (the value of attribute A_j ∈

AP R) will be disclosed.

Although a given database queryQ is only allowed to access A_i ∈A_{P U}, 1≤i = j ≤n, the values V(A_j) of private attributes might be inferred from some publicly known rules by any unauthorized individuals,

Ri∈R:P_i−→A_j,

that is,

Q={A_i|Pk} ∪R|=A_j

where A_i ∈A_{P U}, P_k ∈ P_{P U}, A_j ∈ A_{P R}, and 1 ≤i = j≤n,1≤j≤m.

In order to address the issues from the inference queries, the following model is proposed.

1.2 Problem Statement

Given a database query Q ∈ Q, an active model is proposed to analyze Q and to discover if there exist potential inferences on sensitive data in databaser(S).

The active model is a 4-tuple and deﬁned as:

(2)

• E = {Q,I,D,U} where Q is a set of queries, D set of deletions, I set of insertions, and U set of updates. Without loss of generality, all of D,I,U are treated asQin this paper.

• Cis a set of conditions that triggers corresponding actions, which will be discussed in the later part of this paper.

• KB={R,IC}is the knowledge base including a set of inference rulesRand set of integrity constraints

IC.

• AC is a set of actions and deﬁned as:

E × C × KB −→ AC

where AC = {AC_b,ACp}, AC_b is the set of prior actions before an event Q ∈ Q can proceed, and

ACp set of post actions afterQ∈Qis processed.

In order to take appropriate actions on the inference queries, the following will discuss various types of in-ference rules possible to be used to access sensitive and private attribute values.

2 Inference Rules

2.1 Functional Dependency as Inference

Rule

Definition 2.1 Let r(S) be a relation on the schema S(A₁, ..., A_n), ∃i A_i ∈ X ⊆ S, A_i ∈Y ⊆ S, 1 ≤i ≤ n, r(S) satisfies the functional dependency X −→ Y, denoted r(S)X −→Y,if

∀t, t_∈_r₍_S₎_t_[_X_{] =}_t_[_X_]_⇒_t_[_Y_{] =}_t_[_Y_]_,

i.e., ∀x∈D(X), |π_Y(σ_X₌_x(r(S))| ≤ 1 where D(X) is the domain of attribute X, π is the project operator, andσthe select operation in relational model.

The functional dependency X −→ Y claims that for each unique value of X, there exists at most one value ofY.

Definition 2.2 Let r(S)_H be a relation in a historical database H accessible to public.

For givenr(S) and r(S)_H, there exists two tuples such ast∈r(S) andt ∈r(S)_H. For a given database query

Q, assumingQ is allowed to access only the set of at-tributes A_i ∈ A_{P U}, 1 ≤i ≤n, if there exists a func-tional dependency X −→ Y where X ⊆ ∪A_i ⊆ A_{P U} and Y ⊆ A_{P R}, then the value of Y can be inferred and will be disclosed because if t[X] = t[X], then t[Y] = t[Y], and the fact t[X], t[Y] ∈r(S)_H. In the following discussion, the knowledge ofr(S)_His assumed publicly available.

That is, for a given query:

Q={A₁, ..., A_i|Pk}

Q∪X−→Y |=A₁, ..., A_i, ..., A_j

whereA_i ∈X ⊆A_{P U}, P_k ∈P_{P U},A_j∈Y ⊆A_{P R}, and 1≤i=j≤n.

Given Armstrong Inference Axioms (IA) as follows [3],

IA1: Reﬂexive Rule

ifY ⊆X, thenX −→Y.

IA2: Augmentation Rule

ifX −→Y, thenXZ −→Y Z.

IA3: Transitive Rule

ifX −→Y andY −→Z, thenX −→Z.

Some attribute values V(A_{P R}) might be further in-ferred and disclosed by using some of IA, for example,

IA3. The following will discuss the inference chain of the functional dependencies based on the closure and Armstrong Inference Axioms.

Definition 2.3 A functional dependencyf : X−→Y is trivial if Y ⊆ X. If Y ⊆X, then functional depen-dencyf : X−→Y is non-trivial.

In the rest part, this paper refers to only set of non-trivial functional dependencies.

Definition 2.4 Let F be a set of functional dependen-cies, and letf ∈F with respect to a set of attributes in f, (f)∈S,

F+₌_{_f _|₍_f₎_{⊆ ∪}

f_∈_F(f)∧F |=_{_Arms_rules_}f}

(3)

Definition 2.5 ForX ⊆S(A₁, ..., A_n) and a set ofF of functional dependencies overS(A₁, ..., A_n),

closureF(X) ={Y ∈S|X −→Y ∈F+}

is calledclosureof X.

For a given set of functional dependencies F Ds,

F={f_i:X_i−→Y_i|1≤i≤n},

ifX_i−→Y_iandY_i −→Y_i₊₁,then we haveX_i −→Y_i₊₁, denoted as f_i ⇒ f_i₊₁, by the transitivity property of Armstrong’s rule[3].

Without loss of generality, if

f1 ⇒f2, ..., fi⇒fi+1, ..., fn−1⇒fn,

denoted as,

f1=∗⇒fn, thenXi−→Yj,1≤i < j≤n.

For a given database queryQ, althoughQis allowed to access only the set of attributes X_i∈A_{P U}, 1≤i≤m, iff₁ =∗⇒f_n, then X_i −→Y_j,1≤i < j ≤n,the value of Y_j ⊆ A_{P R} can be inferred, and will be disclosed. That is, for a given query:

Q={X₁, ..., X_i|P_k} Q∪f1=∗⇒fn|=X1, ..., Xi, ..., Yj

where X_i ⊆A_{P U}, P_k ∈P_{P U}, Y_j ⊆A_{P R}, and 1≤i= j≤n.

GivenIA2 andIA3, the pseudo transitive rule can be achieved:

IA4: Pseudotransitive Rule

ifX−→Y andY Z−→W, thenXZ−→W.

The proof of IA4 can be done by following IA2 and

IA3[3].

For a given database query Q ∈ Q and access to at-tributes X, Y, Z, the value of W can be inferred and will be disclosed with the fact:

X −→Y &Y Z −→W |=XZ −→W

That is, for given query:

Q={X, Y, Z|P_k} Q∪(X −→Y&Y Z −→W)|=W whereX, Y, Z ⊆A_{P U},P_k∈P_{P U},W ⊆A_{P R}.

2.2 General Inference Rules

For given set of various rules such as: R⊆ KB, if there exists a rule R : P_i −→ P_j, where A_i ⊆ A_{P U} is a set of attributes with respect to P_i and A_j ∈ A_{P R} is the attribute with respect to P_j. If A_i ⊆ A_{P U} can be satisﬁed by a given queryQ, then the value of Aj ∈ AP R with respect toP_j can be inferred and will be disclosed.

That is, for a given query:

Q={A_i|Pk}

Q∪(R:P_i−→P_j)|=A_j

where A_i ⊆A_{P U}, P_k ∈ P_{P U}, R : P_i −→ P_j ∈ KB, andA_j ⊆A_{P R}.

The type of each predicateP_iinR_ior a matched query condition inQcan be:

1. either a simple query predicate or constraintSQP such as:

Ai θ C or Ai θ Aj,i=j,

whereθ∈ {=, <,≤, >,≥,=}.

2. or a compound query predicate or constraintCQP that consists of a set ofSQPs and logical operators such as: AND, OR, NOT.

For a given set of rulesR⊆ KBsuch that:

R={R_i:P_i−→P_j |1≤i=j+ 1≤n},

ifP_i−→P_j andP_j₊₁−→P_k,then we haveP_i−→P_k, denoted asR_i=⇒R_i₊₁,by the modus ponens law and hypothetical syllogism.

Without loss of generality, if

P1−→P2, ..., Pi−→Pi+1, ..., Pn−→Pn+1,

denoted as,P₁−→∗ P_n, then

Ri−→∗ Rj,1≤i=j+ 1≤n,orR1=∗⇒Rn.

So for a given query:

(4)

Q∪(R:R_i=∗⇒R_j)|=A_j

where A_i ⊆A_{P U}, P_k ∈P_{P U}, R:R_i =∗⇒R_j ⊆ KB, andA_j⊆A_{P R}.

Definition 2.6 Given predicates P_i, P_j, ifR_i :P_i −→ Pj and Rj : Pj −→ Pi, denoted as: Pi ←→ Pj, or

Pj ←→Pi, then Pi ←→Pj is a mutual (circular)

im-plication betweenP_i andP_j, or R_i⇐⇒R_j.

Definition 2.7 Given R_i : P_i −→ P_i₊₁ and R_j : Pj −→ Pj+1, Ri, Rj ∈ R, 1 ≤ i = j + 1 ≤ n, if

Pi −→∗ Pj and P_i ←−∗ P_j, denoted as P_i ←→∗ P_j, or Pi −→∗ Pj and Pi ←−∗ Pj, denoted as Pi ←→∗ Pj, then

Ri⇐⇒∗ Rj,1≤i=j+1≤n.We denote it as a mutual implication ring, or simply a ring.

For given set of various rules R⊆ KB such that:

Ri⇐⇒∗ Rj,1≤i=j+ 1≤n.

if there exists a ruleR:P_i←→∗ P_j, whereA_i⊆A_{P U} is a set of attributes with respect toP_i andA_j ∈A_{P R} is the attribute with respect toP_j, and ifA_i⊆A_{P U} can be satisﬁed by a given query Q, then the value of Aj ∈AP R with respect to Pj can be inferred and will

be disclosed.

So for a given query:

Q={A_i|P_k}

Q∪(R:R_i⇐⇒∗ R_j)|=A_j

where A_i ⊆ A_{P U}, P_k ∈ P_{P U}, A_j ⊆ A_{P R}, and R : Ri⇐⇒∗ Rj⊆R⊆ KB.

3 Discovering

Inference

Chains

and

Rings

Theorem 3.1 For a given set of rules R⊆ KB:

R={R_i:P_i→P_j |1≤i < j≤n}

with respect to a given schema S(A₁, ..., A_n), let LHSR={P_i|P_i∈R_i,1≤i≤n}

be the set of predicates on the left hand side of each Ri∈R, and

RHSR={P_j|P_j∈R_i,1≤i < j≤n} be the set of attributes on the right hand,

a given set of rules R:={P_i −→P_j, 1 ≤i < j ≤n} forms a mutual implication ring (MIR) for a given R set, if and only ifclosure(P_i)⊆(LHS_R∩RHS_R).

Proof:

Proof of the if-part:

Assume the conditionclosure(P_i)⊆(LHS_R∩RHS_R) is true, but R : P_i ←→∗ P_j is not an MIR. For a given circular rule R : P_i ←→∗ P_j, both P_i → P_j and Pi → Pj must hold. Removal of either P_i −→ P_j or P_j −→ P_i will make R : P_i ←→∗ P_j non-circular and contradict with the given conditionclosure(P_i)⊆ (LHS_R∩RHS_R).

Proof of the only-if part: AssumeR: P_i ←→∗ P_i is an MIR, but the conditionclosure(P_i)⊆(LHS_R∩RHS_R) is not true. For a given circular ruleR:P_i←→∗ P_j, we have bothP_i −→P_j and P_j −→P_i. In order to make Pi ⊆ LHSR true, but Pi ⊆ RHSR, removal of Pi on

RHS implies the removal ofP_j→P_i, and it contradicts with the fact thatR: P_i←→P_j is an MIR as well as

R:P_i←→∗ P_j.

Algorithm 3.1 takes as the input: a given queryQ, a given query predicateP_i, and a set of rulesRwith re-spect to schemaS(A₁, A₂, ..., A_n). The algorithm com-putes an inference chainP_i−→∗ P_j, 1≤i=j+ 1≤m, for a givenP_iwith respect to a given queryQ. The com-putation of an inference chain at Step 6c is based on the Armstrong Inference Axiom IA3, the transitive rule. Step 6g to 6l compute both the attributesA_k ∈ A_{P R} andA_k∈Aimplied by the predicatesP_j∈R_j. Step 6m will test if the derived inference chain forms a ring Pi ←→∗ Pj, 1 ≤i = j+ 1 ≤m. If a ring Pi ←→∗ Pj,

1≤i=j+ 1≤m, is found, then the access to all the attributes in A(P_i), 1 ≤ i ≤ m, will be restricted in order to prevent any attribute values in the ring from disclosure. If a chain P_i −→∗ P_j, 1 ≤ i = j+ 1 ≤ m is found, then the attribute A_j₋₁ ∈ A_{P U} will be re-stricted for the attributeA_j ∈A_{P R}with respect to the inference chainP_i−→∗ P_j, 1≤i=j+ 1≤m.

(5)

Algorithm 3.1 Computing Chain ofP_i

Input: P_i, Q, and Rwith respect toS(A₁, ..., A_n)

Output: closure(P_i),chainclosure(P_i),

A(P_i),A(P_i)_{P R}

begin

1. oldclosure(P_i):=∅;

2. newclosure(P_i) :={Pi};

3. chainclosure(P_i) :=∅;

4. A(P_i) :=∅;A(P_i)_{P R}:=∅;

5. while oldclosure(P_i) =newclosure(P_i) do

6. begin

(a) oldclosure(P_i) :=newclosure(P_i);

(b) foreach R_j :P_j→P_k∈R do

(c) if(P_j⊆newclosure(P_i))then

(d) begin

(e) newclosure(P_i) := {Pk} ∪

newclosure(P_i);

(f ) chainclosure(P_i) :={P_j→Pk} ∪

chainclosure(P_i);

(g) ifR_j:P_j→P_k ∈R|=A_k then

(h) begin

(i) A(P_i) :=A(P_i)∪A_k;

(j) ifA_k∈A_{P R} then

(k) A(P_i)_{P R}:=A(P_i)_{P R}∪A_k;

(l) end

(m) if(P_i==P_k)then

(n) P_i.count:=P_i.count+1;

(o) end

7. end

8. closure(P_i) :=newclosure(P_i);

9. return closure(P_i), chainclosure(P_i);

10. return A(P_i), returnA(P_i)_{P R};

end

Figure 1: Algorithm to Computing Chain ofP_i

Algorithm 3.2 Find Implication Chains and Rings

Input: S(A₁, ..., A_n),Q,R

Output: set_MIR

begin

1. set_MIR:=∅; set_CH :=∅;

2. foreach R_i:P_i→P_j∈Rdo

3. begin

(a) P_i.count:= 0; P_i.order:= 0;

(b) computeclosure(P_i)by Algorithm 3.1;

(c) if(P_i⊆LHS_R∩RHS_R)then

(d) set_MIR:=set_MIR∪chainclosure(P_i);

(e) else

(f ) set_CH :=set_CH∪chainclosure(P_i);

end

[image:5.612.66.543.70.660.2]

end

(6)

results achieved from Algorithm 3.2 will be used for the corresponding prevention actions inTrigger4.1 to be presented in Section 4.

4 The Active Approach

For a given active model based system such as:

T ={E,C,KB,AC}

AC is a set of actions and deﬁned as:

Q× C × KB −→ AC

whereAC={AC_b,ACp},AC_bis the set of prior actions before an event Q∈Q⊆ E can proceed, andAC_p set of post actions afterQ∈Q⊆ E is processed.

4.1 The Before Action

AC

_b

When a queryQis issued, the active modelT will take a proactive action AC ∈ AC_b before query Q can be processed. AC ∈ AC_b will invoke Algorithm Algo-rithm3.2 to detect the inference mutual inference rings and chains. If a chain of inference rules is found such as:

P1−→∗ Pn|=Aj ∈AP R,

where 1≤i=j−1≤n,

then the access to A_i ∈ A, A_i P_i ∈ R_i will be re-stricted in order to preserve the privacy ofA_{P R}.

If a ring of inference is found such as:

P1←→∗ Pn |=A_j ∈A_{P R},

where 1≤i < j≤n,

then the access to all the attributes A_i ∈ A, A_i P_i will be restricted.

The active triggerTrigger4.1 inFigure3 will take the initial action to check the given queryQat Step 3a, and process query Q correspondingly at Step 3b, Step 3c, and Step 3d.

Trigger 4.1 Checking Possible Disclosure

Input: S(A₁, ..., A_n),Q,R

Output: set_MIR

begin

1. EVENT: Qon databaser(S)

2. CONDITION:if ∃A_i∈A_{P R}⊆S(A₁, ..., A_n), 1≤i≤n

3. ACTION:

(a) invoke Algorithm 3.2 to detectP_i−→∗ P_j and Pi←→∗ Pj,1≤i < j≤n;

(b) ifP_i−→∗ P_j &|=A_j∈A_{P R} then

restrict the access toA_i∈A,A_iP_i;

(wherei=j−1)

(c) ifP_i←→∗ P_j &|=A_j ∈A_{P R} then

restrict the access toA_i∈AP_i,∀1≤i≤n;

(d) if∀A_i Qare resstricted,thenrejectQ;

4. processQwith restriction on A_i in 3a and 3b;

end

(7)

4.2 The Post Action

AC

_p

There are many actions inAC_pthat can be taken after a query passes the testing inAC_b.One of majorAC_p is to keep track of query patterns by various data mining approaches. It is important to discover the association and correlations between A_{P U} and A_{P R} in the set of observed queries [1]. The higher ratio of association and correlations betweenA_{P U} andA_{P R}should trigger great attention and further actions. Mining query patterns is a broad subject area and will be considered for the future research.

5 Conclusions and Discussions

This paper has demonstrated an active model to detect possible disclosure of sensitive and private attribute val-ues. The active approach mainly addresses a single rule inference, a chain type rule inference, and a ring type rule inference with respect to the type of functional de-pendency rules and generalized rules from diﬀerent per-spectives. The proposed algorithms detects the chains and rings, and the corresponding actions taken will limit the access to the private attribute values. Fur-ther, the post process will use data mining approach to ﬁnd the patterns of query access for the future access control policy making.

6 Acknowledgments

The author would like to express sincere thanks to Dean and Dr. Edward Lieblein, faculty, and staﬀ of Gradu-ate School of Computer and Information Sciences, Nova Southeastern University.

References

[1] R. Agrawal, A. Evﬁmievski, J. Keirnan, and R. Velu, “Auditing Disclosure by Relevance Rank-ing,” in Proceedings of 2007 ACM SIGMOD In-ternational Conference on Management of Data, pp. 79-90, 2007.

[2] R. Agrawal, J. Keirnan, R. Srikant, and Y. R. Xu, “Hippocratic Databases,” in Proceedings of 28th International Conference on Very Large Data Bases, Hong Kong, China, pp. 143-154, 2002.

[3] W. W. Armstrong, “Dependency Structures of Data Base Relationships,” inProceedings of IFIP Congress, pp. 580-583, 1974.

[4] E. Bertino and R. Sandhu, “Database Security -Concepts, Approaches, and Chanllenges,” inIEEE Transactions on Dependable and Secure Comput-ing, Vol. 2, No. 1, pp. 2-19, 2005.

[5] S. Castano, M. Fugini, G. Martella, and P. Sama-rati.Database Security(Eds), ACM Press, 1994.

[6] K. LeFevre, R. Agrawal, V. Ercegovac, R. Ramakr-ishnan, Y. R. Xu, D. DeWitt, “Limiting Disclo-sure in Hippocratic Databases,” inProceedings of 30thInternational Conference on Very Large Data Bases, Toronto, Canada, pp. 108-119, 2004.