• No results found

9.2.1

Categorization

Uncertainty in databases can generally be incorporated as tuple uncertainty and attribute uncertainty. Assuming tuple uncertainty, tuples are associated with a probability to appear in the database. This characteristic is also called existential uncertainty. The property of

attribute uncertainty implies that a tuple has at least one uncertain attribute where the possible values are contained within a defined range. In the literature, probabilistic data models are classified in two types w.r.t. the attribute uncertainty: the discrete uncertainty model (cf. Figure 9.1(a)) and the continuous uncertainty model (cf. Figure 9.1(b)).

In many real-world applications, uncertain objects are already given by discrete obser- vations, in particular if the objects are derived from sensor signals. This type of represen- tation is motivated by the fact that, in many cases, only discrete but ambiguous object information – as usually returned by common sensor devices – is available, e.g., discrete snapshots of continuously moving objects. Therefore, this part will focus on the discrete uncertainty model by adopting a prominent representative among the discrete uncertainty models. The ULDB model orx-relation model [25], introduced in the Trio system [8], will be presented in the following.

The continuous uncertainty model will not be relevant in the context of this work, but for the sake of completeness, it will briefly be explained within a summary of related work on uncertain data in Chapter 10.

9.2.2

The X-Relation Model

The x-relation model extends the relational database model by incorporating uncertainty and lineage [25] and it supports existential uncertainty and attribute uncertainty. Relations in the x-relation model are called x-relations and contain uncertain tuples with alternative instances, which are called x-tuples. Each x-tuple T corresponds to a set of tuples. Each tuple t ∈ T is associated with a probability P(t), denoting the likelihood that it exists in T. This characteristic realizes existential uncertainty of tuples. The probabilities represent a discrete probability distribution of T, which realizes the attribute uncertainty of T; the

9.2 Modeling Uncertain Data 89

(a) Tuples and x-tuples. Tuples

Tuple Location Prob.

t1 Renzy’s Den 50% t2 Waterhole 20% t3 Hunting Grounds 30% t4 Waterhole 10% t5 Hunting Grounds 10% t6 The Forest 20% Tiger X-Relation Name X-Tuple Renzy {t1,t2, t3} Unknown Tiger ? {t4,t5, t6} (b) Possible worlds. Possible Worlds

World Tuples Prob.

W1 {t1},{} 30% W2 {t1}, {t4} 5% W3 {t1}, {t5} 5% W4 {t1}, {t6} 10% W5 {t2},{} 12% W6 {t2}, {t4} 2% W7 {t2}, {t5} 2% W8 {t2}, {t6} 4% W9 {t3},{} 18% W10 {t3}, {t4} 3% W11 {t3}, {t5} 3% W12 {t3}, {t6} 6%

Table 9.1: Tuples describing locations of tigers, an x-relation containing x-tuples with their possible locations and corresponding possible worlds with their probabilities.

constraint that P

t∈T P(t) ≤ 1 holds. The condition

P

t∈T P(t) < 1 implies existential uncertainty of x-tuples, meaning that the x-tuple may not exist at all.

9.2.3

The Possible Worlds Semantics

In relational databases, a popular semantics to cope with the uncertainty of data has been introduced by adopting Saul Kripke’s Possible Worlds Semantics [145], e.g., as performed in [15]. Incorporating this semantics into the x-relation model, an uncertain databaseD is instantiated into a possible world as follows [44]:

Definition 9.1 (Possible Worlds) Let D= {T1, . . . , TN} be an uncertain database and

let W = {t1, . . . , tN} be any (certain) database instance which corresponds to a subset of

tuples ti appearing in D such that ti ∈ Ti, i∈ {1, . . . , N}. The probability of this database

instance (world) W to occur is P(W) = QN

i=1P(ti). If P(W)>0, W is a possible world;

the set of all possible worlds is denoted by W.

The x-relation model is a special type of the possible worlds model that additionally allows mutual independence among the x-tuples. Furthermore, the tuples t of an x-tuple T are assumed to be mutually exclusive, i.e., no more than one instance t of an x-tuple T can appear in a possible world instance at the same time. In the general model description, the possible worlds are constrained by rules that are defined on the tuples in order to incorporate correlations or dependencies between tuples [187].

The x-relation model will be used as a basic object model in the major part of the following chapters. To get an intuition of this model, an example, taken from [134], will be given below.

Example 9.1 Table 9.1 shows an x-relation that contains information about the possible positions of tigers in a wildlife sanctuary. Here, the first x-tuple describes the tiger named “Renzy”, who may be found at three possible (alternative) locations t1, t2 and t3. He may

be in his cave with a probability of 50% or located at the water hole and at the hunting grounds with a probability of 20% and 30%, respectively. This x-tuple logically yields three mutually exclusive, possible tuple instances, one for each alternative location. Now, we know that an unknown tiger may have entered the wildlife sanctuary with a probability of 40%. In this case, it is not certain that the unknown tiger exists at all, which is an existential uncertainty of the x-tuple, denoted by a “?” symbol [25]. To incorporate this existential uncertainty, an additional, “empty” tuple is inserted added to the x-tuple of the unknown tiger, such that the probabilities of the possible worlds can be computed according to Definition 9.1. Taking into account the four alternatives (including the alternative of no unknown tiger) for the position of the unknown tiger, there are twelve possible instances (worlds) of the tiger x-relation. In general, the possible worlds of an x-relationRcorrespond to all combinations of alternatives for the x-tuples inR. In this model, the probability of the unknown tiger being at the water hole is not affected by the current position of Renzy, due to the independence assumption among x-tuples. Considering the general case with possible tuple dependencies, this example could be extended by the “natural” restriction that male tigers are territorial and the position of a tiger may be affected by the presence of other tigers in its close vicinity. Thus, for example, world W6 might not occur by rule.

9.2.4

Translation to Spatial Databases

For the use in this work, the semantics of the x-relation model will be translated into a spatial context. Then, an uncertain databaseDconsists ofN uncertain objects with spatial attributes, where each object X corresponds to an x-tuple and each observation x ∈ X corresponds to a tuple. The attribute uncertainty of objects of ad-dimensional vector space Rdis calledpositional uncertainty. Then, objects do not have a unique position inRd, but have multiple positions associated with a probability value. Thereby, the probability value assigned to a position x ∈ Rd of an object X denotes the likelihood that X is located at the position x in the vector space. The existential dependency, which describes the rule that observations belonging to the same object are mutually exclusive, will be assumed for the rest of this part. This dependency realizes the main characteristic of uncertain data.

According to [43, 45], a formal definition of a positional uncertain object within a d-dimensional vector space is given as follows:

Definition 9.2 (Discrete Uncertain Object) A discrete uncertain object X corres- ponds to a finite set of observations (alternative positions) in a d-dimensional vector space, each associated with a confidence value, i.e., X = {(x, P(X = x))}, where x ∈ Rd, and P(X = x) ∈ [0,1] indicates the likelihood that object X is represented by observation x.