Cleaning Methods - Cost-Conscious Cleaning

Chapter 3 An Introduction to RFID Technology

5.2 Cost-Conscious Cleaning

5.2.1 Cleaning Methods

Cleaning of RFID data can be seen as a classification problem. We take as input a set of features describing the current information regarding a particular tag, and decide on its location. For example, a cleaning method may use the features (EP C, reader, timestamp, detected), where detected is a binary flag, that is set to 1 if the reader detected the tag identified by EP C at time timestamp and 0 otherwise; and classify the tag as present if the reader detects it, and not present otherwise. A more sophisticated classifier may require additional features such as the past k readings of the tag before deciding on its location.

Definition 5.2.1 A cleaning method is a classifier M that takes as input, a tag case, which is a tuple of the form (hEP C, ti, hf1, f2, ..., fki), where each fi is a feature describing one characteristic of the tag identified by EP C or its environment at time t, and makes a prediction of the form (hEP C, ti : loc, conf ), where loc is the predicted

location for the tag identified by EP C at time t, and conf is the confidence that the classifier has in the prediction.

It is possible that some cleaning methods do not provide a confidence value with their predictions (in this case, we can consider that every prediction has a confidence of 100%). We will call such methods terminal methods, as the location they issue for each tag case is final,i.e., at runtime we cannot call another cleaning method based on confidence.

We classify cleaning methods into three broad categories. The first category is user-defined rules that are provided by experts in the application domain and that can be used to quickly clean a large portion of an RFID data set. The second category is cleaning methods based on statistical models [13, 58, 29] that issue tag locations with a high probability of being correct. The third category is methods that use data mining techniques, such as a data warehouse, frequent patterns, or clustering to classify tag cases. For example, we can use historic information on flow patterns [40] or group movements [42] to determine the correct location of a tag. Multiple individual cleaning methods can be combined to form a new method. For example, a new method can be the combination of the window smoothing method, which detects false negatives, and pattern matching method, which detects false positives.

Cost model for a cleaning method. The cost of a cleaning method consists of the per-tuple cleaning cost and the error cost. The per-tuple cleaning cost is a function of two variables: 1) the amortized per tuple training cost and 2) the cost, in term of storage space and running time, for labeling a tag reading. The error cost is what we have to pay for each misclassified tag reading. The error cost can be a scalar value that simply penalizes issuing an incorrect tag location by a constant or a matrix1_.

Features available to a cleaning method. The context in which tag readings take place defines the feature space. Intuitively, features are important information regarding a tag at at the time of a reading. Features can be classified into four groups: (1) Tag features, which describe characteristics of the tag, such as communications protocol, vendor, price, or history of recent tag detections, (2) Reader features, which describe the reader, including the number of antennas, protocol, price, and vendor, (3) Location features, which describe the location where the reading took place, including the type of area being monitored (e.g., door, shelf, and conveyor belt), or the sources of interference in the area, and (4) Item features, that describe the item to which the tag is attached, including item composition (e.g., water or metal content), physical dimensions, or if it is a container.

A DBN-based view of cleaning

In this section we propose a new cleaning method based on Dynamic Bayesian Networks (DBNs) [79]. We assume there is a hidden process that determines the true location of a tag, but we only see noisy observations of the hidden

1_{In a more general setting the error may be a function of the distance of the correct location to the predicted location, the price of the item, and}

process. The model maintains a belief state that is the probability that the tag is present or not at a reader given the past observations. DBN-based cleaning has the advantage that it does not require us to remember recent tag readings, and as opposed to window smoothing, it gives more weight to recent readings (even within a window) than older ones. A simple implementation of the model is to define a single hidden variable Xtthat is true if the tag is present at the reader’s location at time t, and a single observation variable et, which is a noisy signal dependent on Xt. We can compute the most likely current state Xt+1given the observations e1:t+1as:

P (Xt+1|e1:t+1) ∝ P (et+1|Xt+1) X

P (Xt+1|xt)P (xt|e1:t) (5.1)

where P (et+1|Xt+1) is the known probability of observing a certain reading given a true state of the world, P (Xt+1|xt) is the known probability of changing from one true state to another, and P (xt|e1:t) is our previous belief state. The

observation and transition models can be learned from the data or be given by the user. For example, we can update our observation model with the average detection rate of recent tag readings. The belief state is recomputed sequentially as we receive new readings. Figure 5.2 presents the graphical representation of the DBN.

presentt-1

present

_t presentt-1 P(presentt) true 0 .9 false 0 .1

detect

_t-1

detect

_t presentt P(detectt) true 0 .8 false 0 .0 Transition M odel Observation Model

Figure 5.2: Structure of a simple DBN for cleaning

In document Mining Massive Moving Object Datasets from RFID Data Flow Analysis to Traffic Mining (Page 64-66)