Training the Ensemble HMMs - Ensemble Stream Model for Data-Cleaning in Sensor Networks

A typical Markov model uses specific parameters but in our case we like to learn from a set of data samples. As in the previous chapter we have described unsupervised learning in which the stream is unlabeled. Hidden Markov models are especially known for their application in temporal pattern mining, which have higher order states not directly observed by the system. A hidden Markov model can be considered as a generalization of a mixture model where the hidden variables (or latent variables), which constitute the mixture distribution for each observation, are states of a Markov process rather than independent of each other. One such example of HMM is shown in Figure 8.2, which has two hidden states and three observed states. The proposed model uses a classification approach to unlabeled sensor streams so as to increase reliability. Statistically INTEL sensor dataset (CSAIL, 2012) have 30% missing values and at the same time the amount of mobile sensor data generated is increasing dramatically due to GPS enabled phones. We like to extend some of the preliminary results obtained in spatio-temporal data-cleaning algorithms to mobility models by using hidden Markov models (HMM). The ensemble of such streams can be studied using ensemble of HMMs to further learn latent features from the observed datasets. For example, some of the data-cleaning algorithms work well with GPS tagged data as shown in Figure 8.4 as it has context with the underlying map so any outliers can be visually viewed to further remove outliers during clustering.

The sensors allow aggregating all the parameters online. Due to inherent unreli- ability of a sensor and limited amount of training set available, it is hard to measure and label stream quality on-line. Given an observation of stream data over time the stream can be categorized using HEAPS. The number of hidden states learned by HEAPS framework using the data is accomplished using a Baum-Welch (Duda

Figure 8.3: Trajectory Visualization from GEOLIFE Dataset Showing GPS States.

Figure 8.4: Hidden Markov model showing different activity recognition and its corresponding states.

not probabilities, but likelihoods and are unnormalized. Bayes rule is used, which is given by,

ˆy = argmax

wj∈Ω

8.3 Training the Ensemble HMMs

We use the dataset trajectories which have GPS reading from most frequently trav- eled routes in Beijing city. The city falls between 39 55 N 116 25 E latitude and longitude and each trajectory record has the latitude, longitude, date and time in- formation recorded very few seconds. Out of the 17,621 trajectories from 182 users, 73 users have labeled their trajectories such as walk, car, bus, etc. An ensemble of 3 HMMS, each for walk, car and bus are trained using sufficient training data from two different users. Due to exponential complexity of computation with the increase in the number of states, we like to design an efficient HMM for the given dataset. A sample trajectory from dataset (Li & Zheng, 2008), (Zheng, 2008a), (Zheng, 2008b) was taken and plotted to find the temporal variations as shown in Figure 8.4 with activity recognition. From the plot the distribution of latitude and longitude varies to a minimum of 2-states as shown in Figure 8.3 or a maximum of 4-states. The ensemble HMM when used to test new samples was able to detect the sequence 95% of the time. As the trajectories have many common areas, if sufficient number of GPS coordinates are provided during test the HEAP framework is able to categorize the test samples with high accuracy.

8.4 Summary

The ensemble model is well justified. We have discussed in quality labels using DC-Trees in chapter 5, which deals with temporal streams. In the case of sequential time-series data we further need to distinguish the factors which effect the streams. The factors, which affect HMM model are the window-length of the time-series data. In this case the length of the training data is statistically sufficient for the model and its hidden states. These factors are further analyzed to determine the QoD

labels of time-series data as they differ from feature based categories in the previous discussions.

Chapter 9 Effects of Noise Due to User Mobility

M

obility model plays an important role in modern sensing applications. The

presence of noise is an important design consideration in wireless when deployed in harsh terrains. In this chapter we present the results using OMNET++ simulator and capture the temporal variations in time-domain with a single mobile user and

fixed base stations. Most of the work is from the research paper 1_{. The project was}

supported by Korean Science Council and National Institute of Mathematical Sci- ences. A cognitive radio is introduced in Section 9.3 and how primary and secondary members could share a sparsely used licensed spectrum. Section 9.4 described how to model mobility and compensate the fading effects. Section 9.5 explains new standards which co-exist with sensor networks and how it can minimize interference and enhance spectrum usage. For a through treatment of different noise types in sensor

networks, please refer to the2 _{Distributed Sensor Networks, Second Edition: Image}

and Sensor Signal Processing book.

1_{Vasanth Iyer, S. Sitharama Iyengar, Garmiela Rama Murthy, N. Parameswaran, Dhananjay}

Singh, and Mandalika B. Srinivas. Effects of channel SNR in Mobile Cognitive Radios and Co- existing Deployment of Cognitive Wireless Sensor Networks. IEEE IPCCC, 2010, pp. 294-300. Albuquerque, New Mexico, USA.

9.1 Introduction

Cognitive radio and sensor networks studied here are both considered static, with mobile primary licensed users. There are mobile primary users models for extensions to study specific signal estimation techniques. The cognition in their part has two common modes of interference avoidance. The first approach uses overlay to make up for the unused spectrum bandwidth and the second approach uses underlay in the form of interference control. The history of cognitive radio can be attributed to the thesis work of J. Mitola in 2000, where he coined ”Cognitive Radio” for a form of radio that would change its performance by detecting its environment and changing accordingly. Using mobility frame work and cognitive radios we find the trade-offs between minimum spectrum power allocation and channel rate, when operating in overlapping frequencies with primary users.

We like to study the performance of deploying dense Wireless Sensor Networks which uses the ISM band using IEEE 802.15.4 protocol in context of Cognitive networks. Due to recent emerging standards on inter-operability none of them address the distributed nature of the spectrum. Some of the deployments have adapted to frequency reuse and orthogonal spectrum allocations to have least interference and better usage of the same spectrum. These implementations allow baseline reality and also take into consideration the non-linearity of radios in practice, which intro- duce errors during channel coding. In this chapter we extend radio inter-operability and its specific power-ware requirements for extending the sensor network lifetime. We model interference as unlicensed users partially overlapping with primary user as shown in Figure 9.1, using a simulator as illustrated in Figures 9.3, 9.1, which give rise to co-channel interference. The varying parameters at the radio receivers are interference due to number of overlapping channels and variation caused by mo-

Figure 9.1: Frequency and bandwidth of unlicensed user partially overlapping with primary user.

bility. The interference in mobility can be seen as the phase shift due to Doppler and phase shift leading to delay spread due to frequency. The two dimensional represen- tation of interference varying with wireless range from the primary signal and the interfering signals is represented using co-variance matrix. The correlated channels with high signal to noise ratio (SNR) have better Qos, which leads to higher link quality and interference free reception during spectrum usage.

In document Ensemble Stream Model for Data-Cleaning in Sensor Networks (Page 148-155)