Prototype Results of Step Quantization - Ensemble Stream Model for Data-Cleaning in Sensor Netw

This algorithm was implemented using C++ classes into Linux cell phone running at 285 MHz using an arm processor and a DSP audio buffer capable of handling 65535 bytes per stream of CD quality. The low level driver was adapted to play long PCM files (20 seconds as shown in Table 4.4) by using memory management techniques in the real-time player buffer by using the DSP hardware as a file handle and synchronization of end of play was handled by waiting on the DSP hardware handle. The raw values after dynamic quantization are shown in Table 4.4.

4.8 Summary

As these families of codes allow up to 30 frames per second on a low end processor it would work for both audio(which is our test case) but also NTSC video (Kumar

et al., 1985) requirements. Most of the hardware vendors for portable processor have

a built-in player, which implements buffering streams based on arrival time from an on demand server. This real-time compression model will be able to deliver the desired fps. All the source formats available commercially such MPEG,AVI,WMF can be recompressed to nearly 70% of the original size hence a big bandwidth reduc- tion during streaming. Due to heterogeneous sharing of smart resource it is critical to match the device codec with a software decoder model which can handle up and down trans-coding rate at the file level. It also seamlessly enhances the index- ing capability, which is needed with the trans-coded file in compressed format using Gloumb codes. Here we have discussed lossless trans-coding and a quality based rate distortion for very low bandwidth and memory requirements using trans-repairing algorithm.

Figure 4.5: PCM 8-bit to 16 bit decoding with variable volume selection [1..64]. Table 4.4: Adjusted PCM values for selected volumes (1 AND 128).

Sample Index Volume = 128 Volume = 1

(Max. codec values) (Min codec values)

-128 -32247 -520 -127 -31995 -516 -126 -31744 -512 -125 -31492 -507 -124 -31240 -503 -123 -30988 -499 -123 -30736 -495 -121 -30484 -491 -120 -30232 -487 -119 -29980 -483 -118 -29728 -479 -117 -29476 -475 -116 -29224 -471 -115 -28972 -467 -114 -28720 -463 -7 -1763 -28 -6 -1511 -24 -5 -1259 -20 -4 -1007 -16 -3 -755 -12 -2 -503 -8 -1 -251 -4 0 0 0 +125 32000 507 +126 32256 -512 +127 32512 -516

Chapter 5 Sub-problems in Quality of Sensor Logs

B

y ranking the information extracted from sensor logs (as shown in Figure ??)

and its features, data mining algorithms can efficiently predict interesting events from static and mobile streams. The content of this chapter is based on a published

book chapter1 _{from a Machine Learning Text on Sensor Networks. The research}

methodologies and analysis were part of the Yahoo! Summer School on Information Retrieval (IR), held at Indian Institute of Science, Bangalore, 2011. The study used WEKA(of Waikato, 2008) data mining toolkit introduced by Google© architect member to explores how low-cost sensor streams can help data mining algorithms when ground truth data is too small to be effective in practice. Section 5.10 examines sensor domain temporal features in the context of Forest Fire events. Sections 5.18 describes how correlated temporal features are used in ranking the training samples.

1_{Intelligent Sensor Networks: Networks Signal Processing and Machine Learning. Vasanth Iyer}

and S.S. Iyengar and Niki Pissinou. ”Using Event Log Performance and F-measure Attribute Selection”, Publisher Taylor & Francis, 2013.

5.1 Supervised Classification of Skewed Sensor Logs

5.1.1 Introduction

During pre-processing event-log (as shown in Figure ??), training samples are sorted by their area of fire damage and classified into high, medium, small and accidental small fire. The pre-processing step allows studying the probability distribution and in our case the samples are highly skewed, giving an estimate that accidental small fires are more likely to be compared to large fires. Statistically we are interested in the many factors that influence accidental small fires; from the training samples, it follows a normal distribution. Graphically describing normal distribution, it takes the form of a bell-shape curve, which is also known as the Gaussian function. This is a first approximation for a real valued random variable, which tend to cluster around a single mean value. The sensor model needs to learn the expected ranges for the baseline attributes being measured, giving better density estimation with increasing samples count. The baseline discrete parameters capture only the sensor ranges, making event prediction function hard to train with a Gaussian density function, without specific temporal understanding of the datasets. The dynamic features present in a sequence of patterns are localized and used to predict events, which otherwise may not be an attributing feature to the static data mining algorithm.

We have studied the spatial features and baseline discrete sensor measurements and all the attributes available that have a high classification error, which can some- time account to 50% of the errors in the case of accidental small fire category. Further, factors that can cause such fires are investigated, wherein we include data pertaining to human specific temporal attributes such as number of visitors and traf- fic patterns coming into the forest area, thus filtering events with local significance. Temporal attributes is a better estimator, given the type of training samples, which

Large (>50ha) Medium (<50ha) Small Fire (<10ha) Accidental (<1ha) 0 50 100 150 200 250 Burnt Area in ha.

(a) Empirical log collection. (b) Four class classiﬁcation.

Figure 5.1: Histograms of empirical samples.

are difficult to calibrate and any approximation may induce false alarms. Relevance based ranking function is suitable to order higher bound sample chosen by domain experts as ideal estimates and still maintaining the desired low false alarm rates. The method of ranking uses the function of sensor precision and event relevance weights, which are then linearly added to represent data from fire activity logs. Sec- tion 5.2 and 5.3 provides related work and state of the art of label-less learning vs ground-truth extreme event samples. Section 5.4 defines sensor measurements and fire activity to model the data and algorithm computational complexity.

In document Ensemble Stream Model for Data-Cleaning in Sensor Networks (Page 77-81)