Research on Data Processing Technology Based on the Middleware of
the IoT
Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin
School of Information, Beijing Wuzi University, Beijing, China,
[email protected]Abstract
To get effective events based on original data processing is the most basic and most important step of the middleware of Internet of Thing. First this paper put forward a kind of IoT middleware framework model on the basis of application of IoT technology, and then design data processing module through researching on the characteristics of the data of IoT, focusing on data processing for the leakage read, repeated read and dirty read. The core of this paper is to design the filters and puts forward specific filtering algorithm.
Key words
: Data Processing, Internet of Thing, Middleware, RFID, Filter1. Introduction
The concept of Internet of Thing was originated in 1995 when Bill Gates wrote a book "the road ahead", but it didn’t pay attention that under wireless network and other hardware and software development. ITU Internet Report 2005[1]: Internet of Thing, which is released by the International Telecommunication Union in the WSIS, held in Tunis on November 17, 2005, referred to the concept of the Internet of Thing. The report pointed out that the ubiquitous "Internet of Thing" communication era was coming, and all objects in the world, from tires to toothbrushes, from housing to the tissue, can exchange actively via the Internet. Radio frequency identification technology, sensor technology, nanotechnology and intelligent embedded technology will have a more wide range of application.
Internet of Thing is "the Internet connects material objects", abbreviated as IoT. And it is a network which according to the agreed agreement makes articles connect with the internet to realize information exchange and communication through radio frequency identification (RFID), infrared sensors, global positioning system, laser scanners and other information sensing device, in order to achieve intelligent identification, location, tracking, monitoring, and management. There are two meanings [2]: first, the core and foundation of the Internet of Thing is still the Internet, and it is the extension and expansion of the Internet network. Second, its client-side is spread and extended to any goods and articles for information exchange and communication.
With the published of Internet of Thing, it broke the traditional thinking before. The traditional thought is always make physical infrastructure and IT infrastructure separate: on the one hand is airports, roads and buildings, and on the other hand is data center, personal computer, broadband, etc. In the " Internet of Thing" era, reinforced concrete, cable and chip, broadband will integration for unified infrastructure, in this sense, infrastructure is more like a piece of new earth site, where the world's running in it, including economic management, production operation, social management and personal life[3].
2. Related research
Currently, the Internet of Thing technology includes RFID, sensors, ZigBee, GPS and so on. The system architecture of the Internet of Thing contains the perception layer, middle layer and the application layer [4]. Perception layer collects the relevant data of all physical and virtual "objects". The information, characterized by the large amount and inconsistent format, is the data source of the entire IoT system. Because the mutual interference may occur between underlying perception devices, the data collection may sometimes be incomplete, wrong, or even leakage reading. In addition, these massive data will produce a large amount of redundant information. For example: one sensing device may report the same data repeatedly in a short time or more than one sensing device reported the same data and so on. Due to the above reasons, all the uppers application takes large energy in screening and processing the data before providing corresponding service and function, which makes workload and difficulty increased[5]. So IoT researchers transfer the work that process the data collected by the data acquisition technology from the traditional application layer to the middle layer, which can make upper application focus on the development of service
Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin
International Journal of Digital Content Technology and its Applications(JDCTA) Volume7,Number8,April 2013
level, and realize low coupled with the underlying hardware. Not only can it reduce the development difficulties greatly, but also enhance the reusability of the upper layer application. As a result, the IoT middleware is generated. According to the above account, IoT middleware must have three functions at least [6][7]: (1) the hardware management: the main function is to realize the software system coupled loosely with hardware, including configuration and management of material IoT perception equipment. Generally, it refers to the RFID reader (mobile and fixed), wireless sensor network equipment, positioning terminal equipment, intelligent mobile phone equipment and other hardware system; (2) the data processing and transmission: the main function is to process a lot of nonsense, random sequence, uncertain IoT event, then produce a senior abstract events which are semantic. At last, it will route to the associated application; (3) services: providing a unified interface for different applications, and realizing the transparent operation for user or application. Through hierarchical collaboration of the three functions, it constitutes the basic frame of IoT middleware. The data processing is the core of the architecture. According to the semantic level, the existing data processing can be divided into two layers: data pre-processing and complex event processing [8]. Aiming to make my research specific, we make data preprocessing become a layer alone. Next we are focusing on the research of data processing.
According to the above level classification, this paper proposes IoT middleware structure is shown below:
Figure1. IoT Middleware Structure Diagram
The following is the brief overview of the main functions of layers:
Device-aware layer: It is one of the functional modules in the IoT framework, which is mainly responsible for the perception of devices information in the Internet of Thing, which include RFID reader (mobile and fixed), wireless sensor network devices, positioning terminal devices, smart mobile devices and other hardware systems. Device-aware layer is the basis of IoT middleware framework, which is used to shield the differences among the different manufacturers, and various models of hardware, as well as to provide accurate, real-time raw data to the data processing layer. Device-aware layer leaves unified, extensible interfaces based on the internal communication format of IoT middleware framework so that it can realize the reader connection through a simple configuration when need to access a hardware.
Data-processing layer: Data acquisition module will sent the data that read from label to format conversion module. Data format conversion module parsed the original label data into Unified format. Data filtering module use these filter to filter data and exchange data information with data storage module, after that sent the data to the data packet module, finally send to the IoT
Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin
middleware of upper structure until the application is received. As the diagram shows, data filtering is one of the most important links of the data processing module.
Event-processing layer: Setting correspond rules through the rules interface, and matching the effective events after data processing, if matching success, then generate events related to application, and trigger the corresponding operation. Afterwards it will be form a certain relationship that use the event operator and pattern matching, according to business logic rules. The relationship can be express form for (what, where, when, why) four elements of the semantic, it can also synthesis semantics complex events, according to the sensor information. Finally form business process events, according to the upper application specific needs,
Information sharing service layer: Information sharing service layer provides a unified interface for different applications in the supply chain system, the users in the system can invoke service by publish / subscribe model. Publish / subscribe model is a loosely coupled communication paradigm which is oriented distributed computing environment. In pub / sub system, because of the associated theme, the publishers and subscribers do not have to know where the other or online at the same time. It realizes the multidimensional coupling on time, space and data communications between both sides.
3. Research on data processing of IoT middleware
Compared with common data, the IoT data has the following features:
(1) Complexity. IoT data acquisition system is composed of multiple subsystems, including radio frequency identification (RFID) system, global positioning system (GPS), intelligent sensor system (WSN), it is that different tool collection of data acquisition will product different information. For example, RFID read is the product of EPC code information; GPS read is the product of the position information, etc. Therefore, to deal with data is very complicated.
(2) Semantic richness [9]. Be observations are carrying with the context state and background knowledge related information, the information is hidden, and have close relationship among the upper application logic. Using the related information can be derived derivative information further. For example, from product ID can find out its type, price, place of origin, etc., from the position of the reader can see goods store shelves position, etc. The data uploaded from Hardware equipment is a kind of low foundation data, which must rise to senior business logic data, and integrated existing application.
(3) Not accuracy. Because reader equipment and external environment factors, the data will still exist the certain error. Take RFID For example. The existing of RFID reader still leave error problems, such as leakage data, repeat data and dirty data etc. On the other hand, a reader can identify a variety of different object, so RFID data may contain a variety of different kinds of observations. For example, in a security entrance, it can not only identify pass in and out of the staff, but also identify pass in and out items. They are with different kinds of object, also the corresponding event semantics is not the same [10].
(4) Flow characteristics, batch sex and mass sex. The data uploaded from hardware equipment is the form of flow quickly and automatically generated, which need to accumulate up to support the tracking and monitoring application. And, it is with batch of characteristics sometimes that multiple objects will be intensively observation, for example, it will read a large number of data when a container registration. The scale of the RFID equipment deployment will produce unprecedented mass data. At present, the reader can capture per 120 to 400 tag data. For a deployment has 100 reader medium storage per second, it can produce 1.2 ~ 40000 a data, if each data of 20 bytes, then it will produce 1.6 GB ~ 60 GB every day. Therefore, the data quantity need to deal with is very large.
(5) Tense, dynamic and relevance [11]. Data acquisition obtains related data of product, such as RFID in the acquisition of goods inbound, outbound information, GPS obtain product position and status information dynamically in real-time and so on. Data is not exist independently but related, the tense and dynamic can derived relevance; temporal correlation expressed the timing relationships among events, the space correlation expressed the track of event development, the temporal and space correlation together expressed the change process of events which related to objects.
The above-described characteristics of IoT producing information show that there are many issues on the data of IoT producing to be resolved. Due to the inaccuracy and heterogeneity of the Internet of Thing sensing technology such as RFID and ZIGBEE, the data needs to be preprocessed before testing and processing. The current pretreatment is expanded based on the inaccuracies of the Internet of Thing data. Usually can be summed up in three points below:
Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin
Repeated read
When a label is outside of reading scope, the reader is still read the label. It is mainly because of electromagnetic interference when more readers exist at the same time, with great randomness. This kind of mistake is also called the pseudo [12].
Dirty read
When a label is in a reading scope, the reader also induction to the existence of the label tag, but the EPC value which the reader read is not correct.
Leakage read
When a label is in a reading scope, the reader is not read the label. This kind of situation could happen in the time when a reader read many labels at the same time and then some tags are missing. The survey showed that, reader can only be read the tag data which is in its induction range within 60% ~ 70%, it is that at least 30% of tag data was missing. It can be seen that the data read leakage phenomenon is more serious in the RFID applications. This kind of mistake is also called refused to true [13].
Conclusion with former discusses and the structure of IoT middleware, we set up data processing layer frame diagram is shown below:
Figure2. Data Processing Layer Frame Diagram
In the figure above, the data acquisition module will sent the data that read from label to format conversion module. Data format conversion module parsed the original label data into data triads, namely {TagID, ReaderID, ReadTime}. Data filtering module use these filter to filter data and exchange data information with data storage module, after that sent the data to the data packet module, finally send to the IoT middleware of upper structure until the application is received. As the diagram shows, data filtering is one of the most important links of the data processing module.
4. Research on data filtering modules
Not accuracy data is mainly caused by the repeated read, dirty read, leakage read three, so we next research data processing on the three data respectively [13].
After format conversion the data can be expressed as:
Struct TagData{
String TagID; // the name of tag String ReaderID; // the name of reader Date ReadTime; //the time of data read}
4.1. Repeated read processing- redundancy filter
In the practical application, the RFID reader read speed, so will produce a large amount of redundancy data. For example, the CSL461 UHF reader which speeds can reach 1000 times per second, obviously, a large proportion of these data become redundant data because of repeated reading [14]. So the data must be processing in order to reduce the burden of the upper application system. Here we use hash table as intermediate table to filter data. Set TagID hash table for the key word, t for time interval. After filtering the data output to a new storage list. Now take out data from the cache database and compare with hash table data, the redundancy filter algorithm is as follows:
Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin
(1)Judging whether the same of TagID: if the same is turn to (2), Otherwise turn to (3);
(2)Judging whether the Time less than t: if it is true, then should filter it because the data is redundant data, and update the ReadTime of the hash table; if it is false, then we should renew and output the ReadTime of the hash table, because the data is a new data. Turn back to (1) the next data processing;
(3)Inserting the hash table and output the label data to storage list. Turn back to (1) the next data processing.
Figure3. The Redundancy Filter Algorithm Flow Chart
4.2. Dirty read processing - smooth filter
After the data finishing the redundant processing and is stored in the storage list, dirty read processing can be started. The filters mainly smooth the tag data read by the reader, and masked those occasional tags, and set the time threshold value t and the counter threshold value n. Only a tag label appears n times during time t can be considered a stable state one. The choice of these two values directly determines the smoothing effect of the smoothing filter.
In this filter module, three lists have been designed: the input list, the output list and the storage list. The storage list save all of the label data read by the reader, whether or not they meet two threshold requirements of the smoothing filter. Get label node from the input list, and compare to the storage list, if the nodes in the storage list, then this node appearNumber plus 1; if not, it is stored in the storage list. In order to avoid duplicate detection the compared data, after the completion of the comparison, the input list is emptied. The filtering algorithms of the event filter are the following steps.
(1) Input the label data to the input list, and then sign the label node structure appearNumber based on the original input data label structure, which is used to count the number of occurrences of the tag. First it is initialized by 0, and that means the first occurrence.
(2)Get all the items in the input list, and compare to the storage list, if the nodes are in the storage list, then this node appearNumber plus 1; if not, it is stored in the storage list.
(3)Get a label node from the storage list, if this node's appearNumber value is equal to n during t, and the items haven’t been joined in the output list, then add it to the output list, at the same time mark that it has been joined into output list; if this node has exceeding t while appearNumber less than n, then delete the entry; if the residence time is greater than the count threshold, then delete the entry.
(4)Examine whether the judgment of the items stored in the list is completed, if not completed, proceed to step (3), if the judgment is completed, perform step (5).
(5)Send the data in the output list to next module, and empty the output list. The filtering algorithm flow chart of the smoothing filter is shown in Figure 4.
Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin
Figure4. The Smoothing Filter Algorithm Flow Chart
4.3. Leakage read processing- window filter
The first algorithm for the data processing of leakage data is fixed window smoothing algorithm, it fills leakage data by setting a fixed window size, but it can't comprehensive consideration of the reader environment, reader, speaking, reading and writing frequency, label recognition rate, upper applications to set the size of the window because of the window size fixed, and also it is difficult to balance various factors to obtain a satisfactory results. Then adaptive window smoothing algorithm developed, it is improvement of fixed window algorithm, but its calculation accuracy still can't meet the demand of the system. Here we are using adaptive window smoothing algorithm based on probability model [15].
Its model as follows: for label , the number N that labels are read within the smooth window meet Bernoulli binomial distribution B( ,
), is reader polling times or repeated test
times, and use it to show window size. is the average reading frequency of tag within smooth window, also named occurrence probability. When confidence for δ, guarantee data integrity of the sufficient conditions:≥
( )⁄
When the window is too small, expand window size appropriately. And condition for label dynamic changing is: || −
> 2 ×
1 − .
The || is the number of times that label detect in the smooth window inside. Thus according
to the current information of labels, adaptive adjustment window size and then we can ensure that more accurate filling data.
5. Conclusion
This article mainly aims at the specific requirements of Internet of Thing’ application, and proposed a kind of architecture model based on data processing and complex events, for the characteristics of RFID data not accuracy, this paper puts forward a kind of RFID middleware data processing model, and the data processing module can be divided into data acquisition, data format
Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin
conversion, data filtering, data storage and data packet five modules. This paper had a detailed study on the model of the data filtering module, in view of the uncertainty caused by data read, dirty read and leakage read ,we designed three main corresponding filters: redundancy filter, smoothing filter and the filter window. The next job is to study the rules and set up module, according to specific application setting rules, and matching to the effective events processed by data processing module, and output complex events for upper application service.
6. Acknowledgement
This paper is supported by the Funding Project for Technology Key Project of Municipal Education Commission of Beijing (the grant number KM201210037037). Funding Project for Program of Municipal Education Commission of Beijing (the grant number SZ201110037018). Funding project for the National Key Technology Research and Development Program (2011BAH18B03).The Project of Construction of Innovative Teams and Teacher Career Development for Universities and Colleges Under Beijing Municipality (IDHT20130517).
7. References
[1] ITU Internet Report 2005: Internet of Things 2005.
[2] Shen Subin. Study on the architecture of the Internet and related technologies[J].Journal of Nanjing University of posts and telecommunications, 2009, 6.
[3] Zhang Yi, Tong Hong. Summary of the Internet of Things 2010,4,24-27.
[4] Shen Subing.Internet of Things architecture and related technologies[J].Nanjing University of Posts and Telecommunications,2009,(6).
[5] Wang Hui, Shi Xiaoying. Middleware services and their integration framework. Computer Engineering and Applications. 1998, 9,25-27.
[6] LiLi, ZhuQingXin, wang fang. EPC system of middleware study [J]. Computer engineering and design, 2006, (18) : 3360-3363.
[7] Miao Wu, Ting-Jie Lu, Fei-Yang Ling, Jing Sun, Hui-Ying Du. Research on the architecture of Internet of Things. In: Advanced Computer Theory and Engineering (ICACTE), 2010 3rd International Conference. 484-487.
[8] ShaoHuaGang, ChengJin, WangHui, etc. For things networking system and middleware design [J]. Computer engineering, 2010, 4 (17): 84-86.
[9] S.E.Sarma, S.A.Weis, D.W. Engels. Secure risks and challenges. RSA Laboratories Cryptobytes. 2003. 6(1): 2~9.
[10] Vibhor Rastogi, Dan Suciu, Evan Welbourne. Access Control over Uncertain Data. In: Auckland, New Zealand: VLDB. 2008.
[11] Miao Wu, Ting-Jie Lu, Fei-Yang Ling, Jing Sun, Hui-Ying Du. Research on the architecture of Internet of Things. In: Advanced Computer Theory and Engineering (ICACTE), 2010 3rd International Conference.484-487.
[12] Roozbeh Derakhshan, Maria E. Orlowska, Xue Li. RFID Data Management Challenges and Opportunities. In: IEEE First International Conference on RFID, Gaylord Texan Resort, Grapevine,Texas,USA. 2007.175~182.
[13] Bo Feng, JinTa Li, Ping Zhang. Study of RFID Middleware for Distributed Large-scale Systems. In: Information and Communication Technologies. ICTTA, 2006.2754-2759. [14] WuYuCai, YangJianJiang. RFID middleware data processing and its application in the
information management system [J]. Computer CD software and applications. 2011( 14) :14-15.
[15] ZhangMingZhe, b: mr.zhang qiang, YuanWei, etc. Embedded RFID middleware data filtering model study [J]. Computer engineering and design, 2010, 31 (17) :3743-3746.
Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin Liu Bingwu, Huang Kun, Li Juntao, Cheng Xiaolin