Usage pattern discovery, which finds valu- valu-able patterns from the prepared usage data,

Pattern Matching

3. Usage pattern discovery, which finds valu- valu-able patterns from the prepared usage data,

4. Usage pattern analysis and visualization,

which is to analyze and display the discov-ered patterns for finding hidden knowledge, 5. Usage pattern applications, one of which and is smartphone data protection used in this research.

This chapter is organized as follows. A back-ground study consisting of three major subjects is given in the next section. Section III introduces our proposed system using handheld usage pattern matching. Two algorithms of structure similarity are used to check against any possible unauthor-ized uses: (i) approximate usage string matching and (ii) usage finite automata. The two methods are explained in the following two sections. Sec-tion VI shows and discusses some experimental results. Conclusion and some future directions are given in the last section.

BACKGROUND

This research includes three themes:

• Mobile handheld computing, which is the computing for (smart) cellular phones.

• Handheld security, which detects abnor-mal handheld data accesses and protects the data from unauthorized uses.

• Approximate string matching, which is to find the “best” match of a string among many strings.

Related research of these themes will be dis-cussed in this section.

Mobile Handheld Devices

Handheld devices such as smart cellular phones are the must, key component of mobile commerce transactions. People normally have problems understanding the technologies used by devices because they involve various, complicated disci-plines such as wireless and mobile networks and mobile operating systems. Figure 1 shows the system structure of a generic mobile handheld device, which includes five major components (Hu, Yeh, Chu, & Lee, 2005):

• Mobile operating system or environ-ment: Mobile OSs, unlike desktop OSs, do not have a dominant brand. Popular mobile OSs or environments include: (i) Android, (ii) BREW, (iii) iPhone OS, (iv) Java ME,

(v) Palm OS, (vi) Symbian OS, and (vii) Windows Mobile.

• Mobile central processing unit: ARM-based CPUs are the most popular mobile CPUs. ARM Ltd. does not manufacture CPUs itself. Instead, it supplies mobile CPU designs to other chipmakers such as Intel and TI.

• Input and output components: There is only one major output component, the screen, but there are several popular input components, in particular keyboards and touch screens/writing areas that require the use of a stylus. Other I/O components in-clude loudspeakers and microphones.

• Memory and storage: Three types of memory are usually employed by hand-held devices: (i) random access memory (RAM), (ii) read-only memory (ROM), and (iii) flash memory. Hard drives are rarely used.

• Batteries: Rechargeable Lithium Ion bat-teries are the most common batbat-teries used by handheld devices. Fuel cells, a promis-ing technology, are still in the early stage of development and will not be widely ad-opted in the near future.

Figure 1. A system structure of mobile handheld devices

Synchronization connects handheld devices to desktop computers, notebooks, or peripherals to transfer or synchronize data. Obviating the need for serial cables, many handheld devices now use either an infrared (IR) port or Bluetooth technology to send information to other devices.

The widespread availability of handheld mobile devices and the constantly improving technology that goes into them is opening up new approaches for mobile commerce, which is consequently becoming an increasingly attractive prospect for many businesses.

Handheld Security

The methods of handheld security can be classi-fied into five categories: (i) password/keyword identification, (ii) human intervention, (iii) biometric-based identification, (iv) anomaly/

behavior-based identification, and (v) other ad hoc methods. Details of each category are given as follows:

• Password/keyword authentication: It is a fundamental method of data protection and most handheld devices include the op-tion of password protecop-tion. However, de-vice users are reluctant to use it because of the inconvenience of password memoriza-tion and entry. Data encrypmemoriza-tion is another data protection method and can be found in many handheld devices. Just the same, it is inconvenient because reading data re-quires key entry and the decryption takes time and requires extra works. Reviews of encryption algorithms and standards can be found in the article from Kaliski (1993, December). Some of the related research is described as follows:

◦ Public keys are used to encrypt confi-dential information. However, limited computational capabilities and power of handheld devices make them ill-suited for public key signatures.

Ding et al. (2007) explore practical and conceptual implications of using Server-Aided Signatures (SAS) for handheld devices. SAS is a signature method that relies on partially-trusted servers for generating (normally ex-pensive) public key signatures for regular users.

◦ Argyroudis et al. (2004) present a performance analysis focused on three of the most commonly used security protocols for networking applications, namely SSL, S/MIME, and IPsec. Their results show that the time taken to perform cryptographic functions is small enough not to sig-nificantly impact real-time mobile transactions and that there is no ob-stacle to the use of quite sophisticated cryptographic protocols on handheld mobile devices.

◦ Digital watermarking is particularly valuable in the use and exchange of digital media on handheld devices.

However, watermarking is com-putationally expensive and adds to the drain of the available energy in handheld devices. Kejariwal et al.

(2006) present an approach in which they partition the watermarking em-bedding and extraction algorithms and migrate some tasks to a proxy server. This leads to lower energy consumption on the handheld with-out compromising the security of the watermarking process. A survey of digital watermarking algorithms are given by Zheng, Liu, Zhao, & Saddik (2007, June).

• Human intervention: Several companies such as the device manufacturer HP (2005) and the embedded database vendor Sybase (2006) propose practical handheld security methods, e.g., the owners of lost devices

can call the centers to lock down the devic-es remotely. Those methods are normally workable, but not innovative. Additionally, they are a passive method. It may be too late when the users find out that their de-vices are lost.

• Biometric-based identification:

Advanced devices use biometric measure-ments such as fingerprint, retina, and voice recognition to identify the owners (Hazen, Weinstein, & Park, 2003; Weinstein, Ho, Heisele, Poggio, Steele, & Agarwal, 2002).

This approach is not widely adopted be-cause the methods are not yet practical.

For example, an extra sensor may be re-quired for fingerprint recognition. This method also has a reliability problem. For example, if the owner’s finger is cut or the owner has a sore throat, it would affect the recognition result.

• Anomaly/behavior-based identifica-tion (Shyu, Sarinnapakorn, Kuruppu-Appuhamilage, Chen, Chang, & Goldring, 2005; Stolfo, Hershkop, Hu, Li, Nimeskern,

& Wang, 2006): This is the approach used by this research. It protects the handheld data by detecting any unauthorized uses by comparing the current usage patterns to the stored patterns. The patterns include application usage, typing rhythm, etc.

When the measured activities are outside baseline parameters or clipping levels, a built-in protection mechanism will trigger an action, like inquiring a password, before further operations are allowed to continue.

• Other Ad Hoc Methods:Susilo (2002) identifies the risks and threats of having handheld devices connected to the Internet, and proposes a personal firewall to protect against the threats. A method of transient authentication can lift the burden of au-thentication from users. It uses a wearable token to check the user’s presence

con-arated, the token and device lose contact and the device secures itself. Nicholson, Corner, & Noble (2006, November) ex-plain how this authentication framework works and show it can be done without inconveniencing the users, while imposing a minimal performance overhead. Shabtai, Kanonov, & Elovici (2010, August) pro-pose a new approach for detecting previ-ously unencountered malware targeting mobile devices. The method continuously monitors time-stamped security data with-in the target mobile device. The security data is then processed by the knowledge-based temporal abstraction (KBTA) meth-odology. The automatically-generated temporal abstractions are then monitored to detect suspicious temporal patterns and to issue an alert.

Approximate String Matching

The longest common subsequence searching method is widely used for approximate searches.

This method, however, does not always reveal the degree of difference between two strings. This research proposes an approximate method for better characterizing the discrepancies between two strings. Three subjects, (i) longest common subsequences, (ii) string-to-string correction, and (iii) string matching, are related to the proposed string searching method.

Longest Common Subsequences

Finding a longest common subsequence (abbrevi-ated LCS) is mainly used to measure the discrep-ancies between two strings. The LCS problem (Hirschberg, 1977) is, given two strings X and Y, to find a maximum length common subsequence of X and Y. A subsequence of a given string is just the given string with some symbols (possibly none) left out. String Z is a common subsequence

Y and Hirschberg suggested two algorithms to solve the problem. However, the LCS problem is a special case of the problem of computing edit distances (Masek & Paterson, 1980). The edit distance between two character strings can be defined as the minimum cost of a sequence of editing operations, which transforms one string into the other.

String-to-String Correction

The string-to-string correction problem, first sug-gested by Wagner and Fischer (1974), determines the distance between two strings as measured by the minimum cost sequence of edit operations required to change the first string into the other.

The edit operations investigated allow insertion, deletion, and change. Lowrance and Wagner (1975) proposed the extended string-to-string problem to include in the set of allowable edit operations the operation of interchanging the positions of two adjacent symbols. An example of this problem, allowing only deletion and swap operations, was proven to be an NP-complete problem by Wagner in 1975.

String Matching

The string matching problem, given strings P and X, examines the text X for an occurrence of the pattern P as a substring, namely, whether the text X can be written as X = YPY’, where Y and Y’

are strings. Several algorithms for this problem have appeared in the literature (Baeza-Yates &

Gonnet, 1992). In some instances, however, the pattern and/or the text are not exact. For example, the name may be misspelled in the text. The ap-proximate string matching problem reveals all substrings in X that are close to P under some measure of closeness. The most common measure of closeness is known as the edit distance, which determines whether X contains a substring P’

that resembles P in at most a certain edit distance from P to P’. The editing operation, for example,

may change one symbol of a string into another, delete a symbol from a string, or insert a symbol into a string. Some approximate string matching algorithms can be found in the literature (Wu &

Manber, 1992).

Smartphone Data Protection Using Handheld Usage Pattern Matching This research applies user operation patterns to identify and prevent the accesses of unlawful handheld users. This research proposes the fol-lowing steps to protect sensitive data in a handheld device from unauthorized accesses (Hu, Yang, Lee, & Yeh, 2005):

1. Usage data collection, 2. Usage data preparation, 3. Usage pattern discovery,

4. Usage pattern analysis and visualization, 5. Usage pattern applications for handheld data and

protection.

Figure 2 shows the steps and data flows among them. If the system detects a different usage pat-tern from the stored patpat-terns, it will assume the users are unlawful and block their accesses. The Figure 2. The structure of the proposed system

users need to verify their identities such as enter-ing passwords or answerenter-ing a question in order to continue their operations. This approach has the advantages of convenience and vigorous protec-tion compared to other approaches like password protection and fingerprint recognition.

Usage Data Collection

This stage focuses on collecting data of defined categories in order to construct user usage profile.

Based on industry studies and our observations, each handheld user normally follows unique patterns to operate their devices. Possible user pattern measurements include, but not limited to:

• Turn on/off frequency

◦ Measured by day and time using the mean and standard deviation

◦ Useful for detecting un-authorized users, who are likely to operate a handheld device during off-hours when the legitimate user is not ex-pected to be using the that device

• Location frequency

◦ Measures the frequency of opera-tions of a handheld device at different locations

◦ Useful for detecting un-authorized users who operates from a location (e.g., a terminal) that a particular user rarely or never visits

• Elapsed time per session

◦ Resource measure of elapsed time per session

◦ Significant deviations might indicate masquerader

• Quantity of output to location ◦ Quantity of output to terminal

Excessive amounts of data transmitted to remote locations could signify leakage of sensi-tive data (e.g., the method used by an attack via

The usage data should include the user’s unique characteristics of using that handheld device. Our research is based on the assumption that every user has a set of distinguishable and identifiable usage behaviors, which can separate this user from others. This assumption has been verified and ap-plied to other information security applications, including intrusion detection. For example, a cell phone user may follow the patterns below to oper-ate his/her phone the first thing in the morning:

• Turn on the cellular phone.

• Check phone messages.

• Check address book and return/make phone calls.

• Check instant messages.

• Reply/write messages.

• Check schedule book.

• Write any notes.

• Turn off the cellular phone.

The above steps are an example of handheld usage patterns. Other patterns exist for the user and each user has his/her own unique usage patterns.

To collect usage data, users click on the icon “Pat-tern” on the interface in Figure 3a to bring up the interface in Figure 3b, which asks users to enter a number of days of usage data collection. The collection duration could be a week or a month depending on the use frequencies. The interface as shown in Figure 3a is re-implemented; so when

Figure 3. (a) The user interface of a device re-implemented to collect usage data and (b) user entry of data collection time

an application is clicked, it is recorded and the application is then activated.

Usage Data Preparation

The data collected from the previous step is usually raw and therefore cannot be used effectively. For example, the usage patterns should not include an event of alarm-clock operation if the user rarely uses the alarm clock. Data preparation may in-clude the following tasks (Mobasher, Cooley, &

Srivastava, 2000):

• Delete the event whose frequency is less than a threshold value such as 5. For ex-ample, if the usage data is collected for a month, data synchronization can be ig-nored if it is performed twice during that period.

• Remove the event if its duration is less than a threshold value such as 10 seconds.

An event lasting less than 10 seconds is usually a mistake.

• Repeatedly performing the same action is considered performing the action one time.

For example, making three phone calls in a row is treated as making one call.

The interface in Figure 4. allows users to decide whether or not to modify the default threshold

values. If the user clicks the button “Yes,” the interface in Figure .b allows him/her to enter two new threshold values.

After the raw usage data is prepared, a usage tree is created. Figure 5 shows a sample simplified usage tree, where the number inside the paren-theses is the number of occurrences. For example, (20) means the event occurs 20 times. This usage tree is only a simplified example. An actual usage tree is much larger and more complicated. Ide-ally, a directed graph instead of a tree should be used to describe the usage data. However, a di-rected graph is more complicated and therefore is difficult to process. Using a tree can simplify the processing, but it also creates duplicated nodes, e.g., the event “making phone calls” appears four times in the usage tree of Figure 5.

Usage Pattern Discovery, Analysis and Visualization, and Applications The stage of usage pattern discovery focuses on identifying the desired usage patterns. Given the complexity and dynamic nature of user behaviors, identified usage patterns could be fuzzy and not that apparent. Advanced AI techniques such as machine learning, decision tree and other pattern matching and data mining techniques can be ap-plied in this stage. Many data mining algorithms are applied to usage pattern discovery. Among them, most algorithms use the method of sequential pattern generations (Agrawal & Srikant, 1995), while the remaining methods tend to be rather ad hoc. The problem of discovering sequential patterns consists of finding inter-transaction patterns such that the presence of a set of items is followed by another item in the time-stamp ordered transaction set.

The major task of the step of pattern analysis and visualization is to pick useful ones from the discovered patterns and display them. If the figure of the usage tree and the usage DFA in the Section Figure 4. (a) Users deciding whether or not to

modify the threshold values and (b) two input fields for threshold values

IV can be displayed on the device screen, it will greatly help the mobile users to better manage the proposed methods. However, creating and displaying complicated figures takes much com-putation time and consumes valuable resources such as memory from the device. Therefore, this research allows users to check the usage data, which may be too complicated to use by users, but not the usage figures.

Usage pattern applications are the last but most important step in user recognition. It applies the final patterns to handheld data protection. A key part of this task is to reduce false positive and false negative while matching actual observed data set with the pre-built user profiles. Usage patterns can be applied to various applications such as recom-mendation systems (Adda, Valtchev, Missaoui,

& Djeraba, 2007) and Web page re-organization (Eirinaki & Vazirgiannis, 2003). This research uses the handheld usage pattern identification to find any illegal uses of the device. Details of pattern applications of handheld data protection will be given in the next two sections.

APPROXIMATE USAGE

In document 1609608518Cyber_SecurityB (Page 44-51)