Behavior-based Malware Detection using Deep Learning for Improve Security of IoT Infrastructure

(1)

Vol. 28, No. 5, (2019), pp. 128-134

Behavior-based Malware Detection using Deep Learning for Improve Security of IoT Infrastructure

Hyun-Woo Kim¹, Eun-Ha Song^*2

1Dept. of Information Security, Baewha Women’s University, Seoul, Republic of Korea

*2Collage of Convergence & Liberal Arts, Wonkwang University, Iksan, Jeonbuk, Republic of Korea

1[email protected], ^*2[email protected]

Abstract

Background/Objectives: Recent advances in Internet of Things (IoT) technologies require a new type of IoT security environment. Various heterogeneous smart devices have easy access to IoT environment, and as the number of users increases, they are exposed to various threats such as malicious attacks on IoT devices and IoT infrastructure, and data tampering by malicious code. Malware detection in IoT requires data and models for continuous and changing learning of smart devices.

Methods/Statistical analysis: To minimize these security threats, various malware detection techniques in the field of IoT security have been studied. Malware detection in IoT environment is important for data derivation and learning model required for continuous and changing learning of smart devices. The metadata of malware detection can be normalized by the value of device id, time, behavior, location and state. This paper proposes behavior-based malware detection using deep learning (BMD-DL).

Findings: BMD-DL was able to collect metadata about behavior-based malicious behavior and learn and detect malicious codes through deep learning. In addition, through the learned model, IoT Security is provided by disconnecting malicious devices that cause malicious behavior in the IoT environment.

Improvements/Applications: BMD-DL collects behavioral data generated from multiple devices in the IoT and applies the results learned through deep learning to detect persistent malware.

Keywords: Internet of Things Infrastructure,IoT Security,Malware Detection, Malicious Behavior Detection, Deep Learning, Behavior-based Data Collection

1. Introduction

Recently, the development of intelligent services provides an IoT environment in which various devices are integrated through a network. IoT technology, which is used for convenience of life and for various purposes, has become a target of malicious users as it enables high performance computing and processing of large tasks [1].

It is important to defend against various attacks such as spoofing attacks, denial of service (DoS) attacks, jamming, and eavesdropping, which are vulnerable to numerous IoT-based services for different purposes. There is a need for privacy protection to prepare for vulnerabilities in accessibility features that allow quick and easy access to IoT's miniaturized devices. Beyond security for IoT devices, strong defenses against environmentally oriented malicious attacks such as home automation and smart grids / cities are also required. To minimize malicious attacks in the IoT environment, intelligent IoT security technology using machine learning, unsupervised learning, and reinforcement learning is needed [2,3,4].

The purpose-built, diverse IoT environment is a malicious attack that results from a

(2)

Vol. 28, No. 5, (2019), pp. 128-134

variety of external vulnerabilities, including a single device, network, firmware update, and new user connections. Existing studies can detect malicious behavior in a single device or feature-based detection of malicious code that has already been analyzed.

However, it is very difficult to detect attacks by the association of different behaviors occurring in various environments. For this, learning process is necessary to classify ambiguous act into malicious act and non-malignant act to detect malicious attack. In other words, we need to extract features that can define malicious behavior and construct a learning model that can learn the behavior [5,6,7,8,9,10].

In this paper, we propose a behavior-based malware detection using deep learning (BMD-DL) for IoT security. BMD-DL detects malicious codes generated from various devices by defining and learning behavior data generated by different devices in the IoT environment. BMD-DL detects malicious codes based on behavior data and can proactively block devices causing attack behavior in IoT. In addition, it is possible to supplement the zero-day attack problem of existing malware detection techniques through continuous learning and updating of behavior data.

The malicious attack and malware detection technologies of the existing IoT environment are as follows.

The Long Short-Term Memory model-2 (LSTM-2) proposed various malware detection techniques by analyzing the OPcode of IoT applications. Deep learning techniques such as LSTM are found to have higher malware detection rate than existing techniques such as SVM and decision tree [1]. However, using statically configured malware samples requires additional ways to combat constantly changing malware.

Malware injection attack proposed a malware detection architecture for cloud users exposed to malicious attacks, but focused on protecting malicious attacks due to network transmission, insufficient countermeasure against infected malware [2].

Android malware classification proposed a way to classify malware by analyzing the permission and source code of applications on smart devices such as android [6]. This method can increase the detection rate of malware in a single device by using static analysis method, but it is difficult to apply to the detection of malware that can be generated by various devices.

Intrusion detection system (IDS) proposed signaling game-based malware detection strategies to defend against the spread of malware [7].

Next, we propose a machine learning technology [8] Windows environment setup and maintenance malware analysis.

Analysis system and network security design (SNSD) system and network security design for various malware analysis machine learning [9]. In the case of static analysis, only the attack with the extracted features could be detected. In the case of dynamic analysis, a separate test bed was required.

The Internet of Battlefield Things (IoBT) proposed a malware detection technique using OPcode [10]. We consider learning about malicious attack by converting OPcode to vector space. Significant values were obtained only for unique features and require an approach to various data sets.

In this paper, we propose a new malware detection model using various behavior data generated in IoT devices.

2. BMD-DL Scheme

This section looks at the behavior-based data collection of smart devices, learning to detect malware, and blocking smart devices by detecting malware in IoT in the proposed IoT infrastructure of BMD-DL. BMD-DL configures IoT infrastructure with nodes and

(3)

Vol. 28, No. 5, (2019), pp. 128-134

servers. Node is a smart device in IoT infrastructure and server is BMD-DL Controller in IoT infrastructure. The BMD-DL controller is a cluster with computing capacity to operate the BMD-DL.

This paper constructs BMD-DL to provide malware detection in IoT infrastructure as shown in Figure 1.

Figure 1. Overview of BMD-DL

Step1. All nodes of BMD-DL store behavior-based data of system call, process, network, memory, and file system. When the size of stored behavior data is proportional to the size of wireless network transmission rate, it is sent to the server periodically.

Step2. The server of BMD-DL composes data set of deep learning based on learning data of individual node. Learning data uses device id, time, behavior, location and state data from node.

Step3. The server of BMD-DL delivers the trained model for malware detection to the node. Node updates the trained model for malware detection.

Step4. The node of BMD-DL detects malicious code occurring in the node through malicious code trained detection model.

Step5. When malicious code occurs in any node of BMD-DL, the server identifies the node where the malicious code occurred. The IoT infrastructure is protected by blocking the connection between the node where the malware occurred and the server or by deleting the malicious data.

Normalized metadata is required to collect behavior-based data of individual nodes.

Metadata can be identified in five ways: device ID, time, behavior, location, state. The metadata of BMD-DL is shown in Table 1. The server collects metadata continuously provided by nodes. The collected metadata becomes a data set for learning. Server learns to detect various malicious codes by using collected data set.

Table 1.BMD-DL Metadata Component Explanation

Device ID The unique identification number of the IoT device.

Time The time when the IoT device generated the behavior data.

Behavior Meaning behavior data type and value of IoT device. Behavior data is classified into system call, process, network, memory, and file system.

Location The connection location value of the IoT device.

State Shows connection and operation status of IoT devices in IoT infrastructure.

(4)

Vol. 28, No. 5, (2019), pp. 128-134

The behavior for deep learning in BMD-DL is identified according to system call, process, network, memory, and file system. Table 2 shows the meaning of behavior data transmitted from BMD-DL to Server.

Table 2.Behavior data Component Explanation

System call It consists of the type, name, variable type, and variable name of the current syscall.

Process It consists of the PID that executed the process and the fork, delete, and list values of the process.

Network It consists of the server and node IP, network type, port, message, and buffer address values.

Memory It consists of read / write of byte at memory address location and means n data as many as byte.

File System It consists of file size, file path, permissions, and read / write values.

3. Design of BMD-DL

The function of BMD-DL is composed of Node Manager, Server Manager, Deep learning Manager, and User Interface. Figure 2 shows the structure of BMD-DL.

Node Manager is a function for operation of IoT device of IoT infrastructure. The component consists of Node Connector, Malware Detector, Metadata Sender, and Metadata.

Server Manager is a function for collecting and learning behavior data of IoT device of IoT infrastructure. The component consists of Metadata Receiver, Malware Detector, Node Commander, Dataset collector (DSC), Deep Learning Starter (DLS), and Learned Model (LM).

Deep Learning Manager is a function to provide deep learning. It consists of Deep Learning Algorithms Input / Output (DLAIO), Deep Learning List (DLL), and Deep Learning Selector (DLS).

User Interface is a function that connects users and IoT devices. The component consists of Restart (RS), Environment Setting (ES), Malware Status and Environment.

Malware Status is divided into Malware Name (MN), Malware Count (MC), and Time of Occurrence (TO). Environment is divided into Device Name (DN), Device State (DS), and Algorithm Input / Output (AIO).

(5)

Vol. 28, No. 5, (2019), pp. 128-134

Figure 2. BMD-DL Architecture

4.Performance Evaluation

The physical infrastructure environment of BMD-DL is as follows. BMD-DL's server uses Xeon E5-2695 V4 2.10 GHz. It uses two CPUs and has a total of 72 cores. Memory is 123GB and storage is 200GB. The node of BMD-DL uses Raspberry Pi 2. The CPU is ARM Cortex-A7 and the Memory is 1GB. The malicious file used the Microsoft Malware Classification Challenge (BIG 2015) dataset to determine whether BMD-DL was detected in malware [11].

Figure 3 compares the accuracy of malware detection of nodes with BMD-DL and that of single devices without BMD-DL. As the epoch increases, the accuracy is improved.

Malware detection with BMD-DL was similar to the malware detection on a single device. It can be seen that BMD-DL application shows significant accuracy of malware detection as in a single device.

In addition, when the epoch was 5 or less, the accuracy of a single device was very unclear. In the case of epochs less than 20, BMD-DL was analyzed to have higher malware detection accuracy than a single device. The BMD-DL proved to be superior in performance even with a small amount of learning less than 20, which is higher than a single device.

(6)

Vol. 28, No. 5, (2019), pp. 128-134

Figure 3. Comparison of malware detection accuracy between BMD-DL and single device

5. Conclusions

The use of different devices in the IoT and cloud environments has resulted in various attacks, such as malware. As a result, malware detection technology for IoT security, a network environment in which many devices are connected, has been discussed. The BMD-DL proposed in this paper collects behavior data generated by different devices of IoT and detects continuous malware by applying the learned results through deep learning. When applying the proposed method, the detection rate of malicious code was significant. In addition, it was possible to block smart devices attacking the IoT environment due to malware detection.

In a later study, we will apply a deep learning model to detect various malicious codes in BMD-DL. The deep learning model will include preprocessing to broaden the definition of risk behaviors for malware detection and normalize them through correlation analysis. We will minimize false positives and false negatives by adding a layer for inferring malicious behavioral meanings of behavioral data. Also, we will break down the behavior of various malicious codes by node and study the malicious code detection technique for serverlessIoT environment.

References

[1] HamedHaddadPajouh, Ali Dehghantanha, RaoufKhayami, Kim-Kwang Raymond Choo, "A deep Recurrent Neural Network based approach for Internet of Things malware threat hunting," Future Generation Computer Systems, Vol. 85, pp. 88-96, Aug. (2018).

[2] AnavBedi, Nitin Pandey, Sunil Kumar Khatri, “Analysis of Detection and Prevention of Malware in Cloud Computing Environment,” 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, United Arab Emirates, 4-6, Feb. (2019), pp. 918-921.

[3] Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, Patrick McDaniel,

"Adversarial Examples for Malware Detection," European Symposium on Research in Computer Security, Luxembourg, 23-27, Sep. (2019), pp. 62-79.

[4] Liang Xiao, Xiaoyue Wan, Xiaozhen Lu, Yanyong Zhang, Di Wu, "IoT Security Techniques Based on Machine Learning: How Do IoT Devices Use AI to Enhance Security?, " IEEE Signal Processing Magazine, Vol. 35, No. 5, pp. 41-49, Sep. (2018).

[5] NickolaosKoroniotis, NourMoustafa, Elena Sitnikova, "Forensics and Deep Learning Mechanisms for Botnets in Internet of Things: A Survey of Challenges and Solutions, " IEEE Access, Vol. 7, pp.

61764-61785, May. (2019).

[6] Nikola Milosevic, Ali Dehghantanha, Kim-Kwang Raymond Choo, "Machine learning aided Android malware classification," Computers and Electrical Engineering, Vol. 61, pp. 266-274, Jul. (2017).

(7)

Vol. 28, No. 5, (2019), pp. 128-134 [7] Shigen Shen, Longjun Huang, Haiping Zhou, Shui Yu, En Fan, Qiying Cao, "Multistage Signaling Game-Based Optimal Detection Strategies for Suppressing Malware Diffusion in Fog-Cloud-Based IoT Networks," IEEE Internet of Things Journal, Vol. 5, No. 2, pp. 1043-1054, Apr. (2018).

[8] Daniele Ucci, Leonardo Aniello, Roberto Baldoni, "Survey of machine learning techniques for malware analysis," Computers & Security, Vol. 81, pp. 123-147, Mar. (2019).

[9] S. SibiChakkaravarthy, D. Sangeetha, V. Vaidehi, "A Survey on malware analysis and mitigation techniques," Computer Science Review, Vol. 32, pp. 1-23, May. (2019).

[10] Amin Azmoodeh, Ali Dehghantanha, Kim-Kwang Raymond Choo, "Robust Malware Detection for Internet of (Battlefield) Things Devices Using Deep Eigenspace Learning," IEEE Transactions on Sustainable Computing, Vol. 4, No. 1, pp. 88-95, Jan-Mar. (2019).

[11] Jungho Kang, Sejun Jang, Shuyu Li, Young-SikJeong, Yunsick Sung, "Long short-term memory-based Malware classification method for information security," Computers and Electrical Engineering, Vol.

77, pp. 366-375, Jul. (2019).

[12] Bhoi AK, Sherpa KS, Khandelwal B, Mallick PK. T Wave Analysis: Potential Marker of Arrhythmia and Ischemia Detection-A Review. InCognitive Informatics and Soft Computing (2019) (pp. 121-130).

Springer, Singapore.