Research Article
a
December
2017
Computer Science and Software Engineering
ISSN: 2277-128X (Volume-7, Issue-12)
Malware Analysis
Ujaliben Kalpesh Bavishi*, Bhavesh Madanlal Jain
Department of Computer Science, California State University, Sacramento, United States
Abstract— Malware, also known as malicious software affects the user’s computer system or mobile devices by exploiting the system’s vulnerabilities. It is a major threat to the security of the computer systems. Some of the types of malwares that are most commonly used are viruses, trojans, worms, etc. Nowadays, there is a widespread use of malware which allows malware author to get sensitive information like bank details, contact information which is a serious threat in the world. Most of the malwares are spread through internet because of its frequent use which can destroy large systems piercing through network. Hence, in this paper, we focus on analyzing malware using different tools which can analyze the malware in a restricted environment. Since many malware authors uses self-modifying code and obfuscation, it is very difficult for the traditional antivirus software to detect the malware which identifies that it is under scan and it can change its execution sequence. So, in order to address the shortcomings of the traditional antivirus software, we will be discussing some of the analysis tools which runs analysis on the malware in an effective manner and helps us to analyze the malware which can help us to protect our system’s information.
Keywords— Treemap;Thread Graphs; Dynamic Translation; obfuscation; Sandbox; Hardware virtualization
I. INTRODUCTION
Malware[1] is a general term used for programs having malicious code snippet which may cause a major threat to any user. Malware can contain malicious codes of viruses, worms, Trojan horses, could also create a back door to leak personal information or take control of one’s system. Through malware serious crimes can be done this is why malware detection is necessary. In order to do malware detection its definition should be created for which malware analysis is important. Malware analysis consists of analyzing different aspects[2] of a malware so that such malware's could be detected. The malware definitions are also known as signatures, these signatures are used by virus-scanners popularly known as anti-viruses to detect the malware.
There are various problems to analyze a malware[2]. A malware can be self-modifying, obfuscating, can have hidden encoded malicious code which are unpacked at the time of execution, could have various blocks of separate code which may look like a normal program but may have various jump statements to form a malicious code. So, in order to analyze such behaviors of malware we use two methods.
A. Static Analysis.
Static analyzing does not perform execution of code instead it checks the control and data flow of the program to determine its characteristics. Here each possibility of the code can be analyzed by using the concepts of backtracking and analyzing all the possibilities in which the code can execute. There are three different static analysis techniques which are widely used, namely:
a. String Signature: In this technique the analyzer looks for a specific types of malicious code statements to know whether it’s a malware or not.
b. Control Flow Graph- In this method the control flow between the code statements is checked to determine the malicious behavior of the program.
c. Semantic-Aware Analysis- In this method Semantic of the programs is checked and the analyzer checks the actual meaning of the semantics and sees to it that there is no hidden meaning.
The limitation of this Static analysis is it may or may not analyze a self-modifying, obfuscating code which may cause a threat to one’s system.
B. Dynamic Analysis
ISSN(E): 2277-128X, ISSN(P): 2277-6451, pp. 27-33
the actual system is not harmed in the process. Using this method even the characteristics of a self-modifying code can be observed and used to create a signature. The various dynamic strategies to analyze a program are as follows
1) Function Call Monitoring- A program invokes various function calls to access different functions of an operating system. These function calls are monitored and logged into a file in order to analyze any suspicious unauthorized calls.
2) Function Parameter Analysis- the Function calls which are invoked by the program does not give us all the information of the actions performed using that function. For example, it is difficult to differentiate between a file open and a file creates API calls for which parameter analysis is necessary to get the complete information. 3) Information Flow Tracking- The data and information collected from one’s system is kept track on to see where
the information goes so that it does no fall into wrong hands. This technique analyzes weather sensitive information falls into the right hands and is leaked.
4) Information Flow Trace- This is a type of extension to the above technique; the information flow is logged into a log file to keep a trace on the information and to see who has which information.
The limitation of dynamic analysis is it's difficult to backtrack and to analyze all the possible paths. Therefore, both static and dynamic analysis should be used together to efficiently analyze the behavior of a malware.
In this paper, the following techniques are used to analyze the malware, they are
TTAnalyze tool[2] uses Qemu (PC emulator) that analyzes the behavior of windows executables by recording windows API calls and native API calls using page directory base register. Only single execution is traversed.
Exploration of multiple execution paths using TTAnalyze which is modified by creating a snapshot of program state at branching point and also identification of program termination to explore more paths.
Visualization of malware using Treemap and thread graph techniques.
A hardware virtualization tool – Holography which makes use of spy satellite and intelligence agency to store all the logs in a process matrix which is then analyzed in malware analysis engine.
II. TTANALYZE: A TOOL FOR ANALYZING MALWARE
TTAnalyze[2] is a tool for dynamically analyzing the behavior of the windows executables. So when the windows program containing the malware is executed in an emulated operating system environment, its behavior and its actions are monitored. This tool records the windows API calls and also all the Native API functions that is executed in program. The most prominent feature of this tool is, it does not modify the malicious code, since some malicious codes are self-modifying. So, it is not possible for malicious code to detect that it is running in an emulated environment which is an unmodified windows environment. These are the features which makes this tool very efficient in analyzing the behavior of the unknown malware.
It comprises of three important features,
a. It uses the emulation environment with an unmodified operating system environment.
b. It is also comprehensive, since it also monitors both native API calls and also the Windows API functions. c. It also performs function call injection, which allows us to add our own code in the program during the analysis
ISSN(E): 2277-128X, ISSN(P): 2277-6451, pp. 27-33
The analysis process is carried out using the steps which is shown in the figure-1,
a. Malware program is executed in emulated windows environment and actions are monitored. b. Monitor which operating system services are requested by the executable
c. Monitoring system service requests is through native API interface, which is undocumented in the Windows Platform, so it is not straightforward.
d. Tracking of malware process is done with CR3 register (Page-Directory Base Register). e. Windows assigns each process, a unique page directory.
f. Requires Kernel Driver and a program to determine the physical address of page directory.
g. It can change flow of execution of the program by injecting some read instructions in instruction stream. h. Ability to add function calls in the instruction stream improves the efficiency of the analysis results.
The Limitations of this technique are,
a. Only single execution path is traversed during a specific analysis run. b. Failure to examine some executables which are triggered at certain time.
III. EXPLORING MULTIPLE EXECUTION PATHS FOR MALWARE ANALYSIS[3]
It is an extension to TTAnalyze tool which allows us to traverse multiple execution paths driven by the input[3].This system keeps track of user input and creates snapshots to restore the machine to a control flow decision point. It obtains a number of different execution paths at branching point where every branch will have a different behavior based on input, the program processes.
Figure-2: Exploration of Multiple Paths[3]
Consider the example shown above in the figure-2, which helps us to understand how TTAnalyze tool explores multiple paths in a program. Here we have a program which checks the value of x, every time it encounters a branching point, since the input received to a program should be stored in a variable, it uses a labeled variable x. In the first step, when a value 2 is stored in x, a branching condition x>0 is checked. Since a comparison is being done on the labeled data, a snapshot is created of the current process then it is continued. Since the if-condition is satisfied it will go to next step where it is again checked whether (x>2). So, another snapshot is created, and process continues.
A. Analysis Phase
Exploration of multiple paths should follow 2 mechanisms,
1) Determine mechanism to decide when system analyzes both program paths: Analysis of both program paths relies on three components,
a. A set of Taint Sources: It assigns labeled data to memory address which is of interest. b. Shadow Memory: This keeps track of labels which are assigned to some memory location.
c. Extensions to Machine Instruction: When operating system manipulates labeled data, which is required to propagate taint sources.
2) Save current program state when a branching point is located.
B. Implementation Phase:
The implementation Phase is carried out in two phases, 1) Creating and restoring the program state
a. Saving the image of the complete virtual machine costs much time, so a Qemu component was developed that saves process’ image only.
ISSN(E): 2277-128X, ISSN(P): 2277-6451, pp. 27-33
Figure-3: Overview of Visualization System[4]
b. This component identifies the active memory page in a process which is done by analyzing the page table directory of the windows OS.
c. When creating snapshot, a copy of virtual CPU registers and shadow memory is stored. 2) Identification of Program Termination
a. Before reverting, the process is allowed to run until it exits normally or crashes. But system should not allow process to terminate, since it will erase all process related entries from OS.
b. So, every time, program counter of the emulated CPU is checked with start address of the NTTerminateProcess function, if it is equal, then program state is reverted back to previous state.
C. Advantages
a. Effective in detecting the code which is only executed at a certain date or time. b. Identify action that are triggered by commands
c. Provides complete behavioral picture when the malware checks for the existence of the file to determine whether it is installed or not.
D. Limitations
a. Invalid Handles are not avoided when Process creating external effects(when writing to a file or sending data over the network)
b. Specially crafted malware program can perform DOS attack on analysis tool by performing many conditional branches on tainted data
IV. VISUAL ANALYSIS OF MALWARE BEHAVIOR USING TREEMAP AND THREAD GRAPHS[4]
In this paper[4], two techniques are used in order to analyze the malware. Tree mapping, which gives visual description of the actions performed by malware sample. This description is represented using set of rectangles which are nested together but since it does not give any information about sequencing, so we use another technique called Thread Graphs. These graphs display the behavior of all threads in a malware sample and it is also regarded as Behavioral Fingerprint of the sample [4].
In the above figure-3, we can see that, the process is divided into 2 phases,
1. In the first phase, malware samples are collected from the system which is under an attack. A sandbox is used which allows to study the malware sample in a controlled and restricted environment and behavior is analyzed. 2. In the second Phase, the reports generated by sandbox are abstracted based on API’s that they belong to, and
then they are visualized using Tree maps or Thread Graphs.
A. Tree Mapping:
Here the reports generated from CWSandbox are converted into a standardized form which will display information in the form of rectangles. These rectangles dimension are based on several factors, so a Tiling Algorithm is needed in order to construct the dimension of the rectangles in a treemap which is based on the API calls that have been called by the Sample.
ISSN(E): 2277-128X, ISSN(P): 2277-6451, pp. 27-33
Figure-4: Representation of AdultBrowser using Treemap[4]
B. Thread Graph:
This approach helps us to get the chronological behavior of the malware sample. Here graph is generated indicating the commands that are executed by the binary executables in the order they are called by the malware sample. The X-axis represents time (sequence of performed commands) and Y-axis represents the operations that are performed.
Figure-5: Representation of AdultBrowser using Thread Graph[4]
ISSN(E): 2277-128X, ISSN(P): 2277-6451, pp. 27-33
V. HOLOGRAPHY: A HARDWARE VIRTUALIZATION TOOL FOR MALWARE ANALYSIS[5]
Holography[5] is virtual level dynamic analysis tool which analyzes the malware, monitors the log which is used by the analyzer to study the abnormal behavior of the malware sample. This tool not only monitors the logs of the calls made to the operating system, but it also monitors intermediate functions which helps us to know the intention of the malware programs. This tool uses windows environment in a PC emulator called Qemu, which mimics the whole personal computer which do not allow malware programs to detect that it is executed in an emulated environment.
Figure-6: Architecture of Holography[5]
Figure-6 represents the architecture of Holography. This tool makes use of 2 components, a Spy Satellite and an intelligence agency which will take the malware sample that is run in an emulated environment, which is then monitored using process monitor which is a log file generated by Spy Satellite that helps in extracting the abnormal behavior of the malware. The additional information about the functional parameters and operating system calls are provided by the intelligence agency. These logs from process matrix are then passed to Malware analysis engine which will map the abnormal behavior model of the malware and results will be obtained. This tool makes use of 2 main components,
A. Spy Satellite:
a. This determines actions performed by the malware using instructions of the CPU in an emulated environment. b. It should differentiate between malware process and normal operating system processes
c. It makes use of CR3 register that contains physical address of page table entry of an executing malware process which will distinguish between the two processes executing in Tandem.
d. It also logs intermediate function calls.
e. It can also detect malware executed as a multithreaded program by comparing the stack frame and point at which stack is positioned, with each other which identifies the relationship between operating system calls and threads.
B. Intelligence Agency:
Some additional information is needed to map operating system calls and function parameters.
a. Each OS calls resides in DLL which is stored in the virtual address space of the operating system, so Intelligence Agency should collect information about each virtual address space of the operating system calls. b. It is used in combination with spy satellite to know the functional parameters of the invoked operating system
calls which can be obtained from prototype of OS calls.
ISSN(E): 2277-128X, ISSN(P): 2277-6451, pp. 27-33
Consider a malware sample, ‘Padodor’ which was analyzed using the holography technique to produce a process matrix which stores the logs of operating system calls that were invoked by this sample. In the figure-7, a snippet of Process matrix is shown, where it is shown how some files are created and copied to the system32 directory. There is also a log on process matrix which enables autocomplete on Internet Explorer which will saved all the sensitive information that are typed into the browser which can be used by the malware author to hack their bank accounts, username and passwords etc.
VI. SUMMARY
Here we have discussed 4 techniques which is integrated in a tool for analyzing malware dynamically. The following techniques are used in the above-mentioned tools.
Emulated environment with dynamic translation (for single execution path).
Exploring multiple execution paths of the malware.
Thread Graphs and Treemap.
Visualization of malware behavior using process matrix.
Since all the techniques that these tools are using are behavior based analysis, it eliminates Zero-day Attacks and also obfuscation, but sometimes malwares can destroy the tool by making a DOS attack which will not allow the tool to analyze the malware. So, we assume that if we use both code based, and behavior based analysis of the malware sample, it will be efficient in analyzing the malware and can produce good test results. Also, these tools should be placed outside the emulated environment in order to avoid DOS attack that can be made by malware author on the tools.
VII. CONCLUSION
Behavior based analysis are used to analyze unknown malware sample. This depends on the abnormal behavior that is observed in the Log file. So, we presented an overview of the tools described in this paper which can be very efficient in analyzing the malware since it uses an emulated environment (Qemu) which mimics entire computer system that is not detected by the malware during execution. The self-modifying malicious code also can be analyzed by exploring various execution paths and monitoring the log of operating system service that are requested by the malware.
REFERENCES
[1] http://en.wikipedia.org/wiki/Malware.
[2] U. Bayer, C. Kruegel, and E. Kirda. TTAnalyze: A Tool for Analyzing Malware. In 15th Annual Conference of the European Institute for Computer Antivirus Research (EICAR), 2006.
[3] Andreas Moser, Christopher Kruegel, and Engin Kirda. Exploring Multiple Execution Paths for Malware Analysis. In Technical University Vienna for Secure System lab, 2007.
[4] Philipp Trinius, Thorsten holz, Jan Gobel, Felix C. Freiling. Visual Analysis of Malware Behaviour Using Treemaps and Thread Graphs. In Laboratory for Dependables Distributed Systems, University of Mannheim, Germany, 2009.