Leveraging the Cloud for Software Security Services
by
Jonathan Clarke Oberheide
A dissertation submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
(Computer Science and Engineering) in The University of Michigan
2012
Doctoral Committee:
Professor Farnam Jahanian, Chair Professor Peter M. Chen
Associate Professor Zhuoqing Mao Assistant Professor Eytan Adar
Assistant Research Scientist Michael Donald Bailey Craig Partridge, BBN Technologies
c
Jonathan Clarke Oberheide 2012 All Rights Reserved
ACKNOWLEDGEMENTS
First and foremost, I’d like to thank my advisor Farnam Jahanian. Throughout my undergrad, masters, and doctoral studies at the University of Michigan, Farnam has been a close advisor to me on personal, academic, and professional levels. I cannot thank Farnam enough for the opportunities enabled by his guidance over the past decade of my life.
I want to thank the rest of my doctoral committee, including Michael Bailey, Morley Mao, Peter Chen, Craig Partridge, and Eytan Adar, all of whom have provided guidance, feedback, and encouragement throughout the doctoral process. I’d like to thank Bailey, especially, who has acted as a great mentor and friend in the doctoral process and provided invaluable leadership and direction in our research group. Special thanks to Morley as well for allowing me to work with her research group as an undergrad, providing me an early glimpse into the academic process.
I also want to thank the many people and partners outside the University that made this dissertation possible. This includes our colleagues at Arbor Networks, who have collab-orated with our research group on projects ranging from malware analysis to large-scale network traffic analysis. Thanks to Jose Nazario, Dug Song, Craig Labovitz, Rob Malan, and the rest of the Arbor crew. I’d like to thank the folks at Merit Network for their col-laboration and material support throughout the years. Manish Karir, Bert Rossi, and Larry Blunk were instrumental in our cooperation with Merit and its member institutions. Spe-cial thanks to Manish, who introduced me to the wonderful world of academia and guided me through my first steps in the research process as an undergrad. Thanks also to the IT and security staff at the University, spanning groups such as ITSS, CITI, CAEN, and DCO. Folks including Jim Rees, Paul Howell, Matt Bing, Don Winsor, and Laura Fink have been a great help in providing infrastructure services, access to data, and operational advice.
I’d like to thank all of the software faculty at the University that I would have loved to have on my doctoral committee if there were no limit on size, including Peter Honeyman, Jason Flinn, Alex Halderman, Brian Noble, and Atul Prakash. And, of course, I’d like to thank all the fellow doctoral students at the University that have provided friendship and entertainment in our CSE office, including Evan Cooke, Sushant Sinha, Yunjing Xu, Timur Alperovich, Mona Attariyan, Kaushik Veeraraghavan, Jodie Su, Dan Peek, Eric Vander Weele, Kelsey Harris, and Kaustubh Nyalkalkar. In particular, Evan acted as a great mentor, sounding board, and friend during my early years in the doctoral program.
Finally, and most importantly, I wish to thank my family. This thesis is dedicated to my mother Julie, my father Clarke, my sister Kristin, and my girlfriend Heidi.
TABLE OF CONTENTS
DEDICATION . . . ii ACKNOWLEDGEMENTS . . . iii LIST OF TABLES . . . ix LIST OF FIGURES . . . xi ABSTRACT . . . xiii CHAPTERS 1 Introduction . . . 1 1.1 Previous Approaches . . . 11.1.1 The Evolution of Threats . . . 1
1.1.2 Network-Centric Security Mechanisms . . . 3
1.1.3 Host-Centric Security Mechanisms . . . 4
1.2 Our Approach . . . 5
1.2.1 The Evolution of Computing . . . 5
1.2.2 Cloud-Centric Software Security Services . . . 6
1.2.3 Properties of Cloud-Centric Services . . . 6
1.3 Contributions . . . 8
1.4 Structure of Thesis . . . 10
2 N-Version Malware Detection in the Cloud . . . 12
2.1 Limitations of Antivirus Software . . . 14
2.1.1 Vulnerability Window . . . 15
2.1.2 Antivirus Software Vulnerabilities . . . 16
2.2 Approach . . . 17 2.2.1 Deployment Environment . . . 18 2.2.2 Cloud-Based Detection . . . 18 2.2.3 N-Version Protection . . . 19 2.3 Architecture . . . 19 2.3.1 Client Software . . . 20 2.3.2 Cloud Service . . . 23
2.3.3 Archival and Forensics Service . . . 25 2.4 CloudAV Implementation . . . 26 2.4.1 Host Agent . . . 26 2.4.2 Cloud Service . . . 27 2.4.3 Management Interface . . . 29 2.5 Evaluation . . . 30
2.5.1 Malware Dataset Results . . . 31
2.5.2 Deployment Results . . . 32
2.6 Discussion and Limitations . . . 36
2.6.1 User Context and Environment in Detection Engines . . . 37
2.6.2 Disconnected Operation . . . 37
2.6.3 Sources of Malicious Behavior . . . 38
2.6.4 Detection Engine Licensing . . . 39
2.6.5 Managing False Positives . . . 40
2.6.6 Breaking Free of Vendor Lock-in . . . 41
2.7 Related Work . . . 42
2.8 Summary . . . 42
2.8.1 Leveraging the Cloud . . . 43
3 Protecting Mobile Devices with a Cloud Service . . . 44
3.1 Mobile CloudAV . . . 46
3.1.1 Mobile Agent . . . 46
3.1.2 Mobile-Specific Behavioral Engine . . . 47
3.1.3 Connectivity and Mobile Data Usage . . . 47
3.1.4 Additional Security Services . . . 48
3.2 Evaluation . . . 49
3.2.1 Computational Resources . . . 49
3.2.2 Power Consumption . . . 50
3.2.3 Scale of Detection Algorithms . . . 51
3.2.4 On-Device Software Complexity . . . 52
3.3 Related Work . . . 52
3.4 Summary . . . 52
3.4.1 Leveraging the Cloud . . . 53
4 The Dark Side of Cloud Services: Crimeware as a Service . . . 54
4.1 AvP: Antivirus vs. Packers . . . 56
4.1.1 Packer Classification . . . 56
4.1.2 Antivirus Detection . . . 57
4.2 The PolyPack Cloud Service . . . 58
4.2.1 PolyPack Architecture . . . 58 4.2.2 PolyPack Features . . . 60 4.2.3 Future Capabilities . . . 61 4.3 Evaluation . . . 62 4.4 Related Work . . . 63 4.5 Summary . . . 64
4.5.1 Leveraging the Cloud . . . 65
5 Large-Scale Analysis and Classification of Malicious Software . . . 66
5.1 Understanding Malware Labeling . . . 67
5.2 Properties of a Labeling System . . . 69
5.3 Limitations of Antivirus Labeling . . . 69
5.3.1 Consistency . . . 70
5.3.2 Completeness . . . 71
5.3.3 Conciseness . . . 72
5.4 Behavior-based Malware Clustering . . . 72
5.4.1 Defining and Generating Malware Behaviors . . . 73
5.4.2 Clustering of Malware . . . 73
5.4.3 Comparing Individual Malware Behaviors . . . 75
5.4.4 Constructing Relationships Between Malware . . . 76
5.4.5 Extracting Meaningful Groups . . . 77
5.5 Evaluation . . . 78
5.5.1 Performance and Parameterization . . . 78
5.5.2 Comparing Antivirus Groupings and Behavioral Clustering 80 5.5.3 Measuring Completeness, Conciseness, and Consistency . 81 5.5.4 Application of Clustering and Behavior Signatures . . . . 83
5.6 Related Work . . . 85
5.7 Summary . . . 87
5.7.1 Leveraging the Cloud . . . 87
6 A Cloud-Centric Service for Robust and Resilient Threshold Signatures . . 89
6.1 Challenges in Cryptography and PKI . . . 92
6.1.1 Private Key Secrecy . . . 92
6.1.2 Failure of Revocation Mechanisms . . . 95
6.2 CloudCard Architecture . . . 99
6.2.1 Threshold Signatures with CloudCard . . . 99
6.2.2 CloudCard Operation . . . 100
6.2.3 Notable Features of CloudCard . . . 102
6.3 Integration and Implementation . . . 104
6.3.1 TC-RSA Library . . . 104
6.3.2 PKCS#11 Module . . . 104
6.3.3 ssh-keygen-tcrsa . . . 105
6.3.4 Cloud Service . . . 106
6.3.5 Mobile Application . . . 106
6.3.6 Other Applications and Integrations . . . 107
6.4 Deployment and Evaluation . . . 108
6.4.1 Key Generation . . . 108
6.4.2 Signature Generation . . . 109
6.4.3 Signature Verification . . . 110
6.4.4 End-to-End Performance . . . 111
6.4.6 Limitations . . . 112
6.5 Related Work . . . 113
6.6 Summary . . . 114
6.6.1 Leveraging the Cloud . . . 115
7 Discussion and Conclusion . . . 116
7.1 Summary of Contributions . . . 116
7.2 Insights and Lessons . . . 119
7.3 Future Work . . . 121
LIST OF TABLES
Table
2.1 A distribution of the sources of 1,000 executables observed during the de-ployment of our host agent over a six-month period. . . 34 2.2 The percentage increase in detection coverage obtained when ClamAV, a
truly free engine, is added to a deployment with only a single engine. . . 39 2.3 The number of false positives observed at each engine threshold, and the
associated detection coverage over the full malware dataset. . . 41 3.1 An example of the increased detection coverage against a dataset of a recent
month’s worth of malware samples when using multiple engines in parallel: ClamAV (CM), Symantec (SM), McAfee (MA), BitDefender (BD), and F-Secure (FS). . . 46 3.2 Comparison of the mobile agent with ClamAV in memory consumption
and CPU jiffies on the Nokia N800. . . 49 3.3 Comparison of the mobile agent with Kaspersky Mobile Security on the
Nokia N95. . . 50 3.4 The number of threats addressed in the signature database of various
detec-tion engines. . . 51 4.1 For each packer, we list the increase over the unpacked binaries of the
total number of antivirus evasions across all binaries (out of 2,080) and the median/average number of evasions per binary (out of 10). . . 62 4.2 The number of occurrences a packer produced the optimal packing for each
of the 208 distinct samples. . . 63 4.3 Parallels exist between the cloud computing models of legitimate services
and crimeware services. . . 64 5.1 The number of unique labels provided by five antivirus engines is listed for
each dataset. . . 67 5.2 The percentage of time two binaries classified as the same by one antivirus
are classified the same by other antivirus products. Malware is inconsis-tently classified across antivirus vendors. . . 70 5.3 The percentage of malware samples detected across datasets and antivirus
5.4 The ways in which various antivirus products label and group malware. Antivirus labeling schemes vary widely in how concisely they represent the malware they classify. . . 71 5.5 10 unique malware samples. For each sample, the number of process, file,
registry, and network behaviors observed and the classifications given by various antivirus vendors are listed. . . 74 5.6 A matrix of the NCD between each of the 10 malware samples in our example. 75 5.7 The clusters generated via our technique for the malware listed in Table 5.5. 77 5.8 The completeness, conciseness, and consistency of the clusters created with
our algorithm on the large dataset as compared to various antivirus vendors. 81 5.9 The top five malware behaviors observed by type. . . 84 6.1 Timings of key generation across RSA schemes, key sizes, and CPU types. 109 6.2 Timings of signature generation across RSA schemes, key sizes (in bits),
and CPU types. . . 110 6.3 Combined TC-RSA (3,3) timings for signature generation, signature
com-bination, and the network communication overhead for a full CloudCard SSH login sequence in the pessimistic model of all operations occurring serially. . . 114
LIST OF FIGURES
Figure
2.1 Detection rate for 10 popular antivirus products as a function of the age of the malware samples. . . 15 2.2 Number of vulnerabilities reported in the National Vulnerability Database
(NVD) for 10 antivirus vendors between 2005 and 2007 . . . 17 2.3 Architectural approach for cloud-centric file analysis service. . . 20 2.4 Screen captures of the detection engine VM monitoring interface (a) and
the web management portal which provides access to forensic data and threat reports (b). . . 28 2.5 The average detection coverage for the various datasets (a) and the
contin-uous coverage over time (b) when a given number of engines are used in parallel. . . 30 2.6 Executable launches (a) and unique executable launches (b) per day over a
one month period in a representative sample of 50 machines in the deploy-ment. . . 33 4.1 The top 10 packers classes in our AML dataset as determined by PEiD and
SigBuster. . . 57 4.2 The fraction of detected binaries for 23 antivirus products and 35 most
popular packers. . . 58 4.3 Conceptual overview of the PolyPack architecture. . . 59 5.1 A Venn diagram of malware samples labeled as SDBot variants by three
antivirus products. . . 68 5.2 On the left, a tree consisting of the malware from Table 5.5 has been
clus-tered via a hierarchical clustering algorithm whose distance function is nor-malized compression distance. On the right, a dendrogram illustrating the distance between various subtrees. . . 76 5.3 The memory and runtime required for performing clustering based on the
number of malware clustered (for a variety of different-sized malware be-haviors). . . 79 5.4 On the left, the number of clusters generated for various values of the
in-consistency parameter and depth. On the right, the trade-off between the number of clusters, the average cluster size, and the inconsistency value. . . 79
6.1 The evolution of past approaches designed to protect the secrecy of private key material. . . 92 6.2 Traditional secret sharing schemes like have to combine the split key
ma-terial in a single location in order to perform a cryptographic operation as illustrated in (a). Using threshold cryptography, partial signatures can be combined to generate a full signature without ever having to combine the split key material in a single location as illustated in (b). . . 98 6.3 Our proposed CloudCard architecture uses (3,3) threshold RSA to generate
a signature across a client’s host, a cloud service, and a mobile device. . . . 102 6.4 A screenshot of the CloudCard mobile application displaying a signature
ABSTRACT
Leveraging the Cloud for Software Security Services
by
Jonathan Clarke Oberheide
Chair: Farnam Jahanian
This thesis seeks to leverage the advances in cloud computing in order to address mod-ern security threats, allowing for completely novel architectures that provide dramatic improvements and asymmetric gains beyond what is possible using current approaches. Indeed, many of the critical security problems facing the Internet and its users are inaquately addressed by current security technologies. Current security measures often are de-ployed in an exclusively network-based or host-based model, limiting their efficacy against modern threats. However, recent advancements in the past decade in cloud computing and high-speed networking have ushered in a new era of software services. Software services that were previously deployed on-premise in organizations and enterprises are now being outsourced to the cloud, leading to fundamentally new models in how software services are sold, consumed, and managed.
This thesis focuses on how novel software security services can be deployed that lever-age the cloud to scale elegantly in their capabilities, performance, and manlever-agement. First, we introduce a novel architecture for malware detection in the cloud. Next, we propose a cloud service to protect modern mobile devices, an ever-increasing target for malicious attackers. Then, we discuss and demonstrate the ability for attackers to leverage the same benefits of cloud-centric services for malicious purposes. Next, we present new techniques for the large-scale analysis and classification of malicious software. Lastly, to
demon-we present a threshold signature scheme that leverages the cloud for robustness and re-siliency.
Thesis Statement:By leveraging properties inherent to cloud computing, it is possible to design new classes of cloud-centric software security services that scale elegantly in their capabilities, performance, and management.
CHAPTER 1
Introduction
Security threats have plagued Internet-connected devices for some time. In particular, malicious software has enabled attackers to achieve financial gains from botnets, credential theft, spam, denial-of-service, phishing, and other attacks. In recent years, the scale and sophistication of attacks on end users have increased dramatically. Therefore, it is vital to the overall health of the Internet and its users that we develop effective and efficient security mechanisms that are able to deter, detect, and defend against modern malicious threats.
1.1
Previous Approaches
1.1.1
The Evolution of Threats
Over the past decade, we’ve observed a distinct evolution of malicious threats. In the first half of the decade, we saw the explosion of network-based threats on the Internet. Early denial-of-service (DoS) attacks took down high-profile websites such as Yahoo, Amazon, and CNN with little difficulty by flooding them with excessive traffic and requests [146]. As attackers realized that taking control of an end host was more powerful than simply knocking it offline, the era of the flash worm began. Flash worms such as Code Red [131], Slammer [130], and Witty [160] targeted vulnerabilities in network-facing operating sys-tems and services. These threats wreaked havoc on the availability and integrity of network infrastructure and vulnerable services exposed and listening on the Internet.
As the attack surface of network-facing services was minimized through proper isola-tion and access control, broad patching and mitigaisola-tion of remote code execuisola-tion vulner-abilities, and other mechanisms, the efficacy of network-based threats was reduced. In the second half of the decade, malicious threats evolved to target end hosts and users di-rectly. Instead of exploiting network-facing services, attackers began to target the large host-based attack surface presented by client-side applications such as web browsers, PDF viewers, and office suites [182]. By simply tricking a user into visiting an untrusted link and exploiting a web browser vulnerability, the attacker could take full control of the user’s host by installing their own malicious software, also known as malware. Many attackers realized that persistent, stealthy control of a large number of compromised end hosts en-abled powerful attacks and new monetization models. This realization led to the era of botnets [50, 19]. Even as recent efforts to reduce the client-side attack surface of end hosts have made progress, attackers have continued to evolve their host-based attacks to include social engineering in order to trick users into installing malicious software [178, 92].
With concern, we’ve also observed an evolution in the sophistication of the actors be-hind the malicious attacks. In the early 2000s, many attacks were perpetrated by amateurs looking to explore the bounds of computing and security. Bored teenagers, politically-motivated hackivists, and feuding hacker groups were frequently responsible for website defacements and denial-of-service attacks. As bad actors realized that many types of at-tacks such as financial fraud, spam, and denial-of-service could be monetized, Internet-based attacks turned into a lucrative business opportunity. In the current day, it is well-understood that rich underground crimeware markets, organized cybercrime groups, and well-funded and sophisticated adversaries are active and responsible for many attacks on the Internet [80, 77]. With accusations of state-sponsored attacks and terms like cyberwar being thrown around in the current day [72, 39], it is clear that the security landscape has drastically evolved in the past decade.
To address the increasing sophistication and scale of malicious threats, researchers have proposed a broad range of software security technologies over the past decade, including systems such as intrusion detection systems, intrusion prevention systems, firewalls, and antivirus software. These systems have become essential components in detecting
mali-cious attacks and protecting end hosts and users.
While software security mechanisms can take many forms, a frequent goal for many such approaches is to observe network or host activity and distinguish between legitimate and malicious activity. We classify common software security mechanisms as network-centric or host-network-centric, depending on their deployment model and which type of activity they observe and inspect. For example, we would classify a sensor device placed on a network link to passively listen for malicious traffic destined for an enterprise network as a network-centric approach. On the other hand, we would classify antivirus software installed on every end host in an enterprise that inspects host activity to detect malicious software as a host-centric approach. As we will discuss, both models have pros and cons, depending on which type of threats they are designed to address.
1.1.2
Network-Centric Security Mechanisms
In a network-centric model, mechanisms such as network access control (NAC), in-trusion detection (NIDS), and inin-trusion prevention (NIPS) are commonly deployed on an organization’s network to inspect traffic sourced and destined from the end hosts to be protected. Network-based intrusion detection systems may employ both signature-based matching to detect known malicious attacks [37, 139] and anomaly detection algorithms to detect unknown attacks [93, 25]. Other detection systems may operate on a flow-based granularity using formats such as NetFlow [95] instead of operating on live network traffic. A network-centric approach typically offers of a wide breadth of visibility across hosts on the network, but lacks the depth of visibility because it can only observe the bits present on the network link and not what those bits represent when received and processed by an end host. Network-centric mechanisms were prevalent during the early 2000s when dealing with the propagation and detection of flash worms [131, 130, 160]. More recently, network-based sensors have focused on the detection of malicious traffic and botnet activity [51, 86, 85].
However, as we’ve seen threats migrate from network attacks to host attacks, it has become increasingly difficult for network-centric approaches to maintain their efficacy.
Inspecting network traffic delivered over SSL transports that are becoming more common-place is just one of the challenges. As attackers increasingly target users via client-side applications such as web browsers, PDF viewers, and office suites, network-centric mech-anisms must be aware of a wide variety of complex file formats and have deep inspection capabilities to detect malicious code [182]. For example, attackers may deliver a browser exploit via obfuscated JavaScript, making it extremely difficult for a network-based sensor to detect the presence of malicious intent. Lastly, it is common nowadays to see encrypted botnet command and control traffic [17], causing challenges in identifying infected end hosts for network-centric mechanisms.
1.1.3
Host-Centric Security Mechanisms
As malicious threats migrated more and more towards targeting end hosts in the second half of the decade, more emphasis was placed on researching and developing host-centric security mechanisms. In a host-centric model, software such as antivirus and host-based intrusion detection systems (HIDS) is deployed on end hosts to monitor host activity and block malicious attacks. The most common approach for protecting end hosts is traditional antivirus engines [48, 129]. Such antivirus engines commonly operate on a file-based gran-ularity to scan for malicious software that may enter an end host via a number of vectors. HIDS can also be used by monitoring and restricting system-level activity at execution time to detect malicious activity. Such approaches have been extensively explored in academia, whether observing instruction and system-call level information [44], analyzing OS and application information flow [188], or monitoring host activity from a hypervisor perspec-tive [101].
Host-centric mechanisms systems may be able to inspect deep information and activity on a single end host, but they lack the global network-wide visibility that network-centric systems can often provide [118, 49, 151]. While both network-centric and host-centric sys-tems are commonly deployed to attempt to detect and defend against malicious threats, it is clear that Internet-connected devices and users remain under continued attack. There-fore, this thesis advocates for exploring new deployment models beyond host-centric and
network-centric perspectives as threats continue to evolve in sophistication and scale.
1.2
Our Approach
1.2.1
The Evolution of Computing
While security threats have evolved considerably over the past decade, we’ve also wit-nessed significant advancements in the realm of computing. In particular, the introduction of cloud computing [172, 179, 126], bolstered by the popularization of x86 virtualization and increased availability of high-speed, low-latency networking [20, 156, 3], has ushered in a new era of software services. Cloud computing has enabled a wide range of new service models, including infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS), that change the way modern software services are sold, consumed, and managed [78, 144, 190, 42]. Cloud computing is succinctly defined in [14] as follows:
Cloud Computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that pro-vide those services.
The services themselves have long been referred to as Software as a Service (SaaS). The datacenter hardware and software is what we will call a Cloud. When a Cloud is made available in a pay-as-you-go manner to the general public, we call it a Public Cloud; the service being sold is Utility Computing.
Thus, Cloud Computing is the sum of SaaS and Utility Computing.
Software services that were previously deployed on-premise in organizations and en-terprises in a network-centric or host-centric model are now being outsourced to the cloud. Cloud-delivered services are leveraging the beneficial properties of cloud computing, such as reliability, scalability, performance, management, and cost, to more effectively deliver software technologies [14].
1.2.2
Cloud-Centric Software Security Services
While the revolution in cloud computing presents interesting security challenges in itself [100, 105, 40], this thesis seeks to understand how cloud computing and a SaaS deployment model can enable more effective software security services. That is, rather than investigating how we can “protect the cloud”, this thesis seeks to understand how the cloud can protect us. While previous work has considered host-centric and network-centric approaches, this thesis focuses on how novelcloud-centric architecturescan provide improved efficacy against modern security threats. While this thesis advocates for the exploration of cloud-centric services, it is not aimed at discounting existing approaches, just as host-centric services didn’t completely replace network-centric services as threats moved from the network to the end host. Rather, we see cloud-centric services as the next phase in the deployment model of security services that follows from the intersection of the evolution of threats and evolution of cloud computing capabilities observed over the past decade.
1.2.3
Properties of Cloud-Centric Services
This thesis argues that many of the properties inherent to cloud computing enable us to construct effective and efficient cloud-centric software security services. While most of the properties may provide beneficial gains for software security services, there may also be trade-offs or potential negative consequences that must be carefully considered. As this thesis is deeply rooted in practical issues, these properties may have significant impact on the design of software security services and on real-world efficacy, deployment, and management. Several of the properties of cloud computing that are relevant to the security realm include the following:
• Global visibility and network effects:The potential for global visibility offered by a cloud-hosted service can provide significant value to a security service. In many scenarios, the more users or devices participating in a cloud-centric security service, the more intelligence can be collected, which can result in a network effect that in-creases the efficacy of the security service [118, 49, 151]. The visibility and network
effects of cloud-centric security services may span across hosts, networks, organiza-tions, and even globally across the entire Internet. On the other hand, aggregating globally-scoped data in a centralized multi-tenant cloud environment may have an impact on cross-user and cross-organization privacy and confidentiality.
• Scalability and elasticity: Given the dramatic increase in malicious software and attacks [128, 127], it is imperative that any security mechanisms be rapidly scalable. In addition, such mechanisms need to be sufficiently elastic to handle highly variable security workloads. Failing to scale and adapt to heavy or variable loads, whether legitimately or intentionally generated, can potentially result in the failure of the se-curity service. Cloud computing and public IaaS providers allow for the construction of highly scalable and elastic security services.
• Agility and flexibility: Cloud computing enables agility and flexibility through the rapid deployment of new functionality and software updates compared to the slow delivery model of traditional on-premise software [14, 82]. Cloud-centric security services can leverage such agility and flexibility in order to keep pace with an adap-tive adversary. For example, updates or heuristics for the latest security threats may be deployed in real-time in a cloud-centric service, as opposed to pushing down soft-ware updates to an enterprise that will deploy them on-premise. In the field of secu-rity where threats change on a daily basis, this agility has very practical benefits. On the other hand, agility and flexibility may also empower attackers to build effective malicious cloud services themselves, which we will explore later in the thesis.
• Availability and survivability: Contrary to popular belief about the reliability of public cloud computing environments [113], the functionality offered by many IaaS providers and cloud computing platforms can enable the construction of highly avail-able and survivavail-able systems beyond what is typically feasible with fixed infrastruc-ture [14]. Cloud-centric security services can leverage such functionality to achieve global accessibility and reliable connectivity. High availability is an absolute ne-cessity when deploying security services that are often in the critical hot-path of
• Trustworthiness and host-proofing:Moving security functionality to a cloud-centric service can introduce risk to confidentiality and integrity if the cloud provider is un-trusted. However, some classes of cloud-centric services can be designed and de-ployed in such a way that does not sacrifice trustworthiness, even in the face of a compromised or malicious IaaS provider [65, 185]. By host-proofing software secu-rity services [60], one may achieve guarantees about the confidentiality and integsecu-rity of data handled by the cloud service. While achieving full host-proofing may not be feasible for all types of security services, maintaining as much trustworthiness as possible is an important goal to keep in mind when constructing cloud-centric services.
By leveraging these advancements in cloud computing, we are able to develop and deploy novel security services in a cloud-centric architecture that can scale elegantly in terms of their capabilities, performance, and management.
1.3
Contributions
The main contributions of this thesis are the following:
• N-Version Malware Detection in the Cloud
While it’s generally known in the security community that antivirus software is not a perfect solution, we provide novel quantification of the failure of commercial host-centric antivirus in terms of its detection coverage, attack surface, vulnerability win-dow, and classification capabilities. To address these limitations, we introduce the CloudAV architecture, a core piece of research that moves the complexity of mal-ware detection off of the end host and into the cloud. Deploying CloudAV security services in the cloud enables N-version protection, retrospective detection, forensic data collection, centralized management and policy definition, and a wide range of practical deployment benefits. Through a real-world implementation and evaluation, we find that migrating malware detection functionality from a host-centric to cloud-centric model is an effective approach.
• Protecting Mobile Devices with a Cloud Service
While our results indicate that a cloud-centric approach to malware detection is quite effective when deployed on traditional end hosts, offloading analysis to the cloud lends itself naturally to the resource-constrained mobile environment. As mobile devices and applications become an enticing target for attackers, protecting these de-vices in an efficient and scalable manner is vital. Such protection is complicated by the diversity of mobile platforms and the intricacies of each platform’s security model. Our research indicates that cloud-based malware detection is indeed an en-ergy and resource efficient approach for mobile device security through implementa-tions and evaluaimplementa-tions on a range of modern mobile platforms.
• The Dark Side of Cloud Services: Crimeware as a Service
While cloud-based security services have great potential for deploying protective mechanisms, it would be incomplete to ignore the fact that the positive attributes of cloud-centric services may also be abused to benefit attackers. To demonstrate the potential for malicious cloud services, we construct PolyPack, an example of an emerging model known as crimeware-as-a-service (CaaS). PolyPack is a cloud-based service for automated malware obfuscation and antivirus evasion that integrates a wide range of packers in front of CloudAV’s backend antivirus engines, allowing re-searchers to understand the impact of malware packers of antivirus efficacy. Based on our observations in the underground crimeware markets, we anticipate that CaaS will present an attractive approach for attackers and concerning development for de-fenders in the near future.
• Large-Scale Analysis and Classification of Malicious Software
Given the staggering growth of malicious software [128, 127], malware classification has emerged as an important mechanism to understand and quantify the malware epi-demic. To address the limitations of existing automated classification and analysis tools, we developed and evaluated a dynamic analysis approach, based on the execu-tion of malware in virtualized environments that are hosted in a cloud-centric service
and the causal tracing of the operating system objects affected during malware exe-cution. The results of our analysis and clustering of runtime behaviors exhibit strong consistency, completeness, and conciseness. Furthermore, our research illustrates the necessity for scalable and performable cloud-centric architectures for heavyweight analysis of malicious software.
• A Cloud-Centric Service for Robust and Resilient Threshold Signatures
Lastly, to demonstrate that the benefits of a cloud-oriented architecture apply to soft-ware security services beyond those directly related to malicious softsoft-ware, we de-sign a threshold de-signature scheme that leverages a cloud service. Threshold cryp-tography offers attractive properties for requiring a number of independent parties to cooperate in order to perform a cryptographic operation, such as generating a RSA signature across multiple devices. We introduce an architecture, called CloudCard, that employs a threshold signature scheme with key material split across a user’s host, a user’s mobile device, and a cloud service. Using this (3,3) threshold RSA scheme, our CloudCard approach enables improved secrecy of private key mate-rial, flexible and fast revocation, and out-of-band signature confirmation, all while maintaining full compatibility with existing PKCS#11-enabled applications and in-terfaces. CloudCard also demonstrates that trustworthiness can be maintained even when moving security functionality to a cloud-centric service.
1.4
Structure of Thesis
This thesis is organized into five primary chapters. Chapter 2 presents a novel architec-ture for the deployment of malware detection services in the cloud. In Chapter 3, we discuss leveraging a cloud service to protect modern mobile devices. Chapter 4 explores the dark side of cloud services and the potential for attackers to leverage the same benefits of a cloud-centric service for malicious purposes. In Chapter 5, we discuss new techniques for the large-scale analysis and classification of malicious software. In Chapter 6, we present a threshold signature scheme that leverages the cloud for robustness and resiliency. Finally,
Chapter 7 concludes the thesis with a review of the lessons learned throughout this research and a look toward the future work in cloud-centric software security services.
CHAPTER 2
N-Version Malware Detection in the Cloud
In this chapter, we explore the problem of detecting malicious software using a cloud-centric architecture called CloudAV. As with most approaches of “enumerating badness” in security, detecting malicious software is a non-trivial problem. The vast, ever-increasing ecosystem of malicious software and tools presents a daunting challenge for network oper-ators and IT administroper-ators. Currently, antivirus is one of the most widely-used host-centric software security mechanisms for detecting and mitigating malicious software. However, the elevating sophistication of modern malware means that it is increasingly challenging for any single vendor to develop signatures for each new threat.
To address the growing sophistication and scale of malware, we propose a new cloud-centric deployment model for the detection of malicious software that is a departure from the traditional host-centric model of antivirus software. This architectural shift is charac-terized by two key changes:
• Antivirus as a cloud service: First, we propose that the detection capabilities cur-rently provided by host-based antivirus software can be more efficiently and effec-tively provided as a cloud-centric security service. Instead of running complex anal-ysis software on every end host, we suggest that each end host run a lightweight process to detect new files, send them to a cloud service for analysis, and then permit access to or quarantine them based on a report returned by the cloud service.
determined by multiple, heterogeneous detection engines in parallel. Similar to the idea of N-version programming, we propose the notion of N-version protection and suggest that malware detection systems leverage the detection capabilities of multi-ple, heterogeneous detection engines to more effectively determine malicious files.
This new cloud-centric model provides several important and practical benefits:
• Better detection of malicious software: Antivirus engines have complementary detection capabilities, and a combination of many different engines can improve the overall identification of malicious software.
• Enhanced forensic capabilities: Information about which hosts accessed which files provides an incredibly rich database of information for forensics and intru-sion analysis. Such information provides temporal relationships between file access events on the same or different hosts.
• Retrospective detection:When a new threat is identified, historical information can be used to identify exactly which hosts or users open similar or identical files. For example, if a new bot infection is detected, a cloud-based antivirus service can use the execution history of hosts on a network to identify which hosts have been infected and notify administrators or even automatically quarantine infected hosts.
• Improved deployability and management: Moving detection off the host and into the network significantly simplifies host software, enabling deployment on a wider range of platforms and enabling administrators to centrally control signatures and enforce file access policies.
To explore and validate this new model of deploying anti-malware security software, we design a cloud-based architecture that consists of three major components: a lightweight
host agent run on end hosts that identifies new files and sends them into the network for analysis; acloud servicethat receives files from hosts and identifies malicious content; and anarchival and forensics servicethat stores information about analyzed files and provides a management interface for operators.
We construct, deploy, and evaluate a production cloud antivirus system called Clou-dAV. CloudAV includes a lightweight, cross-platform host agent for Windows, Linux, and FreeBSD and a cloud service consisting of 10 antivirus engines and two behavioral de-tection engines. We provide a detailed evaluation of the system using a dataset of 7,220 malware samples collected in the wild over a period of a year [136] and a production deployment of our system on a campus network in computer labs spanning multiple de-partments for a period of over six months.
Using the malware dataset, we show how the CloudAV N-version protection approach provides 35% better detection coverage against recent threats compared to a single antivirus engine and 98% detection coverage of the entire dataset, compared to 83% with a single engine. In addition, we show how our architecture enables advanced functionality, such as retrospective detection, which can greatly mitigate the impact of the large window of vulnerability presented by antivirus products.
Finally, we analyze the performance and scalability of the system using deployment results and show that while the total number of executables run by all the systems in a com-puting lab is quite large (an average of 20,500 per day), the number of unique executables run per day is two orders of magnitude smaller (an average of 217 per day). This means that the caching mechanisms employed in the cloud service achieves a hit rate of over 99.8%, reducing the load on the network, and in the rare case of a cache miss, we show that the average time required to analyze a file using CloudAV’s detection engines is approximately 1.3 seconds.
2.1
Limitations of Antivirus Software
The ubiquitous deployment of antivirus software is closely tied to the ever-expanding ecosystem of malicious software and tools. The rise of botnets and targeted malware at-tacks for the purposes of spam, fraud, and identity theft present an evolving challenge for antivirus companies. For example, the recent Storm worm demonstrated the use of en-crypted peer-to-peer command and control and the rapid deployment of new variants to continually evade the signatures of antivirus software [17].
AV Vendor Version 3 Months 1 Month 1 Week
Avast 4.7.1043 62.7% 45.8% 39.6%
AVG 7.5.503 83.8% 78.6% 72.2%
BitDefender 7.1.2559 83.9% 79.7% 78.5%
ClamAV 0.91.2 57.5% 48.8% 46.8%
CWSandbox 2.0 N/A N/A N/A
F-Prot 6.0.8.0 70.4% 49.6% 46.0%
F-Secure 8.00.101 80.9% 74.4% 60.3%
Kaspersky 7.0.0.125 89.2% 84.0% 78.5%
McAfee 8.5.0i 70.5% 56.7% 53.9%
Norman 1.8 N/A N/A N/A
Symantec 15.0.0.58 60.8% 38.8% 45.2%
Trend Micro 16.00 79.4% 74.6% 75.3%
(a) 1 Year 3 MonthsRecency of Malware Sample1 Month 1 Week 1 Day d
20 30 40 50 60 70 80 90 100 Percent Detected Avast AVG BitDefender ClamAV F-Prot F-Secure Kaspersky McAfee Symantec TrendMicro (b)
Figure 2.1: Detection rate for 10 popular antivirus products as a function of the age of the malware samples.
However, two important trends, that we detail in this section, call into question the long-term effectiveness of products from commercial antivirus vendors. The first is that antivirus software fails to detect a significant percentage of malware in the wild. Moreover, there is a significant vulnerability window between when a threat first appears and when antivirus vendors generate a signature or modify their software to detect the threat. This means that end systems with the latest antivirus software and signatures can still be vulnerable for long periods of time. The second important trend is that the increasing complexity of antivirus software and services has indirectly resulted in vulnerabilities that can be and are being exploited by malware. That is, malware is actually exploiting vulnerabilities in antivirus software as means to infect systems.
2.1.1
Vulnerability Window
The sheer volume of new threats means that it is difficult for any single antivirus vendor to create signatures for all new threats. The ability of a vendor to create signatures for new threats is dependent on many factors, such as detection algorithms, collection methodology of malware samples, and response time to 0-day malware. The end result is that there is often a significant period of time between when a threat appears and when a signature is created by antivirus vendors. This period of time is known as thevulnerability window.
To quantify the vulnerability window, we analyzed the detection rate of multiple an-tivirus engines across malware samples that were collected over a one-year period. The dataset included 7,220 samples that were collected between November 11th, 2006, and November 10th, 2007. The malware dataset is described in further detail in our evalu-ation. The signatures used for the antivirus were updated the day after collection ended (November 11th, 2007) and stayed constant throughout the analysis.
In the first experiment, we analyzed the detection of recent malware. We created three groups of malware: one that included malware collected more recently than three months ago, one that included malware collected more recently than one month ago, and one that included malware collected more recently than one week ago. The antivirus engine and signature versions, along with their associated detection rates for each time period, are listed in Figure 2.1(a). The table clearly shows that the detection rate decreases as the malware becomes more recent. Specifically, the number of malware samples detected in the one week time period, arguably the most recent and important threats, is quite low.
In the second experiment, we extended this analysis across all the days in the year dur-ing which the malware samples were collected. Figure 2.1(b) shows significant degradation of antivirus engine detection rates as the age, or recency, of the malware sample is varied. As can be seen in the figure, detection rates can drop over 45% when one day’s worth of malware is compared to one year’s worth. As the plot shows, antivirus engines tend to be effective against malware that is a year old but much less useful in detecting more recent malware, which poses the greatest threat to end hosts.
2.1.2
Antivirus Software Vulnerabilities
A second major concern about the long-term viability of host-based antivirus software is that the complexity of antivirus software has resulted in an increased risk of security vul-nerabilities. Indeed, severe vulnerabilities have been discovered in the antivirus engines of nearly every vendor. While local exploits are more common (ioctlvulnerabilities [57, 58], overflows in decompression routines [59], etc.), remote exploits in management interfaces have been observed in the wild [55]. Due to the common need for elevated privileges by
0 5 10 15 20 25 30 35 40 45 50 55 60 65
ClamAV McAfee TrendMicro Symantec Kaspersky F-Secure Avast BitDefenderAVG F-Prot
Number of Vulnerabilities
AV Vendors
Severity of CVE/NVD Antivirus Vulnerabilities Low Severity Medium Severity High Severity
Figure 2.2: Number of vulnerabilities reported in the National Vulnerability Database (NVD) for 10 antivirus vendors between 2005 and 2007
antivirus software, many of these vulnerabilities result in a complete compromise of the affected end host.
Figure 2.2 shows the number of vulnerabilities reported in the National Vulnerability Database [140] for 10 popular antivirus vendors between 2005 and 2007. This large number of reported vulnerabilities demonstrates not only the risk involved in deploying antivirus software but also an evolution in tactics, as attackers are now targeting vulnerabilities in antivirus software itself.
2.2
Approach
This thesis advocates a new model for the detection functionality currently performed by antivirus software. First, the detection capabilities currently provided by host-based an-tivirus software can be more efficiently and effectively provided as a cloud service. Second, the identification of malicious software should be determined by multiple, heterogeneous detection engines in parallel.
2.2.1
Deployment Environment
Before discussing the details of the approach, it is important to understand the envi-ronment in which such an architecture is most effective. While such an architecture can be very effective in a standalone deployment, the CloudAV system can also be deployed along-side existing antivirus engines and host-based intrusion detection systems. Some possible deployment environments include the following:
• Enterprise networks: Enterprise networks tend to be highly-controlled environ-ments in which IT administrators control both desktop and server software. In ad-dition, enterprises typically have good network connectivity with low latencies and high bandwidth between workstations and back-office systems.
• Government networks: Like enterprise networks, government networks tend to be highly controlled with strictly-enforced software and security practices. In addition, policy enforcement, access control, and forensic logging can be useful in tracking sensitive information.
Privacy implications: Shifting file analysis to a central location provides significant ben-efits but also has important privacy implications. It is critical that users of a cloud-based antivirus solution understand that their files may be transferred to another computer for analysis. There may be situations in which this might not be acceptable to users (e.g., many law firms and many consumer broadband customers). However, in controlled envi-ronments with explicit network access policies, like many enterprises, such issues are of less concern. Moreover, the amount of information that is collected can be carefully con-trolled depending on the environment. As we will discuss later, information about each file analyzed and which files are cached can be controlled, depending on the policies of the network.
2.2.2
Cloud-Based Detection
The core of the proposed approach is moving the detection of malicious files off of end hosts and into the cloud. By moving the complexity of the detection engines off of the
end host, we significantly lower the complexity of host-based monitoring software. Clients no longer need to continually update their local signature database, reducing administra-tive cost. Simplifying the host software also decreases the chance that it could contain exploitable vulnerabilities [110, 55]. Finally, a lightweight host agent allows the service to be extended to resource-limited devices that lack sufficient processing power but remain an enticing target for malware.
While moving detection to the cloud has a number of benefits, the approach is not with-out trade-offs. Certainly, availability and connectivity to the cloud service is of paramount concern when moving services from a host-centric to cloud-centric model. We believe that cloud computing and high-speed networking makes the deployment of cloud-based detec-tion more practical and worthy of exploradetec-tion. Issues and policy surrounding disconnected operation are discussed in further detail later in this chapter.
2.2.3
N-Version Protection
Moving to a cloud-centric architecture also enables us to employ a set of heterogeneous detection engines that are used to provide analysis results on a file, which we callN-version protection. This approach is analogous to N-version programming, a paradigm in which multiple implementations of critical software are written by independent parties to increase the reliability of software by reducing the probability of concurrent failures [15]. While N-version programming uses multiple implementations to increase fault tolerance in complex software, N-version protection uses multiple independent implementations of detection en-gines in order to increase coverage against a highly complex and ever-evolving ecosystem of malicious software.
2.3
Architecture
In order to move the detection of malicious files off of end hosts and into the network, several important challenges must be overcome: (1) unlike existing antivirus software, files must be transported into the network for analysis; (2) an efficient analysis system must
Host Agent P2P Email IM Files HTTP File Threat Report End Host Safe Files
Suspicious Files Network Service
Media Analysis Engines
Forensics Archive
Figure 2.3: Architectural approach for cloud-centric file analysis service.
be constructed in order to handle the analysis of files from many different hosts using many different detection engines in parallel; and (3) the performance of the system must be similar to or better than existing detection systems, such as antivirus software.
To address these problems, we envision an architecture that includes three major com-ponents. The first is a lightweighthost agent run on end hosts that identifies new files and sends them into the network for analysis. The second is a cloud servicethat receives files from the host agent, identifies malicious content, and indicates to hosts whether access to the files is safe. The third component is an archival and forensics service that stores in-formation about what files were analyzed and provides a query and alerting interface for operators. Figure 2.3 shows the high-level architecture of the cloud-based approach.
2.3.1
Client Software
Malicious files can enter an organization from many sources. For example, USB drives, email attachments, downloads, and vulnerable network services are all common entry points. Due to the broad range of entry vectors, the proposed architecture uses a lightweight file acquisition agent run on each end system.
Just like existing antivirus software, the host agent runs on each end host and inspects each file on the system. Access to each file is trapped and diverted to a handling routine which begins by generating a unique identifier (UID) of the file and comparing that iden-tifier against a cache of previously analyzed files. If a file UID is not present in the cache then the file is sent to the cloud service for analysis.
To make the analysis process more efficient, the architecture provides a method for sending a file for analysis as soon as it is written on the end host’s filesystem (e.g., via file-copy, installation, or download). Doing so amortizes the transmission and analysis cost over the time elapsed between file creation and system- or user-initiated access.
2.3.1.1 Threat Model
The threat model for the host agent is similar to that of existing software protection mechanisms, such as antivirus, host-based firewalls, and host-based intrusion detection. As with these host-based systems, if an attacker has already achieved code execution priv-ileges, it may be possible to evade or disable the host agent. As previously discussed, antivirus software contains many vulnerabilities that can be directly targeted by malware due to its complexity. By reducing the complexity of the host agent by moving detection into the network, it is possible to reduce the vulnerability footprint of host software that may lead to elevated privileges or code execution.
2.3.1.2 File Unique Identifiers
One of the core components of the host agent is the file unique identifier (UID) gen-erator. The goal of the UID generator is to provide a compact summary of a file. That summary is transmitted over the network to determine if an identical file has already been analyzed by the cloud service. One of the simplest methods of generating a UID is using a cryptographic hash of a file, such as MD5 or SHA-1. Cryptographic hashes are fast and provide excellent resistance to collision attacks. However, the same collision resistance also means that changing a single byte in a file results in a completely different UID. To combat polymorphic threats, a more complex UID generator algorithm could be employed to avoid impacting cache performance. For example, a method such as locality-preserving hashing in multidimensional spaces [97] could be used to track differences between two files in a compact manner.
2.3.1.3 User Interface
We envision three majors modes of operation that affect how users interact with the host agent. These modes range from less to more interactive, depending on the policy and security requirements of the organization deploying CloudAV.
• Transparent mode: In this mode, the detection software is completely transparent to the end user. Files are sent into the cloud for analysis, but the execution or loading of a file is never blocked or interrupted. In this mode, end hosts can become infected by known malware, but administrators can use detection alerts and detailed forensic information to aid in cleaning up infected systems.
• Warning mode: In this mode, access to a file is blocked until an access directive
has been returned to the host agent. If the file is classified as unsafe then the user is presented a warning about why the file is suspicious. The user is then allowed to choose whether to proceed in accessing the file.
• Blocking mode:In this mode, access to a file is blocked until anaccess directivehas been returned to the host agent. If the file is classified as suspicious then access to the file is denied and the user is informed about it by an error dialog.
2.3.1.4 Other File Acquisition Methods
While the host agent is the primary method of acquiring candidate files and transmitting them to the cloud service for analysis, other methods can also be employed to increase the performance and visibility of the system. For example, a network sensor or tap that monitors the traffic of a network may pull files directly out of a network stream using deep packet inspection (DPI) techniques. By identifying files and performing analysis before the file even reaches the destination host, the need to retransmit the file to the network service is alleviated and user-perceived latencies can be reduced. Clearly, this approach cannot completely replace the host agent, because network traffic can be encrypted, files may be encapsulated in unknown protocols, and the network is only one source of malicious content.
2.3.2
Cloud Service
The second major component of the architecture is the cloud service responsible for file analysis. The core task of the cloud service is to determine whether a file is malicious. Unlike existing systems, each file is analyzed by a collection of detection engines. That is, each file is analyzed by multiple detection engines in parallel and a final determination of whether a file is malicious is made by aggregating these individual results into a threat report.
2.3.2.1 Detection Engines
A cluster of servers can quickly analyze files using multiple detection techniques. Ad-ditional detection engines can easily be integrated into a cloud service, allowing for consid-erable extensibility. Such comprehensive analysis can significantly increase the detection coverage of malicious software. In addition, the use of engines from different vendors who use different detection techniques means that the overall result does not rely too heavily on a single vendor or detection technology.
A wide range of both lightweight and heavyweight detection techniques can be used in the backend. For example, lightweight detection systems like existingantivirus engines
can be used to evaluate candidate files. In addition, more heavyweight detectors like be-havioral analyzers can also be used. A behavioral system executes a suspicious file in a sandboxed environment (e.g., Norman Sandbox [141], CWSandbox [36]) or virtual ma-chine and records host state changes and network activity. Such deep analysis is difficult or impossible to accomplish on resource-constrained devices but is possible when detection is moved to dedicated servers. In addition, instead of forcing signature updates to every host, detection engines can be kept up-to-date with the latest vendor signatures at a central source.
Finally, running multiple detection engines within the same service provides the ability to correlate information between engines. For example, if a detector finds that the behavior of an unknown file is similar to that of a file previously classified as malicious by antivirus engines, the unknown file can be marked as suspicious.
2.3.2.2 Result Aggregation
The results from the different detection engines must be combined to determine whether a file is safe to open, access, or execute. Several variables may impact this process.
First, results from the detection engines may reach the aggregator at different times. For example, if a detector fails, it may never return any results. In order to prevent a slow or failed detector from holding up a host, the aggregator can use a subset of results to determine whether a file is safe. Determining the size of such a quorum depends on the deployment scenario and variables like the number of detection engines, security policies, and latency requirements.
Second, the metadata returned by each detector may be different so the detection re-sults are wrapped in a container object that describes how the data should be interpreted. For example, behavioral analysis reports may not indicate whether a file is safe but can be attached to the final aggregation report to help users, operators, or external programs interpret the results.
Lastly, the threshold at which a candidate file is deemed unsafe or malicious may be defined by security policy. For example, some administrators may opt for a strict policy where a single engine is sufficient to deem a file malicious, while less security-conscious administrators may require multiple engines to agree to deem a file malicious. We explore the balance between coverage and confidence further in the discussion section.
The result of the aggregation process is a threat report that is sent to the host agent and can be cached on the server. A threat report can contain a variety of metadata and analysis results about a file. The specific contents of the report depend on the deployment scenario. Some possible report sections include: (1) an operation directive; a set of instructions indi-cating the action to be performed by the host agent, such as how the file should be accessed, opened, executed, or quarantined; (2) family/variant labels; a list of malware family/variant classification labels assigned to the file by the different detection engines; and (3) behav-ioral analysis; a list of host and network behaviors observed during simulation. This may include information about processes spawned, files and registry keys modified, network activity, or other state changes.
2.3.2.3 Caching
Once a threat report has been generated for a candidate file, it can be stored in both a local cache on the host agent and in a shared remote cache on the server. This means that once a file has been analyzed, subsequent accesses to that file by the user can be determined locally without requiring network access. Moreover, once a single host in a network has accessed a file and sent it to the cloud service for analysis, any subsequent access of the same file by other hosts in the network can leverage the existing threat report in the shared remote cache on the server. Cached reports stored in the cloud service may also periodically bepushedto the host agent, to speed up future accesses, and invalidated when deemed necessary.
2.3.3
Archival and Forensics Service
The third and final component of the architecture is a service that provides information on file usage across participating hosts, which can assist in post-infection forensic analysis. While some forensics tracking systems [106, 68] provide fine-grained details tracing back to the exact vulnerable processes and system objects involved in an infection, they are often accompanied by high storage requirements and performance degradation. Instead, we opt for a lightweight solution consisting of file access information sent by the host agent and stored securely by the cloud service, in addition to the behavioral profiles of malicious software generated by the behavioral detection engines. Depending on the privacy policy of an organization, a tunable amount of forensics information can be logged and sent to the archival service. For example, a more security conscious organization could specify that information about every executable launch be recorded and sent to the archival service. Another policy might specify that only accesses to unsafe files be archived without any personally identifiable information.
Archiving forensic and file usage information provides a rich information source for both security professionals and administrators. From a security perspective, tracking the system events leading up to an infection can assist in determining its cause, assessing the risk involved with the compromise, and aiding in any necessary disinfection and cleanup. In
addition, threat reports from behavioral engines provide a valuable source of forensic data, because the exact operations performed by a piece of malicious software can be analyzed in detail. From a general administration perspective, knowledge about which applications and files are frequently in use can aid the placement of file caches, application servers, and even be used to determine the optimal number of licenses needed for expensive applications.
Consider the outbreak of a 0-day exploit. An enterprise might receive a notice of a new malware attack and wonder how many of their systems were infected. In the past, this might require performing an inventory of all systems, determining which were running vulnerable software, and then manually inspecting each system. Using the forensics archival interface in the proposed architecture, an operator could search for the UID of the malicious file over the past few months and instantly find out where, when, and who opened the file and what malicious actions the file performed. The impacted machines could then immediately be quarantined.
The forensics archive also enablesretrospective detection. This means that the complete archive of files that are transmitted to the cloud service may be re-scanned by the antivirus engines whenever a signature update occurs. Retrospective detection allows previously undetected malware that has infected a host to be identified and quarantined.
2.4
CloudAV Implementation
To explore and validate the proposed cloud-based architecture, we constructed a pro-duction quality implementation called CloudAV. In this section, we describe how CloudAV implements each of the three main components of the architecture.
2.4.1
Host Agent
We implement the host agent for a variety of platforms including Windows 2000/XP/Vista, Linux 2.4/2.6, and FreeBSD 6.0+. The implementation of the host agent is designed to ac-quire executable files for analysis by the cloud service, as executables are a common source of malicious content. We discuss how the agent can be extended to acquire DLLs,
docu-ments, and other common malcode-bearing files types in the discussion section.
While the exact APIs are platform dependent (CreateProcess on Win32, execve syscall on Linux 2.4, LSM hooks on Linux 2.6, etc.), the host agent hooks and interposes on system events. This interposition is implemented via the MadCodeHook [152] package on the Win32 platform and via the Dazuko [147] framework for the other platforms. Process creation events are interposed upon by the host agent to acquire and process candidate executables before they are allowed to continue. In addition, filesystem events are captured in order to identify new files entering a host and preemptively transfer them to the cloud service before execution to eliminate any user-perceived latencies.
As motivating factors of our work include the complexity and security risks involved in running host-based antivirus software, the host agent was designed to be simple and lightweight, both in code size and resource requirements. The Win32 agent is approxi-mately 1,500 lines of code of which 60% is managed code, further reducing the vulnerabil-ity profile of the agent. The agent for the other platforms is written in Python and is under 300 lines of code.
While the host agent is primarily targeted at end hosts, our architecture is also effective in other deployment scenarios such as mail servers. To demonstrate this, we also imple-mented a Milter (mail filter) frontend for use with mail transfer agents (MTAs), such as Sendmail and Postfix. This Milter frontend allows us to scan all attachments on incoming emails. Using the Pymilter API, the Milter frontend weighs in at approximately 100 lines of code.
2.4.2
Cloud Service
The cloud service acts as a dispatch manager between the host agent and the backend analysis engines. Incoming candidate files are received, analyzed, and a threat report is returned to the host agent dictating the appropriate action to take. Communication between the host agent and the cloud service uses a HTTP wire protocol protected by mutually authenticated SSL/TLS. Between the components within the cloud service itself, commu-nication is performed via a publish/subscribe bus to allow modularization and scalability.
(a) (b)
Figure 2.4: Screen captures of the detection engine VM monitoring interface (a) and the web management portal which provides access to forensic data and threat reports (b).
The cloud service allows for various priorities to be assigned to analysis requests to aid latency-sensitive applications and penalize misbehaving hosts. For example, application scanning may take higher analysis priority than background analysis tasks such as retroac-tive detection and mail scanning. This also enables the system to penalize or temporarily suspend misbehaving hosts that may try to submit many analysis requests or otherwise flood the system.
Each backend engine runs in a Xen virtualized container, which offers significant ad-vantages in terms of isolation and scalability. Given the numerous vulnerabilities in existing antivirus software, isolation of the antivirus engines from the rest of the system is vital. If one of the antivirus engines in the backend is targeted and successfully exploited by a ma-licious candidate file, the virtualized container can simply be disposed of and immediately reverted to a clean snapshot. As for scalability, virtualized containers allow the network service to spin up multiple instances of a particular engine when demand for its services increases.
Our current implementation employs 12 engines: 10 traditional antivirus engines (Avast [5], AVG [6], BitDefender [7], ClamAV [168], F-Prot [8], F-Secure [9], Kaspersky [10], McAfee [11], Symantec [12], and Trend Micro [13]) and two behavioral engines (Norman Sandbox [141] and CWSandbox [36]). The exact version of each detection engine is listed in Figure 2.1(a). Nine of the backend engines run in a Windows XP environment using Xen’s HVM capa-bilities, while the other three run in a Gentoo Linux environment using Xen domU par-avirtualization. Implementing each particular engine for the backend is a simple task and
extending the backend with additional engines in the future is equally as simple. For ref-erence, the amount of code required for each engine is 42 lines of Python code on average with a median of 26 lines of code.
2.4.3
Management Interface
The third component of the CloudAV architecture is a management interface that pro-vides access to the forensics archive, policy enforcement, alerting, and report generation. These interfaces are exposed to network administrators via a web-based management in-terface. The we