Limitations - Wide spectrum attribution: Using deception for attribution intelligence in cyber

In this section limitations of the research are identified and analysed. These limitations form the basis for some items of future work, detailed in the next section.

Real-world validation of AHFW

In this research, DICE was validated both in laboratory conditions and with real adversary attacks, while AHFW was only tested within laboratory conditions, due to time and scoping constraints. The next logical step is to test AHFW with real adversaries and in different environments that experience cyber attacks. An alternative approach, with less risk but more planning, would be to expand the laboratory conditions to include a team of penetration testers who are unaware that honeypots are included in the scenario, and task them with goals that should result in AHFW receiving a greater number of interactions and greater diversity of interactions than an equivalent non-AHFW environment. While penetration testers may not act exactly in the same way as adversaries, they often have similar goals, such as to retrieve a trophy file from a particular system. These interactions should invoke the AHFW actor-dependent triggers in the same way that an adversary would.

Machine learning techniques for interpretation in AHFW

AHFW was designed with the intent of pluggable modules, to enable approaches such as machine learning to be added without much difficulty. Due to time and scoping constraints no machine learning techniques were experimentally validated in the framework, instead a policy approach was used that connects observations to adaptations. Research involving machine learning techniques for interpretation, that connects observations to adaptations, is a logical next step for this research. For example, different techniques such as supervised learning, case-based reasoning and reinforcement learning could be added and compared to identify the most effective technique and this can trivially be measured by the quantity and diversity of interactions received.

Success metrics

This research proposes success metrics for adaptive honeypots as the quantity and diversity of interactions received. This is a simple way to initially measure the success of the effects of adaptive honeypots, however, further research should explore more subtle metrics. For example, portraying the possible honeynet routes as attack trees or probability maps. This could be displayed visually to an analyst, for example, as a maze graphic.

Anti-honeypot techniques

In the field of malware analysis, anti-analysis techniques exist that aim to make analysis of malware difficult or impossible. Similarly, anti-honeypot techniques exist. This thesis uses VMI to reduce the likelihood of detection by physical inspection. In another scenario, an adversary could learn that the system is a honeypot through other means, e.g. an inside source. The adversary could then flood or poison the honeypot with “junk interactions” to dissuade automated interpretation or human-led interpretation. This is similar to setting off a fire alarm many times, so that the operator disables it. Operators might disable the honeypot if they receive junk information, or they might think that it is broken. Worst, an adversary could convince the honeypot to relay false information to the operator, implicating an innocent party. An adversary could attempt to overload the system with interactions to cause a backlog of observations, to crash the system or bring it to a halt temporarily.

Therefore, techniques that could affect AHFW and DICE in this way should be explored further, as should countermeasures, i.e. anti-anti-honeypot techniques.

Applicability of AHFW to physical environments

The observation technique that was selected for AHFW, introspection, works only on virtualised environments and not physical systems. Various properties can be used to discern virtual machines from physical systems, such as logical, resource and timing-based discrepancies (Garfinkel et al., 2007). However, nowadays this is mostly a moot point as the use of VMs is widespread in production infrastructure, such that VMs are equally a valid target as physical systems. Detection of a virtual environment is no longer a reliable indicator of the presence of a honeypot and the information has little meaning, in this respect, to an adversary. Garfinkel et al. (2007) emphasise this point:

Virtualization has made massive inroads into enterprise data centers, a trend that is expected to continue. Soon, malware that limits itself to non-virtualized platforms will be passing up a large percentage of commercial, military and institutional targets. To the degree that malware disables itself in the presence of VMs, VMs become even more attractive for production systems. In the long run, malware authors are motivated to operate regardless of the presence of a VMM.

Performance limitations

An open question is performance and, through experiments, this thesis has shown that AHFW performs at least equally to similar introspection techniques. However, the performance limitations of AHFW need to be fully understood and this was not included in this research as it was determined to be out of scope. Once understood, uses of techniques such as GPU and CUDA processing could be considered to ease performance burden. The performance burden could also be shared with other complementary observation techniques, such as networking monitoring. In doing so, AHFW becomes less susceptible to detection and better able to engage with adversaries.

7.4 Future Work

This section highlights areas for future work based on the outcomes and noted limitations of this research. Future research is divided in two categories; high level themes in Section 7.4.1 and immediate work packages in Section 7.4.2. High level themes are research ideas that warrant further investigation and need splitting into work packages. Work packages can begin immediately and have a shorter duration to completion.

7.4.1 Themes

Attribution prioritisation

Organisations, government and military are regularly targeted by cyber attacks (Defence Manage- ment, 2011; U.S. DoD, 2012). Attributing every cyber attack is costly in monetary value and time. Prioritisation of attribution is needed. A cyber attack against an industrial control system that causes loss of life or significant financial impact, may warrant attribution more than a port scan.

Prioritisation should initially be based on the impact of the attack and should be specific to the victims industry. Probabilistic models should be used to weigh impact based on multiple, sometimes conflicting, attribution intelligence sources. An appropriate approach is multiple-criteria decision analysis.

Honeypot observation techniques

Cheng et al. (2009) identify monitoring as one of four important research questions. They highlight that quality of service degradation could outweigh benefits of adaptation and note that “more research on lightweight monitoring techniques is needed”. Introspection was selected as the honeypot observation technique for AHFW in Chapter 6. Researchers have identified a number of introspection challenges (Floeren, 2013) and therefore other techniques, such as hidden kernel modules and vampire network taps, may offer benefits that introspection does not. Section 7.3 identified possible techniques to reduce performance burden, such as GPU processing, and also complementary observation techniques, such as network monitoring.

Honeypot decision making

A simple policy-based decision making process was selected for AHFW for expediency and to main- tain focus on the overall framework during this research. There are many options for computational decision making to control the interpretation of observations to select adaptations. Once a number of decision making processes are identified they can be measured against each other and a taxonomy of honeypot decision making engines created. Research should explore the differences of decisions that favour the adversary, making it easier for the adversary, or discouraging the adversary, making it more difficult for them. Many interesting attributes can be used to drive the decision making process; the culture of the adversary, their location, the time that they attack, the amount of tries that they use, preferred tools, failures and success rates, etc. AHFW has been designed as a framework for interpretation of observations, such as that hooks are available for observations and adaptations. Honeypot adaptation

Cheng et al. (2009) describe three dependencies that create change that in turn influences adaptations: actor-dependent, system-dependent and environment-dependent. Previous research has focused on environment-dependent triggers. AHFW focused on actor-dependent triggers. Cheng et al. (2009) note that “a fruitful avenue of research is a more comprehensive approach that leverages several adaptation techniques simultaneously”. Further research should investigate the use of all three dependencies.

The AHFW adaptation engine focuses on the modification of workstations, devices, etc, in the vicinity of the adversary. Other research has adapted the responses to adversary-entered com- mands (Wagener et al., 2009). Bilar and Saltaformaggio (2011) modify the workstation that the adversary, most commonly automated malware, is currently situated on.

Previous research in this area has focused on read-only techniques, as performing writable introspection can potentially disrupt kernel structures and put the virtual operating system into an unstable state. Researchers are beginning to explore writable introspection (Lin, 2013) and this provides many opportunities to continue this research.

In AHFW honeypots are adapted in the nearby vicinity to bypass the writable issue. Future work should adapt the honeypot that the adversary is currently interacting with, perhaps by keeping track of the things that the adversary has already interacted with. For example, an interactive map could visualise parts of the system that have been interacted with and those that have not been interacted with. A human operator or automated process could then decide which parts of the system to adapt. Examples would be identical to those demonstrated in this research, such as files, folders, user accounts, processes and services.

Cheng et al. (2009) also identify many dimensions that self-adaptive systems need to consider. For example, adaptive overheads should not affect overall system performance. In the context of this research, an adversary could overload the system with changes to create an overload of adaptations, thereby crashing and disabling the system. At this early stage of research these issues were considered, but not to any great depth. Further research should carefully scrutinise each of the dimensions identified by Cheng et al. (2009), when applied to adaptive honeypots.

AHFW was purposely designed to be easily accessible for future research. Machine learning techniques, e.g. supervised learning or case based reasoning should be used to autonomously control adaptations based on observations. Wagener et al. (2009) has already had some success in this area by using self-learning techniques for an adaptive honeypot.

Adaptive architectures

AHFW uses a single control loop for feedback. Multiple control loops present interesting challenges in adaptive honeypot research. Different architectures are possible, e.g. multiple control loops could monitor a single honeypot, each responsible for a separate aspect of the system. Or multiple control loops could each be responsible for precisely one honeypot in a honeynet. In both examples the communication between control loops, e.g. how they interact, ensuring that adaptations are relevant between control loops, is an interesting area of research.

AHFW topology

In AHFW adaptations take place on honeypots in the nearby vicinity of the adversary. Honey- farm proposals such as Collapsar demonstrate ways to deploy and manage large swathes of honeypots (Jiang et al., 2006). This work could be combined with AHFW. Nested virtualisation is increasingly becoming popular, i.e. virtualisation within virtualisation. This technology is useful for cloud customers who wish to use virtualisation. Beham et al. (2013) carried out initial performance experiments with VMI-based intrusion detection system (IDS) and honeypots, particularly VMI- Honeymon. Similar experiments using AHFW should take place. XenBlanket is one approach to nested virtualisation that is a useful starting point (Williams et al., 2012).

7.4.2 Work packages

Work packages identified for immediate work are: Work Package 1: Follow up studies

In this research two proof-of-concept systems, DICE and AHFW, have been designed and implemented to demonstrate the possibilities of two new approaches. By proving the feasibility of these

new techniques, this calls for further research and experiments such as: repeat, scaled up and follow on studies including studies using different data sets and studies using different environments. For example, Section 5.2.4 (page 113) noted that adding complementary techniques such as Evercookie to DICE in HTTP, would be a useful starting point. Similarly, AHFW should be scaled up by adding more honeypots and different operating systems e.g. Windows, Mac OS and Android. Volatility, the forensics toolkit that AHFW uses for interpreting memory collected by introspection, supports a wide array of operating systems (Walters, 2014).

Work Package 2: Further analysis of DICE data set

Work that can begin immediately based on the data set created in this research is the identification of i) adversaries that return and hide their origin and ii) adversaries that share their resources with fellow adversaries. One way to achieve this is to analyse timing channels in keyboard strokes in the data sets that were created. Authorship attribution techniques are also well suited to this task. Another approach is to use search engines to find the successful credential pairs. DICE creates unique credential pairs that are unlikely to be found elsewhere and therefore can be searched for on underground forums, Internet Relay Chat (IRC) chat rooms and content sharing web sites such as pastebin.com.

Work Package 3: Cross environment application

The approaches defined in this thesis are generic enough to be applied in other cyber environments. Sophisticated attacks against critical national infrastructure and in particular industrial control systems, have increased (Symantec, 2011). Unfortunately these systems are too often connected to the Internet, leaving them open to attack. Both approaches, DICE and AHFW, could be applied to this environment, to help with attribution of cyber attacks. For example, programmable logic controllers (PLCs) have authentication procedures using standard enterprise protocols such as FTP and HTTP. DICE could be implemented as a PLC and contribute to the emerging field of industrial control system honeypots. Adaptive industrial honeynet environments should also be created that implement the AHFW approach. Other researchers have already had limited success in deploying industrial control system honeypots (Wilhoit, 2013).

Work Package 4: Enhance AHFW policies

The AHFW configuration file schema allows for various strings, operands and wildcards to be used to create flexible policies. This should be improved by adding functionality to handle operator-defined regular expressions. Regular expressions can be used to identify common formats, for example, credit card numbers, website and e-mail addresses, region-specific addresses and phone numbers. This enhancement will enable operators to define generic policy rules and introduces variance to reduce the likelihood of deterministic profiles.

Work Package 5: Collaborative adaptive honeynets

Research in the area of adaptive honeypots is increasing (Wagener et al., 2009; Pauna and Patriciu, 2014). Current proposals suggest systems that are singular and do not share results of their learning

process. These honeypots should collaborate and multi-agent planning in artificial intelligence is a natural extension to AHFW.

A group of agents, i.e. honeypots, have a common goal and each agent is designated sub-goals that work towards achieving the common goal. As a trivial example, a goal is to engage with an adversary for as longer duration as possible. This is a strategic high-level goal. Sub-goals could be tactical, such as agent a will use technique x, while agent b will use technique y to try to engage with the adversary for longer. The benefit of this technique is that the learning feedback loop is much shorter than currently implemented in AHFW as agents can share their learning outcomes. This approach could be extended further by allowing multiple, disparate organisations spread over geographic regions to collaborate.

7.5 Closing Remarks

This thesis explored solutions to help with the attribution problem. Two approaches were proposed, AHFW and DICE, that contribute to solving the problem and advance the field of honeynets. The contribution of this research is important for the attribution problem and in particular for the advancement of deception technologies for attribution. The suggested research themes highlight general research areas to be explored, while the work packages identify specific and immediate work to carry out. Both approaches lead the way for better attribution deception techniques that help to solve the attribution problem.

Glossary

AB ApacheBench (AB) is a benchmarking tool for the Apache HTTP web server. 129

Actor-dependent trigger The trigger that initiates an action on behalf of an adaptive honeypot/honeynet response to an adversary. Other triggers include system and environment- dependent. 4, 5

Adversary An adversary is the conductor or origin of a cyber attack. Adversaries are most often categorised by their motives, e.g. cyber criminal, cyber terrorist, nation state actor and hobbyist. 9

AHFW Adaptive Honeynet Framework (AHFW) is a framework for adaptive honeynets that rely on actor-dependent triggers for adaptation to appear more realistic and appealing to an adversary. iii, 4, 85, 115, 134

API An Application Programming Interface (API) is a software interface that allows developers to build systems that communicate with other systems. 75, 119

Attack actor See adversary. 4, 9

Attribution artifacts Attribution artifacts are results yielded by attribution techniques. These are categorised as digital or physical. Digital artifacts include IP address, port numbers, e-mail address, usernames, passwords, browsing history, malware variants, etc. Physical artifacts include name, home address, geographic location, organisation or employer name, state of origin, etc. 19

AV Anti-Virus (AV) is software that identifies malicious files, most often by comparing signatures of suspicious files with known malicious signatures. 12, 54, 80

Bloom filter A space-efficient probabilistic data structure used in a number of traceback approaches. 29, 32, 40, 46, 47, 74

Crimeware A class of malware designed to automate cyber crime. Two examples are Zeus and SpyEye. 10, 22, 23

Cyber An interactive domain made up of digital networks that is used to store, modify and communicate information (UK Government, 2014). 1, 5, 7

Cyber attack A cyber-attack consists of any action taken to undermine the functions of a computer network for a political or national security purpose (Hathaway et al., 2012). 8

Cyber-dependent crime Cyber-dependent crimes can only be committed using computers, computer networks or other forms of Information and Communications Technology (ICT), such as dissemination of malicious software and Distributed Denial-of-Service (DDoS). 10

Cyber-enabled crime Cyber-enabled crimes are extensions of traditional crimes that use an aspect of cyber, such as fraud and theft. 10

Darknet A purposefully deployed portion of routed, allocated IP space in which no active sensors or servers reside, with the intention of catching suspicious or malicious traffic. 44, 75

DDoS A Distributed Denial-of-Service (DDoS) is a cyber attack that is identical to a Denial-of- Service (DoS), however, the attack originates from multiple, distributed points. These could be associates or zombie systems that are part of a botnet. 5, 8, 30

DICE Deception Inside Credential Engine (DICE) is a honeypot that identifies attack actors that use different source origins or share credentials with associates, using honeytokens and a reverse password policy for authentication. iii, 4, 101, 115, 134

DKSM Direct Kernel Structure Manipulation (DKSM) is a technique proposed by Bahram et al. (2010) that detects, disables and/or subverts SCI-based virtual machine introspection techniques. 92, 118

DMZ Demilitarized zone (DMZ) is a physical or logical subnet that exposes systems and services to a wider and untrusted network, most often the Internet. This offers an additional layer of security; if a system in the DMZ is compromised, it is not in the same vicinity as other systems

In document Wide spectrum attribution: Using deception for attribution intelligence in cyber attacks (Page 151-193)