• No results found

Present and Future Work

The goal of our present and future research in this area is to combine model learning and mutation-based fuzzing in the following ways.

1. use fuzzing as a source of counterexamples during learning, and 2. use (intermediate) learning results to guide mutation-based fuzzing. At this point in time, we have already put some significant effort into (1): Most importantly, we have implemented a new equivalence oracle, AFLEQOracle, in LearnLib, which iteratively loads a traces that AFL marks as interesting, and parses them as a test query for the learner. Unfortunately, we were unable to apply this new equivalence oracle to the RERS challenge due to time restrictions. The code for this project is available at https://github.com/praseodym/learning-fuzzing.

In this section we give an overview of our current effort on using mutation- based fuzzing as a source of counterexamples during learning.

An overview of the architecture for combining AFL and LearnLib is shown in Figure 6.1. To establish this, we had to tackle the following main issues:

– As AFL is provided as a standalone tool, we have created a library, libafl, that the learner can communicate with.

– As LearnLib is written in Java, and AFL (and libafl) are written in C, we needed to bridge all communication between the two. For this purpose, we have used the Java Native Interface (JNI) programming interface, which is part of the Java language. JNI allows for code running in the Java Virtual Machine (i.e. LearnLib) to interface with platform-specific native binaries or external libraries (i.e. libafl). – We have added the possibility to embed the target program in AFL’s

fork server. For each membership or test query, the fork server creates a new instance of the target process. This speeds up the execution of learning, independent of the technique used to find counterexamples. There were some other issues that we had to address:

JVM

LearnLib libafl

AFL fork server Target process JNI

queries setup

Figure 6.1: Architecture for combining LearnLib and AFL. – AFL is designed such that it does not care about the target program’s

output. Instead only coverage data is used as a measure for test case relevancy. The learner, however, relies on output behaviour. Therefore, we have extended AFL to always save data from the target’s stdout into a shared memory buffer (shared between libafl and the fork server process). The content of this shared memory buffer is returned to LearnLib after a successful query.

– AFL runs the target program in a non-interactive manner, i.e. it provides the program with input once and then expects it to terminate and reset state. This is in contrast to the default behaviour of Learn- Lib, which expects a single-step system under learning that repeatedly accepts an input value and returns the associated output, and has an explicit option to reset. We initially simulated this behaviour in AFL by running the target program once for each prefix of an input sequence. For the RERS challenge, however, we could run each input sequence once, as it was easy to correlate individual inputs to their corresponding outputs.

We have performed some inital experiments with the setup described above. In these experiments we compared different learning setups on their ability for finding error states in the reachability problems of the RERS 2015. For these problems, the number of reachable error states are now known.

A selection of the results is shown in Table 6.6. In addition to the number of (reachability) states learned, this table compares learning performance in terms of learning time and the number of queries needed (lower is better). In all cases, using fuzzing equivalence delivers models with more states and more reachability states found in a shorter learning time. One remark here is that the learning time we report only includes the time the learning process ran, not the time that the fuzzer ran. We ran the AFL fuzzer on

Chapter 6

Table 6.6: Results for the RERS 2015 challenge problems on a Intel Xeon CPU E5-2430 v2 @ 2.50GHz (virtualised server), with Oracle Java 8 JVM configured with 4GB heap.

problem method states errors time queries

1 TTT, W-method 1 25 19/29 4s 7 342 1 L*, W-method 8 25 19/29 13h 2.46 × 108 1 TTT, fuzzing 334 29/29 21s 16 731 1 L*, fuzzing 1 027 29/29 44m 2.86 × 106 2 TTT, W-method 1 188 15/30 1h 8.15 × 106 2 L*, W-method 3 195 15/30 17h 2.39 × 107 2 TTT, fuzzing 2 985 24/30 13m 412 340 2 L*, fuzzing 3 281 24/30 13h 4.21 × 107 3 L*, W-method 1 798 16/32 110h 2.42 × 109 3 TTT, fuzzing 1 054 19/32 13m 698 409 3 L*, fuzzing 1 094 19/32 13h 2.34 × 107 4 TTT, W-method 7 21 1/23 4h 5.17 × 107 4 TTT, fuzzing 7 402 21/23 16m 458 763 5 L*, W-method 1 183 15/30 13h 2.20 × 106 5 TTT, fuzzing 3 376 24/30 8m 416 943 6 L*, W-method 1 671 16/32 93h 8.89 × 108 6 TTT, fuzzing 3 909 23/32 45m 2.80 × 106

each problem for one day, and the test cases that were generated during that time were used for equivalence testing using the learning process.

6.6

Conclusion

An ongoing challenge for learning algorithms formulated in the Minimally Adequate Teacher framework is to efficiently obtain counterexamples. In this chapter we have compared and combined conformance testing and mutation- based fuzzing methods for obtaining counterexamples when learning finite state machine models for the reactive software systems of the RERS challenge. We have found that for the LTL problems of the challenge the fuzzer did not find any additional counterexamples for the learner, compared to those found by the tester. For the reachability problems of the challenge, however, the fuzzer discovered more reachable error states than the learner and tester, albeit in some cases the learner and tester found some that were not 160

discovered by the fuzzer. This leads us to believe that in some applications, fuzzing is a viable technique for finding additional counterexamples for a learning setup.

Protocol Message Format

Inference and its

Applications in Security

Rick Smetsers, Joeri de Ruiter, Sicco Verwer, and Erik Poll

Abstract

A promising application of model learning is in the area of protocol inference. Protocol inference refers to some automated form of reverse engineering the workings of a communication protocol. This can be useful for security analysis in different ways. It can be used to reverse-engineer unknown protocols, to detect security flaws in implementations of known protocols, to fingerprint implementations, or to detect anomalies in protocol usage, for example.

A prerequisite for using model learning in this area is that the protocol’s message format (i.e. input format) is known. In this chapter we give an overview of tools and techniques for inferring the protocol message format, and their applications in security.

Chapter 7

7.1

Introduction

Protocols play a crucial role in modern-day IT systems. They are used between parties that communicate across a network (the prototypical exam- ple being TCP/IP), between different hardware components (e.g., USB), between processes on the same machine (for example OS services, such as a CUPS printer service), and between different components within one process (e.g., the protocols provided by APIs).

Protocols are of great importance for the security of these systems, as any interface at which a system can be attacked comes with an associated protocol. An attacker can try to exploit security flaws in the protocol itself or in a particular implementation of the protocol. For protocols that involve cryptography, such flaws may be of cryptographic nature, but more often than not they are more mundane implementation mistakes, such as the SSL Goto bug on Apple iOS or the Heartbleed bug in OpenSSL.

Even if different implementations of the same protocol do not contain exploitable flaws, they can (and often do) exhibit differences in behaviour. This is often the case because there is some form of freedom or ambiguity in the protocol’s specification (if such a specification exists at all), or simply because the implementations contain mistakes. As a result, an implementation may have unique characteristics that provide a fingerprint of that implementation. Such fingerprints can be interesting for attackers, as it leaks information about the system they are targeting.

The characteristics of the usage of a protocol may also be used as a basis for anomaly detection. Deviations from the normal protocol usage may indicate malicious intent, and can thus be used for intrusion detection.

Attackers not only try to attack the protocols that their victims use, but they may also use their own protocols as part of their attacks. The prime example here is that controlling a botnet requires some communications protocol between the bots and the command and control centre. Reverse engineering such protocols, and ideally finding security flaws, may be useful to take botnets down.

This importance of protocols has motivated a need for formalisms in which different protocol implementations and specifications can be described in a similar way. One way to achieve this is by viewing a protocol implemen- tation not via its internal structure, but through the laws which govern its 164

behaviour: which input messages does it accept at which point, and which messages does it produce in response? This way, implementations (and specifications) that exhibit completely dissimilar compositions, for example because they are written in different programming languages, can still be characterized and analysed through the same set of rules.

There are typically two levels at which one can formally describe the behaviour, or ‘language’, of a protocol: the protocol message format and the protocol state machine. The former describes the structure for individual valid messages in the protocol and the latter describes the temporal control- specific behaviour and data dependencies of messages that make up a protocol session.

In this chapter, we give an overview of tools and techniques for inferring the protocol message format and their applications in security, with the aim to bridge the gap between the academic and applied world, and propagate further research in the area. It was observed by Bossert and Guilh´ery that there is a huge difference between the academic and the applied world in the field of protocol inference for security applications [24]. In the applied world on the one hand, “experts can be [seen] as fighters specialized in one-lines commands [that are] able to compute any CRC and format trans-coding by heart”. In the academic world on the other hand, papers emerge in different subfields of (software) engineering, that use different terminology for similar problems related to protocol inference. This has resulted in a dissonance between researchers and security experts.

Recently, two other survey papers have appeared on the subject of protocol inference [106, 52]. Both of these surveys explicitly focus on the tools that have been proposed. Our survey contains such an overview as well, but in addition gives a comprehensive and cohesive overview of the techniques that these tools implement. We believe that this approach is more useful in bridging the gap between the academic and applied world, and propagate further research in the area.

Overview Most of the work on protocol inference focuses on communica- tions protocols (despite that the techniques can be applied elsewhere as well). Therefore, we first introduce the terminology of communications protocols in Section 7.2. Then, in Section 7.3, we give a general classification of protocol inference techniques. In Section 7.4 we describe the techniques that have

Chapter 7

been proposed for reverse engineering message formats. In Section 7.5 we give an overview of the applications of these approaches in security. Finally, in Section 7.6 we conclude our work.