False Positives and Negatives - Security-Pattern Recognition and Validation

10.3 Discussion

10.3.2 False Positives and Negatives

Our tool produces both false positives and negatives, however, with a considerably better false-positive rate (99% precision compared to 80% recall). False positives give the analyst a false sense that certain security features are used, which is not the case. At least, false positives may puzzle the analyst. As the false-positive rate, however, is comparatively low, this point does not pose a threat to our approach.

False negatives are pattern instances that our tool misses. Nevertheless, the tool detects a significant portion of correct pattern instances in code (199 of 248), which saves time and effort compared to solely carrying out a manual analysis. Without the tool, an analyst must manually note down trace hierarchies. Otherwise, she would end up visiting the same code locations several times during the review, e.g., if the security-relevant code is scattered across several classes/methods (see also Section 9.2).

According to the security experts from SAFECode, the fact that expected pattern instances are missing is an important information for a security analyst. If she finds fewer pattern instances than expected, she may focus the security analysis on those particular security requirements. If it turns out that self-implemented security features have been implemented (in general a bad practice), a more thorough manual assessment is required. For example, the many self-implemented Authentication Enforcer patterns, which do not use well-tested authentication features (such as the Android account manager API), must be analyzed in more detail.

10.4 Summary

The pattern-variant detection has been evaluated in a case study with 25 non-trivial Android applications and shows acceptable results on precision and recall. Besides, this benchmark of benign Android apps can be used by researchers in other projects. Our tool can be used for reviewing Android applications at earlier development stages and before a software will be released. It also provides support for security audits by automatically detecting source code locations of implemented security features and tracing a security-relevant object’s configuration. Analysts can

utilize the visual representation of connected objects to understand security-object interactions. We see our tool as an enhancement for reviewing Android applications rather than a replacement for common vulnerability detection tools, which only support highlighting of erroneous source code locations. In the end, this work can be regarded as a first step towards supporting security analysts with regard to program comprehension.

Part IV

Finale

CHAPTER

ELEVEN

RELATED WORK

The related research of this thesis can be divided into several areas. First, works on security patterns with the subtopics classification, collection, and description quality are presented. Second, design and security pattern detection in software systems followed by the detection of security concepts on the architectural level are described. Then, analysis techniques related to our COPG approach are depicted. Finally, as this work also focuses on analyzing Android applications (Chapter 8 and 9), comparable static analysis approaches tailored towards Android specialties, such as the app lifecycle simulation, are presented briefly in Section 11.5.

11.1 Security Patterns

This thesis tackles the security pattern topic in several ways and related work on the topics has already been presented in several sections before. Thus, the next sections present briefly the topics collection, classification, and description quality.

11.1.1 Collection

Security patterns are often scattered across several scientific works. Thus, for example, the SecurityPatterns.org website collects several security pattern publications to provide an overview of the available security patterns [439]. Moreover, three main works in book-form have been released [144, 437, 464] and some other pattern collections have been published (Table 11.1). Heyman et al. [238] collected the largest number of security patterns (220). Unfortunately, they did not publish their catalog.

All catalogs only cover parts of all published security patterns since 1996. This thesis collects 528 published patterns from the years 1996 to 2016 and provides a holistic overview of these patterns.

Venue Covered Period of Time Number of Collected Patterns

Kienzle et al. [282] n/a 29

Blakley et al. [50] n/a 13

Schumacher et al. [437] n/a 46

Steel et al. [464] n/a 22

Heyman et al. [238] 1996–2005 220

Yskout et al. [540] n/a 35

Dougherty et al. [115] n/a 15

SecurityPatterns.org [439] 1997–2009 180

Fernandez [144] n/a 62

Hafiz [205] 1997–2013 97

Table 11.1 – Overview of venues collecting security patterns.

11.1.2 Classification

Security patterns can be used like design patterns to structure software. However, there are security patterns which describe security requirements or processes for enterprises, e.g., the Enterprise Architecture Management Pattern [136]. Thus, many different classifications exist. They have already been presented and discussed extensively in Section 4.2. Concluding, the discussed classifications only consider a few patterns for their classification.

In this thesis, two classification approaches for all collected security patterns with regard to “application” and “software reengineering” concerns have been presented (Section 4.4 and Section 4.5). The first classification scheme is based on all collected security patterns and shaped towards the selection by application domains, which is relevant for researchers and practitioners who are interested in security patterns. The second classification shows in detail which security and implementation forces the software-security patterns have.

11.1.3 Description Quality

With the increasing number of publications, the quality partially decreases [238]. Some works depict that the quality of security pattern descriptions differs and is sometimes even not sufficient. Yoshioka et al. examine patterns according to their ease of use, effectiveness, and sufficiency in general and give some examples [539].

Halkidis et al. [213] inspect the patterns collected by Blakley et al. [50] according to the ten principles of building secure software [506], three software development problems with regard to security (buffer overflow, poor access control, and race condition), and the STRIDE model.

A more general approach that is not related to security has been presented by Laverdiere et al. [300]. They utilize the House of Quality evaluation framework to evaluate twelve security patterns. Moreover, they show that the inspected patterns have many undesired properties such as lacking generality, or under- or over- specification.

11.2 – Pattern Detection

The appropriateness and quality of security-pattern documentation has been examined by Heyman et al. [238]. They use a scoring system to measure the quality of a description element and show that core patterns have medium to high quality. In addition, the guidelines, best practices, and process activity patterns have a documentation quality between low and medium.

Security pattern investigations with regard to quantity and in particular their quality have been conducted for only a few security patterns. In this thesis, the description quality has been assessed by inspecting the usage of UML figures and given code examples of all collected software-security patterns.

11.2 Pattern Detection

Design patterns are often used in software development to design software systems. Reengineering these patterns gives maintenance programmers and analysts specific information on implemented solutions. This section focuses on the two pattern types discussed in this thesis: design and security patterns.

11.2.1 Design Patterns

In the area of design-pattern detection several approaches with different analysis styles exist. The used detection techniques span many areas, such as fuzzy sets, machine learning, constraint satisfaction, and graph theory. A comprehensive overview of the available approaches give Dong et al. [112, 113], Rasool and Streitferdt [379], Ampatzoglou et al. [20], and Al-Obeidallah et al. [8]. Thus, only some approaches are listed in the following to give a brief overview:

Structural: Pattern instances are detected by static program information. Such analyses inspect inter-class relationships and method invocations. [28, 30, 33, 47, 100, 141, 198–200, 278, 293, 315, 362, 453, 497, 509, 511, 548].

Behavioral and structural: These analyses encompass behavioral program aspects additionally to the structural analysis information. They are extracted using static and dynamic analysis techniques. [12, 24, 104, 237, 275, 301, 314, 350, 440, 466, 497, 498, 515, 524, 542, 543].

Structural and semantic: Enhancing structural analysis with semantic information aims to decrease the false positive rate of the structural detection. Semantic information is, for example, considering naming conventions, annotations or metrics to retrieve role information for pattern components. [24, 380, 440, 489]. Structural, behavioral and semantic: Analyses that use all three types aim to combine the benefits of each analysis type to reduce the number of incorrectly detected patterns instances. [51, 111, 114, 445, 446].

Some of these approaches use analyses on graphs to detect patterns. The OPG is also a kind of graph, which we use to detect patterns. Thus, the next paragraphs

will present approaches using a UML or graph representation of the source code in more detail.

In the approach of Seemann and von Gudenberg, a compiler collects method calls and inheritance hierarchies from Java code [440]. The resulting graph is then filtered to detect Strategy, Bridge, and Composite design patterns.

The Columbus tool analyses C++ source code and uses an internal scheme that captures the C++ language at a low-level representation (Abstract Semantic Graph (ASG)) enhanced with higher-level elements such as semantic of types [141]. The main program information such as classes, attributes, and their relationships are extracted from the ASG, written in PROLOG format, and passed to another tool called Maisa. Maisa does the automatic design pattern detection with its integrated pattern library.

Balanyi and Ferenc introduce an XML–based language (Design Pattern Markup Language (DPML)) to provide the ability of customizable pattern descriptions for the detection [30]. They use the Columbus tool to extract the main program information as described before. Then, the DPML-based pattern description file is loaded into an XML DOM tree. Finally, the detection algorithm matches the DOM tree to the ASG to identify implemented pattern instances.

A comparable approach has been presented by two other works [445, 446]. They use an Abstract Syntax Tree (AST) data structure built with static analysis information. They extract data and control-flow information of the AST to obtain static behavioral information for the pattern detection.

The approach presented by Tsantalis et al. [497] is based on the idea that a class diagram is a directed graph that can be mapped into a matrix. The represented information in the matrix depends on the characteristics of the pattern that should be detected. Moreover, the pattern which should be detected is represented as matrices. In the next step, inheritance hierarchies are built to inheritance trees. Then the inspected software system is split into subsystems consisting of classes belonging to one or more hierarchies. An algorithm is applied to calculate the similarity between the subsystem matrices and the pattern matrices to extract the pattern instances within the subsystems.

Yu et al. present an approach based on sub-patterns [542, 543]. First, they create sub-patterns of GoF patterns. Then, they begin with the transformation of source code into directed graphs. Such a graph consists of nodes representing classes and edges and their weights model their relationships. The pattern instances are identified by a subgraph discovery method where the extracted subgraphs are matched with predefined structural features. Finally, behavioral characteristics of the GoF patterns are considered for obtaining the final pattern instances.

One of the latest approaches in this area focuses on detecting new pattern variants of the GoF patterns with four already available pattern detection tools [513]. As far as we know, there exists no approach or similar technique that uses OPGs for the detection nor is tested with Android apps.

In document Security-Pattern Recognition and Validation (Page 163-171)