Modelling Protection Architectures - Generalized Vulnerability and Exploitation Pattern

4.3 Generalized Vulnerability and Exploitation Pattern

4.3.1 Modelling Protection Architectures

4.3.3 Breach Access Controls on Intermediated Level . . . 133 4.3.4 Denial-of-Service a Shared Resource . . . 135 4.3.5 Identification of Attack Surfaces in AMP Systems . . . 136 4.3.6 Primary and Secondary Assets . . . 138 4.3.7 Attack Objectives and Scenarios . . . 139

"The essence of hacking is finding unintended or overlooked uses for the laws and properties of a given situation and then

applying them in new and inventive ways to solve a problem whatever it may be"

Jon Erickson - Hacking: The Art of Exploitation

Vulnerability assessment as such is an offensive method to determine in which ways a ToE is exploitable. Analysts should think as attackers would do. In other words, they hack their own devices. Hacking is considered to be an art, sometimes referred to as esoteric engineering [40]. Truth be told, computer science and esoteric methods obviously don’t compare very well in the first place. However, a quite significant security research community contributes to the improvement of the computer system protection, with their work on recent discoveries of vulnerabilities. They optimise attack methods regarding their effectiveness, degrade the performance of countermeasures or just simply circumvent them. Either way, it is about to be one step ahead of adversaries, because this knowledge can be used to build a proper protection strategy.

One might ask, why does it make sense to think about the exploitability of my system? Why do researchers spend effort in that direction and not the other way around? System architectures, particularly protection architectures, are built on assumptions. For example, this is often the anticipated capability of an adversary, particularly when it comes to cryptographic protocols. What hackers mostly do is to break the assumptions on which such systems are built, rather than break a security countermeasure directly [44]. Such countermeasures, a cryptographic protocol, for example again, is publicly reviewed or even formally proven. To aim for breaking the mathematics behind it is a high investment of time and resources. In many cases, it is much easier to leak, for example, the private key material from a communication endpoint.

As a result, vulnerability assessment during the development of a system can be seen as a test of assumptions. This is particularly true for the AMP-paradigms. Hardware was designed having traditional SMP hypervisor paradigms in mind. Even though modern MPSoCs are heterogeneous and highly integrated systems, it might be obvious that if an administrative software layer is moved away to design an AMP system, some security issues would arise.

Within the security cycle, the vulnerability assessment covers the inductive driven phase. It aims to discover specific instances of vulnerabilities and attack instances.

In this chapter, the hypothesised attacks defined in Section 3.3.4 are conceptually analysed. The results on cache thrashing are based on findings of Schnarz et al. in

[144, 146]. This considers the hypothesised attack: Hyp-Attack1: P Ei disrupts P Ej

access to LLC and is elaborated upon in Section 4.1. Furthermore, the hypothesised

attack: P Es memory base is tampered with by adjacent P E is examined in Section 4.2. The findings are based on the results of Schnarz et al. [143, 145]. Both misuse cases are evaluated following the concept introduced in Section 2.2.1. In Section 4.3, a generalised analysis of the findings is described.

4.1 Hyp-Attack1: P E

disrupts P E

access to LLC

4.1.1 Vulnerability Analysis

The inherent aim of Denial of Service (DoS) attacks is the interference with the performance of the targeted entity. It aims to degrade the performance to a level where particular functions cannot be executed with the intended quality. In practice, this might be the over-commitment of a certain capability of a service. Examples are the handling of messages or requests. The service is forced to operate illicit data and is prevented from fulfilling the original functionality.

Cache thrashing, in general, is a phenomenon which appears due to inappropriate use of a processor’s caching infrastructure. Two scenarios lead cache thrashing into taking effect. One is when the cache coherence protocol (such as Modified Exclusive Shared Invalid (MESI)1_{) is forced to synchronise data between processors or main}

memory due to concurrent accesses. The second scenario is when two or more entities (such as processors) exhaust the limited capacity of the shared cache and force the infrastructure to evict and re-fetch data inefficiently. This work is concerned with the last case.

The result of cache thrashing is an increase in latency to access data in memory. In other words, the memory bandwidth and therefore the performance of the system degrades. Each time that entity (E1 ) makes use of the memory data or instructions will be loaded into a particular location of the cache. If another entity (E2 ) concurrently accesses the same location inside the cache the cache management first writes the former values back to main memory to free the space for E2. The other way around is also true: if E1 regains the focus of the cache E2 ’s data will be evicted. This causes the interference between the two entities.

1_{MESI is a protocol to enforce the coherence of data shared in multiprocessor systems. The}

Cache thrashing can appear on several system levels. On the application level, for example, two or more concurrent processes or threats could interfere with each other. In this case, both types of cache thrashing are possible whereas the thrashing induced by the coherence protocol at these levels impacts applications which frequently access common data structures. Good examples for this situation are sorting algorithms [104].

Attack Objective and Scenario

From an adversary perspective, the cache thrashing can be induced to mount a DoS attack to a target MC-domain. Particularly in AMP-based MC-systems, the second type of cache thrashing turns into focus. Whereas the cache coherence based thrashing applies for shared data, the eviction of data of concurrent processing elements does not rely on the existence of shared data. Under those circumstances, the adversary just needs to find a particular way to force the data eviction from its target.

For simplicity, the concept is shown concerning the driver information system considering two MC-domains. Each MC-domain runs a private PE, including a private L1 cache. Both PE share a L2-cache of a certain size (compare with Section 3.1.2 for the detailed case study description). Figure 4.1 depicts the considered cache thrashing scenario. It is considered that MC2 is the targeted domain, thus the victim of the attack. Hence, the attacker who is assumed to reside in MC1 aims at deliberately decreasing the MC2 performance to operate on the target frame denoted by t. It is assumed that t contains data which is critical to the overall mission of the function. In practice, this might be a critical value from a sensor in the vehicle.

Caching Terminology and Design

The general intention of the integration of caches to processors is to speed up access to frequently used data in memory. The memory in computing systems is hierarchically organised [65]. Caches usually consist of several storage levels which contain the data and instructions. A particular kind of cache-logic manages the data inside the cache storage. Typical functions of the cache-logic are the coherence protocol and the write-back strategy. Data in the cache is typically referenced via memory addresses. Two types of cache addresses exist. These include virtual caches, which are sometimes referenced to as logical caches, and physical caches [160]. In physical caches, the requested addresses are translated by a MMU. In logical caches, the data inside the cache is referenced by the virtual addresses. In this work, the scope is on physical caches. In order to reference the data in the cache, the respective requested memory

Physical Memory Target Frame t MC2 MC1 PE1 PE2 Memory Partition MC2 Memory Partition MC1 LLC

Fig. 4.1 Considered cache thrashing scenario.

address is divided into a tag and a word section. This is elaborated upon in the later sections.

Despite the highest memory levels, which are the processor’s registers, several cache levels can be implemented. Cache levels closest to the processor are commonly referenced to as L1. Further levels are denoted as L2 or Ln, respectively. The cache level closest to the main memory is commonly referenced to as the LLC. In multi-core systems, some cache levels are private to the processor, and some are shared between multiple processors. The size of caches usually expands from the lower to the higher levels.

Three important parameters of caches are the cache size, cache-line size and the

associativity [70]. Aside from the cache size which defines the capacity, the smallest

addressable entity within a cache are Cache Line (CL). CL have a fixed CL size, such as, for example, 64 Byte. Data which are loaded from the main memory into a particular CL are referred to as the Memory Line (ML).

The relationship between the memory and the LLC is referred to as the associativity scheme. It defines the number of possible locations a single ML can be loaded to the cache. Hence, the associativity degree is usually fixed and therefore immutable. The location where a ML is loaded to is denoted as CLi. The associativity between the LLC

and main memory can be fully associative or organised into associativity-cache-way sets. Fully associative means that each CL can be loaded to all CL positions in the LLC. However, due to the sake of efficiency, in the vast majority of LLC implementations,

main memory first v blocks cache-memory setv-1 cache- memory set0 B0 Lk-1 L0 Bv-1

Fig. 4.2 Cache associativity mapping. [160, p. 132]

caches are divided into Way-Sets (WSs) [160]. WSs have a certain size, which are defined by the number of CLs they can contain. If one contains eight CLs, it is referenced as an 8-way-set associative cache. A commonly used associativity for LLCs is 16-way. In way-set caches, a specific ML is always loaded into a specific way-set. This principle is depicted in Figure 4.2. To put it in another way, fully associative caches consist of a single way-set. Referring to Stallings [160] and Hill et al. [70] the following applies for way-set caches.

m= v ◊k i= j mod v

where

i = way-set number

j = main memory block number m = number of lines in the cache v = number of sets

k = number of lines in each set

It depends on the replacement algorithm to determine the specific position within a WS (denoted by CLi) to which a ML will be loaded. Depending on the specific

algorithm or upon a randomised scheme. The CL that gets replaced will be written back (evicted) to the main memory. In the literature, the situation where requested data does not reside in the cache is referred to as a cache-miss. The other way around is also true: when the data is existent in the cache a cache-hit occurred. Optimising the cache-hit rates of caches is the target of a wide range of research [70] whereas increasing the cache-miss rate is the target of the DoS-attack. In Table 4.1, the caching key parameters are summarized.

Table 4.1 Caching terminology and parameters.

Sign Description

W S Way-Set

CL Cache Line

M L Memory Line

M B Memory Block

CLSize Size of single cache line

CacheSize Size of cache

M LSize Size of single memory line equals to

CLSize

W Si Identifies a specific set in the cache

CacheAssoc. Cache Associativity

CLi Identifies a specific CL in the cache

M Li Identifies a specific CL in the memory

v Total number of way-sets in the cache

m Number of CLs in the cache

k Number of CLs in each way-set

b Number of memory blocks in main

memory

The number of way-sets, denoted by v, is calculated by:

v= CacheSize CLSizeú CacheAssoc.)

Exploitation of the Cache Associativity

Given the terminology and key parameters of caches an adversary aims at degrading, the computational performance is to increase the cache-miss rate for its victim. In order to achieve this goal, in the following, the cache associativity is examined in more detail. In general, the memory mapping which is applied to the MMU has no relation to the memory association. As a result, if the adversary aims at exploiting the

associativity scheme, they need to flood a particular WS in which the targeted ML resides.

Preliminaries and Threat Capabilities are parameters the attacker has assumed to be known such as the LLC cache size, cache-Line size, associativity degree and the physical address of the targeted memory line. The former parameters are usually part of the technical documentation of the hardware, and therefore it is feasible that adversaries are aware of those parameters. To determine the physical address of the target memory line is out of scope for this consideration. As a result, it is presumed to be aware of the address.

The principle of sharing a k-way-cache by two processing elements working on two different main memory partitions is shown in Figure 4.3. For the sake of simplicity, the private L1 cache is omitted. The example shows PE1 and PE2, each working on two distinct memory lines. Furthermore, this example shows WS0 which is shared by an ML from both memory partitions. In general, this means that each ML belongs to a specific WS in the cache. Such as:

M Liœ MBn æ CLjœ W Si MB0 MBj Physical Memory MC2 MC1 PE1 PE2 WSj WS0 WSv-1 LLC MBb-1 MLv-1 ML0 ML0 MLj Memory Partition MC2 Memory Partition MC1

Attack Tree

Here, the subsequent actions to exploit a shared k-way-set are described. Figure 4.4 depicts the attack tree of the DoS attack. The root of the tree states the ultimate goal of the attack, which is to deny the service of a target MC-domain. To increase the cache miss-rate for the target is the method for achieving the goal and is stated in the next level of the attack tree.

Cache Parameters lookup target physical address determine shared way-set number flood shared way-set increase cache-

miss rate for target DoS target MC- domain Target physical address AND OR

Fig. 4.4 Cache DoS attack tree.

As it was defined that the cache parameters previously are presumed to be known to the adversary, they appear as an initial state in the attack tree diagram. Those parameters are usually documented in the technical specification of the applied hardware architecture. More importantly, the target physical address is either known by the adversary or looked up in the main memory by any action which is out of scope for this consideration. The core actions are essentially the sub-goals of determine

performance impact on the side of the victim whereas the former seeks to set up the environment. In the following sections, these two actions are detailed.

Determination of Target Set (W Sv)

By the reference of the targeted physical address, the target WS can be identified. In the following, this particular WS is denoted by victim WS (W Sv). Substantial

preliminaries are the technical properties of the cache. In this case, these include the cache-line size (denoted by CLsize) and the WS associativity (denoted by k). Given

by those values, W Sv can be determined by the following equation.

W Sv= T argetAddress(P A)

CLsize mod v

Alternatively, W Sv can be directly derived from the physical memory address. The

memory addresses for k-way-set caches consist of three portions. These components are tag, set, and word [160].

Tag Set Word

t bits d bits w bits

where

t= s ≠d = tag size d= logv = set size

The d bits specify a particular set within the cache, which is in this case W Sv.

Flood Target Way-Set

Flooding a specific WS causes an increase in cache-misses in that particular WS. As it has been revealed previously, the WSs are the shared memory spaces within the LLC. To introduce a performance impact to target a MC-domain, the adversary just needs to over-commit the particular WS to which the targeted physical address is assigned to. Flooding or over-committing is introduced by frequently fetching cache-lines into the WS in a way that the cache-management is forced to evict other cache-lines. If the attacked PE tries to access its data, it suffers from the higher access latency of the main memory due to a cache-miss. The procedure for forcing this cache-miss and

reload penalty is detailed in the following. Sufficient memory lines need to be allocated and afterwards, accessed very frequently in a loop.

The number of memory lines to be allocated equals the number of cache-lines in the WS (way-set size), which is given by k. An allocation procedure assigns k addresses, which fulfil two aspects. They need to correspond to distinct Memory Blocks (MBs), and all have the same MLID (offset) within those blocks. This concept is depicted in

Figure 4.5. The assigned addresses will be stored in a target address array. According to the tag, set, and word notation, the allocation procedure simply needs to allocate and assign data having an equal t and d bit portion in their addresses.

MLj MB0 MLk-1 Physical Memory MBb-1 MLi Memory Partition MC2 Memory Partition MC1 MLi MLi MBt

Fig. 4.5 Target address array allocation principle.

The actual flooding is introduced by memory accesses to every entry in the target address array in a loop. Here, the term memory access abstract either read or write operations on external data in main memory. For example, the load-store principle of an ARM RISC architecture those accesses are implemented by instructions such as LDM2, STM3_{, PUSH}4_{, POP}5_.

Security Problem Factorization

Timing Consideration In this section, the timing and scheduling issues of memory access are causing the stall of the victim. For the consideration, the time when a cache-line is requested, loaded and evicted is focused. The delta between a request and the actual load of a cache-line is the penalty the PE faces due to the cache-miss. Table 4.2 shows the symbols for the timings in the scheduling diagram.

Figure 4.6 depicts the timing situation of the attack scenario. On the top, the content of the target WS is shown. The bottom part depicts the time-line of the

2_{LoaD Multiple} 3_{STore Multiple}

4_{Store data at stack pointer position} 5_{Load data from stack pointer position}

Table 4.2 Symbols for Cache Scheduling. Symbol Parameter Description

ri request time time at which a cache-line is to

be loaded (fetched) into way-set

ei eviction time time at which a cache-line is

wrote back to main memory

li load time time at which the cache-line is

loaded into way-set

pi miss penalty time introduced due to cache

miss

two competing parties. In the initial state, the WS is filled with cache-lines from the adversary only. This is the flooded state. In this moment (rt1) when the target requests

a new cache-line (CLt1), it suffers a penalty (p1) for the first time due to a cache-miss.

Now the CL of the target is resident in the cache. From this state the adversary seeks

In document Security patterns for AMP-based embedded systems (Page 117-152)