• No results found

Next, we introduce the high-level components of a generic CA scheme in order to ground the subsequent discussion on the threat model and attack surfaces that we consider. After this, TEEs and SEs are explored as potential candidates towards addressing these threats.

3.3.1 High-level CA Components

The high-level procedure for acquiring raw measurements collected from sensing hardware, constructing and managing a CA behavioural model, and generating the authentication decision can be modelled in six stages:

1. Sensor data acquisition: Raw measurements are acquired from the rele-vant sensor I/O devices. This may include a vector of real values from a tri-axial accelerometer (representing x-, y- and z-axis values in three-dimensional space), latitude and longitude GPS co-ordinates, key presses, two-dimensional cursor co-ordinates, or even a waveform of desired length recorded from a microphone. The sampling rate at which data is polled from sensing hardware is scheme-dependent.

2. Feature extraction: Raw measurements are transformed into more mean-ingful or efficient representations for more effective classification. This may comprise standard dimensionality reduction algorithms, such as principal component analysis (PCA); noise reduction; normalisation; or computing

simple statistical metrics, such as the arithmetic mean or standard deviation of sensor measurements.

3. Classification: In past work, authentication is often modelled as a binary classification problem, i.e. ‘authenticated’ and ‘not authenticated’ [128], [129], [131]–[133], [135]–[137]. During the enrollment phase (see Figure3.1), the classifier is trained from features collected after a period of monitoring the user’s regular behaviour, which serve as the authenticated examples, while unauthenticated examples are sourced from artificially perturbed values or data samples sourced from other users.

4. User Model: During enrollment, the saved classifier parameters act as the behavioural model. This may include, for example, learned network weights and node activation values for a neural network, or feature weights for logistic regression. In the authentication phase, the parameters are restored from file and used to classify future segments of sensor data.

5. Decision/post-processing: During the authentication stage, a segment of sensor data, e.g. using a sliding window of a given time-frame, is used as input to the classifier that outputs the corresponding authentication state.

6. Update device status: The classifier label may be transmitted to a remote authority over a secure channel or returned to a security monitor that modifies the system state to reflect the authentication state.

3.3.2 Threat Model

In general, current CA schemes aim generally to protect against the following:

a)a primary or second-line of defence against a device thief who bypasses the initial lock-screen through shoulder-surfing, successful guessing, or where no mechanism is present, with the aim of accessing sensitive data; and b) a curious adversary, e.g. a co-worker or friend, who accesses the device to browse private data. In our work, the threat model is expanded to incorporate adversaries present in a practical deployment, and we specifically consider software threats that aim to influence or subvert the operation of the scheme on the device.

An on-device CA scheme and its constituent components – described in Sec-tion3.3.1– would typically execute as a part of a user-mode application (ring 3) or the host operating system (ring 0). We aim to address a strong adversary oper-ating at either of these protection levels; that is, at worst, a root-level adversary with extensive control over the OS, including the CA code itself and run-time and persistent states accessible to the OS. The threats under consideration are now summarised:

• Observing run-time states, such as memory contents, including raw sensor data polled from hardware, the extracted features, classification decisions, and the user model under execution.

• Observing data structures contained within the device’s persistent storage, such as the trained user model stored at rest on the device’s filesystem.

• Modifying sensor measurements after they are polled from hardware, the extracted features, classification decisions, and the user model both at rest and under execution.

• Disrupting or modifiying the intended CA scheme execution flow; for example, bypassing feature extraction in order to give rise to denial of service conditions.

In the above cases, the intentions of the attacker include learning the be-haviour of the device user, thus violating user privacy, and generating inaccurate authentication states in order to access restricted device assets or deny legitimate accesses to the device. To be clear, we do not consider physical attacks on the sensing hardware itself to be in scope; for example, replacing sensors with ones that output adversely inaccurate measurements, or removing them completely to give rise to denial of service conditions. Moreover, defending against contextual adversaries that engineer the surrounding environmental conditions to influence classification decisions (see Shretha et al. [147]) is also beyond the scope of this work. However, we do place trust in the security assurances provided by a TEE or SE, which were discussed in greater detail in Chapter2. Related to this, we consider threats outside the remit of TEEs and SEs to be out-of-scope in this work; for example, developer-induced TEE and SE programming errors, supply chain attacks and, for TEEs, the use of fault injections and other related attacks requiring significant expertise, as discussed in Sections2.5.1and2.6of Chapter 2. The reader is also referred back to Section2.4.7and [40] for a description of protection scope of the GlobalPlatform TEE specifically; Section2.4.6 and, in particular, Costan and Devadas [110] for Intel SGX; and Section2.2.3for SEs.

3.3.3 TEEs and SEs as Secure CA Candidates

As per Chapter2, secure execution is the ability to maintain confidentiality and integrity of run-time states during code execution, while trusted execution typ-ically includes extensions for attesting to the state of executed code. SEs and TEEs are two high-level constructs for preserving the security of CA schemes at run-time, which would be implemented as a trusted application (TA) and provisioned into the TEE or SE before deployment. This section contrasts these

two candidates in order to protect the sensitive assets of a CA scheme – authenti-cation decisions, feature extraction, user model and sensor data – and the code under execution. The focus on these standardised constructs minimises potential deployment overhead commercially.

The principle behind separating a CA scheme from the Rich OS is to minimise the Trusted Computing Base (TCB) – the set of hardware, software and firmware components critical to maintaining the security of a system. Conventional OSs, such as Windows and Android, are generally too large in size to formally verify their security properties; TEEs and SEs, meanwhile, are significantly smaller and more suitable for such analysis [148]. By relocating CA schemes to a TEE or SE, the TCB is reduced largely to that of the TEE/SE and the CA scheme’s application logic, thus reducing the attack surface significantly. A TEE or SE could be used to address the threat model by protecting the code under execution, including run-time states, and protecting the internal behavioural model from untrusted world adversaries. Using a TEE, the behaviour model can be protected by using its secure storage after use, i.e. at power-down, to encrypt it on the untrusted world filesystem using a TEE-resident storage key, and subsequently restoring it on relaunch, e.g. upon booting the device. A TEE could also be used to host key material for signing authentication decisions, using ECDSA or RSA, before transmission to a remote authority.

The potential performance benefits of TEEs, i.e. its hardware sharing with a REE, make it attractive in pursuit of our design goals, namely for computationally-intensive applications like those requiring machine learning. Our initial investiga-tions using the GCU dataset [135] found that, on average, sensor data surpassed 0.5MB per day per user – far exceeding the memory capacity of many commer-cial embedded SEs at the time of writing when over 3 days of data is used, as suggested in related work [119], [127], [128], [149], [150]. This could be a deciding factor when considering that other data, such as payment and transportation credentials and applications, must reside simultaneously on one SE.

TEEs are also supported directly by underlying device CPU/SoC, RAM and storage, thus offering greater speed and capacity over SEs. Moreover, many TEEs, such as Intel SGX, also offer native remote attestation, where third-parties may attest to the integrity of the enclave application. This property allows remote services reliant on authentication decisions to verify the state of the platform (post-boot) or applications themselves. As stated in Chapter2, a TEE’s precise TCB depends primarily on the chosen platform, but for widespread TEEs, such as TrustZone and SGX, the hardware TCB generally comprises a trusted CPU chipset or SoC. The software TCB of a TEE comprises the trusted world containing a trusted OS, firmware and applications (for GlobalPlatform and TrustZone), or enclave logic for SGX [110].

TABLE3.2: Summary of CA-relevant features.

Feature REE SE TEE

Isolated Execution 7 3 3

Processing Performance Best Poor Good+ Memory Capacity Great Limited Great+

HW Tamper-Resistance 7 3 †

Remote Attestation 7 7 ♦

Secure I/O 7 3 3^

Hardware TCB All SE MCU CPU/SoC*

Software TCB Large Smallest Small

TEE-dependent. Limited protection.

*Applicable for TrustZone and SGX. ^Applicable for TrustZone.

+Applicable for SGX and TrustZone-A. We note that TrustZone-M, marketed for microcontrollers, is significantly more limited.

FIGURE3.3: Example CA attack vectors; this chapter proposes using a TEE to protect those areas in red.

While SEs are significantly less powerful than TEEs, their primary benefit is strong hardware tamper-resistance, thus making it an attractive solution for key and other small-sized auxiliary data storage. Note that TEEs and SEs are not mutually exclusive; it is possible for a TEE to rely upon an SE, or indeed a TPM, for certain operations such as cryptographic operations and key storage, as noted in GlobalPlatform guidelines [40]. We summarise the key properties of each architecture in Table3.2, and compare this to a regular application running in user-mode in the Rich OS (implicit in existing proposals).