Dynamic Code Modification - Design and Implementation

5.3 Design and Implementation

5.3.2 Dynamic Code Modification

When enabling dynamic code modification, we must ensure that the constraints imposed by the NaCl sandbox are not violated. In order to preserve the NaCl sandbox’s guarantees, we introduce additional constraints on dynamic code modification that are checked by the verifier. Figure 5.3 lists all these constraints. Essentially, the constraints in Figure 5.3 imply immutability of instruction boundaries and the NaCl guard instructions in the modified code. These immutability requirements significantly restrict

1. NEW must satisfy all NaCl safety verification constraints, as outlined in Sec- tion 5.1.

2. Both NEW and OLD must start at the same address, be of equal size, and lie within a single code region.

3. Any direct control-transfer instructions in NEW must target valid instruction boundaries in the same code region.

4. NEW and OLD must start and end at instruction boundaries, and all instruction boundaries between must be identical.

5. No pseudo instructions are added or removed. NEW may not introduce new pseudo instructions. All pseudo-instructions in OLD must occur in NEW and have identical guard instructions.

Machine code OLD is replaced with machine code NEW.

Figure 5.3: Extended NaCl’s Constraints on Runtime Code Modification.

// for an instruction pair OLDI and NEWI

if (diff of (OLDI, NEWI) is aligned qword) {

//fast path

atomic aligned qword write to update OLDI; } else { // slow path OLDI[0] = 0xf4; // HLT instruction serialize(); // barrier OLDI[1:n] = NEWI[1:n]; serialize(); // barrier OLDI[0] = NEWI[0]; }

Figure 5.4: Pseudo Code for Safe Code Modification.

possible code modifications; however, in practice, we found that they do not limit code modifications required by inline caching, the primary goal of self-modification in the ported language runtimes.

Code validation of dynamically modified code is performed at bundle granularity and applied only to bundles that are being modified. The NaCl validator may need to validate an additional bundle if it is targeted by a direct control-flow instruction from the modified code to ensure that the destination of the control-flow instruction is a valid instruction.

When nacl dyncode modify is invoked, some untrusted threads may concurrently be executing the modified region. Therefore, the service runtime must ensure that while it is modifying code, an untrusted thread executes either the old or a new instruction, but no other instruction which may be composed from bytes of the old and new instructions. Such corruption is proven to be possible by Sundaresan et al. [128], and we also verified it with our own experiments.

AMD and Intel processors support atomic code modification. An 8-byte aligned modification is viewed atomically by the processor according to the relevant documentation [55, 90]1_{, and our own}

experiments confirm this behavior. Thus, we can safely modify one 8-byte-aligned instruction at a time.

1_{Page 8-8 Vol. 3A in Intel 64 and IA-32 Architectures Software Developer’s Manual [90], page 48 in AMD64 Architecture}

If a modified instruction is not 8-byte-aligned or longer than 8 bytes, it is still possible to safely modify it, but ensuring safety in such cases requires further mechanisms. The pseudocode in Figure 5.4 shows how we perform safe code modification. When an instruction cannot be modified via our fast path, we write a HLT byte to the start of the modified instruction, thus preventing its execution.

We then issue a serialization barrier to synchronize the instruction stream and code memory view for all hardware threads, including other cores or processors, as required by the Intel 64 and IA-32 Architectures Manual (page 8-4 Vol. 3A) [90]. The one-byte write is atomic, therefore an untrusted thread may execute either the old instruction or the HLT instruction before the first serialization barrier. In between the barriers, the thread may only execute the HLT instruction. Finally, after the second serialization barrier, the thread may execute either the HLT instruction or the new instruction. The second serialization barrier guarantees that a concurrently executing thread will not observe an illegal instruction composed from the first byte of NEWI and the remaining bytes of OLDI. One might think that because x86 is a total store ordered (TSO) architecture–i.e., each core views other cores’ writes in order– that the second serialization barrier may not be necessary. However, the TSO guarantee applies only to data memory, and does not take account of instruction prefetch. The x86 architecture reference manual explicitly requires that when one core modifies code, a serializing instruction must be executed before another core attempts to execute the modified code (page 8-4, Vol. 3A) [90]. The second serialization barrier meets this requirement.

We omit the last serialization barrier after the last modifying instruction in Figure 5.4. We rely on higher-level software to implement synchronization that would allow an executing thread to observe the last modification in its execution stream before executing the modified instruction, if such synchronization is required. Omitting the third serialization barrier does not allow the execution of an illegal instruction as the concurrent thread may execute only either the HLT instruction or the new instruction.

Our technique relies on a serialization barrier primitive. The common approach on x86 processors is to use a serializing instruction (such as cpuid) on all hardware threads that must observe one another’s prior writes. As NaCl sandboxing is a user mode mechanism, we require a serialization barrier that can be triggered from user mode. Conveniently, certain system calls serialize all processors as a side- effect. We used the mprotect system call, which triggers inter-processor interrupts of remote hardware threads for a “TLB shoot-down”, serializing all processors. NaCl invokes mprotect to modify the execute permission on a dummy memory page allocated specifically for this purpose. The kernel implementation of mprotect writes into the CR3 register and thus serializes all processors in the course of performing the TLB shoot-down.

In document Stronger secrecy for network-facing applications through privilege reduction (Page 89-91)