Schedulable UnitSD = Security Domain - Capabilities for cross-layer micro-service security

Figure 2.2: Co-scheduling Overview - The isolated environment is on the left. It consists of an isolated cache partition along with the processors assigned to that partition.

Co-scheduling is used to group tasks belonging to the same security domain and state cleansing events occur when changing domains. Regular tasks are on the right in a

separate cache partition. Tasks on the right, those in the shared region, have no scheduling restrictions.

2.5 IMPLEMENTATION

Partitioning the LLC and associating cores with each partition does not require changes to the kernel or the operating system. It can be done by a system administrator as part of the machine configuration. Here we focus on the implementation of our co-scheduling and selective-sharing mechanisms.

2.5.1 Capability Enforcement through Strict Co-Scheduling

Shared Cache Processor

CORES

Core 1

Core 2

ORG1: Thread1 ORG2: Thread1 ORG2: Thread3

t

Scheduling Policy

ORG1: Thread2 ORG2: Thread2 ORG1: Thread1 ORG2: Thread2

Figure 2.3: Limitations of Default Scheduling Policy

Default Scheduler: Consider containers from two security domains or organizations with thread configurations as shown in Table 2.3 running on two cores. Simply executing on the cores requires allowing the processes to access the L1, and L2 caches belonging to those cores along with the entire LLC. The defining characteristic in our example is the shared cache. For hyperthreaded cores, this is the L1, L2 and LLC. In the case of two physical cores, the shared cache is only the LLC. The cores 1 and 2 in Figure 2.3 are not cores on two separate sockets on the same motherboard. Figure 2.3 shows an example schedule that might result from the default scheduler in Linux.

Table 2.3: Per-Domain Thread Allocations Domain ID Thread Count

ORG1 2

ORG2 3

Even if the scheduler flushes between scheduling different containers, the other organiza-tion has the ability to carry out a cache-base side-channel attack. This process is shown in Figure 2.4. Consider the flushing events f₁ on Core₁ and f₂ and f₃ on Core₂ as shown in Figure 2.4. Despite the flushing event f₁, attacks can be carried out across containers belonging to different domains during ∆t₁. This limitation is repeated at flushing event f₃ for a period of ∆t₂. It is clear that enabling hyperthreading or assigning multiple cores

to a single isolated region poses an additional set of challenges to cache-access capability

ORG1: Thread1 ORG2: Thread1 ORG2: Thread3

t

f₁

Scheduling Policy

Δt₁ Δt₂

ORG1: Thread2 ORG2: Thread2 ORG1: Thread1 ORG2: Thread2

Figure 2.4: Limitations of Default Scheduling Policy + Flushing

Traditional CFS in Linux was born out of the need to reduce the impact to latency sensi-tive jobs. We introduce a Strict-Co-Scheduling (SCS) algorithm for cache-access capability enforcement. Our SCS implementation aims to reduce the cost of transferring the capability to access a given cache region due to flushing while remaining favorable to latency sensitive tasks by utilizing CFS within a security domain.

The terms outlined in Table 2.4 are used to describe the SCS algorithm we introduce.

Like the default Linux scheduler, if no work is available the processor will idle. There are two main changes to the default scheduler class that are not shown here. When choosing a process to run, the scheduler will always choose from PP rivilegedDomain. Additionally, the default scheduler will not schedule a process for less than M inRuntime. To ensure our SCS algorithm remains as work-conserving as possible, a thread may be preempted after only running for a fraction of the M inRuntime where necessary. More discussion on this below.

Algorithm 2.1 highlights how the SCS nextDomain function is implemented in Linux.

Once the next next domain is chosen, cores schedule threads only from PP rivilegedDomain or idle until work is available. This approach introduces little additional algorithm complexity.

Because the O(1) scheduler in Linux, which is round-robin, performs worse for latency

for SCS. We mitigate this impact by only transferring the cache-access capability in a round robin fashion, while utilizing the CFS scheduler when scheduling processes from the domain holding the cache-access capability.

Table 2.4: Terms used in Scheduling Algorithm Definition of Term

SDCList A circularly linked list of security domains.

i The index offset in to SDCList.

P The queue of runnable processes in the system sorted from least to greatest vruntime

P_{DOM AIN} The queue of runnable processes belonging to

DOM AIN sorted from least to greatest vruntime.

P rivilegedDomain The security domain currently holding the cache-access capability.

M inRuntime The minimum runtime for which a thread should be scheduled.

Algorithm 2.1 Strict-Co-Scheduling (SCS) Domain Selection function nextDomain

i ← i + 1

i ← i mod SDCList.size() domain ← SDCList[i]

while size(P ) > 0 AND size(Pdomain) == 0 do i ← i + 1

i ← i mod SDCList.size() end while

domain ← SDCList[i]

return domain end function

The downside with SCS stems from the system’s inability to remain work-conserving under certain situations. In a work-conserving system, the processor never idles if there

ORG1: Thread1 ORG2: Thread1

is work that can be done by any process in the system. By very definition, we can not consider all processes during the isolated run of a single group of tasks belonging to the same security domain. We highlight three different cases which demonstrate an inability to remain work-conserving using the configuration as outline by Table 2.3.

Figure 2.5 shows the situation in which ORG1 : T HREAD2 finishes before the P rivilegedDomain’s (ORG1) M inRuntime is up, leading to a ∆u1 underutilized time span The system must

idle until the next capability transfer occurs, assuming work is available for the next do-main. Once ORG1 is scheduled again, ORG1 : T HREAD1 may be waiting on input for a period of ∆u₂. Again, the system will remain underutilized for a period of time and ORG1 : T HREAD1 will not receive the full M inRuntime.

In order to mitigate situations of low utilization such as the one presented in Figure 2.5, we choose to violate the M inRuntime guarantee provided to processes in Linux. Consider the situations in which the number of processes for the P rivilegedDomain exceeds the number of cores, as is the case for ORG2. As shown in Figure 2.6, we can schedule ORG2 : T HREAD3 if ORG2 : T HREAD2 does not use its full M inRuntime. This means that ORG2 : T HREAD2 will not be scheduled for the full M inRuntime, potentially introducing

In document Capabilities for cross-layer micro-service security (Page 42-46)