The main purpose of the toolkit is to orchestrate the execution of CE invocations (tasks) to fully exploit the available computing resources (local devices or remote nodes) while guaranteeing sequential consistency. To fulfill it, the runtime offers an API (Figure 4.3) with two different functions: executeTask and accessValue.
ExecuteTask requests to the runtime the execution of an asynchronous task. The method requires four parameters: the names of the method invoked and the class containing it; a boolean indicating whether invocation of the method is on an instance of the class –true– or static – false–; and the set of values corresponding to the invocation arguments. Besides the regular
/**
* Generates a new task whose execution will be managed by the runtime. *
* @param methodClass name of the class containing the method that has been invoked * @param methodName name of the invoked method
* @param hasTarget the method has been invoked on a callee object * @param parameters parameter values
* */
voidexecuteTask(String methodClass, String methodName,boolean hasTarget, Object... parameters);
/**
* Registers an object access from the main code of the application. *
* Should be invoked when any object is accessed. *
* @param <T> Type of the object to be registered * @param o Accessed representative object
* @param isWritter true if the access modifies the content of the object * @return The current object value
*/
<T> TaccessValue(T o, boolean isWritter);
Figure 4.3: Definition of the interface to the Runtime toolkit.
arguments of the method, these parameters may contain two more objects corresponding to the callee object, if it is an instance invocation, and a future object corresponding to the result of the invoked method.
To synchronize the value of the future object with the result of the task execution, the runtime system provides the second method, accessValue, that takes two input parameters. The first is an object – a File instance for files – which a preceding task may have accessed and the second a boolean indicating whether that access modifies the content of the object or not. The method checks if any task has previously accessed that object. If no task has accessed that object, it returns the same instance. Otherwise, the runtime fetches the value from the node that computed its generator task and registers the access.
As done for registers in out-of-order processors, the runtime assigns to each datum version a unique ID and applies a renaming technique on each access to the datum with the purpose of preventing false dependencies (WaW and WaR accesses) from reducing the potential parallelism of the application. The first time a task accesses a datum, the runtime designates the ID to the value; for instance, data1version1. When one task or the main code of the application accesses the datum to update its content, the runtime assigns a new ID for the new version –for instance, data1version2 – and preserves the value assigned to the previous ID. Thus, pending tasks reading the old version of the datum can fetch it by using the old ID, and tasks coming after the access
4.2. RUNTIME TOOLKIT ARCHITECTURE
will refer to the new ID to obtain the new value.
Since several applications can share computing resources and data values, the runtime library consists of two parts as the layout of the runtime architecture in Figure 4.4 depicts. On the one hand, the application-private part of the runtime controls those aspects of the execution related to the application. In other words, it is the entry point to the runtime; it creates new asynchronous tasks and monitors the private values they access (objects). On the other hand, the Orchestrator is in charge of handling all those aspects of the execution that might affect several applications; namely, accesses to shared data (files) and managing the usage of the available computing devices. While each COMPSs-Mobile application instantiates the application-private part of the runtime, there is only one single instance of the former component on the mobile device which runs in a separate process and is publicly available as an Android service.
App Process App Code Mobile Device Runtime Process Task Executor Access Analyzer Private Data Register Data Manager Data Store Public Data Register Data Manager
Data Store PlatformGPU
CPU Platform Cloud Platform Offload Decision Engine
Figure 4.4: Runtime system architecture with three available Computing Platforms: one for the cores of CPU, on to offload tasks to the GPU and one gathering all the remote resources.
For achieving its purpose, the runtime toolkit leverages on two components: the Access Analyzer and the Task Executor. The Access Analyzer is a component partly hosted on the application-private part of the runtime and partly on orchestrator service. As its name suggests, its goal is to monitor all the accesses to the data values to detect data dependencies on task creations and necessary data synchronizations when accessing a concrete datum. The private data register, hosted in the application-private part of the runtime, is in charge of applying the renaming technique to all the private data values such as objects; while the public data registry, hosted in the orchestration service, does the same for the shared data values like files.
The Access Analyzer wholly implements the functionality of the accessValue method of the runtime API. For executeTask invocations, it only pre-processes the task to detect possible data dependencies. At the end of this pre-processing, the executeTask implementation creates a Task object containing which CE has to be executed and the list of arguments to pass to the method.
Following the example introduced in Section 3.4, when the execution reaches line 12 of the Main class on the second iteration of the runtime – calling the aggregation method –, the API of the runtime receives an invocation to the executeTask method with parameters “es.bsc.compss.sample.Report”,“aggregate”,falseand an object array with the current instance of globalReportandsreport. After the Access Analyzer pre-processing, a Task object represents the invocation with an attribute ceID with the internal ID representing the CE for theaggregate
method in thees.bsc.compss.sample.Reportclass and two parameters representing an updating access to the value known as d3v2 that at the end of the task will become d3v3 and a read access to the value known as d5v2.
Once the Access Analyzer has processed the API invocation, the task object moves forward to the second component of the runtime, the Task Executor, for its execution. To decide which resources should host the computation, the runtime relies on the concept of Computing Platform: a logical grouping of computing resources capable of running tasks. The resources represented by a platform can vary from one single core from the processor to a set of virtual instances deployed in a multi-cloud platform. The implementation of the platform is responsible for monitoring the data dependencies of the task and scheduling both the execution of the task on its resources – picking one of the available implementations for the corresponding CE – and the obtaining and preparation of any necessary value. To achieve these duties, each platform can turn to different strategies: centralizing the management on the Orchestrator process, centralizing it in a remote resource or distributed across multiple resources.
The Offload Decision Engine (ODE), subcomponent of the Task Executor, makes the decision of which platform runs the task being unaware of the actual computing devices supporting the platform nor the details of their interaction. The ODE polls each of the available platforms – configured by the user beforehand – for a forecast of the expected end time, energy consumption and economic cost of running the task. According to a configurable heuristic, the ODE picks the best platform to run the task and requests its execution.
Each part of the runtime has access to the Data Manager (DM) component. The DM is a distributed key-value store that manages the value assigned to each datum ID. The DM is asynchronous; either Computing Platforms or the Access Analyzer, on behalf of the application code, can subscribe to the existence or value of a specific datum. Upon the computation of a new datum version either on the main code of the application or any resource part of a Computing Platform, the generating element publishes its value into the DM which notifies all the subscribers. The local instance of the DM is responsible for handling the fetching of requested values if they are located in a different process.