A superficial analysis of the behavior presented by the described platform reveals a deficiency that unnecessarily harms the performance of the runtime. Just take the example of one application with one single task taking as the only input parameter the content of a text box of the GUI to generate a string to show it to the user. When the application reaches the CE invocation, the runtime registers the content of the text box as d1v1 in the Data Manager instance hosted on the application-private part of the runtime. After that, it submits the asynchronous task to the Orchestrator and continues the execution of the main code until it reaches the instruction that sets the result string into the label where the runtime halts the execution due to a value synchronization.
Simultaneously, the Orchestrator receives the task, and the Offload Decision Engine forwards it to the CPU Platform. The Scheduler of the platform queries the Data Manager of the runtime process for the existence of d1v1 and receives the corresponding notification. At this point, the Scheduler tries to obtain the value for d1v1; however, the local Data Manager does not contain it and needs to fetch them from the Data Manager hosted in the application process. Transfer the value from one Data Manager to the other requires interprocess communication (IPC); hence, the source must serialize the value, pass the value, and the destination deserializes it. At the end of the transfer, the Data Manager in the Orchestrator notifies the presence of the value so that the Scheduler forwards the task to the threads. When the task completes, the Scheduler stores the result value, d2v2, in the local Data Manager. At this point, the main code of the application notices the existence of the value and requests the value to the Data Manager in the application-private part of the runtime which fetches it from the Orchestrator process via interprocess communications. For small objects of a few bytes, the overhead induced by IPC is negligible; however, for large objects, this mechanism may incur a significant overhead of several seconds.
One solution to dodge these value transfers consists in separating the management of the platform from the executing threads. The frontend of the platform, which computes the forecasts and the schedules task executions on the cores, remains on the Orchestrator, while the actual execution of the task happens in the backend of the platform hosted in the application process. In the end, the management of the cores is something concerning all the applications, but the CE methods are a private part of the application. By doing so, both, the application and the execution threads, request the same instance of the Data Manager for the values of the accessed data values; hence, transfers of data values are no longer necessary. Coordinating both parts of the platform still requires interprocess communications; however, commands follow a clear schema, are quickly serialized/deserialized and take up few bytes.
This division of the CPU Platform incurs changes on the flow followed by a task. Although, for the Scheduler, the stages of the process are the same – existence check, value obtaining, task execution and storing the values – the location of the components with which it interacts is
5.2. PROXIED EXECUTION
different. After receiving the task from the Offloading Decision Engine, the Scheduler queries the Data Manager hosted in the Orchestrator for the existence of all the input values; steps 1 and 2 remain intact. It is from step 3 on that the process changes since the data values no longer need to be on the local Data Manager but on the Data Manager in the application process. At this point, the Scheduler contacts the backend of the platform which forwards the value request to the corresponding Data Manager. The application Data Manager acts exactly as the instance in the Orchestrator for the original procedure, and it checks whether the value is available in the process or whether it has to fetch the value from another process. Once the Data Manager has the value, it notifies the value presence to the Scheduler through the backend component alike the original step 4 does. When the Scheduler notices the presence of all the input values of the task and decides to launch the task execution, again, it contacts (step 5) the CPU Backend so that the execution threads contained in it run the task (step 6). At the end of the execution, the executing thread notifies the task completion so that the backend forwards the notification to the Scheduler (step 7) which eventually contacts to the backend to store the output values on the Data Manager instance in the application process (step 8).
Figure 5.3 updates the diagram of Figure 5.1 with the architecture and the flow of the task when the platform has proxied executions. Given the higher performance of the Proxied Execution compared to the flow described in Section 5.1, this is the default mechanism included in the final prototype to exploit the CPU of the mobile. However, the user can configure the runtime toolkit to run the tasks within the Orchestrator process.
Mobile Device Runtime Process Task Executor Data Manager CPU Platform O oad Decision Engine Scheduler 1 2 Application Process Data Manager Thread Pool CPU Backend 3 4 6 5 7 8
Figure 5.3: Architecture of the CPU platform with proxied execution illustrating the flow involving a task execution.