Real-time scheduling for 3D rendering on automotive embedded systems

(1)

Real-time Scheduling for 3D

Rendering on Automotive

Embedded Systems

Von der Fakultät Informatik, Elektrotechnik und

Informationstechnik der Universität Stuttgart

zur Erlangung der Würde eines Doktors der Naturwissenschaften

(Dr. rer. nat.) genehmigte Abhandlung

vorgelegt von

Stephan Schnitzer

aus Stuttgart

Hauptberichter:

Prof. Dr. rer. nat. Dr. h. c. Kurt Rothermel

Mitberichter:

Prof. Dr.-Ing. habil. Roman Obermaisser

Tag der mündlichen Prüfung:

27.02.2019

(2)

(3)

Contents

Abstract 11 Zusammenfassung 13 Acknowledgements 15 1. Introduction 17 1.1. Overview . . . 17

1.1.1. Multiple hardware platforms . . . 18

1.1.2. Limitations on features . . . 19

1.2. Goals and Problem Statements . . . 20

1.2.1. Goals . . . 20

1.2.2. Boundary conditions . . . 21

1.2.3. Execution time prediction . . . 22

1.2.4. GPU scheduler . . . 24 1.3. Project ARAMiS . . . 25 1.3.1. Background . . . 25 1.3.2. Structure . . . 25 1.3.3. Results . . . 26 1.4. Contributions . . . 27

1.4.1. Requirements analysis for graphics virtualization . . . 27

1.4.2. Virtualized automotive graphics system . . . 27

1.4.3. Execution time prediction for 3D rendering commands . . 28

1.4.4. 3D GPU scheduler . . . 28

1.4.5. Further contributions . . . 29

1.4.6. Related publications and contributors . . . 29

1.5. Structure . . . 32

2. Requirements and Architecture 33 2.1. Requirements . . . 34

(4)

Contents

2.1.2. R2 – Restricted Window Creation and Positioning . . . 35

2.1.3. R3 – Trusted Channel . . . 35

2.1.4. R4 – Virtualized Graphics Rendering . . . 36

2.1.5. R5 – Reconfiguration of Policies . . . 37 2.1.6. R6 – Certifiability . . . 38 2.1.7. R7 – System Monitoring . . . 38 2.2. Architecture . . . 40 2.2.1. Virtualization . . . 40 2.2.2. Inter-VM communication . . . 41 2.2.3. Integrity . . . 41 2.2.4. Application interfaces . . . 42 2.2.5. GPU Scheduler . . . 42 2.3. Demonstrator . . . 43 2.3.1. Hardware overview . . . 43 2.3.2. Implementation . . . 44 2.3.3. Evaluation . . . 45 2.4. Related Work . . . 47

2.5. Summary and Appraisal . . . 49

3. Execution Time Prediction 51 3.1. Background . . . 52 3.1.1. EGL . . . 52 3.1.2. OpenGL ES 2.0 . . . 52 3.1.3. Machine Learning . . . 55 3.1.4. Model analysis . . . 59 3.2. System model . . . 60 3.3. Prediction Architecture . . . 61

3.3.1. OpenGL ES Context Monitor . . . 61

3.3.2. Predictor . . . 62

3.3.3. Execution Time Monitor . . . 65

3.4. Prediction models for FLUSH, CLEAR, and SWAPBUFFERS . . . . 66

3.4.1. Prediction Model for FLUSH . . . 66

3.4.2. Prediction Model for CLEAR . . . 66

3.4.3. Prediction Model for SWAPBUFFERS . . . 67

3.5. Prediction Models for DRAW . . . 68

3.5.1. Fragment estimation heuristics . . . 71

3.5.2. Shader model: based on profiling . . . 79

(5)

Contents

3.5.3. Shader model: based on machine learning . . . 83

3.6. Online adaption . . . 94

3.7. Implementation . . . 97

3.7.1. Architecture . . . 97

3.7.2. Initialization of the shared library libETP . . . 98

3.7.3. Prediction model creation . . . 99

3.7.4. Used libraries and algorithms . . . 100

3.7.5. Modes of operation . . . 101

3.8. Evaluation . . . 102

3.8.1. Setup . . . 102

3.8.2. Coverage factor . . . 106

3.8.3. Fragment Heuristics . . . 107

3.8.4. Shader execution time . . . 110

3.8.5. Command Group prediction . . . 112

3.8.6. Prediction overhead . . . 124

3.8.7. Evaluation conclusion and summary . . . 128

3.9. Related Work . . . 130

3.10. Summary and future work . . . 132

3.10.1. Summary . . . 132 3.10.2. Future work . . . 133 4. GPU Scheduling 135 4.1. Requirements . . . 136 4.2. System Model . . . 138 4.3. Approach . . . 140 4.3.1. System Architecture . . . 140

4.3.2. Application-specific parameters for scheduling . . . 141

4.3.3. Conceptual Design of the Scheduling Algorithm . . . 144

4.3.4. Important Parameters, Variables, and Functions . . . 145

4.3.5. Scheduling Algorithm . . . 148

4.3.6. Reservation Concept and Schedulability . . . 152

4.4. Implementation . . . 160

4.4.1. Hardware platform and Operating System . . . 160

4.4.2. Dispatching commands . . . 160

4.4.3. Time measurement and prediction . . . 162

4.4.4. GPU Scheduler interface . . . 162

(6)

Contents 4.4.6. Concurrency . . . 164 4.5. Evaluation . . . 167 4.5.1. Setup . . . 167 4.5.2. Effectiveness . . . 168 4.5.3. GPU Utilization . . . 172 4.5.4. Scheduler Efficiency . . . 174

4.5.5. Evaluation conclusion and summary . . . 174

4.6. Outlook on preemptive scheduling . . . 176

4.7. Related Work . . . 178

4.8. Summary and future work . . . 182

4.8.1. Summary . . . 182

4.8.2. Future work . . . 182

5. Summary 185

Appendix

187

A. Appendix 187 A.1. Vivante GPU instruction set . . . 187

A.2. libETP XML profiling data file . . . 188

A.3. Additional results for scheduler effectiveness . . . 189

A.3.1. Influence of MPCG on scheduler effectiveness . . . 189

A.3.2. Scheduler effectiveness with huge ETP error . . . 190

A.3.3. Scheduling timing . . . 191

Glossary 193

Acronyms 201

Math Terms 203

Bibliography 209

(7)

List of Figures

1.1. Audi virtual cockpit screenshot . . . 18

1.2. BMW 7 series self-parking surround-view . . . 19

1.3. Dependencies of the ARAMiS subprojects . . . 26

2.1. Architecture of a virtualized vehicular graphics system . . . 40

2.2. Demonstrator front view with HMI devices . . . 43

2.3. HTML5-based demonstrator control GUI . . . 44

2.4. Setup of VCT-B and GPU scheduling, at final ARAMiS event . . 45

3.1. OpenGL ES 2.0 rendering pipeline . . . 53

3.2. Example: Continuous piecewise linear regression model fitting half circle . . . 56

3.3. Error of model for auxiliary fragment shader execution time . . . 57

3.4. Example: Input values and MARS model for half sphere . . . 57

3.5. Example of a feed-forward artificial neural network graph . . . 58

3.6. Hardware and software components for 3D rendering with OpenGL ES 2.0 . . . 60

3.7. Execution Time Prediction Components and Models . . . 61

3.8. OpenGL ES 2.0 rendering pipeline (concise) . . . 68

3.9. Example of possible deviation of triangle size approximation . . . 72

3.10. Average triangle samples depending on number of rendered triangles 76 3.11. Bounding box applied on a horse model . . . 76

3.12. Execution time of vertex shader (VS) depending on the number of attributes . . . 84

3.13. Error of submodel for auxiliary vertex shader execution time . . . 88

3.14. Error of submodel for auxiliary fragment shader execution time . 89 3.15. Error of submodel for vertex shader commands execution time . . 90

3.16. Error of submodel for fragment shader commands execution time 91 3.17. Error of submodel for texture lookup calls . . . 93

(8)

List of Figures

3.19. Kernel latency distribution . . . 103

3.20. Screenshots of evaluated applications . . . 106

3.21. Accuracy of fragment heuristics, speedometer application . . . 108

3.22. Accuracy of fragment heuristics, glmark2-es2 “build” benchmark . 109 3.23. Accuracy of fragment heuristics, Quake 3 “demo four” application 109 3.24. Accuracy of shader execution time prediction concepts . . . 111

3.25. Accuracy of Draw prediction, es2gears application . . . 113

3.26. Accuracy of Draw prediction, glmark2-es2 “build” benchmark . . . 114

3.27. Accuracy of Draw prediction, glmark2-es2 “shading” benchmark . 116 3.28. Accuracy of Draw prediction, glmark2-es2 “texture” benchmark . . 117

3.29. Accuracy of Draw prediction, speedometer application . . . 118

3.30. Accuracy of Draw prediction, quake3 “demo four” application . . . 119

3.31. Accuracy of SwapBuffers prediction, glmark2-es2 “build” benchmark120 3.32. Accuracy of Draw prediction assuming precise number of fragments, glmark2-es2 “build” benchmark . . . 122

3.33. Accuracy of Draw prediction assuming precise number of fragments, glmark2-es2 “shading” benchmark . . . 122

3.34. Accuracy of Draw prediction assuming precise number of fragments, glmark2-es2 “texture” benchmark . . . 123

3.35. Initial CPU time overhead for loading libETP, compared to native execution . . . 125

3.36. CPU time overhead of libETP prediction per frame, compared to native execution . . . 126

3.37. CPU time overhead of libETP prediction per frame, withoutDraw optimization . . . 127

4.1. 3D GPU scheduling system model . . . 138

4.2. GPU scheduling architecture . . . 140

4.3. Example for simple priority-based scheduling . . . 142

4.4. Example for scheduling using etpf . . . 143

4.5. Example for the effect of SDdelay_C using MPCG=1 . . . 146

4.6. Example for the effect of SDdelay_C using MPCG=2 . . . 146

4.7. GPU scheduling algorithm reservation example . . . 153

4.8. GPU scheduling algorithm schedulability example . . . 156

4.9. Scheduler interface callbacks . . . 163

4.10. Scheduler thread concurrency synchronization . . . 164

4.11. Effectiveness (homogeneous scenario), 60 FPS . . . 168

(9)

List of Figures

4.14. Effectiveness (mixed scenario) . . . 170

4.15. Effectiveness (mixed scenario), Quake 3 with 200 % predET . . . 171

4.16. GPU utilization, mixed scenario . . . 172

4.17. Average GPU utilization and required number of scheduler runs . 173 4.18. Delay of the scheduling algorithm . . . 173

4.19. Example for simple priority-based scheduling . . . 176

A.1. Effectiveness (mixed scenario), MPCG=2 . . . 189

A.2. Effectiveness (mixed scenario), MPCG=10 . . . 189

A.3. Effectiveness (mixed scenario), Quake 3 with 25 % predET . . . . 190

A.4. Effectiveness (mixed scenario), Quake 3 with predET =∞ . . . . 190

A.5. Timing diagram of a short period, MPCG=1 . . . 191

(10)

List of Tables

3.1. Performance parameters provided by the GPU Profiler to the

prediction models . . . 63

3.2. Machine learning models provided for shader prediction . . . 63

3.3. Nomenclature of shader profiling calculations . . . 81

3.4. Nomenclature of MARS submodel terms . . . 87

3.5. Comparison table of 3D applications used for evaluation . . . 105

3.6. Comparison of the measured number of fragments with the area covered by bounding boxes . . . 107

3.7. Prediction errors of Glmark2-es2 “build” . . . 115

3.8. Influence of the fragment heuristic on the mean absolute error (MAE) of the predicted execution time . . . 121

4.1. Application setup for mixed scenario . . . 170

Listings

3.1. Execution time prediction for CGs . . . 64

3.2. Record trian1gle samples data after Draw calls . . . 74

3.3. Using triangle samples to predict the number of fragments . . . . 75

3.4. Code of the online_adaption() function . . . 95

4.1. Brief sketch of scheduling algorithm . . . 144

4.2. Code of submit(CG) function . . . 148

4.3. Code of schedule_next() function . . . 149

4.4. Code of schedulability function . . . 157

A.1. Vivante GC2000 GPU shader instruction set . . . 187

A.2. Vivante GC2000 GPU shader instruction set . . . 188

(11)

Abstract

3D graphical functions in cars enjoy growing popularity. For instance, analog instruments of the instrument cluster are replaced by digital 3D displays as shown by Mercedes-Benz in the F125 prototype car. The trend to use 3D applications expands into two directions: towards more safety-relevant applications such as the speedometer and towards third-party applications, e.g., from an app store. Traditionally, to ensure isolation, new automotive functions are often implemented by adding further electronic control units (ECUs). However, in order to save cost, energy, and installation space, all 3D applications should share a single hardware platform and thus a single GPU. GPU sharing brings up the problem of providing real-time guarantees for rendering content of time-sensitive applications like the speedometer. This requires effective real-time GPU scheduling concepts to ensure safety and isolation for 3D rendering. Since current GPUs are not preemptive, a deadline-based scheduler must know the GPU execution time of GPU commands in advance. Unfortunately, existing scheduling concepts lack support for dynamic tasks, periodic real-time deadlines, or non-preemptive execution.

In this work, we present the requirements that apply to automotive HMI rendering. Based on these requirements, we propose a Virtualized Automotive Graphics System (VAGS), which uses a hypervisor providing isolation between different VMs, in particular for the head unit and for the instrument cluster.

Additionally, we present a novel framework to measure and predict the

execution time of GPU commands using OpenGL ES 2.0. We propose

prediction models for the GPU commands relevant for 3D rendering such as

Draw and SwapBuffers. For Draw we present two heuristics to estimate

the number of fragments, two concepts to estimate the shader execution time, and an optional online adaption mechanism. The number of fragments is estimated either by the bounding box of the rendered model, on which the vertex shader projection is applied, or by a subset of the triangles that is used to estimate the average size of a triangle. To estimate the shader execution time, we either execute them in a profiling environment with a dedicated

(12)

Abstract

OpenGL ES 2.0 Context, or we use a MARS (multivariate adaptive regression splines) model trained offline. The implementation and evaluation of our framework demonstrates its feasibility and shows that good prediction accuracy can be achieved. For instance, when rendering a popular 3D benchmark scene, less than 0.4 % of the samples were underestimated by more than 100µs and

less than 0.2 % of the samples were overestimated more than 100µs. The

overhead introduced by our prediction is negligible on some scenarios and typically below 25 %on the long-run. The application’s initial startup is delayed

by only about 30 ms of CPU time when using the most efficient concept.

Moreover, we present a real-time 3D GPU scheduling framework that provides strong guarantees for critical applications while still giving as much GPU resources to less important applications as possible, thus ensuring a high GPU utilization. The proposed concepts for execution time prediction are used to make good scheduling decisions and are required since current GPUs are not preemptive. Our implementation is based on an automotive embedded system running Linux and our evaluations show the feasibility and effectiveness of our concepts. The GPU scheduler fulfills given real-time constraints for a dynamic set of applications submitting arbitrary sequences of GPU command batches. It achieves a high GPU utilization of 99 % in a challenging scenario with 17

applications and fulfills 99.9 % of the deadlines of the highest-priority

application. Moreover, scheduling is performed highly efficient in real-time with less than 9µslatency.

(13)

Zusammenfassung

Im Automobilbereich erfreut sich der Einsatz von 3D-Grafik zunehmender Beliebtheit. Beispielsweise zeigte Mercedes-Benz im F125 Autoprototypen, wie analoge Zeiger der Kombiinstrumente durch digitale Displays ersetzt werden. Der Trend, 3D-Anwendungen zu nutzen, geht in zwei Richtungen: Zum einen hin zu kritischeren Anwendungen wie der Geschwindigkeitsanzeige, zum anderen hin zu Drittanbieteranwendungen, die beispielsweise über einen Appstore bezogen werden. Um Isolationsanforderungen zu erfüllen, werden traditionell neue Funktionen im Auto häufig mittels neuer Steuergeräte umgesetzt. Um jedoch Kosten, Energieverbrauch und Bauraum im Fahrzeug zu sparen, sollten alle 3D-Anwendungen eine einzige Hardwareplattform und somit auch eine einzige GPU als gemeinsame Ressource nutzen. Für zeitsensitive Anwendungen wie die Geschwindigkeitsanzeige ergibt sich hierbei die Herausforderung, Rendering in Echtzeit zu gewährleisten. Hierfür sind wirksame Konzepte für das Echtzeitscheduling der GPU erforderlich, welche Sicherheit und Isolation beim 3D-Rendering garantieren können. Da aktuelle GPUs nicht unterbrechbar sind, muss ein Deadline-basierter Scheduler die

Ausführungszeit der GPU-Befehle im Voraus kennen. Bestehende

Schedulingkonzepte unterstützen leider keine dynamischen Tasks, keine periodischen Echtzeitdeadlines, oder setzen unterbrechbare Ausführung voraus.

In dieser Arbeit werden die für HMI-Rendering im Automobilbereich relevanten Anforderungen beschrieben. Basierend auf diesen Anforderungen wird das Konzept des virtualisierten automobilen Grafiksystems (VAGS) vorgestellt, welches einen Hypervisor nutzt um die Isolation zwischen verschiedenen VMs, insbesondere für die Headunit und die Kombiinstrumente, sicherzustellen.

Des Weiteren wird ein neuartiges Framework vorgestellt, welches die Ausführungszeit von GPU-Befehlen misst und basierend auf OpenGL ES 2.0 vorhersagt. Hierbei werden für die relevanten GPU-Befehle wie Draw und SwapBuffers Vorhersagemodelle vorgestellt. Für Draw-Befehle werden zwei

(14)

Zusammenfassung

Konzepte, welche die Ausführungszeit der Grafikshader vorhersagen, sowie ein optionaler Echtzeit-Korrekturmechanismus. Die Anzahl der Fragmente wird entweder mittels einer Bounding-Box des gerenderten Modells, auf welche die Projektion des Vertexshaders angewendet wird, abgeschätzt, oder durch eine

Teilmenge der gerenderten Dreiecke, welche genutzt wird um die

Durchschnittsgröße eines Dreiecks zu ermitteln. Um die Laufzeit eines Shaders abzuschätzen, wird er entweder in einer Kalibrierungsumgebung in einem separaten OpenGL-Kontext ausgeführt, oder es wird ein offline trainiertes MARS-Modell verwendet. Die Implementierung und die Auswertungen des

Frameworks zeigen dessen Machbarkeit und dass eine gute

Vorhersagegenauigkeit erreicht werden kann. Beim Rendern einer Szene des bekannten Benchmarkprogramms Glmark2 wurden beispielsweise weniger 0,4 %

der Messproben um mehr als 100µs unterschätzt und weniger als 0,2 % der

Messproben um mehr als 100µs überschätzt. Unsere Implementierung

verursacht bei langer Ausführung eine zusätzliche CPU-Rechenzeit von üblicherweise weniger als 25 %, bei manchen Szenarien ist diese sogar

vernachlässigbar. Der Programmstart verlangsamt sich beim effizientesten Verfahren hierbei lediglich um etwa 30 ms. Auf lange Sicht liegt er

typischerweise unter25 % und ist für manche Szenarien sogar vernachlässigbar.

Darüber hinaus wird ein echtzeitfähiges 3D-GPU-Schedulingframework vorgestellt, welches kritischen Anwendungen Garantien gibt und trotzdem die verbleibenden GPU-Ressourcen den weniger kritischen Anwendungen zur Verfügung stellt, wodurch eine hohe GPU-Auslastung erreicht wird.

Da aktuelle GPUs nicht unterbrechbar sind, werden die vorgestellten Konzepte zur Vorhersage der Ausführungszeit verwendet um prioritätsbasiert Scheduling-Entscheidungen zu treffen. Die Implementierung basiert auf einem automobilkonformen eingebetteten System, auf welchem Linux ausgeführt wird. Die darauf ausgeführten Auswertungen zeigen die Machbarkeit und Wirksamkeit der vorgestellten Konzepte. Der GPU-Scheduler erfüllt die jeweiligen Echtzeitvorgaben für eine variable Anzahl von Anwendungen, welche unterschiedliche GPU-Befehlsfolgen erzeugen. Hierbei wird bei einem anspruchsvollen Szenario mit 17 Anwendungen eine hohe GPU-Auslastung von

99 % erzielt und99,9 % der Deadlines der höchstprioren Anwendung erfüllt. Des

Weiteren wird das Scheduling in Echtzeit mit weniger als 9µs Latenz effizient

ausgeführt.

(15)

Acknowledgements

I thank especially my advisor Professor Dr. Kurt Rothermel who supported my research and my work in the distributed systems group. I would like to thank him for his continuous guidance, help, and trust. Additionally, my thank goes to Professor Dr. Roman Obermaisser for his support for my research and for reviewing this thesis. My special thank goes to Simon Gansel for our tight collaboration within our complementary research. I appreciate that he spent lots of time reviewing this thesis and was always helpful and constructive. I also thank my colleagues from the distributed systems group who inspired and supported my research. To name but a few, I thank Dr. Frank Dürr for his support and his excellent feedback on our joint publications. I also thank Dr. Boris Koldehofe, Dr. Muhammad Adnan Tariq, Ruben Mayer, and Florian Berg for supporting my research and sharing ideas.

Moreover, I thank the German Federal Ministry for Education and Research (BMBF) who funded part of my research in the scope of the project ARAMiS with funding ID 01IS11035 and allowed me to present my research on international conferences.

I especially thank my wife for her love and her continuous support over the last years.

This work was only possible by the grace of my god and father who listened to the prayers of me and many friends (Bible, Psalm 66 Vers 20).

(16)

(17)

1. Introduction

1.1. Overview

Innovations in cars are mainly driven by electronics and software today [EJ09, MGR+_{14]. In particular, graphical functions and applications} enjoy growing popularity as shown by the increasing number of displays integrated into cars. For instance, the head unit (HU) uses the center console screen to display the navigation system or displays integrated into the headrests of the front seats to display multimedia content. Another recent trend in modern cars is to replace the analog instruments of the instrument cluster (IC) by digital 3D displays, for instance as shown in the Mercedes Benz F125 prototype car [Mer11]. Although, in the beginning, graphical output was mainly 2D content such as movies or 2D maps, the amount of 3D graphics is steadily increasing [Nvi13]. For example, modern navigation systems display 3D city models [AUD15]. Also, the instruments of the vehicle are rendered 3D objects with reflections and shadows to imitate physical instruments as close as possible. Additionally, 3D rendering allows completely different forms of presentation such as the speed indicator at the F125 prototype car shown on its 3D display [Mer11]. Again, a “bird’s eye view” with a virtual 3D model of the car and its surroundings supports the driver during parking [bmw15]. To render such complex scenes with high frame rates, graphics processing units (GPUs)

are integrated into cars.

Using 3D rendering in automotive scenarios the GPU is typically accessed concurrently by multiple applications, which is quite different to its use in consumer products where often a single application is rendered in full screen mode. In an automotive environment, typically multiple 3D applications run in parallel and are constrained by ISO standards, automotive design guidelines, legal requirements, and demands specific to the original equipment manufacturer (OEM). 3D applications can be sorted depending on the safety-criticality and importance of their rendering. A few examples for 3D applications are listed next, sorted from high importance to low importance.

(18)

1. Introduction

• Safety-relevant IC applications such as parking assistant or displaying

instruments [Mer11,Nvi13] – stutter-free, latency-bound, highframe rates.

• OEM applications like navigation system – decent quality is important, but

low latency and high frame rates are less relevant.

• Third-party software such as a web browser executing WebGL or

applications from an app store [For13, QNX13, Dai13] that are not quality-assured by the OEM – best effort, using remaining GPU resources. In order to execute 3D applications with different requirements, the latest high-end cars use multiple hardware platforms to ensure physically isolated 3D rendering. Additionally, in order to save cost, many features like custom 3D games are not yet available, since they would require additional hardware platforms. Next, we describe these two state-of-the-art methods in more detail.

Figure 1.1.: Audi virtual cockpit screenshot1

1.1.1. Multiple hardware platforms

Since some applications for the IC are typically certified with ASIL B2 while

the HU applications are QM2, it is common practice to physically isolate the IC

from the HU platform. Since a few years ago, high-end HU systems are using 3D rendering, e.g., for 3D navigation or 3D menus. A relatively new trend is 3D rendering used for the IC, e.g., by the latest Audi TT car [AUD15, AUD14]; a screenshot of the so-called “Audi virtual cockpit” is depicted in Fig. 1.1.

1_Source: _{http://www.audi.de/content/dam/nemo/models/tt/tt-coupe/my-2017/}

1300x551-layer-header/1300x551_0005_ATT_D_151004_1.jpg

2_{The automotive standards [ISO11, ISO 26262] address functional safety for vehicles, which}

includes risk classification ranging from ASIL D (highest risk) to QM (no safety relevance)

(19)

1.1. Overview

Figure 1.2.: BMW 7 series self-parking surround-view3

Moreover, self-parking systems with sophisticated 3D-rendered surround-view, such as in the latest 7 series of BMW [bmw15] (cf., Fig. 1.2), are also implemented using a physically separated hardware platform like [Fre14]. Thus, in today’s high-end cars three hardware platforms (for HU, IC, and parking) are integrated, which are potentially using rendering on dedicated 3D GPUs. Next, we describe the second state-of-the-art method used to fulfill the automotive requirements.

1.1.2. Limitations on features

When designing a system, the fact that multiple hardware platforms are used limits the flexibility. For instance, to display HU content such as navigation instructions on the IC display current solutions use a LVDS channel of fixed resolution, which can be displayed at a fixed position on the IC display, only. Thus, getting more flexibility implies increased effort and hardware cost.

Furthermore, executing custom 3D applications from a user-selected app store would require either proper isolation from the rendering of the OEMs’ applications on the HU, or yet another hardware platform. To this end, OEMs do not support this, yet. Additionally, rear-seat entertainment displays are typically showing video streams transmitted by the HU. Therefore, they do not allow the rear-seat passengers to run their own 3D applications, since the HU cannot prevent impact on the applications displayed on the main HU display and physically separated 3D-enabled platforms for each rear-seat display seem to be too expensive.

3_Source:

http://www.bmw.com/_common/shared/newvehicles/7series/sedan/2015/ showroom/driver_assistance/7-series-sedan-surround-view-01-en.jpg

(20)

1. Introduction

1.2. Goals and Problem Statements

Unfortunately, separate hardware platforms increase cost, energy consumption, and space requirements. Therefore, there is a strong incentive to consolidate hardware, and ultimately share a single GPU between several applications. Additionally, hardware consolidation provides unprecedented flexibility on the visibility of the applications graphical output, e.g., animations moving windows between IC display and HU display. Moreover, a consolidated hardware with a shared GPU enables support for uncertified 3rd-party applications installed by the user, thus increasing the number of available applications by orders of magnitudes.

1.2.1. Goals

For future cars, a single hardware platform with a powerful 3D GPU shall be able to render the 3D content of different applications with quite different requirements and different importance. A key requirement for safe GPU sharing in automotive scenarios is to provide real-time guarantees for 3D rendering of safety-relevant applications. For instance, deterministic time bounds for presenting warning messages must be guaranteed and less important applications must not interfere with important applications. More precisely, the following goals must be fulfilled in order to run mixed-criticality 3D applications on a single shared GPU.

Concurrency: Typically, many 3D applications are running in parallel.

Flexibility: The set of running 3D applications is dynamic, applications can join

or leave during run-time.

Prioritization: Concerning criticality of 3D rendering, some applications are

more important than others.

Desired frame rates: Each application has specific requirements for a

(uniformly distributed) frame rate. For instance, if an application needs

to be rendered with 30 frames per second (FPS), a higher frame rate

would waste valuable GPU time.

Isolation: 3D rendering of important applications must be guaranteed and not

affected by less important applications.

(21)

1.2. Goals and Problem Statements

1.2.2. Boundary conditions

Historically, the typical use case for 3D GPUs are running a single trusted 3D application (e.g., a game). While technically, also a CPU could be used for 3D rendering (so-called “software rendering”), a 3D GPU performs this task orders of magnitudes faster by using a highly optimized hardware architecture with many parallel computation units. The GPU renders 3D using a rendering pipeline where first the 3D vertex coordinates are calculated by a vertex shader and then the color of each pixel is calculated by a fragment shader (cf. Sec. 3.1.2). The parameters of the rendering pipeline and the shader programs are provided by the 3D applications.

The recent trend to use 3D rendering also in a web browser via WebGL [Khrc] brought uncertified 3D applications new attention, since allowing uncertified applications to use the 3D GPU for rendering can result in unresponsive graphics. Since current GPUs do not support sufficient preemption—i.e., no upper bound for context switch latency is guaranteed—Khronos [Khrb] (the organization publishing the OpenGL standards) states:

“If a particular draw call takes a long time to execute, because it contains very many triangles, because the associated shaders are computationally expensive, or for any other reason, the user’s system may become unresponsive. This is a longstanding problem in the 3D graphics domain, and is one which has received renewed attention since WebGL has been released, because WebGL allows unknown and untrusted code to access the graphics processor.” [Khrd]

In such a case, the suggested solution is to reset the GPU:

“Solutions already exist to this problem on some operating systems. For example, Microsoft Windows Vista and later support a new driver model which will reset the graphics processor if it spends too long on any particular operation. The WebGL implementation can detect that the graphics card was reset, warn the user that WebGL content might have caused it, and prompt the user if they want to continue running the content.” [Khrd]

Unfortunately, for automotive scenarios, resetting the GPU is not an option since it cannot happen without delay and would even require 3D applications to be restarted since their GPU context became inconsistent. To this end, automotive 3D rendering can neither use explicit GPU preemption, nor reset

(22)

1. Introduction

the GPU to preempt it. For a GPU shader program, this implies that it must always terminate. For OpenGL ES 2.0, the OpenGL ES Shading Language specification [Sim09] in Appendix A.4 forbids while loops and allows only for loops that can be unrolled at compile-time. The newer OpenGL Shading Language (GLSL) specification for OpenGL ES 3.0 [SKBR12] contains no such restriction. Since the possibility to create non-terminating loops can cause unwanted behavior and system malfunction, forbidding them is common. For instance, the most popular area where untrusted shader code is executed is the WebGL standard [Khrc], which is based on OpenGL ES 2.0. To the extent of our knowledge, all browsers supporting WebGL strictly follow the specification in forbidding loops which cannot be unrolled at compile-time. Since many 3D GPU drivers actually would not reject non-terminating shader source code, the web browser implements a safety layer filtering out potentially non-terminating or extremely long-running code. To this end, for automotive scenarios, only loops that can be unrolled by the shader compiler and thus are guaranteed to terminate can be supported.

1.2.3. Execution time prediction

Without preemption, we explicitly need to consider the execution time of rendering jobs to ensure that low priority (non-safety critical) rendering jobs do not prevent the timely execution of high priority (safety critical) jobs. To this end, a non-preemptive scheduling approach is required. Non-technical approaches to determine the execution time by certification of the 3D software by a central authority like the OEM are not scalable since many apps are not implemented by the OEM himself but sub-contractors or even a large number of untrusted third-party developers of an app store. Consequently, the execution times of the GPU commands must be predicted prior to their execution.

Existing concepts like [KLRI11] use history-based approaches in kernel space to predict the execution time. While such approaches are easy to implement, they are not aware of the rendering setup and the rendered scene. To this end, they cannot predict the first commands of an application. Additionally, this approach is based on the assumption that the same GPU commands result in the same GPU execution time, which is not always the case since the GPU-internal state depends on the OpenGL context and can differ [SGDR14].

The execution time of a Draw command depends on many parameters. The

most parameters are the used shader programs and the respective number of

(23)

1.2. Goals and Problem Statements

instances. For instance, the input parameters of the shader programs can influence the positions of vertices during the vertex shader execution. This changes the number of fragments, which heavily affects the execution time.

The number of vertex shader instances is directly given by the application’s 3D API calls. To this end, the main challenges when predicting a Draw command

are to estimate

• the number of fragments generated by the vertex shader and the used

attribute data and

• the execution time per shader instance.

Next, we address both challenges in more detail.

Number of fragments. The number of fragments is one of the most relevant

factors of accurate prediction, since for each fragment one instance of a fragment shader must be executed on the GPU. Unfortunately, in order to determine the number of fragments accurately before execution on the GPU, the full vertex processing step of the OpenGL rendering pipeline would have to be emulated on the CPU. For medium or large 3D models, this is not feasible without severely affecting rendering performance, since a massive overhead would be introduced into the prediction. Consequently, a heuristic must be used, which inevitably introduces prediction errors (addressed in Sec. 3.5.1).

Execution time per shader instance. An application provides the source

code of vertex shader and fragment shader written in the GLSL. The GLSL supports if-statements, for-loops, and while-loops, but no non-structured commands such as “goto”. The source code is compiled by the user space GPU driver, which creates a shader binary with the target GPU instruction set. It performs typical compiler optimizations such as factoring out, loop unrolling, dead code elimination, or constant folding. The user space driver is typically proprietary, since it often contains intellectual property of the GPU manufacturer, which means that shader compilation is a black box. To this end, heuristics must be used to predict the execution time per shader instance (addressed in Sec. 3.5.2 and Sec. 3.5.3).

(24)

1. Introduction

1.2.4. GPU scheduler

The non-preemptive GPU scheduler is responsible to dispatch concurrently running 3D applications such that the goals prioritization, desired frame rate,

and isolation are fulfilled. Since the set of applications can change during run-time and the required execution time is determined by execution time prediction during runtime, scheduling algorithms for fixed sets of periodic tasks [Liu69, LL73] are insufficient. The Shortest Process Next (SPN) algorithm (cf. [TB14]) does not support given priorities and frame rates. Existing

approaches for 3D GPU scheduling address just fairness [DWA08, BDC08] and optionally weighted fairness with priorities [KLRI11]. However, to ensure a guaranteed latency until a frame is rendered requires a much more sophisticated scheduling algorithm. More precisely, the scheduling algorithm must keep track of the dynamic frame deadlines of each application. Unfortunately, the execution time required for an application to render its frame can change between different frames and is not available to the scheduler before the respective user space process has submitted all of its commands. Furthermore, a reservation policy for future frame periods is needed, since long-running commands of low-priority applications may not only affect the current period of higher-priority applications, but also their future periods. While fulfilling our goals is mandatory, the performance of our scheduling approach is extremely important. The latency experienced by applications and the overhead introduced by the execution of the scheduling algorithm itself thus must be very small. Furthermore, the scheduling decisions shall result in a high GPU utilization to ensure the available GPU resources are exploited. Our approach, presented in Chapter 4, addresses all these challenges—as shown by our evaluations in Section 4.5.

(25)

1.3. Project ARAMiS

The contributions of this work were supported by the project ARAMiS [ARA16] of the German Federal Ministry for Education and Research (BMBF) with funding ID 01IS11035. In this section, we provide a brief overview of the ARAMiS project and how its goals are relalated to the contributions of this work.

ARAMiS is short for “Automotive, Railway and Avionics Multicore Systems”. Its goal was to build a technological platform to further increase safety, efficiency, and comfort by using multicore technology in the automotive, avionics, and railway domains. ARAMiS ran from December 2011 to March 2015 and had a planned budget of 36 million Euro.

1.3.1. Background

Historically, single-core CPUs were prevailing in PCs, servers, and embedded systems. However, the development of new CPU generations by increasing the clock speed turned out to be slow and inefficient. In many scenarios, computation can be performed in parallel, which means that multicore CPUs can often provide a significant performance improvement compared to single-core CPUs. The rise of 3D rendering with its compute-intensive but also highly parallelizable workload brought up 3D GPUs as specialized multicore processing units. When using multicore systems for the automotive domain, many requirements must be fulfilled, such as real-time, availability, functional safety, and efficiency. Existing concepts typically do not fulfill them and are therefore not suitable. This motivates the goal of ARAMiS to find new architectures, methods, and concepts that allow multicore systems to be used for automotive platforms.

1.3.2. Structure

ARAMiS was organized in multiple subprojects, which are depicted in Fig. 1.3. Next, the subprojects are briefly described, focusing on the results relevant for this work.

TP0 provided the coordination and project management of the overall project TP1 defined scenarios consisting of use cases and requirements, including the

(26)

1. Introduction

TP0: Coordination of the overall project

S ta rt o f pr oj ec t _TP1: Scenarios and Requirements TP2: System design TP5:

Integrated methods and tools TP3: Hardware TP4: Software TP6: Demonstrators P ro je ct r es ul ts

Figure 1.3.: Dependencies of the ARAMiS subprojects

TP2 designed a system for a virtualized 3D rendering that fulfills the

requirements of TP1.

TP3 developed hardware concepts, focusing on heterogeneous architectures,

security, safety, certifiability, and virtualization. Since the GPU vendors neither grant access to the hardware layout, nor do programmable chips (e.g., FPGAs) provide sufficient performance for 3D rendering, TP3 was not in the focus of this work.

TP4 developed software concepts, which includes virtualization, compositing,

3D execution time prediction, and real-time 3D GPU scheduling, i.e., the main results of this work.

TP5 was a small subproject that examined tools supporting the design of

multicore systems. For virtualized 3D graphics, no relevant tools are known.

TP6 built multiple demonstrators (2 for automotive, 2 for avionics, 1 for railway)

covering most use cases of TP1.

1.3.3. Results

The results of ARAMiS were published in more than 120 documents, more than 70 scientific publications, and 5 demonstrators. This shows the relevance of multicore systems in the automotive, avionic, and railway domains in general, and the relevance of a virtualized graphics system with real-time 3D scheduling in particular. In particular, we built the automotive cockpit demonstrator VCT-B in collaboration with Daimler. It shows a prototype of virtualized 3D rendering of IC and HU.

(27)

1.4. Contributions

In this section, we describe our contributions to the goals presented in Sec. 1.2.

1.4.1. Requirements analysis for graphics virtualization

In Section 2.1, we thoroughly analyze relevant ISO standards and legal requirements and derive seven technical requirements for a virtualized automotive HMI system. Such requirements have been largely neglected by current virtualization efforts, which did not target automotive systems with their specific requirements, in particular, with respect to safety. For OEMs, the certifiability of automotive system functionalities is highly relevant. According to [ISO11, ISO 26262], for each functionality safety-criticality shall be identified and mapped to criticality-classes4. To fulfill the criticality-level, the severity and

likelihood of failures must be determined using, for instance, failure mode and effects analysis (FMEA) [Sta03]. Moreover, certifiability also applies to custom third-party applications. For instance, [ISO02, ISO 15005] prohibits displaying movies to the driver while the vehicle is in motion.

1.4.2. Virtualized automotive graphics system

In Section 2.2, we present a concept for a Virtualized Automotive Graphics System (VAGS). To this end, we elaborate on the challenges that are due to the identified requirements caused by consolidation of mixed-criticality graphics electronic control units (ECUs) as used, in particular, by the HU and IC. Although virtualization is a mature technology for general resources like CPU or main memory, existing concepts do not provide sufficient isolation for accessing shared graphics hardware (GPU). Our proposed architecture uses a dedicated driver-VM, which is used as central instance by the other VMs to present content on the displays. In particular, the driver-VM manages real-time 3D GPU scheduling, display access permissions, and input events.

In Section 2.3, we describe our automotive cockpit demonstrator, which contains the major components of a Virtualized Automotive Graphics System (VAGS). It shows the feasibility of our concepts and how they can be implemented.

4 _{[ISO11, ISO 26262] specifies five safety requirement levels: Four ASIL (Automotive Safety}

Integrity Level) ranging from ASIL-A (low criticality) to ASIL-D (high criticality), and one no-criticality level QM (Quality Management)

(28)

1. Introduction

1.4.3. Execution time prediction for 3D rendering

commands

In Chapter 3, we present a framework for measurement and prediction of the execution time of GPU command batches. Prediction is performed in user space, which gives the huge benefit that context information can be determined much easier, since it can be inferred from the commands transmitted through a standardized API like OpenGL ES. The basic idea is to predict the individual execution time of graphics commands using models that are determined either during runtime or offline.

In particular, we propose models for the main commands relevant for 3D rendering, namely, Flush, Clear, Draw, and SwapBuffers, using the Open

Graphics Library for Embedded Systems (OpenGL ES) standard [Khra].

Flush has constant execution time independent of the context. The execution

time of Clear (if not integrated into SwapBuffers) essentially depends on

the render buffer size. TheDraw model is based on the number of vertices and

the number of fragments (possible pixels of triangles). Therefore, to predict the

Draw execution time, we estimate the number of fragments and the time the

processing time per vertex and per fragment using the given shader program. We achieve this by emulating the vertex shader either on a bounding box of the 3D model, or on a representative subset of triangles. To profile the execution time of these commands on the specific GPU and to execute the emulation, we propose an online approach that instruments the GPU command groups in kernel space on the fly and an offline approach that uses machine learning models based on platform-specific training data. Furthermore, we present a fine-grained online correction to further improve prediction accuracy. We implemented our prediction framework and present evaluation results that compare our approaches with each other and an existing history-based approach. We show that our prediction framework achieves unprecedented accuracy that is sufficient even for challenging scheduling scenarios.

1.4.4. 3D GPU scheduler

In Chapter 4, we present a framework for real-time 3D GPU scheduling. Without preemption, we explicitly consider the execution time of GPU command batches to ensure that low priority (non-safety critical) GPU command batches do not prevent the timely execution of high priority (safety critical) jobs. Our GPU

(29)

1.4. Contributions

scheduling algorithm considers in addition to the job execution time several other parameters like the priority of the rendering jobs, screenrefresh rate, and target

frame rate. In more detail, we make the following contributions for 3D GPU

scheduling:

1. A system architecture and framework for 3D GPU scheduling that uses execution time prediction of GPU rendering jobs.

2. A priority-based real-time scheduling concept that specifically addresses desiredframe rates of dynamic rendering jobs and bitblitting aligned to the

vertical synchronization of the displays.

3. An implementation of the framework and the proposed 3D GPU scheduling concepts.

4. An evaluation showing the conformance of the implementation compared to the setup, a high GPU utilization of about 97 %, and less than 10 µs

scheduling latency.

1.4.5. Further contributions

In Chapter 2.3, we present an automotive cockpit demonstrator (VCT-B) that was developed in collaboration with the Daimler AG in Stuttgart. It uses a consolidated hardware platform for IC and HU, using a hypervisor for isolating the virtual machines that contain HU and IC functionality. The input buttons on the steering wheel and the push-and-rotary switch can be used to navigate through menus and change modes. We demonstrate the uses cases for a VAGS, such as a flexible display usage and isolation. To this demonstrator, the author has contributed concepts, code, and guidance. The concepts include an access control system for display areas, the inter-VM communication layer, efficient compositing concepts, and the virtualization layer.

1.4.6. Related publications and contributors

In this section, we present the scientific publications by the author that are related to this work. For each publication, we briefly describe the amount of the author’s contribution. The author was advisor of all diploma, master, bachelor, and study theses cited in this section. We also declare the contributions to this work that are beyond the scope of the referenced scientific publications. All publications have been written in collaboration with the other authors. Especially Simon Gansel

(30)

1. Introduction

provided lots of valuable feedback to both, the concepts, and the publication texts. The feedback and the discussions with Prof. Dr. Kurt Rothermel and Dr. Frank Dürr helped to tailor and improve the publications.

Focus of this work. In [GSD+13] the requirements for automotive graphics are expounded and the concept of a VAGS is presented. The author’s contribution to this publication was 45 %. In [SGDR14] we presented execution time prediction

using the bounding box heuristic and profiling of shaders during runtime. The author’s contribution to this publication was 85 %, the implementation was sole

work of the author. In [SGDR16] we presented the real-time scheduling for 3D GPU rendering. The author’s contribution to this publication was85 %, the GPU

scheduler implementation was sole work of the author.

The diploma and master theses of Fabian Römhild, Armin Cont, and Waqas Tanveer [Röm11, Con11, Tan13] gave insight about the scheduling capabilities of OpenGL and CUDA. The observed limitations justified our concept to do GPU scheduling in kernel space. The diploma thesis of Martin Thielefeld [Thi12] improved the knowledge how GPU execution time depends on OpenGL ES 2.0 Context, thus helping to build adequate prediction models. The study thesis of Felix Zehender [Zeh14] helped to better understand how the 3D GPU driver in user space (MESA, in particular) compiles and optimizes shader code. The master thesis of Hua Ma [Ma14] provided a better understanding of the Vivante GPU kernel driver, which helped to implement our Execution Time Monitor. The master thesis of Robin Keller [Kel16] helped to understand the limitations of linear regression regarding GPU execution time prediction. As a consequence, we used a non-linear model without online learning and only for the prediction of shader execution times.

Yaroslav Nalivayko worked as a student assistant on the execution time prediction. He implemented requested features such as saving the prediction parameters to XML, the triangle samples approach, and helped on debugging and creating training data.

Completive to this work. Within the scope of the ARAMiS project, small

parts of this work were published in [RAL+_{15], where virtualization concepts in} the scope of ensuring safety and security in automotive systems are described.

The author contributed 10 % to each of the publications about automotive

HMI access control concepts [GSGH+_{14, GSGH}+_{15] and efficient compositing} [GSC+_15].

(31)

1.4. Contributions

Ahmad Gilbeau-Hammoud contributed to [GSGH+_{14, GSGH}+_{15] with his} diploma thesis [GH13] and his subsequent work as a student assistant and research assistant. Riccardo Cecolin contributed to [GSGH+_{15] with his} diploma thesis [Cec14]. The master thesis of Han Zhao [Zha15] proposes a 3D compositor for a VAGS that allows to combine the 3D output of different applications. The depth information is used to determine visibility and shader programs operating on the applications color and depth buffers are used for customized lighting effects. Thus, this thesis provides further motivation for a VAGS on a consolidated hardware architecture. The master thesis of Andrej Eisfeld [Eis14] improved inter-VM communication of the OpenGL ES 2.0 and EGL protocols, showing that efficient transmission of graphics data in a VAGS is possible.

(32)

1. Introduction

1.5. Structure

The rest of this work is structured as follows. In Chapter 2 the relevant requirements and our architecture are presented. Section 2.1 presents the relevant automotive HMI requirements. The architecture of our proposed Virtualized Automotive Graphics System is described in Section 2.2. To demonstrate the scenarios of our automotive graphics virtualization, we created an automotive cockpit demonstrator, which is described in Section 2.3. Chapter 2 is complemented by related work in Section 2.4 and a summary in Section 2.5.

Our main contributions are the execution time prediction—presented in Chapter 3—and the real-time GPU scheduler—presented in Chapter 4.

Related to execution time prediction (ETP), we provide background information about EGL, OpenGL ES 2.0, and machine learning in Section 3.1. The system model is presented in Section 3.2. In Section 3.3, we describe the prediction architecture and how the prediction models are used.

The rather simple models for Flush, Clear, and SwapBuffers are

described in Section 3.4. The challenging Draw command and its sub models

to estimate fragments and shaders are presented in Section 3.5. This includes fragment estimation heuristics, performance parameter profiling, and machine-learning-based models. The optional Online Adaption allows to correct predictions leaning to either overestimation or underestimation and is presented in Section 3.6.

The implementation is expounded in Section 3.7 and the evaluation results are presented and discussed in Section 3.8. The related work for execution time prediction in Section 3.9 is followed by a summary and an outlook on future work in Section 3.10.

For GPU scheduling we explain the requirements in Section 4.1 and the system model in Section 4.2. The concepts are explained in Section 4.3 and followed by a description of the implementation in Section 4.4. In Section 4.5 we present our evaluation results, which show feasibility, effectiveness, and performance of our GPU scheduler. The chapter is concluded an outlook on preemptive GPU scheduling in Section 4.6, related work in Section 4.7, and the summary and future work in Section 4.8.

This work is concluded in Chapter 5.

(33)

2. Requirements and

Architecture

In this chapter, we thoroughly analyze relevant ISO standards and legal requirements and derive seven technical requirements for a virtualized automotive HMI system. Such requirements have been largely neglected by current virtualization efforts, which did not target automotive systems with their specific requirements, in particular, with respect to safety. For OEMs, the certifiability of automotive system functionalities is highly relevant. According to [ISO11, ISO 26262], for each functionality safety-criticality shall be identified and mapped to criticality-classes1_{. To fulfill the criticality-level, the severity and}

likelihood of failures must be determined using, for instance, failure mode and effects analysis (FMEA) [Sta03]. Moreover, certifiability also applies to custom third-party applications. For instance, [ISO02, ISO 15005] prohibits displaying movies to the driver while the vehicle is in motion. These specific regulations impose challenging technical requirements to virtualization. To this end, we elaborate on the challenges that are due to the identified requirements to consolidate mixed-criticality graphics ECUs as used, in particular, by the HU and IC. Although virtualization is a mature technology for general resources like CPU or main memory, existing concepts do neither provide sufficient isolation for accessing shared graphics hardware (GPU) and input devices (e.g., steering wheel buttons), nor do they provide sufficient isolation for implementing the flexible presentation of application windows.

This chapter is structured as follows. In Sec. 2.1, the requirements for automotive graphics systems are analyzed and seven technical requirements derived. In Sec. 2.2, we propose the architecture for virtualized automotive graphics. The automotive cockpit demonstrator VCT-B is explained in Sec. 2.3, followed by related work in Sec. 2.4, and a summary of this chapter in Sec. 2.5.

1 _{[ISO11, ISO 26262] specifies five safety requirement levels: Four ASIL (Automotive Safety}

Integrity Level) ranging from ASIL-A (low criticality) to ASIL-D (high criticality), and one no-criticality level QM (Quality Management)

(34)

2. Requirements and Architecture

2.1. Requirements

In this section, we discuss requirements that are relevant for automotive HMI systems. Automotive application development is constrained by ISO standards, automotive design guidelines, legal requirements, and OEM specific demands. The design guidelines (e.g., [AAM06, AAM 2006], [ESO08, ESoP 2008], [JAM04, JAMA 2004]) in the automotive domain are almost completely derived from the following ISO standards.

• [ISO96, ISO 11428] Ergonomic requirements for the perception of visual

danger signals.

• [ISO02, ISO 15005] Requirements to prevent impairment of the safe and

effective operation of the moving vehicle.

• [ISO04, ISO 16951] Priority-based presentation of messages. • [ISO10, ISO 2575] Symbols for controls and indicators. • [ISO08, ISO 15408-2] Security in IT systems.

• [ISO11, ISO 26262] Risk-based assessment of potentially hazardous

operational situations and of safety measures.

In the following, we propose seven technical requirements for automotive HMI systems. For each of them we added references to relevant sections of the mentioned ISO standards.

2.1.1. R1 – Input Event Handling

R1.1 – Restricted Access Control: For user input events access control is

required and it shall not violate any of the following

constraints [ISO02, ISO 15005]. Applications using dialogues shall not require to use input devices in a way that demands removal of both hands from the steering wheel while driving (5.2.2.2.2). Additionally, exiting a dialog or an application shall always be possible (5.3.3.2.1) unless legally required or traffic-situation-relevant (5.3.3.2.3).

R1.2 – Restricted Processing Time: A maximum processing time for input event handling shall be met. For instance, response to tactile user inputs shall not exceed 250 ms (5.2.4.2.3).

(35)

2.1. Requirements

2.1.2. R2 – Restricted Window Creation and Positioning

R2.1 – Restricted Visibility of Windows: Usually, graphical applications use

API functions to change the visibility of windows, e.g., to create, hide, or position them. This functionality must be restricted, and functions not intended to be used by the driver must be inaccessible for him [ISO02, ISO 15005] (5.2.2.2.4).

R2.2 – Priority-based Displaying of Windows: If multiple windows shall be

displayed, the importance of each of them must be defined. Importance is represented by priorities, which can depend on safety requirements and software ergonomic aspects (5.2.4.2.4) that must be met by the system (5.2.4.3.3). Moreover, they can depend on urgency and criticality, which have to be defined [ISO04, ISO 16951] (3.5). Additionally, appropriate reactions (e.g., behavior in case of conflicts) shall be enforced [ISO04, ISO 16951] (Annex B). Furthermore, country-specific legal requirements constrain the definition of the priorities, e.g., German law requires the constant visibility of the speedometer while the vehicle is in motion (StVZO §57 [Jan11]). Additionally, visual information must be presented in a consistent way [ISO02, ISO 15005] (5.3.2.2.1).

R2.3 – Timing Constraints: An automotive HMI system shall enable

applications to provide important information to the driver within given time constraints. This means that windows showing information shall be visible within given time constraints [ISO02, ISO 15005] (5.2.4.3.4). If applications require user interaction, e.g., if a user selects a radio channel, the flow of information must not adversely affect driving (5.2.4.2.1). Concretely, according to [AAM06, AAM 2006] Section 2.1, each glance shall not exceed 2 seconds. Hence, any kind of animation shall not run longer than 2 seconds.

2.1.3. R3 – Trusted Channel

R3.1 – Integrity and Confidentiality: In environments where applications run

inside VMs, communication is inevitable. This holds for communication that previously used dedicated communication hardware and is now replaced by software-based inter-VM communication. According to [ISO08, ISO 15408-2], communication between applications and hardware must provide integrity and confidentiality, for both, user data

(36)

(14.5.8.2) and software components providing relevant functionality (17.1.5.3). All applications that need trusted communication shall be able to use it (17.1.5.2).

R3.2 – Authentication and Non-Repudiation: Identification shall be assured

even between distinct systems (17.1.5.1), which also applies to inter-VM communication. A trusted channel also requires non-repudiation of origin (8.1.1 and 8.1.6.1-3) and receipt (8.2.1 and 8.2.6.1-3). This requires authentication and may also involve cryptographic key management (9.1.1) and key access (9.1.7.1).

2.1.4. R4 – Virtualized Graphics Rendering

In our system, multiple VMs have shared access to a single GPU, and therefore the VMM has to provide isolation. That is, unintended interference between applications must not occur.

R4.1 – Priority Handling: Application windows must be assigned a priority,

which determines how GPU commands are processed [ISO02, ISO 15005] (5.2.4.2.4 and 5.2.4.3.3), [ISO08, ISO 15408-2] (15.2.5.1-2 and 15.2.6.1-2). For instance, a rendered speedometer must have a high priority, since the German law regulates that it must be visible while driving and display the current speed (StVZO §57 [Jan11]).

R4.2 – Rendering Time Constraints: Not only comparative requirements

(like priorities) but also absolute timing requirements have to be fulfilled. A response to a drivers tactile input shall not exceed 250 ms [ISO02, ISO 15005] (5.2.4.2.3). Similarly, emergency signals may require constant redraw rates to represent flashing lights [ISO96, ISO 11428] (4.2.2). This requires appropriate CPU and GPU resources and imposes a minimum

frame rate since the delay between two consecutive frames is constraint by

an upper bound. The upper bound must be known to determine the effectiveness of safety-critical messages [ISO04, ISO 16951] (Annex F) and also to allow for the definition of delays after which messages are displayed (Annex B). Additionally, OEMs (especially of premium brands) have demanding requirements for the rendering, e.g., that the speedometer shall be rendered stutter-free at 60 frames per second.

R4.3 – GPU Resource Isolation: The GPU is a controlled resource according

to [ISO08, ISO 15408-2]. To prevent unintended interference, it must be

(37)

2.1. Requirements

possible to provide guarantees to certain applications that they are provided sufficient GPU resources such as processing time. Therefore, it must be possible to control which GPU resources individual windows, graphical applications, or VMs are allowed to use (15.3.6.1 and 15.3.7.1-2).

2.1.5. R5 – Reconfiguration of Policies

A set of permissions that apply to user input events, application windows, and the related scheduling and isolation is called a policy. At each point in time, exactly one policy is active, though policies are dynamically switched during runtime depending on the system state.

R5.1 – Dynamic State Changes: In accordance to [ISO02, ISO 15005], astate change happens either on user request or automatically by system-defined rules. A state can depend on a current vehicle condition like “vehicle is in motion”, which could require the deactivation of applications that are not intended to be used by the driver while the vehicle is in motion (5.2.2.2.4). Otherwise, an automotive HMI system shall provide sufficient information and warnings to provide the driver with the intended purpose in a current state. For every state change, specified deadlines apply to determine a consistent and accurate transition between different states. The definition of states and system behavior is explained in more detail in [ISO04, ISO 16951] (3.3 and Annex E).

R5.2 – Dynamic Policy Changes: Authorized software components shall be

enabled to apply changes to policies during runtime. This includes granting and revoking permissions on both, currently active and currently inactive policies. As for R5.1, deadlines apply to dynamic policy changes. Where applicable and allowed, the driver shall be able to change the active policy to manipulate the flow of information [ISO02, ISO 15005] (5.3.3.2.3).

R5.3 – Presentation Enforcement: The system-defined rules shall enforce the

presentation of legally required messages and traffic-situation-relevant messages. Presentation requires that those messages are visible and

perceivable, in particular, if state changes require driver

attention [ISO02, ISO 15005] (5.3.2.2.2). Furthermore, state-related information shall be displayed either continuously or upon request by the driver.

(38)

2.1.6. R6 – Certifiability

For an OEM, certifiability is an essential part of the software development process, e.g., by using methods like FMEA [Sta03]. The development process for certified software, in particular, for high criticality levels, is quite complex and expensive. A key indicator for complexity is the number of function points that correlates with the approximated number of software defects [EJ09]. Hence, a system shall be developed with respect to an easy certification according to [ISO11, ISO 26262].

2.1.7. R7 – System Monitoring

System Monitoring puts the focus on logging, detecting, and reacting to events that possibly are relevant to provide safety.

R7.1 – Secure Boot: Derived from [ISO08, ISO 15408-2], the system shall

provide secure boot to ensure the integrity of the system. Compromising the system (14.6.9.1) or system devices or elements (14.6.9.2) by physical tampering shall be unambiguously detected.

R7.2 – Auditing: The auditing of all safety-critical related events shall be guaranteed to ensure traceability of system activities in an automotive HMI system that potentially violate safety or security. Therefore, direct hardware access must not be permitted to ensure that auditing cannot be bypassed. For a potential violation analysis, a fixed set of rules shall be defined for a basic threshold detection, [ISO08, ISO 15408-2] (7.3.2). To indicate any potential violation of the system-defined rules, the monitoring of audited events shall also be based on a set of rules (7.3.8.1) that must be enforced by the system either as an accumulation or a combination of a subset of defined auditable events that are known to threat the system security (7.3.8.2). Similarly, all changes to policies initiated by applications shall be monitored and verified.

R7.3 – Supervision of Timing Requirements: It is a requirement to regulate

the flow of information to ensure short and concise groups such that the driver can easily perceive the information with minimal distraction [ISO02, ISO 15005] (5.2.4.2.1). Therefore, specified time restrictions need to be verified. This also includes the auditing of driver tactile input and system response time, which shall not exceed 250 ms (5.2.4.2.3).

(39)

2.1. Requirements

R7.4 – Detection of DoS Attacks: The occurrence of any event representing

a significant threat such as a DoS attack shall be detectable by the system in real-time or during a post-collection batch-mode analysis [ISO08, ISO 15408-2] (7.3.2).

R7.5 – Perception of Visual Signals: For the perception of visual danger

signals, visibility properties like fractions of luminances [ISO96, ISO 11428] (4.2.1.2) and colors of signal lights (4.3.2) have to be monitored. Monitoring is also required for certain safety-critical symbols defined in [ISO10, ISO 2575].

R7.6 – Software Fault Tolerance: [ISO08, ISO 15408-2] requires the

detection of defined failures or service discontinuities and a recovery to return to a consistent and secure state (14.7.8.1) by using automated procedures (14.7.9.2). A list of potential failures and service discontinuities have to be supervised by a watchdog to detect entering of failure states. Furthermore, for a defined subset of functions that are required to complete successfully, failure scenarios shall be specified that ensure recovery (14.7.11.1).

R7.7 – System Integrity: In case of unrecoverable failures, the system shall be

able to switch to degraded operation mode to preserve system integrity. A list of failure types shall be defined, for which no disturbance of the operation of the system can take place [ISO08, ISO 15408-2] (15.1.7.1). Moreover, the system shall ensure the operation of a set of capabilities for predefined failure types (15.1.6.1). This includes the handling of DoS attacks and detection of illegitimate policy changes. Some events have to be maintained in an internal representation to indicate if any violations take or took place. This includes the behavior of system activities for the identification of potential violations (7.3.10.2-3) like state changes (7.3.10.1).

(40)

2.2. Architecture

In this section, we briefly describe the architecture of a VAGS that addresses the identified requirements and is depicted in Fig. 2.1. While Certifiability (R6) applies to the complete development process, all other requirements can be fulfilled by the functionalities of the components of our architecture.

Microkernel-based VMM Hardware OS GPU Scheduler GPU Input Devices Window Manager Input Manager

Permission and Policy Management Authentication Manager System Monitor Auditing Watchdog

Isolated Communication Channel OS Speedo-meter Tacho-meter OS Navi-gation Media TV VM (Instrument Cluster) VM (Head Unit) VM (Virtualization Manager) ... ... Display 1 Display 2 ... VM (custom apps) OS App 1 App 2 ... ...

Figure 2.1.: Architecture of a virtualized vehicular graphics system

2.2.1. Virtualization

The consolidation of graphics hardware is of high relevance in modern cars. An increasing number of automotive functionalities and applications require highly sophisticated graphical representations in 2D or 3D based on hardware acceleration. For instance, the HU uses displays integrated into the backside of the front seats and center console to display multimedia content; and displays connected to the IC show car specific information like current vehicle speed or warnings. To this end, HU and IC both require a high amount of CPU and GPU resources, which makes them good candidates for hardware consolidation. Each virtualized ECU runs in a dedicated virtual machine (VM), and a virtual machine monitor (VMM) acts as middleware between VMs and hardware. Besides the already mentioned general benefits, the virtualization of IC and HU provides advantages such as the flexible placement of graphical output on previously separated displays, which is a matter of software implementation only. Moreover, virtualization enables OEMs to deploy custom applications inside a dedicated VM that is isolated from HU and IC.

With respect to certifiability, we follow the approach of a microkernel-based VMM where drivers run in user space rather than kernel space. Therefore, the kernel code size is very small and easier to certify [EJ09]. If driver code crashes,

(41)

2.2. Architecture

this does not affect the VMM. The Virtualization Manager runs in a dedicated VM and exclusively manages shared resources. It contains relevant drivers, e.g., for GPU and input devices. This ensures that access to all shared resources is controlled by a single trustworthy VM. Indirect hardware access by VMs facilitates Virtualized Graphics Rendering (R4) and System Monitoring (R7). Additionally, the Virtualization Manager contains multiple software components ensuring that every hardware access by VMs is in compliance with our requirements. Note that our architecture only shows four exemplarily VMs. However, we do not restrict the number of VMs. Therefore, it is possible to deploy additional VMs if needed.

2.2.2. Inter-VM communication

In order to access hardware, the HU and IC VMs communicate with the Virtualization Manager VM. For this bidirectional communication, a Trusted Channel (R3) is required to support secure communication between the different virtual machines. A trusted channel is provided by the cooperation of the Isolated Communication Channel and the Authentication Manager. The Isolated Communication Channel provides integrity and isolation for communication (R3.1) between applications and the Virtualization Manager. To initiate a connection, applications first have to provide valid