Oracle-Guided Design and Analysis of Learning-Based Cyber-Physical Systems

(1)

UC Berkeley Electronic Theses and Dissertations

Permalink https://escholarship.org/uc/item/0tm3q8b5 Author Ghosh, Shromona Publication Date 2019 Peer reviewed|Thesis/dissertation

eScholarship.org Powered by the California Digital Library

(2)

Shromona Ghosh

A dissertation submitted in partial satisfaction of the requirements for the degree of

Doctor of Philosophy in

Engineering - Electrical Engineering and Computer Science in the

Graduate Division of the

University of California, Berkeley

Committee in charge:

Professor Alberto Sangiovanni-Vincentelli, Co-chair Professor Sanjit A. Seshia, Co-chair

Professor Claire J. Tomlin Professor Francesco Borelli

(3)

(4)

Oracle-Guided Design and Analysis of Learning-Based Cyber-Physical Systems by

Shromona Ghosh

Doctor of Philosophy in Engineering - Electrical Engineering and Computer Science University of California, Berkeley

Professor Alberto Sangiovanni-Vincentelli, Co-chair Professor Sanjit A. Seshia, Co-chair

We are in world where autonomous systems, such as self-driving cars, surgical robots, robotic manipulators are becoming a reality. Such systems are considered safety-critical since they interact with humans on a regular basis. Hence, before such systems can be integrated into our day to day life, we need to guarantee their safety. Recent success in machine learning (ML) and artificial intelligence (AI) has led to an increase in their use in real world robotic systems. For example, complex perception modules in self-driving cars and deep reinforce-ment learning controllers in robotic manipulators. Although powerful, they introduce an additional level of complexity when it comes to the formal analysis of autonomous systems. In this thesis, such systems are designated as Learning-Based Cyber-Physical Systems (LB-CPS).

In this thesis, we take inspiration from the Oracle-Guided Inductive Synthesis (OGIS) paradigm to develop frameworks which can aid in achieving formal guarantees in differ-ent stages of an autonomous system design and analysis pipeline. Furthermore, we show that to guarantee the safety of LB-CPS, the design (synthesis) and analysis (verification) must consider feedback from the other. We consider five important parts of the design and analysis process and show a strong coupling among them, namely (i) Robust Control Synthesis from High Level Safety Specifications; (ii) Diagnosis and Repair of Safety Require-ments for Control Synthesis; (iii) Counter-example Guided Data Augmentation for training high-accuracy ML models; (iv) Simulation-Guided Falsification and Verification against Ad-versarial Environments; and (v) Bridging Model and Real-World Gap. Finally, we introduce a software toolkit _VerifAI for the design and analysis of AI based systems, which was de-veloped to provide a common formal platform to implement design and analysis frameworks for LB-CPS.

(5)

(6)

4.3 Running Example . . . 36 4.4 Problem Formulation . . . 38 4.5 Solution Approach . . . 40 4.6 Monolithic specifications . . . 40 4.6.1 Diagnosis . . . 43 4.6.2 Repair . . . 43 4.7 Contract specifications . . . 49 4.7.1 Non-Adversarial Environment . . . 49 4.7.2 Adversarial Environment . . . 51 4.8 Evaluation . . . 55 4.8.1 Autonomous Driving . . . 55 4.8.2 Quadrotor Control . . . 58

4.8.3 Aircraft Electric Power System . . . 59

5 Counter-Example Guided Data Augmentation 62 5.1 Introduction . . . 62

5.1.1 Data Augmentation . . . 62

5.3 Solution Approach . . . 64

5.4 Design of Image Generatorγ . . . 66

5.4.1 Modification Space . . . 66

5.4.2 Picture Concretization . . . 68

5.4.3 Annotation Tool . . . 69

5.4.4 Image Generators as Simulation Engines . . . 70

5.5 Sampling Methods . . . 70 5.5.1 Non-active Sampling . . . 70 5.5.2 Active Sampling . . . 72 5.5.3 Cross-Entropy Sampling . . . 72 5.5.4 Bayesian Optimization . . . 72 5.6 Error Tables . . . 73

(8)

5.7.1 Augmentation Methods Comparison . . . 77

5.7.2 Random vs Low-discrepancy Sampling . . . 78

5.7.3 Random vs Diversity Augmentation . . . 78

5.7.4 Augmentation Loop . . . 78

5.7.5 Error Table-Guided Sampling . . . 79

6 Simulation Guided Falsification and Verification 82 6.1 Introduction . . . 82

6.1.1 Verification and Falsification . . . 83

6.2 Preliminaries . . . 84 6.2.1 Mathematical Preliminaries . . . 84 6.2.2 Gaussian Process . . . 85 6.2.3 Bayesian Optimization . . . 86 6.3 Problem Formulation . . . 86 6.4 Solution Approach . . . 87

6.5 Active-Learning for Falsification . . . 89

6.5.1 Theoretical Results . . . 90

6.6.1 Modeling Smooth vs Non-Smooth Functions . . . 92

6.6.2 Collision Avoidance with High Dimensional Uncertainty . . . 93

6.6.3 OpenAI Gym Environments . . . 94

6.8 Proofs . . . 97

6.8.1 Convergence proof . . . 99

7 Specification-centric Simulation Metric 103 7.1 Introduction . . . 103

7.1.1 Quantifying system and model mismatches . . . 104

7.3 Running Example . . . 107

7.4 Problem Formulation . . . 108

7.5 Solution Approach . . . 109

7.5.1 Computing approximate safe sets using Mand simulation metric . . 110

7.5.2 Specification-Centric Simulation Metric (SPEC) . . . 111

7.6 Distance Metric Computation . . . 115

7.7 Running Example: Distance Computation . . . 120

(9)

7.8.2 Webots: Lane Keeping . . . 123

7.9 Practicality of SPEC . . . 127

8 VerifAI: A toolkit for Design and Analysis of Artificial Intelligence-Based Systems 129 8.1 Introduction . . . 129

8.2 VerifAI Structure . . . 131

8.2.1 Inputs and Outputs . . . 132

8.2.2 Tool Components . . . 133

8.3 Features and Evaluation . . . 134

8.3.1 Falsification . . . 134

8.3.2 Fuzz-Testing . . . 138

8.3.3 Data Augmentation and Error Table analysis . . . 139

8.3.4 Model Robustness and Hyper-parameter Tuning . . . 139

9 Conclusion and Future Work 142

(10)

List of Figures

1.1 Correct-by-construction design . . . 2

1.2 System Analysis and Verification . . . 2

1.3 OGIS framework I = (L,O) . . . 4

1.4 Thesis contribution in overall CPS design. . . 7

3.1 The ego car is attempting to reach the goal while avoiding collisions with the other cars. . . 17

3.2 Naive CEGIS: CN = (LCN,OCN). The oracle OCN(LCN) is shown in blue (yellow). The oracle and learner together play a zero sum game. The size of the counter-exampleECEset grows every iteration. Hence, over time the optimization problem solved by the LCN grows while that solved byOCN remains the same. . . 19

3.3 Single CE CEGIS: CS = (LCS,OCS). The oracle OCS(LCS) is shown in blue (yel-low). The oracle and learner together play a zero sum game. The oracle OCS is the same as OCN. However, the learner LCS solves an optimization problem with the most recent counter-example e∗ returned by the oracleOCS. This is the same optimization problem as in (3.2) where ECE =e∗ is a singleton set, with the most recent counter-example. This ensures the size of the optimization problem solved by the learner LCS does not grow over the iterations. Although this circumvents the scalability issue of CN, it suffers from oscillating indefinitely among the same set of counter-examples as shown in Example 2. . . 20

3.4 Satisfiabiality CEGIS: CB = (LCB,OCB). The oracle OCB(LCB) is shown in blue (yellow). The oracle and learner together play a zero sum game. Here unlike CN and CS, they oracle and learner solve satisfiability problems as opposed to optimization problem. Hence, the strategy returned by either one of them is a dominant strategy and not an optimal one. . . 21

3.5 Dominant strategy, optimal counter-example CEGIS: CD = (LCB,OCN). The or-acle OCN(LCB) is shown in blue (yellow). The oracle OCN) solves an optimization problem to propose optimal counter-examples. The learner LCB solves a satiafi-ability problem and proposes dominant control strategies as opposed to optimal control strategies. . . 24

(11)

3.7 Dominant strategy, optimal counter-example CEGIS: CD = (LCH,OCN). The oracleOCN(LCH) is shown in blue (yellow). The oracleOCN) solves an optimization problem to propose optimal counter-examples. The learnerLCH solves a sequence of SMT queries to compute the refuted balls, and then solves a MILP problem to find an −robust control strategy. . . 27 3.8 Generalized continuous rps. . . 28 3.9 Continuous RPS with n counterexamples . . . 29 3.10 The x-axis shows the dimension of the input space, while the y-axis shows the

time taken for the CEGIS loop to converge. milp ce 1 refers to Ck=1

H , milp ce 2

refers to Ck=2

H ,milp ce inf refers to CN, and smt ce inf refers to CD. . . 30

4.1 Vehicles crossing an intersection. The red car is the ego vehicle, while the black car is part of the environment. . . 36 4.2 OGIS for Problem 1 IϕM = (LI_ϕM,OI_ϕM). The oracle OI_ϕM(LI_ϕM) is shown in

blue (yellow). The oracle OI_ϕM diagnoses the infeasible control synthesis (@u)

problem being solved by the learner. It first extracts the Irreducibly Inconsis-tent System (IIS) from the optimization problem M (IC = IIS(M)). It then

extracts the set of diagnosed atomic predicates D0 _{= ExtractPredicates(}_I

C) and

associated constraints I0 = ExtractConstraints(IC) and returns it to the learner.

The learner updates the MILP by introducing ‘’slack” variables to the constraints pertaining to the to the diagnosed predicates I0. It then solves the modified the updated optimization problem. If this is feasible, the learner returns the updated specification ϕ0 and synthesized control strategy u. If not, it sends the updated specification to the oracle and loop continues. The oracle returns a set of di-agnosed predicates D0 _{to the user at every iteration. By introducing the slack} variables to the atomic predicates, we are guaranteed that the OGIS loop will terminate with a repaired specification ϕ0. . . 41 4.3 Parse tree of ψ ≡ψe→ψs used in Example 6 and 9. . . 50

4.4 Hierarchical CEGIS for Problem 2 CϕC = (LC_ϕC,OC_ϕC). The oracle OC_ϕC(LC_ϕC) is shown in blue (yellow). The oracleOC_ϕC implements a CEGIS frameworkCE =

(LCe,OCe). The learnerLC_ϕC repairs the adversarial environment assumptionϕw.

The loop terminates when the ϕw becomes empty (trivially true) or OCe does not find a counter-example to the candidate control strategy u∗ proposed by LCe. . 54 4.5 Changing lane is infeasible att= 1.2 s in (a) and is repaired in (b). . . 57 4.6 Left turn becomes infeasible at time t= 2.1 s in (a) and is repaired in (b). . . 57

(12)

of the quadrotor, represented as a green rectangle. The black square marks the initial position while the red square marks the goal. As shown in Fig. (a) and (c), the original specification becomes infeasible at time t = 0.675, which is marked by a green square along the trajectory, when the quadrotor hits the boundary represented by the dotted red line. After updating the specification, controller synthesis becomes feasible, as shown in Fig. (b) and (d), where the quadrotor reaches the final position, at the cost of passing through the red dotted line. . . 60 4.8 Simplified model of an aircraft electric power system (left) and counterexample

trajec-tory (right). The blue, green and red lines represent environment, state, and controller variables, respectively, for a 380-ms run. . . 61 5.1 Counter-example guided data augmentation (CEGDA): Cda = (LCda,OCda). The

oracle OCda(LCda) is shown in blue (yellow). The oracle OCda generates an

aug-mentation set _A and an error table _E. It then extracts features from the error table _E to explain the cause of failure of the trained model f, which is used to generate _A. Moreover, it analyzes the _E to provide feedback to the user. The learner LCda then augments the training data X to generate ˜X which is used to train the model f. . . 65 5.2 The low dimension (3D) modification space _M on the right can be projected

to the high dimensional feature (image) space _X on the left through the image generator function γ. . . 66 5.3 Car re-sizing and displacement using vanishing point and lines. . . 67 5.4 Distance over modification space used to measure visual diversity of concretized

images. d(m(1),m(2)) = 0.48, d(m(1),m(3)) = 2.0, d(m(2),m(3)) = 2.48. . . 68 5.5 Sample basic images. The image at the top shows a road scenario sample, and

the images at the bottom show car model samples. . . 69 5.6 Annotation trapezoid. User adjusts the four corners that represent the valid

sampling subspace of x and z. The size of the car scales according to how close it is to the vanishing point. . . 70 6.1 OGIS for FalsificationIf = (Lf,Of). The oracleOf (learnerLf) is shown in blue

(yellow). The oracle Of is a composition of two individual oracles, (i) the first is

an active-learning framework to propose candidate environment efor simulation; and (ii) a black-box simulation engine which generates finite-horizon trajectories ξS(·;e) of the closed-loop system S in a given environment configuration e. The learner Lf first builds a parse tree corresponding to the safety specification Tϕ

whose leaf nodes are GPs modeling the predicates. At every iteration, the learner uses the system trajectory returned by the oracle to update the GP models of the predicates. It then sends the updated tree with the updated GP models Tϕ

(13)

6.2 Equivalent parse tree Tϕ for ρϕ in (6.6) to the function (6.7). We replace the

predicates ρµi with their corresponding pessimistic GP predictions to obtain a lower bound on ρϕ(e). . . 90

6.3 The dashed orange line in Fig 6.3a represents the true, non-smooth optimization function in (6.9) while the green and blue line represent sin(e) and cos(e) respec-tively. Modeling this function directly as a GP leads to model errors Fig Fig. 6.3b, where the 95% confidence interval of the GP (blue shaded) with mean estimate (in blue line) does not capture the true function ρϕ(e) in orange. In fact, the

minimum (red star) is not contained within the shaded region, causing the op-timization to diverge. BO converges to the green dot, where ρϕ(e) > 0 which

is not a counterexample. Instead, modeling the two predicates individually and combining them with the parse tree, leads to the model in Fig Fig. 6.3c. Here, the true function is completely captured in the confidence interval. As a consequence, BO converges to the global minimum (the red star and green dot converge). . . 92 6.4 The orange and blue lines in Fig 6.4a and Fig Fig. 6.4b show the evolution of

samples returned over the BO iterations when (6.9) is modeled as a single GP and multiple GPs respectively for two different initialization. We see the that when modeling as a single GP, it takes longer to stabilize to e∗ and in some cases (Fig 6.4b) does not stabilize to e∗. . . 93 6.5 The red, blue and green bars shows the average number of counterexamples found

using random sampling; applying BO on the reduced input space and original input space respectively for the example in Sec 6.6.2. The black lines show the standard deviation across the experiments. . . 94 6.6 The green, blue and red bars show the number of counter examples generated

when modeling ρµ1, ρµ2 as separate GPs; modeling ρϕ as a single GP and

ran-dom testing respectively for the reacher example (Sec 6.6.3.1). Our modeling paradigm, finds more counterexamples compared to the other two methods. . . . 95 6.7 The green, blue and red bars show the number of counter examples generated

when modeling ρµ1, ρµ2, ρµ3 as separate GPs, modeling ρϕ as a GP and random

testing respectively for the mountain car example (Sec 6.6.3.2). While our mod-eling paradigm, finds orders of magnitude more counterexample compared to the other two methods, we notice that modeling ρϕ as a single GP performs much

worse than random sampling for the controller trained with PPO Fig 6.7a and comparable for the controller trained with DDPG Fig 6.7b. . . 97 7.1 The avoid set is expanded and the reach set is contracted with the simulation

metric dsim_{. If the abstraction trajectory (}_ξ_{M) stays clear of the expanded avoid}

set and reaches the contracted reach set, the system trajectory (ξS) also stays clear of the original avoid set and reaches the original reach set. . . 104

(14)

blue (yellow). The learner LSIM synthesizes controllers for S using the model

M. To do so it modifies the specification for the model ϕ(e;dSP EC). The oracle

OSIM first verifies if the synthesized controller is safe for the system. If yes, it

terminates and outputs the synthesized controller. If not, it computes a high confidence estimate ˆd of SPECdSP EC. To compute ˆd, we use Scenario

Opti-mization which is formalized as a OGIS framework ISP EC = (LSP EC,OSP EC)

where OSP EC is a black-box physics simulator and LSP EC implements scenario

optimization. . . 116 7.3 Different reachable sets when the quadrotor abstraction is conservative. The

dis-tance metricdSP EC _{only considers the distance between trajectories that violates}

the specification on the system and satisfies it on the abstraction, leading to a less conservative estimate of the distance, and a better approximation of ES. . . 123 7.4 Different reachable sets when the quadrotor abstraction is overly optimistic. The

distance metricdSP EC achieves a far less conservative under-approximation ofES compared to the other distance metrics. . . 124 7.5 Different reachable sets when the quadrotor abstraction is overly optimistic.

Here we compute the exact reach sets corresponding to EM,ES,Eϕ(dSP EC) and

Eϕ(dsim).The distance metricdSP EC achieves a far less conservative under-approximation

of ES compared todsim. . . 125 7.6 Hybrid controller for lane keeping. lanemeans a lane is detected by the perception

system. The dashed line represents the transitions taken on initialization based on the value oflane. To closely follow the center of the lane, we synthesize a LQR controller in each mode. . . 126 7.7 The lane detection fails for (a) and (b) and S car tries to slow down. When lane

is correctly detected (c), the LQR controller tries to follow the lane . . . 126 7.8 The green lines represent the boundaries of the original reach set. The yellow

region is the contracted reach set for the model computed using ˆd. The model’s

trajectory shown in blue is entirely contained within the yellow region. Conse-quently, the system’s trajectory (shown in dotted red) leaves the yellow region but is contained within the original reach set at all times. . . 127 7.9 An example of the environment scenario that contributes to the distance between

the model and the system. The environment samples used for computing SPEC can be used to identify the reasons behind the violation of the safety specification by the system. . . 127 8.1 VerifAI tool overview. . . 132 8.2 The red car and green car are the AV car and broken car respectively. The

distance d captures the distance between the AV car and the cones. The AV car has to safely maneuver around the broken or disabled car. . . 135 8.3 Hybrid controller for Av to safely maneuver around the broken car. . . 135

(15)

8.4 Scenes generated by Scenic. The orange oval marks the placement of the AV car, broken car and cones. . . 136 8.5 A falsifying scene automatically discovered by VerifAI. The neural network

misclassifies the traffic cones because of the orange vehicle in the background, leading to a crash. Left: bird’s-eye view. Right: dash-cam view, as processed by the neural network. . . 137 8.6 The NN incorrectly detects the orange car as an orange cone. Hence the distance

d is incorrectly estimated to be 14.5m even when the distance is 30m. . . 137 8.7 Accident scenario breakup: 1) initial scene sampled from the program; 2) the red

car begins its turn, unable to see the green car; 3) the resulting collision. . . 138 8.8 This image generated by our renderer was misclassified by the NN. The network

reported detecting only one car when there were two. . . 139 8.9 The green dots represent model parameters for which the cart-pole controller

behaved correctly, while the red dots indicate specification violations. Out of 1000 randomly-sampled model parameters, the controller failed to satisfy the specification 38 times. . . 140

(16)

List of Tables

3.1 Summarizes the algorithms studied in this paper. “Optimal” means maximal with respect to a reward function capturing satisfaction of the high-level specification. “Memory” refers to the number of counterexamples the candidate oracle takes into account before proposing a candidate strategy. “Terminates” denotes whether the

algorithm terminates. . . 28

3.2 Experiment 1 run times in seconds. . . 29

4.1 Slack values over a single horizon, for ∆t = 0.2 and H = 10. . . 47

4.2 Slack variables used in Example 6 and 9. . . 50

5.1 Example of error table proving information about counterexamples. First rows describes Fig. 5.6. Implicit unordered features: car model, environment; ex-plicit ordered features: brightness, x, z car coordinates; explicit unordered fea-ture: background ID. . . 73

5.2 Comparison of augmentation techniques. Precisions (top) and recalls (bottom) are reported. _TT set generated with sampling methodT;fXT model f trained on X augmented with technique T ∈ {S, R, H, C, D, M}; S: standard, R: uniform random, H: low-discrepancy Halton,C: cross-entropy, D: uniform random with distance constraint, M: mix of all methods. . . 76

5.3 Random vs Distance augmentation. . . 78

5.4 Augmentation loop. For the best (highlighted) model, a test set _C[i] T and aug-mented training set _X[ri+1] are generated. r is the ratio of counterexamples to the original training set. . . 80

(17)

Acknowledgments

The last six and a half years would not have been possible without the support of a large number of people in my life. I owe each and everyone of them immense gratitude and love for being with me every step of the way.

First and foremost, I would like to thank my advisors Sanjit A. Seshia and Alberto Sangiovanni-Vincentelli. I cannot complete this list without Claire J. Tomlin, who has for all purposes been an unofficial advisor. I have been very fortunate to have had three pioneers as my advisors who have each helped shape a different side of this thesis. Alberto, who trusted me enough to take me in as a student when Robert Brayton (Bob) first suggested I work with him. He has given me an immense amount of independence and shown me a lot of trust when it came to my research. It is because of him that I always ask myself before doing anything, what is the bigger picture and how will anything I do achieve it. Sanjit, who has had such a strong influence in my PhD since my first year. Since my first semester, he has provided me unique insights and advise in all my projects. He was always willing to sit down and work out the rough edges in my research. He also pushed me to step outside my comfort zone and try different things while always helping me come back when I strayed too much. Claire has been an inspiration for me. Although I started working with her in my third year, I felt that was three years too late. With her I could discuss even the silliest of ideas I had. She always encouraged me to try everything before making a decision. It was comforting and extremely important for me to know that I had that unconditional support from her. I would also like to express my gratitude to Francesco Borrelli for being part of my qualifying exam and thesis committee. Without your guidance and push to find an application for my research, I would have never found my love for self-driving cars. I would also like to thank Anca Dragan for always being available to discuss new ideas and ventures for my work in the domain of human-robot interaction. Although we never got around to collaborating during my time at Berkeley, I hope to get a chance to work with her in the future.

I would also like to thank Bob for accepting me as a student and helping me transition to Alberto’s group when I decided I wanted to work in a different area. Without you, I would not be here. I still remember the day Jaijeet Roychowdhury interviewed me in Bangalore for a Berkeley PhD, I am deeply indebted to you.

I would like to express a special thanks to George Pappas and Rajeev Alur, who have always kept in touch with my research and provided me very useful advise whenever they can. My one regret is that I have never had a chance to formally work with them.

A huge thank you to Shirley Salanio for being available at all times of the day to answer all my administrative questions and just chatting with me about life in general. Thanks for your patience and always going that extra step to sort out all my administrative messes. A special thanks to Jessica Gamble and Mary Stewart who have made day to day life in Berkeley seamless.

Throughout my PhD I have been very fortunate to intern with leading pioneers in indus-try. For my time at Microsoft Research (MSR), I owe my gratitude to Ashish Kapoor And Shaz Qadeer. Ashish been a constant source of motivation and advise throughout the last

(18)

Ranade when during her time at MSR was a mentor both professionally and personally. For my time is SRI, I owe my gratitude to Susmit Jha and Nataranjan Shankar. Susmit was really willing to get down to nitty-gritty details and had immense amount of patience to sit through the long math sessions with me. Shankar really bought perspective to my work. He’d always find a way to relate my work to other fields, which really pushed me to explore. I have also been so fortunate to collaborate with great many people during my time at Berkeley. This thesis would not have been possible without Tommaso Dreossi, who has been my longest collaborator and am fortunate enough to call a close friend; Somil Bansal, a friend who grew to be an amazing collaborator; Marcell Vazquez-Chanlatte, who taught me how to be practical in my research; Dorsa Sadigh, who pushed me when I needed to and taught me the perfect work life balance. I would also like to thank all my other co-authors and collaborators, specially people who helped with the work of this dissertation: Vasumathi Raman, Alexandre Donze, Pierluigi Nuzzo, Shankar Sastry, Xiangyu Yue, Felix Berkenkamp, Daniel Fremont, Kurt Keutzer, Hadi Ravanbakhsh and Edward Kim.

I owe a huge thanks to Antonio Iannopollo, who I met on my first day at Berkeley and has been a constant in my PhD since that day, till the day I left Berkeley. I thank you for your friendship and your support in every step of the way. To listening to me vent, to offering useful advise and now helping me with administrative stuff, I cannot express how fortunate I am to have you in my life. My life in the DOP center would not be complete without Inigo Incer and Baihong Jin, two great friends I have shared numerous conversations with and who are willing to run around to help me even today. Finally, to Marten Lohstroh, Fabio Cremona, Eric Kim, Sylvia Herbert, Anayo Akametalu and Jaime Fernandez who have been very supportive and great friends over the last 6 years. Thank you to Branden Ghena and Joshua Adkins for being amazing co-GSIs in EECS 149: Introduction to Embedded Systems. Everyone says having a good support system outside of work is very important, and I have had one of the best. Words are not enough to express my gratitude for Anurag Khandelwal, without whom my last six years would have been impossible to get by, who has supported me through every decision and every difficult time. I am thankful to have had you in my life. A huge thanks to Aishwarya Parasuram, Moitrayee Bhattacharyya, Tathagata Das, Nitesh Mor for being my constants the last 6 years, and being the ones I turn to at every step. To Lauren Iannopollo and Rusi-Ko Mchedlishvili for the amazing get togethers and for accepting me as one of your own. To James Devine and Alyssa Morrow for making my time at MSR memorable and amazing. To Sreeta Gorripaty, Nikunj Bajaj, Radhika Mittal, Saurabh Gupta, Shubham Tulsiani, Bharat Hariharan, Neeraja Yadwadkar and Vivek Chawda for your great friendships at Berkeley. To Madhumitha Sridhara, for believing in me and pushing me to apply for Berkeley, to motivating me to stay when I wanted to quit and live streaming my graduation just to watch me graduate; I would not be here without you. To Aditi Vutukuri, Priyanka Ganapathi, Sneha Abhyankar and Kritika Mehta; who will always go that extra mile, however far just to make sure I are okay, thank you for supporting me even at the worse of times.

(19)

Finally, I owe everything I am today to my constants throughout my life, my ma Ruby Ghosh and my baba Sankar Prasad Ghosh. For sacrificing so much for me and never letting me feel I missed out on anything. For taking care of all my whims and motivating me when I’m down. For being my number one fan and celebrating the happy moments and grieving the sad ones with me, for always being there. It is now my turn to take care of you.

I dedicate this thesis to each and everyone of you and to all the other people I have had the good fortune to meet along the way. To the ones who have brought sunshine to my life, you make me happy when the skies are gray. You have all impacted my life for the better.

(20)

Chapter 1 Introduction

1.1 Motivation

Today, we are in an era where autonomous systems are becoming a reality. Hence, much of the recent research in robotics and control theory has focused on developing complex autonomous systems such as robotic manipulators, autonomous vehicles and surgical robots. Since such systems are expected to interact and share the world with humans, we consider them to be safety-critical. Before such systems can be deployed into the real world it is important to design and analyze systems to satisfy safety objectives or guarantees.

In general, the design pipeline for Cyber-Physical Systems (CPS) has two highly in-teracting components (i) synthesis or design of the overall system; and (ii) verification or analysis of the system. Correct-by-construction (Figure 1.1) design has emerged as a new design paradigm for synthesis. In this framework, the high level system safety requirement is directly incorporated into the synthesis process, which guarantees that the resulting system satisfies the safety requirements. To be able to achieve this, the system designer must first be able to express (or design) the safety specification (requirement); and then develop syn-thesis algorithms which are capable of incorporating the specification in the design. To be able to achieve this, we often make many simplifying assumptions; such as simplified system models and environments. To counteract this, we follow synthesis with verification. The goal of verification is to mathematically prove that the designed system indeed satisfies the high level safety specification. The complexity of verification process is highly dependent on the synthesized module; which limits its usability to simple systems. Falsification/testing has risen as a more general and scale-able analysis technique, where the one searches for system behaviors which falsify the safety specification. Independent of the actual algorithms or frameworks used for the design and analysis, a common consensus in the design of CPS, is the tight coupling between synthesis and verification. It is important to go through multiple rounds of the each before we can consider the design a success.

Recent successes in machine learning (ML) and artificial intelligence (AI) have moti-vated an increased use of such techniques in the design of complex perception modules and

(21)

Figure 1.1. Correct-by-construction design

.

Figure 1.2. System Analysis and Verification

.

controllers for to achieve complex tasks for CPS systems.

In real world robotic systems, ML based techniques have shown to far outperform classical computer vision techniques for designing high fidelity perception modules. Models produced by machine learning algorithms, especially deep neural networks (NN), are being deployed in domains where trustworthiness is a big concern, creating the need for higher accuracy and assurance [130, 135]. However, learning high-accuracy models using deep learning is limited by the need for large amounts of data, and, even further, by the need of labor-intensive labeling. Hence, designing formal frameworks that can analyze and automatically generate data that can be used for re-training is of vital importance.

To achieve a rich set of maneuvers in complex robotic systems, reinforcement learn-ing (RL) [149], optimal control (OC) [146] and model predictive control (MPC) [112] tech-niques have been developed. RL based techtech-niques have shown to achieve a range of com-plex maneuvers like flying a quadrotor [83] to comcom-plex tasks on robots [96]. While these controllers can handle some degree of uncertainty [115, 125], and have been successful in synthesizing high fidelity controllers; they fail to provide any formal guarantee of safety and merely measure performance in expectation. On the other hand, model based control syn-thesis techniques like reachability [38, 109] and MPC [112, 127, 128] provide strong safety guarantees but often make simplifying assumptions on the system. Hence, there is a need to develop formal frameworks to provide similar safety guarantees for RL based controllers. Moreover, even while using a model for synthesis, it is not clear what safety specifications

(22)

lenges while designing synthesis and verification frameworks which now need to consider the added complexity introduced by learning. Today’s systems are composed of complex ML modules which interact with the controller and the physical system. Monolithic synthesis algorithms that consider the synthesis of the entire system cannot simultaneously synthe-size all the components. Even if one could define the correctness of the overall system, to ensure correctness of each sub-component, we must be able to define the correctness of each component. Mathematically capturing/defining the correctness or ML components is hard, for e.g., how can one define the correctness of vision system. In spite of the recent successes in designing high fidelity vision systems in the ML community, there correctness is not guaranteed. As a result, there is a large body of work which focuses on the (robustness) analysis of such systems [46]. During controller design, one must now synthesize controllers which have to be robust to the errors of the ML modules. In [135], the authors have done a comprehensive analysis of challenges introduced by ML components in formal verification. To start, one needs to capture the formal correctness of such systems. Then one must be able to mathematically define the the environment or the domain of inputs for them. Finally, one must design computational engines which are are to reason about the correctness of such system, this generally involves a search over the entire input space.

This has motivated us to study and develop frameworks that can be used to design and analyze high fidelity robotic systems composed of complex controllers and ML components interacting with the physical system.

1.2 Thesis Approach

In this thesis, we develop formal design and analysis frameworks that can be used in various parts of the design and analysis pipeline to improve the safety and hence, design high fidelity robotic systems with formal safety guarantees. In particular, we recognize five important parts of the pipeline (detailed in Section 1.3) and propose frameworks that can used to design them taking inspiration from control theory and formal verification. The proposed frame-works are inspired by the oracle-guided inductive synthesis (OGIS) framework introduced in [80]. An instance of OGIS I consists of a learner L and an oracle O (Figure 1.3). The learner attempts to learn or synthesize a concept (e.g. control policy, deep learning network, verification proof) by querying an oracle (e.g. verifier, optimization engine, simulation en-gine). The framework does not know the correct concept a-priori and tries to learn it by minimizing the number of queries made to the oracle. Hence, it makes it particularly attrac-tive for design and analysis of robotic systems by modeling or learning the smallest concept necessary without knowledge of the overall system or its sub-components. Moreover, the OGIS framework helps us provide strong safety guarantees based on the learner and oracle we choose. In some chapters we use a specific instance of the OGIS framework, Counter-Example Guided Inductive Synthesis (CEGIS). In each chapter, we show how an instance of

(23)

Figure 1.3. OGIS frameworkI= (L,O)

.

the OGIS framework helps us decouple a complex design problem to simpler building blocks which can be reduced to designing a simple learner (or oracle) and interaction between them, for e.g., for robust controller synthesis (Figure 3.2).

1.3 Thesis Contribution

In this thesis, we take five parts of the design and analysis of CPS systems pipeline and reformulate each as an instance of a OGIS framework and detail the learner and oracle design and the interaction between them.

For correct-by-construction design, we consider the drawbacks of the current framework (Figure 1.1 and propose an unified framework that introduces feedback between robust con-troller synthesis (Chapter 3) and specification design (Chapter 4). As outlined in Figure 1.1, correct-by-construction synthesis requires designing the specification and the synthesis al-gorithm. In general it is hard for a designer to design a specification for a system which is synthesizable without understanding the synthesis algorithm. Hence, there is a tight cou-pling with the specification and synthesis process. While Figure 1.1 shows a one way flow of information from specification design to synthesis, we propose that the specification design and the synthesis process are dependent on each other. One can refine (or repair a specifica-tion) based on the results of synthesis. In return, one has to utilize the updated specification and environment models for consecutive rounds of synthesis. In this thesis, we propose there needs to be a strong feedback between these two parts of the pipeline. In Chapter 3, we study the synthesis of robust control strategies from high level specifications [155]. Today the requirements for robotic systems rely not only on safety requirements but also aim to fulfill performance (liveness) requirements. Moreover, to account for modeling error or interaction with other agents, our synthesis process must be robust to environment disturbances. Rather than designing controllers and verifying after the fact that they satisfy high level require-ments, there has been a paradigm shift towards correct-by-construction controller synthesis. This has been commonly observed in optimal control [109] for safety requirements. Recently, synthesis tools like TuLiP [159] and LtLMoP [73] have been developed to synthesize con-trollers from a high level temporal logic specification in non-adversarial settings. In this thesis, we look into the problem of synthesizing robust controllers in adversarial settings

(24)

infeasible to implement in practice. In this chapter, we reformulate the game in the OGIS framework, where the interaction between the two players is captured as an interaction be-tween the learner and the oracle. We provide approximate but sound implementations for the oracle and learner, and use that to provide correctness guarantees for the overall syn-thesis procedure. In Chapter 4, we look at the problem of design (through diagnosis and repair) of specifications for controller synthesis [64]. To synthesize or verify controllers, one needs to first mathematically capture the requirements of the system. However, specifica-tion design is a hard problem even for the more experienced of designers. Recently, several controller synthesis methods have been proposed for expressive temporal logics and a vari-ety of system dynamics. However, a major challenge to the adoption of these methods in practice is the difficulty of writing the requisite formal specifications before hand. Specifica-tions that are poorly stated, incomplete, or inconsistent can produce synthesis problems that are unrealizable (no controller exists for the provided specification), intractable (synthesis is computationally too hard), or lead to solutions that fail to capture the designer’s intent. In this chapter, we reformulate the diagnosis and repair of specifications into an interaction between the learner who repairs the specification and the oracle who diagnoses the specifi-cation. We prove that the proposed frameworks and algorithms minimally modify or repair specifications for infeasible controller synthesis problems.For the special case of specifications involving environment assumptions, we show that the oracle relies on synthesis engine which can compute robust controllers. As we showed in Chapter 3, we can build such a synthesis engine by instantiating an OGIS instance which solves a game. As a result, we develop a hierarchical OGIS framework to diagnose and repair environment assumptions by using the frameworks presented in Chapter 3. This further exemplifies the tight co-relation between the specification design and the synthesis algorithms.

While designing high fidelity autonomous systems, the overall safety of the system is highly dependent on interaction between the designed controller and other components in the system. In most real world autonomous systems, these components are ML or AI based perception or decision modules. Such modules are used in conjunction to controllers to percieve or sense the environment around, e.g., perception modules, and rely on rich models like neural network. Hence, one needs to ensure that we can build these modules to correct to guarantee the correctness or safety of the overall system. To train deep neural networks to provide high accuracy results, we need rich training sets with large amounts of data and labor-intensive labeling. If the dataset is not representative of the environments in which the model is expected to operate, then the trained model would perform very poorly. A key issue while choosing the training data set, is being able to decide of the diversity of the input. One cannot a-priori detect what inputs are not being represented in the training set. So one has to analyze (or test) the model to detect where the network is failing. In this thesis, we show that synthesizing rich data sets requires a model analysis (or verification) step in the loop and propose a framework that uses a tight coupling between model synthesis and falsification to design data sets for training. Data augmentation overcomes the lack of data

(25)

by inflating training sets with label-preserving transformations, i.e., transformations which do not alter the label. Traditional data augmentation schemes [51, 138, 29, 28, 91] involve geometric transformations which alter the geometry of the image (e.g., rotation, scaling, cropping or flipping); and photometric transformations which vary color channels. However, these schemes add data without taking into account what kind of features the model has already learned. To overcome this, in Chapter 5, we propose a counter-example guided data augmentation [46] that analyzes the network to find images where it performs poorly and augments the training set. This ensures that we now populate the training set with images which the network could not capture originally and provide high-level explanations of network failure. In this chapter, we formulate the overall training and data set design into an interaction between a learner which is responsible for training the ML module and an oracle which analyzes the the trained model to find counterexamples which are then added to the training data set. Hence, the learner here is a synthesizer and the oracle is falsifier. We show that by going over this loop multiple times, we are able to synthesize high accuracy image detection models using deep learning.

When one does not consider uncertainty in the controller synthesis process or relies on RL for synthesizing controllers, we generally have to perform a verification and analysis step after the synthesis. Even for model based control synthesis techniques, since we make sim-plifying assumptions during synthesis, we might have to verify that the synthesized control is robust to errors in the actual system. However, formal verification requires that the sys-tem, environment and specification be formally defined. For complex dynamical systems and ML based controllers, this is often hard to do. To this end, simulation guided falsification has been suggested to find failure cases of such controllers [50, 8, 39, 163, 37]. However, these techniques are far from providing verification guarantees. In Chapter 6, we look into simulation-based falsification for systems with learning based components (like controllers) which have some smoothness properties. In this chapter, we formulate simulation-guided falsification as in interaction between a learner which attempts to learn the cause of the fail-ure using Gaussian Processes (GP) and an oracle which uses Bayesian Optimization (BO) to search for likely counterexamples to simulate on the system. While this scheme has shown to find counterexamples more quickly, we also study the assumptions and conditions of the system under which, we can provide probabilistic verification guarantees even when the underlying system is unknown.

In Chapters 3, 4, 5 and 6, we study design and analysis in a purely model or simulation world. Most of the work in verification and synthesis rely on simpler models or simulations of the real system. Even if we are able to formally synthesize or verify that these systems satisfy our safety requirements, we arefar from proving their correctness on the actual physical system. This mismatch between the model world and the real world occurs because we consider simplified models. Even if we have access to a high fidelity simulator of the actual system, we may still miss out certain environmental effects for e.g., wear and tear, sensor and actuator noise. Hence it is important to develop a framework which decides under what circumstances the analysis of the model holds for the real system. In Chapter 7, we study a specific problem of how we transfer model level verification guarantees to the actual system

(26)

Figure 1.4. Thesis contribution in overall CPS design.

.

for reach-avoid control problems. To this end, we define a specification-centric simulation metric SPEC by taking inspiration from the simulation metric [5, 66, 9] that captures the mismatch between the model and the system. Unlike the simulation metric, SPEC considers the underlying safety specification, and hence, less conservative. Like the simulation metric, SPEC retains the necessary properties which can be used to synthesize robust controllers using the model which is guaranteed to be safe for the system. In this chapter, we formulate the synthesis of safe controllers for actual system as in interaction between a learner which synthesizes (proposes) controllers for the model using SPEC and an oracle which attempts to verify the controller on the actual system and computes SPEC. We further propose a sampling based technique to compute SPEC which can be formulated as an instance of OGIS. Hence, the overall framework becomes an instance of hierarchical OGIS.

The stages mentioned above are important parts of an autonomous system design pipeline. While each may tackle a different facet of the design process, to ensure the safety and cor-rectness of the overall design, we need to be able to provide strong safety guarantees at each stage. Moreover, we need would like to expose the vulnerability of each process to the other, so it can be considered which designing the next step. This leads to a tight coupling (and feedback) among the design and analysis steps. For example, if we could analyze the ML modules to realize when the models fail; they can be used in the control synthesis process for designing controllers that are robust to their failure. Figure 1.4 shows how our contributions are placed in the overall design and analysis framework. The blue boxes represent design-centric frameworks, while the yellow boxes represent analysis-design-centric frameworks. There is a tight couple between the stage, suggesting design is tightly coupled with analysis and vice versa.

(27)

design for the analysis of AI based systems. The toolkit incorporates many of the algorithms and frameworks described in this thesis. The toolkit takes as input an overall system as a simulator, a description of environment configurations and system level specifications. By relying on simulators for system descriptions,VerifAIcan be used with very general systems which may not have well defined models. By considering system level specifications, we overcome the need to define component level mathematical specifications for ML components which are often hard. Moreover, component level requirements do not capture the interaction among components, e.g controllers and perception module. By considering system level requirements, the ML components are analyzed the context in which they are used in the system, which gives us more realistic analysis data. We have incorporated the analysis techniques from Chapters 5 and 6. Finally, we have shown results using a range of robotic simulators like OpenAI [17] and Webots [140]. The toolkit is the first to analyze CPS with ML components with system level specifications.

1.4 Thesis Outline

This thesis includes and revises content from several of my previously published papers. I gratefully acknowledge and thank my advisors, Alberto Sangiovanni-Vincentelli and Sanjit A. Seshia, who have played an important role in shaping the contributions in all these papers. Chapter 3 is based on our paper [155] which is joint work Marcell Vazquez-Chanlatte I would also like to thank Vasumathi Raman for help in developing the theory CEGIS loop for robust control synthesis. Chapter 4 revises the material from [64]. which is joint work with Dorsa Sadigh. I thank Pierluigi Nuzzo and Vasumathi Raman for help with the proofs in the chapter. Chapter 5 revises our paper [48, 47] which is joint work with Tommaso Dreossi and Xiangyu Yue. In Chapter 6 we revise the material from our paper [65]. I thank Ashish Kapoor, Shaz Qadeer and Gireeja Ranade for their guidance and advise. I thank Felix Berkenkamp for explaining the theory of Gaussian Process and Bayesian Optimization and developing the proofs with me. In Chapter 7 we extend the work from out paper [63], which is joint work with Somil Bansal. I thank Claire Tomlin for her advise and unique insights into the problem. In Chapter 8 we present a new toolkit VerifAI from our paper [49] which is joint work with Tommaso Dreossi and Daniel J. Fremont. I would like to thank Hadi Ravanbaksh, Edward Kim and Marcell Vazquez-Chanlatte for their feedback and help in developing and integrating additional functionalities into the toolkit.

(28)

Chapter 2 Mathematical Preliminaries

In this chapter we summarize the key mathematical concepts used in this thesis.

2.1 Oracle-Guided Inductive Synthesis

Oracle-Guided Inductive Synthesis (OGIS) was introduced by Jha and Seshia ([80]) as a framework that captures a family of synthesizers that operate by iteratively querying an oracle. An instance of the OGIS framework I = (L,O) is defined by the tuple consisting of a learner (or synthesizer)L and an oracleO. The learner L attempts to infer or synthesize a ‘’concept” or an ‘’artifact” from a domain of possible artifacts which satisfies a high level specification ϕ by iteratively querying the oracle O on examples selected from a domain of examples E or candidate concepts. This domain is problem dependent and will explained more in detail in the individual chapters. In this chapter we summarize the key notations and definitions required to setup an instance of OGIS I = (L,O). For more details refer to [80].

Definition 1 (Artifact Class). An artifact (concept) class _C is the domain of artifacts from which the learner L searches for (synthesizes) an artifact using the queries exchanged with the oracle O.

The concept class may either be specified in the original synthesis problem or arise as a result of a structure hypothesis that restricts the space of candidate concepts. Formally, we can imagine each concept to be a set of examples. Hence,_C⊆2E_{. Depending on the specific} instance, the domain of examples E and the concept class _Ccan be finite or infinite.

We define the specification or the requirement by ϕ. The format of ϕ depends on the synthesis problem. In this thesis we focus on specifications defined over finite horizon system trajectories.

OGIS comprises of two key components: an inductive learning engine (also sometimes referred to as a ‘’Learner”) and an oracle (also referred to as a ‘’Teacher”). The interaction between the learner and the oracle is in the form of a dialogue comprising queries and

(29)

responses. The oracle is defined by the types of queries that it can answer, and the properties of its responses. Synthesis is thus an iterative process: at each step, the learner formulates and sends a query to the oracle, and the oracle sends its response. The oracle may be tasked with determining whether the learner has found a correct target concept. In this case, the oracle implicitly or explicitly maintains the specification ϕ and can report to the learner when it has terminated with a correct concept or artifact.

The oracle O is defined by the type of queries it can accept. Let Qbe domain of queries (input to the O or conversely output of the learner L), and R be corresponding set of responses (output of the O or conversely the input to the learner L). A valid dialogue d of the O is a query-response pair (q, r) such thatq ∈Qandr ∈R.A sequence of valid dialogue sequence D∗ is sequence of valid dialogue pairs.

Definition 2. An oracle is a (potentially non-deterministic) mapping O : D∗×Q→ R. A learner is (potentially non-deterministic) mapping L :D∗ →Q×_C.

The learner observes a valid dialogue sequence δ ∈ D∗ to propose a candidate context

∈_C and a corresponding query q ∈Q to the oracle. The O observes the dialogue sequence δ∈D∗ and the current query q to produce a response r∈R.

An OGIS procedure is defined by properties of the learner and the oracle. Relevant properties of the learner include (i) itsinductive bias that restricts its search to a particular family of concepts and a search strategy over this space, and (ii) resource constraints, such as finite or infinite memory. Relevant properties of the oracle include the types of queries it supports and of the responses it generates. We now discuss some common queries we use throughout this thesis:

1. Membership query (qmem(x)): The learner selects an example x ∈ E and queries the

oracle if the example satisfies the specification or not.

2. Simulation query (qsim(x)): The learner selects an example x ∈ E and queries the

oracle for a simulation behavior.

3. Counter-example query (qce(c)): The learner proposes a candidate concept c∈ C and

asks the oracle for counter-examples, i.e., e ∈ c such that e does not specify ϕ. If it can’t find such an example, it returns ⊥.

4. Verification query (qver(c)): The learner proposes a candidate concept c∈C and asks

the oracle if this concept satisfies ϕ.

This set of queries is not exhaustive and we discuss individual queries in the different chapters. Formally, the learner L has to synthesize or learn the concept c ∈ _C such that all the examples making up c satisfies the specification ϕ by querying the oracle O at different examples x ∈ E or with candidate concepts c ∈ _C. In each iteration, the queries made by the learner depends on the dialogue sequence D∗ observed so far.

(30)

A special case of the OGIS framework is Counter-example guided Inductive Synthesis (CEGIS) introduced in [141]. An instance of CEGIS is defined similar to OGIS,C = (L,O). In CEGIS, the O accepts only a subset of queries, counter-example queries and positive witness queries (where the learner asks the oracle for positive examples). In this thesis, we use CEGIS frameworks with only counter-example queries. Most of the frameworks presented in the thesis are instances of CEGIS.

2.2 Hybrid Dynamical Systems

In this work we consider continuous-time hybrid dynamical system, ˙

x=f(x, u, e) (2.1)

wherex∈ X ⊆(_Rnc_×{₀_,₁_}nl_{) represent the hybrid (continuous and logical) states,}_u_{∈ U ⊆} (_Rmc × {₀_,₁}ml_{) are the hybrid control inputs and} _e ∈ E ⊆ ₍

Rec × {₀_,₁}el_{) are the hybrid} external inputs, including disturbances and other adversarial inputs from the environment. For the purposes of this work, we assume that the state of the system is fully observable. Using a sampling period ∆ > 0, the continuous-time system in (2.1) lends itself to the discrete-time approximation,

xt+1 =fd(xt, ut, et) (2.2)

where xt∈ X, ut∈ U and et ∈ E.

Given an initial state x0 ∈ X, finite horizon H control sequence u = (u0, . . . , uH−1)

and environment (disturbance) sequence e = (e0, . . . , eH−1), the finite horizon trajectory

(or behavior) of the system S modeled by the dynamics in (2.2) is uniquely expressed as ξS(·;x0,u,e) ={(x0, u0, e0), . . . ,(xH−1, uH−1, eH−1}. We denote ξS(t;x0,u,e) the trajectory

of the system S at time t. Moreover, let Ξ denote the set of all finite horizon trajectories of the system.

We make a further simplifying assumption that U = [−1,1]nu _and _E _{= [}₋₁_,_1]ne _where nu and ne are the dimensions of U and E respectively. This is not a limiting assumption,

as one could always scale the dynamics to modify the control and environment domain. Further, since u ∈ U, we have the finite-horizon control sequence u ∈ UH_{. For ease of}

notation, we simply say u ∈ U to imply u0, . . . uH−1 ∈ U. Similarly, we say e ∈ E to imply

e0, . . . , eH−1 ∈ E.

2.3 Safety Specification

We specify the safety specification by ϕ. They are defined on finite-length trajectories ξS(·;x0,u,e) of the system that can be obtained by rolling out the dynamics in (2.2) over

(31)

the system that satisfy the system level-safety specification; ϕ⊆Ξ. For example, ϕ can be temporal behaviors of the system properties in, e.g. Signal Temporal Logic (STL) [104].We say the system behavior satisfies the specification ϕ, i.e., ξS(·;x0,u,e) |= ϕ if and only if

ξS(·;x0,u,e)∈ϕ.

We further assume, we have access to the quantitative semantics of ϕ represented by ρϕ : Ξ→R, such that:

ξS(·;x0,u,e)|=ϕ↔ρϕ(ξS(·;x0,u,e))>0

ξS(·;x0,u,e)2ϕ↔ρϕ(ξS(·;x0,u,e))<0

Since the trajectory ξS(·;x0,u,e) is deterministic given u and e and a fixed x0, we will

use ρϕ(u,e) instead of ρϕ(ξS(·;x0,u,e)) andϕ(u,e) to represent asξS(·;x0,u,e)|=ϕin the

rest of the chapter.

By evaluating ρϕ on the system behavior in a given environmente,ξS(·;x0,u,e), we can

comment on the satisfaction of the system behavior and hence, the safety of the corresponding environment e. Typically, ρϕ(ξ) = 0 is considered to be an unknown behavior and hence,

we cannot comment on the satisfaction of ξ. One would have to then evaluate the boolean satisfaction by checking if the ξ ∈ ϕ. In this work, we take a pessimistic approach and consider ρϕ(ξ) = 0 to imply unsatisfactory behavior. This allows for behaviors that are

atleast >0 robust, which is a valid assumption to make while evaluating the safety of the system.

We assume that ρϕ is Lipschitz continuous in u, i.e., there exists a constant Lρϕ such that for all e∈ E:

∀u,u0 ∈ U |ρϕ(u,e)−ρϕ(u0,e)| ≤Lρϕ|u−u 0_|

(2.3) We further require ρϕ to be allow a total ordering in the Ξ, i.e., if ρϕ(ξ1) > ρϕ(ξ2) then

ξ1 is said to be more safe compared to ξ2, and hence is a more desirable behavior. This

allows for an ordering among the control strategies u. If ξ1 (ξ2) was generated in u1 (u2),

then we can say u2 is a more ”robust” compared to u1. Larger values offer a higher degree

of satisfaction while lower values offer a lower degree of satisfaction.

Example 1. Consider the discrete time system with two states x= [y, z]T_, _f d=

yt+ut+et

zt+ut+et

where ut, et ∈ [−1,1]. Consider the specification “For the next 30 seconds, y > 2 implies

thatz will be less than 4within two seconds”. Consider the following syntactically- generated quantitative semantics over the state, following the quantitative semantics for STL defined in [104]. ρϕ(u,e) = max t∈{0,...,30} yt−2, min t0_∈{_t,...,t₊₃_}(4−zt 0)

Since ut and et are both bounded and ρϕ is smooth, substituting the dynamics equations into

(32)

Temporal logic has risen as a popular language for capturing and defining mathematical properties on trajectories. Linear temporal logic (LTL) was first introduced in [123] to capture behaviors of sequential programs. Since then LTL has been extended with to capture properties over dense time in Metric Temporal Logic (MITL) in [90]. A more popular extension of MTL used to define properties over the behaviors of dynamical systems is Signal Temporal Logic (STL).

2.4.1 Signal Temporal Logic

Signal Temporal Logic (STL) was first introduced as an extension of Metric Temporal Logic (MTL) to reason about the behavior of real-valued dense-time signals [104]. STL has been largely applied to specify and monitor real-time properties of hybrid systems [42]. We use the robust, quantitative interpretation for the satisfaction of a temporal formula [41, 40], as further detailed below. In this setting our safety specification ϕis represented as a STL formula evaluated on the system trajectory ξS at some time t. We say (ξS, t) |= ϕ when ϕ evaluates to true for ξS at time t. We instead write ξS |= ϕ, if ξS satifies ϕ at time 0. The atomic predicates of STL are defined by inequalities of the form µ(ξS(t)) > 0, where µ is some function of the trajectory ξS at time t. Specifically, µ is used to denote both the function of ξS(t) and the predicate. Any STL formula ϕ consists of Boolean and temporal operations on such predicates. The syntax of STL formulae is defined recursively as follows: ϕ::=µ| ¬µ|ϕ∧ψ|G[a,b]ψ|F[a,b]ψ|ϕU[a,b]ψ, (2.4)

whereψ andϕare STL formulae,Gis the globally operator,Fis thefinally operator andU

is the until operator. Intuitively, ξS |=G[a,b]ψ specifies that ψ must hold for the trajectory

ξS at all times of the given interval, t ∈ [a, b]. Similarly ξS |= F[a,b]ψ specifies that ψ must

hold at some time t0 of the given interval. Finally,ξS |=ϕU[a,b]ψ specifies thatϕ must hold

starting from time 0 until a specific time t ∈[a, b] at which ψ becomes true. Formally, the satisfaction of a formulaϕ for a trajectoryξS at timet is defined as:

Similar definitions as the ones in (2.4) and (2.5) can also be provided when the intervals of the temporal operators are open, such as (a, b],[a, b), or (a, b), or unbounded, such as [a,+∞). The bound of an STL formula is defined as the maximum over the sums of all

(33)

nested upper bounds on the temporal operators of the STL formula. For instance, given ψ =G[0,20]F[1,6]ϕ1∧F[2,25]ϕ2, the bound can be calculated as max(6 + 20,25) = 26. An STL

formulaϕ isbounded-time if it contains no unbounded operators.

Robust Satisfaction A quantitative orrobust semantics is defined for an STL formula ϕ by associating it with a real-valued functionρϕof the trajectoryξSand timet, which provides a “measure” of the margin by which ϕis satisfied. Specifically, we require (ξS, t)|=ϕif and only if ρϕ(ξS, t) >0. The magnitude of ρϕ(ξS, t) can then be interpreted as an estimate of the “distance” of ξS from the set of signals satisfying or violating ϕ.

We define the quantitative semantics as follows: ρµ(ξS, t) = µ(ξS(t)) ρ¬µ(ξS, t) = −µ(ξS(t))

ρϕ∧ψ(ξS, t) = min(ρϕ(ξS, t), ρψ(ξS, t)) ρG[a,b]ϕ(ξS, t) = mint0∈[t+a,t+b]ρϕ(ξS, t

0₎ ρF[a,b]ϕ(ξS, t) = maxt0∈[t+a,t+b]ρϕ(ξS, t

0₎ ρϕU[a,b]ψ(ξS, t) = maxt0∈[t+a,t+b](min(ρψ(ξS, t

0₎_, mint00_∈_[_t,t0_]ρ_ϕ(ξ_S, t00)).

(2.6)

Using the above definitions, the robustness value can be computed recursively for any STL formula. For brevity, in this chapter we use ρϕ(ξS) =ρϕ(ξS,0).

(34)

Chapter 3 Synthesizing Robust Control

Strategies

3.1 Introduction

A key step in the design of Cyber-Physical Systems (CPS) or robotic systems, is the syn-thesis of robust control strategies from high level-safety specifications. One would like to synthesize controllers for safety-critical systems which can control such systems to satisfy high-level safety requirements. In this chapter, we address the problem of synthesizing robust controllers for linear systems from high-level temporal specifications. Specifically, we pro-pose Correct-by-Construction controller synthesis algorithms based on the Counter-Example Guided Inductive Synthesis (CEGIS) paradigm.

Correct-by-construction controller synthesis from high-level formal specifications offers a promising means of raising the level of abstraction for implementation. In particular,

reactive synthesis from temporal logic generates programs or controllers which maintain an ongoing interaction with their (possibly adversarial) environments. Reactive synthesis from linear temporal logic using automata-theoretic methods has been demonstrated for synthesizing high-level controllers for robotics. However, for embedded or robotic systems, reactive synthesis becomes much more challenging for several reasons. First, the specification languages go from discrete-time, propositional temporal logics to metric-time temporal logics over both continuous and discrete signals, so the previous automata-theoretic methods do not easily extend. Second, even for simple classes of dynamical systems and metric-time temporal logics, verification is itself undecidable, let alone synthesis. Third, the state of the art for solving games over infinite state spaces, as required for metric or quantitative temporal objectives, is far less developed than that for finite games.

To address these challenges, researchers have resorted to various simplifications. One sim-plification is to consider the control problem over a finite horizon instead of infinite horizons. This ensures verification is decidable for many interesting and practical systems and spec-ification languages, like metric temporal logic (MTL) [90] and signal temporal logic (STL)

(35)

[104]. Reachability [109] based techniques offer a suite of well-developed and mature tools for handling finite horizon games, whose constraints amount to simple safety invariants such as obstacle avoidance. However, direct extensions to temporal constraints are not straightfor-ward. Existing approaches model the environment as a bounded, non-deterministic distur-bance [38], which is often unrealistic, leading to infeasibility in applications like autonomous driving for even very mundane cases. Instead, one may have a more complicated model of the environment that leverages either data-driven techniques or known aspects of the behavior of the other agents, such as their formal specifications.

In this chapter we build a synthesis engine which solves the reactive-synthesis problem by solving a series of finite-horizon robust control problems. This technique known as Receding Horizon Control (RHC) was solved using a Counter-Example Guided Inductive Synthesis

(CEGIS) [141] framework in [128]. However, the practical implementation of the CEGIS framework that was provided was unsound. We first provide empirical results which show that naively implementing the CEGIS framework lead to scaling issues. To overcome the scalability issue and ensure soundness of our algorithm, we provide 2 variants of CEGIS which, (i) hybrid limits problem growth across iterations; and (ii) dominant CEGIS-builds dominant strategies using Satisfiability Modulo Theory (SMT) [13] as opposed to optimally robust strategies. Finally, we provide a theoretical analysis of our algorithms along with a worst-case convergence characterization. The results in this chapter are adapted from [155].

3.2 Preliminaries

We first introduce the flavor of reactive synthesis we consider in this chapter, Receding Horizon Control, and describe how the CEGIS framework can be instantiated to solve it effectively.

3.2.1 Receding Horizon Control and Counter-Example Guided

Inductive Synthesis

A promising and scalable approach to synthesizing reactive strategies is to formulate reactive synthesis over a finite, receding horizon as a series of zero-sum games between the system and the environment, where the environment assumptions and system objectives are systemati-cally encoded into rewards [128]. This form of controller, known as Receding Horizon Control (RHC), offers a (limited) form of reactivity. Each game is solved using a Counter-Example Guided Inductive Synthesis (CEGIS) [141, 6] scheme to search for a dominant strategy.

Model Predictive Control (MPC) or Receding Horizon Control (RHC), is a well studied hybrid system control method [58, 112]. In RHC, at each time step, the state of the system is observed and a finite horizon optimization problem is solved, given a set of constraints and cost functionJ. Given, the system dynamicsf, we locally linearizef at each MPC step to solve the optimization problem, to generate a sequence optimal control u∗. For example,

Oracle-Guided Design and Analysis of Learning-Based Cyber-Physical Systems

UC Berkeley Electronic Theses and Dissertations

Contents

List of Figures

List of Tables

Acknowledgments

Chapter 1

Introduction

1.1

Motivation

1.2

Thesis Approach

1.3

Thesis Contribution

1.4

Thesis Outline

Chapter 2

Mathematical Preliminaries

2.1

Oracle-Guided Inductive Synthesis

2.2

Hybrid Dynamical Systems

2.3

Safety Specification

2.4.1

Signal Temporal Logic

Chapter 3

Synthesizing Robust Control

Strategies

3.1

Introduction

3.2

Preliminaries

3.2.1

Receding Horizon Control and Counter-Example Guided

Inductive Synthesis