Problem Statement - Reinforcement learning for the control of traffic flow on highways

The problem considered in this dissertation is to investigate to what extent suitable reinforcement learning algorithms are able to identify high-quality traffic control policies for a portion of highway, taking into account known and novel control measures for various scenarios of traffic flow. As a concept demonstrator testbed, the reinforcement learning algorithms are to be implemented in a detailed agent-based microscopic simulation model of a traffic environment under investigation.

1.3 Dissertation Objectives

The following twelve objectives are pursued in this dissertation: I To conduct a thorough survey of the literature related to:

(f) simulation principles and guidelines, with a specific focus on microscopic traffic simulation modelling.

II To create a suitable microscopic agent-based simulation model for use as a benchmark for evaluating the effectiveness of highway traffic control measures within the context of a simple highway network. This model should be able to facilitate the implementation of the highway traffic control measures researched in pursuit of Objectives I(c), I(d) and I(e) and should be informed by the research conducted in pursuit of Objective I(f).

III To identify a number of reinforcement learning algorithms capable of successfully altering traffic control policies by changing the variables of various highway traffic control measures. IV To implement the reinforcement learning algorithms of Objective III in the context of the simulation model of Objective II with a view to identify high-quality highway traffic control policies, taking into account the subsequent improvements in the traffic flow along the highway made possible by changing the control policies associated with various existing highway traffic control measures.

V To develop and implement a novel highway traffic control measure in the simulation model of Objective II, based on the assumption that instructions may be given by reinforcement learning agents to varying percentages of autonomous vehicles with a view to improving the traffic flow along a stretch of highway.

VI To verify and validate the model and algorithmic implementations of Objectives II–V according to generally accepted modelling guidelines.

VII To compare statistically the relative effectiveness of various reinforcement learning algorithms with that of existing highway traffic control strategies in the context of the benchmark model of Objective II, taking variations in traffic demand along the stretch of highway into account.

VIII To compare statistically the relative effectiveness of the novel highway traffic control measure of Objective V with that of the best-performing existing highway traffic control measures identified in Objective VII, in the context of the benchmark model of Objective II, taking variations into account in the traffic demand along a stretch of highway.

IX To apply the concept demonstrator implementations of Objective IV and V to a special case study involving realistic traffic data for a specified stretch of a real highway.

X To evaluate the effectiveness of the associated reinforcement learning algorithms of Ob- jective III in terms of their capability of identifying high-quality highway traffic control policies in the context of the case study of Objective IX.

XI To compare statistically the relative effectiveness of the novel highway traffic control measure of Objective V with that of the best-performing existing highway traffic control measures identified in Objective X, in the context of the case study.

XII To recommend sensible follow-up work related to the work in this dissertation which may be pursued in future.

1.4 Dissertation Scope

Due to the complexities involved in the highway traffic control problem, the scope in this dissertation is limited to the following control methods:

Ramp metering is the concept of controlling highway utilisation by effectively limiting the inflows of traffic onto the highway. This is achieved by changing traffic light phases at on-ramps, thereby controlling when vehicles are allowed to enter certain sections of the highway, ensuring that the highway capacity is fully utilised, preventing highway over- utilisation, and thus reducing congestion due to over-utilisation [115].

Variable speed limits are another method of controlling the flow of traffic on a certain section of highway [53]. By reducing the speed limit for a certain section of highway, the flow characteristics of that section are altered. As a result, the outflow out of that section may be reduced, thereby allowing a congested section upstream more time to resolve the congestion before further vehicles arrive. Furthermore, variable speed limits may lead to homogenisation of traffic flow, as the differences between the speeds of vehicles are reduced. This may result in a more stable traffic flow which may, in turn, lead to higher throughput and subsequently to a reduction in travel times [53].

The following traffic control measures are acknowledged, but are not implemented in this dissertation:

Dynamic lane assignments. In many cases, different lanes of a stretch of highway are not used effectively, resulting in over-utilisation of certain lanes, while other lanes may remain relatively under-utilised. One method of resolving this imbalance is to assign vehicles to specific lanes, thereby increasing the overall lane utilisation and hence increasing throughput on the highway. Dynamic lane assignment seems especially useful in a traffic paradigm where autonomous vehicles are present in the traffic flow, since direct and very detailed kinematic instructions may be given to such vehicles [137]. Due to the fact, however, that the focus in this dissertation is specifically on the period of mixed traffic flow with limited numbers of autonomous vehicles it is expected that dynamic lane assignments will not be effective due to the limited numbers of vehicles to which lane-changing instructions may be given. Dynamic lane assignments are therefore considered beyond the scope of this dissertation.

Variable message signs provide a manner of conveying information about upcoming traffic conditions to drivers through roadside infrastructure [164]. Due to the difficulty of mea- suring the effectiveness of these messages and their influence on driver behaviour, however, variable message signs are excluded as a control measure from the scope of this dissertation. Platooning is the result of cooperative driving in the form of automated vehicles manoeu- vring to achieve short inter-vehicle distances. Platooning may be facilitated by means of inter-vehicular communication, allowing vehicles to perform safe and efficient passing, lane changing and merging at close range. Platooning movement patterns have typically been modelled on the movement of wild geese and dolphins [68]. The benefits of platooning are, however, only expected to be fully exploitable once the traffic composition consists mainly of autonomous vehicles, and since the focus in this dissertation is on the transitional period during which limited numbers of autonomous vehicles are present in the traffic flow, platooning is beyond the scope of this dissertation.

of the prevailing techniques in fulfilment of Objective I(a). Thereafter, reinforcement learning algorithms, in particular, are studied in fulfilment of Objective I(b). This approach is followed in order to understand different machine learning techniques deemed suitable for solving the online traffic control problem described in §1.2 with a specific focus on reinforcement learning. In pursuit of Objective I(c), the study includes a review of multiple existing techniques for highway traffic control, with a specific focus on existing models and strategies for implementing the well-known control measures of ramp metering and variable speed limits. The aim here is to identify suitable techniques that have been implemented successfully within these contexts and may be adapted for implementation in this dissertation. Thereafter, in pursuit of Objective I(d), the focus shifts to previous attempts at controlling the traffic flow along a highway by providing autonomous vehicles with specific instructions. Finally, the literature study concludes with a review of previous attempts at implementing machine learning in a highway traffic control context, as well as a review of microscopic traffic simulation modelling and model validation guidelines, in fulfilment of Objectives I(e) and I(f).

The second stage of the study is the development stage. During this stage, Objectives II, III and IV of _{§1.3 are pursued. Initially the simple benchmark microscopic agent-based traffic simula-} tion model of Objective II is established within a suitable software environment. The highway traffic control measures identified in the literature pertaining to Objective I(c) are incorporated into this simulation model in order to be able to effectively assess the ability of reinforcement learning algorithms to identify high-quality traffic control policies. This stage also includes the formulation of the highway traffic control problem as reinforcement learning problems in order to facilitate the implementation of the various reinforcement learning algorithms deemed suitable, in fulfilment of Objective III. Finally, this stage culminates in the development of a novel highway traffic control measure in pursuit of Objective V, informed by the literature reviewed in fulfilment of Objective I(d) on the application of autonomous vehicles for controlling highway traffic flow.

The next stage is the implementation stage. Objectives IV and V of §1.3 are pursued during this stage. This entails the implementation of the reinforcement learning algorithms for the existing and the novel highway traffic control measures within the context of the benchmark simulation model of Objective II. This implementation serves the purpose of a testbed for evaluating the traffic control protocols identified by reinforcement learning algorithms according to the improvements achievable in respect of the traffic flow along the highway.

The fourth and final stage of this study is the verification and evaluation stage. During this stage, Objectives VI to XI are pursued. The first step is to research appropriate, generally accepted modelling guidelines according to which a meaningful validation and verification of not only the simulation model implementation, but also the implementation of the reinforcement learning algorithms and highway traffic control measures within the simulation model may be carried out, in fulfilment of Objective VI. This is followed by a thorough statistical comparison of the relative performances of the various algorithms implemented within the context of the benchmark simulation model of Objective II for each of the highway traffic control measures, in fulfilment of Objectives VII and VIII. Thereafter, a case study of a specific instance of the highway traffic control problem is conducted in fulfilment of Objective IX. In this case study, the aforementioned reinforcement learning implementations for the existing and novel highway

traffic control measures are put to the test in the context of a realistic traffic data set, for a specified stretch of highway. This is again followed by a thorough statistical comparison of the relative performances of the various algorithmic implementations in fulfilment of Objectives X and XI. Finally, after having conducted a critical evaluation of the relative performances of the reinforcement learning algorithms in the context of the existing and novel highway traffic control measures, a summary is presented of what has been achieved in the dissertation, and suitable follow-up work and possible improvements are suggested for pursuit in the future, in fulfilment of Objective XII.

1.6 Dissertation Organisation

Apart from this introductory chapter, this dissertation consists of a further thirteen chapters, partitioned into four distinct parts. The first part, comprising Chapters 2–4, contains a literature review of material relevant to the work in this dissertation. More specifically, Chapter 2 is devoted to a literature review of machine learning, with a particular focus on reinforcement learning and a variety of solution techniques for the reinforcement learning problem. In Chap- ter 3, the focus shifts to the existing literature on highway traffic control measures, such as ramp metering and variable speed limits, and how machine learning has been implemented in these contexts. Furthermore, the existing literature on the implementation of autonomous vehicles for controlling highway traffic flow is reviewed. Part 1 is concluded in Chapter 4 with a com- prehensive review of the literature pertaining to the principles of computer simulation with a particular focus on traffic simulation modelling.

The second part of the dissertation, comprising Chapters 5–10, is concerned with the development and implementation of the benchmark and case study microscopic highway traffic simulation models, as well as the implementation of the various reinforcement learning algorithms for the existing highway traffic control measures within the context of the benchmark and case study simulation models. In Chapter 5, a detailed description is provided of the simulation environment which acts as a testbed for the evaluation of the effectiveness of the machine learning algorithms. The implementation of the reinforcement learning algorithms is documented in Chapter 6 within the context of ramp metering, while a similar description of the implementation of reinforcement learning for solving the variable speed limit problem is provided in Chapter 7. Thereafter, a multi-agent approach to solving the ramp metering and variable speed limit problems simultaneously is presented in Chapter 8. Chapter 9 contains a description of the microscopic agent-based traffic simulation model developed for the purpose of the real-world case study. The ability of the reinforcement learning algorithms to identify high-quality highway traffic control policies in a real-world scenario is evaluated in Chapter 10, where a statistical evaluation of the relative algorithmic performances is performed.

The third part of the dissertation, comprising Chapters 11 and 12, is devoted to the development, implementation and evaluation of a novel highway traffic control measure. The focus thus shifts from existing technologies to future technologies involving fully autonomous vehicles. The concepts on which the novel highway traffic control measure is based, its formulation as a reinforcement learning problem and the solution by reinforcement learning algorithms within the context of the benchmark simulation model are detailed in Chapter 11. A statistical performance comparison of the novel highway traffic control measure with the best-performing existing highway traffic control measures is furthermore conducted. The ability of the reinforcement learning algorithms to identify high-quality highway traffic control policies within the context of the novel highway traffic control measure in a real-world scenario is evaluated in Chapter 12, where a sta-

Part I

Literature Review

CHAPTER 2 Machine Learning

Contents

2.1 Machine Learning in General . . . 15 2.2 Reinforcement Learning . . . 16 2.2.1 Evaluative Feedback . . . 17 2.2.2 The Reinforcement Learning Problem . . . 19 2.2.3 Reinforcement Learning Solution Approaches . . . 24 2.3 Reinforcement Learning with Function Approximation . . . 28 2.3.1 k-Nearest Neighbours Weighted Average . . . 29 2.3.2 Multi-layer Perceptron Neural Networks . . . 30 2.4 Chapter Summary . . . 36

This chapter serves as an introduction to the field of machine learning, with a specific focus on reinforcement learning. In §2.1, the notion of machine learning is described in general and the different machine learning paradigms are discussed. Thereafter, the focus shifts in _{§2.2 to} reinforcement learning in particular, introducing the key concepts of the reinforcement learning problem, together with a number of common solution approaches for this problem. In§2.3, two function approximation methodologies are reviewed which may be employed in order to extend the applicability of reinforcement learning to problems with continuous state and action spaces. The chapter finally closes in §2.4 with a brief summary of the material included.

2.1 Machine Learning in General

A scientific field is often best defined by the central question studied. Mitchell [103] states the central question of machine learning as follows:

“How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?”

This central question covers a broad range of learning tasks, such as how to mine medical data records in order to determine which patients are likely to respond best to which treatments, how to design autonomous robots that are capable of navigating based on their own past experience, and how to build search engines that automatically take a user’s needs into account and then customise themselves accordingly. More specifically, Mitchell [102] states that a machine is said

to learn with respect to a particular class of tasks T and a performance measure P , if it reliably improves its performance P at tasks in T following a gain in experience. Naturally then, three features have to be defined in order to have a well-defined learning problem — the class of tasks, the measure of performance which is to be improved upon, and the source of experience. Marsland [93] classifies machine learning algorithms into four categories, according to the manner in which these algorithms find answers:

Supervised learning. In supervised learning, the algorithm is provided with a training set of examples for which the correct responses (targets) are known. Then, based on this training set, the algorithm aims to generalise in order to respond correctly to all possible inputs. This is also sometimes called learning from exemplars.

Unsupervised learning. In unsupervised learning, the correct responses are not known be- forehand, but the algorithm aims to identify similarities between various inputs, such that those inputs which have something in common can be categorised together.

Reinforcement learning. Reinforcement learning falls somewhere between supervised and unsupervised learning, since the algorithm receives a signal if the answer is incorrect, but does not receive an indication as to how to correct it. The algorithm therefore learns by trial-and-error until the best answer is found. Reinforcement learning is sometimes referred to as learning with a critic because of this monitor which associates a score with each answer, but provides no suggestions as to how to improve it.

Evolutionary learning. Biological evolution may be interpreted as a learning process: biological organisms adapt in order to improve their own survival rate and the chance to produce offspring in their environment. This process is replicated in evolutionary learning, where each answer (or set of answers) is associated with a level of fitness, which provides an indication as to how good the current solution is.

Given the online nature of the highway traffic control problem, the potentially large number of variables that need to be taken into account, and the fact that until now, no perfect control method, or combination of methods has been found (which significantly complicates the learning process within the paradigm of supervised learning), it is reinforcement learning that has drawn the attention of the author for further investigation and implementation in this dissertation. This is due to the expectation that the performance measures of _{§1.3 are easily defined and} measured within a simulation environment, which allows them to be translated into effective reward functions in order to provide high-quality feedback to a learning agent, thus allowing the performance of different control policies found during a trial-and-error search to be evaluated accurately in search of near-optimal policies.

In document Reinforcement learning for the control of traffic flow on highways (Page 43-53)