Reinforcement Learning for Variable Speed Limits

3.4 Machine Learning in Highway Traffic Control

3.4.2 Reinforcement Learning for Variable Speed Limits

One of the first demonstrations of reinforcement learning to the VSL problem is due to Zhu and Ukkusuri [179]. In their formulation of the VSL problem as an MDP, the state space is characterised and discretised according to various levels of congestion. Four such levels of congestion are defined as follows

u_limit =        1 if 0 < ρi(t) ≤ 0.25ρjam 2 if 0.25ρjam < ρi(t) ≤ 0.5ρjam 3 if 0.5ρjam < ρi(t) ≤ 0.75ρjam 4 if 0.75ρ_jam < ρi(t) ≤ ρjam, (3.37)

where level 1 is characteristic of a free-flow state, level 2 is characteristic of a state of slight congestion, level 3 is characteristic of a state of moderate congestion, and finally, level 4 is characteristic of a state of heavy congestion. The action space comprises a discretised function of speed limits which may be employed by the learning agent on each controlled link i, given by

Vi(t) = V0+ ai(t)I, (3.38)

where ai(t) ∈ {1, 2, . . . , A} denotes the space of actions available to the learning agent on

controlled link i, V0 denotes the minimum allowable speed limit, and V0 + AI denotes the

maximum allowable speed limit. The objective to be minimised in this study is the total travel time spent by vehicles in the system. As in the formulation of Rezaee et al. [132], the reward function employed in order to achieve this goal of minimising the total travel time is defined as

where N (i) represents the average number of vehicles present on the controlled link i during the current control period. This formulation of the reinforcement learning problem was employed in a link-based dynamic network loading model, which is a second-order macroscopic traffic simulation model. The RMART reinforcement learning algorithm (Algorithm 2.5) was employed as solution technique to the reinforcement learning problem.

Walraven et al. [166] demonstrated another application of reinforcement learning to the VSL problem, once again using METANET as the underlying macroscopic traffic modelling tool. The state space is again defined in such a manner as to provide a representation of the current traffic flow conditions on the highway. The state of the highway is given by

st= at−1 uf , st−1(0), u1(t) uf , . . . ,uN(t) uf ,ρ1(t) ρjam , . . . ,ρN(t) ρjam , (3.40)

where the first and second state variables represent the current and previous speed limits assigned to the highway [166]. The remaining state variables represent the current speeds and velocities for the N highway sections during time period t. The current speed un and density ρnfor each

section n _{∈ {1, . . . , N} are normalised with respect to the free flow speed u}f and jam density

ρjam, respectively. The speed and density information for all highway sections is included in

order to allow the learning agent to detect an oncoming traffic jam in one of the sections under consideration. The action space _{A = {60, 80, 100, 120} contains a number of discrete} speed limits which may be applied by the learning agent. In order to smooth the increase and decrease of the speed limits, this state space may also be defined in a state-specific manner, thus allowing only certain speeds to be selected in relation to the current applied speed limit. For example, if a current speed limit of 120 km/h is enforced, the action space may be reduced to A(st) ={80, 100}, thus allowing the agent to choose only between a new speed limit of 80

km/h or 100 km/h, excluding the speed limit of 60 km/h from the available action space [166]. Finally, the reward function employed is

rt=

0 if min_{ui(t + 1)| i = 1, . . . , N} > u

−h(t, t + 1) otherwise, (3.41)

where u is a pre-specified threshold speed, and h(t, t + 1) is a function denoting the number of vehicle hours accumulated during the time interval from time t to time t + 1. As may be seen from the definition of the reward function, the objective to be achieved by the learning agent is once again to minimise the total time spent in the system by the vehicles. In order to solve the reinforcement learning problem, Walraven et al. [166] employed Q-learning, in conjunction with a neural network using the back propagation algorithm as a function approximator.

Another implementation of Q-learning for solving the VSL control problem is due to Li et al. [85]. In this implementation, the same basic highway network, consisting of a dual carriageway with a single on-ramp, as previously employed by Hegyi et al. [53] and M¨uller et al. [105] was implemented within a macroscopic cell transmission simulation model. In this implementation, the state space comprised three variables, namely the density upstream of the bottleneck lo- cation at the lane merge, the density directly downstream of the lane merge and the traffic density at the on-ramp. This state space was discretised in intervals of magnitude 5, between 5 and 80 vehicles/mile/lane on the mainline, while intervals of magnitude 3, between 3 and 30 vehicles/mile/lane were employed for the on-ramp density. This discretisation was employed be- cause a table-based implementation of Q-learning was adopted [85]. The action space consisted of three actions: To either reduce the current speed limit by 5 miles per hour, to maintain the current speed limit or to increase the speed limit by 5 miles per hour. As a result, the speed limit was adjusted incrementally in order to avoid introducing major disturbances in the traffic

where R(s) denotes the reward achieved when in state s, µ denotes the parameter used to scale the magnitude of the reward, and λ is the Poisson parameter. The value of µ was taken as 1×104

while the parameter λ was set to the critical density at the bottleneck. In order to increase the convergence speed of the Q-learning algorithm, an additional incentive of 200 was added to the reward function when the agent found itself in the two states closest to the critical density, while a penalty of 400 was subtracted for severely congested states (i.e. those states with a bottleneck density above 40 veh/mile/ln). This implementation was finally evaluated in the context of a real-world case study involving a section of the Interstate 880 highway in California.

3.5 Chapter Summary

This chapter contained reviews of traffic flow theory and specific highway traffic control measures. In _{§3.1, the two basic traffic flow modelling paradigms, namely macroscopic and microscopic} traffic flow theory, as well as some of the basic notions within each of these paradigms, were introduced. Thereafter, the focus shifted in §3.2 to the control of traffic on a highway, with a review of RM as a means of controlling the number of vehicles allowed onto the highway in §3.2.1. Dynamic speed limits, which may be employed so as to control the flow of traffic already on the highway, were next reviewed in §3.2.2. In §3.2.3, the notion of LA was reviewed, which may be employed to improve the utilisation of the available space on the highway in the most efficient manner. This was followed in _{§3.3 by a review of various techniques which} have been employed in order to improve the traffic flow along a highway in the presence of varying percentages of autonomous vehicles. Finally, applications of machine learning, and more specifically, reinforcement learning to these highway traffic control methodologies, were reviewed in §3.4.

CHAPTER 4 Computer Simulation Modelling

Contents

4.1 Simulation Modelling Concepts . . . 69 4.2 Prevailing Simulation Modelling Paradigms . . . 71 4.2.1 Agent-based Modelling . . . 71 4.2.2 Discrete-event Modelling . . . 71 4.2.3 System Dynamics Modelling . . . 71 4.2.4 Dynamic Systems Modelling . . . 72 4.3 Typical Steps in a Simulation Study . . . 72 4.4 Verification and Validation of a Simulation Model . . . 75 4.4.1 Verification of a Simulation Model . . . 75 4.4.2 Validation of a Simulation Model . . . 76 4.5 Some Advantages and Drawbacks of Simulation Modelling . . . 77 4.6 Traffic Simulation Modelling Paradigms . . . 78 4.6.1 Macroscopic Traffic Simulation . . . 79 4.6.2 Microscopic Traffic Simulation . . . 80 4.6.3 Mesoscopic Traffic Simulation . . . 80 4.7 Chapter Summary . . . 81

This chapter serves as a brief introduction to the extensive field of computer simulation modelling. In_{§4.1 simulation modelling itself, as well as a few key concepts pertaining to simulation} modelling, are defined. This is followed in §4.2 by an introduction to the four major simulation modelling paradigms found in the literature. Thereafter, twelve generic steps that are typically followed during the completion of a simulation study are briefly discussed in_{§4.3. In §4.4, more} detail is provided on some of the various methods suggested in the literature for verification and validation of a simulation model. Before the three currently prevailing traffic simulation modelling paradigms are introduced in_{§4.6, some of the advantages and disadvantages of simulation} modelling are mentioned in _{§4.5. The chapter finally closes in §4.7 with a brief summary of the} material included.

4.1 Simulation Modelling Concepts

Several interpretations and definitions of the notion of simulation have been proposed in the literature. Perhaps most famously, Banks et al. [9] defined simulation as “the imitation of the

operation of a real-world process over time.” Law and Kelton [81] defined simulation as “a broad collection of methods and applications to mimic the behaviour of real systems, usually on a computer with appropriate software.” Simulation may, as a result, be seen as a process of experimentation, using a model of a real-world system, with the aim of studying the behaviour of the underlying system, given certain starting conditions. In order to achieve this, the behaviour of the model has to be a sufficient predictor of the behaviour of the real-world system, so that specific “what-if” questions may be answered using the simulation model [119].

While several different modelling paradigms exist within the broader concept of simulation, there are a number of key concepts that are common to all of these paradigms, as they form the basis of most simulation models. The system, model, events, entities, attributes, activities, resources and system state variables are these common concepts on which the notion of a simulation model is built. This section serves as a brief introduction to these concepts.

A system may be defined as a set of interrelated objects, or entities, which cooperate in order to achieve a common goal [175]. A model was defined by Shannon [147] as the representation of an entity/object in a form other than itself. This representation usually comes paired with a number of assumptions, and is used in order to predict the behaviour of the real-world system under various conditions.

System state variables are the collection of all the information required to sufficiently describe the current system status at any given point in time [10]. This collection of variables used to provide a snapshot description of the system is known as the system state [175]. In the case of a traffic highway simulation, for example, the state of the system may be defined according to the various traffic densities, speeds, and flows on the particular stretch of highway under consideration. Events are specific occurrences which have the potential to change the system state variables, as well as the resulting state that the system finds itself in.

Entities are objects, such as persons or vehicles, which possess the ability to cause changes in the system state variables [63]. They may either be dynamic (i.e. possess the ability to move through a system), or they may be static (i.e. remain stationary and serve other entities in the system) [9, 10]. All entities possess a number of unique characteristics, called attributes, which are used to describe the performance, as well as the functions of these entities [10, 63]. Events are created by the interaction of entities with activities [9, 10, 63]. Activities are the processes and the logic which govern the execution of the simulation. Within the context of a simulation, there are three major types of activities, namely delays, queues and logic [63]. A resource is a special type of entity, which is typically of a static and capacity-restricted nature, and provides a service to other dynamic entities [10, 63]. Examples of resources are bank tellers, order windows or packaging machines. Resources may find themselves in one of a number of given states, such as idle, busy, blocked of failed.

Banks et al. [9] stated that simulation models themselves may be classified as being either static or dynamic, deterministic or stochastic, and discrete or continuous. A static simulation model, commonly referred to as a Monte Carlo simulation model, is a model that is independent of time, and thus only describes a system at a specific instant in time [9, 175]. A simulation of the process of rolling a die is a good example of a static simulation model. Dynamic simulation models, on the other hand, attempt to capture the behaviour of real-world systems as they evolve over time [175]. The simulation of a bank teller from the time when the bank opens at 09:00 until the bank closes at 15:30 is an example of a dynamic simulation model [9].

A simulation model that is devoid of randomness is called a deterministic simulation model. As a result, such models assume certainty with respect to every aspect of the model. An example of a deterministic model would be that of a dentist’s office if all patients were to arrive at their

In discrete simulation modelling, all system state variables are updated at discrete or countable points in time. In contrast, the system state variables are updated continuously as time pro- gresses in a continuous simulation model [175]. An example of a discrete simulation model is that describing the events in a banking hall, where customers arrive at specific points in time, whereas an example of a continuous simulation model is that of the temperature distribution within an engine block as the engine runs for an extended period of time.

In document Reinforcement learning for the control of traffic flow on highways (Page 102-108)