Model Framework - Reinforcement learning for the control of traffic flow on highways

5.1.2 The Benchmark Model . . . 88 5.1.3 The Generation of Vehicles . . . 89 5.1.4 Model Output Data . . . 90 5.2 Model Verification and Validation . . . 90 5.2.1 Verification of the Traffic Simulation Model . . . 91 5.2.2 Validation of the Traffic Simulation Model . . . 91 5.3 Experimental Design . . . 93 5.3.1 The Simulation Warm-up Period . . . 93 5.3.2 General Specifications of the Simulation Framework . . . 94 5.3.3 Types of Statistical Analysis to be Performed on Model Output Data . 96 5.4 Chapter Summary . . . 99

This chapter is devoted to a detailed description of the microscopic (agent-based) traffic simulation model designed and implemented as a test-bed environment for the experiments conducted in this dissertation. The simulation model was implemented in the AnyLogic 7.3.5 [5] software suite, making specific use of its built-in Road Traffic Library. The chapter opens in §5.1 with a detailed description of the modelling framework, with a specific focus on the process of building the road network, implementing highway intersections, and generating individual vehicles as well as recording model output data. This is followed by a description of the verification and validation techniques employed throughout the model building process in _{§5.2. Thereafter, the} experimental design employed in later chapters of the dissertation for the purpose of comparing highway traffic control policies is described in §5.3. Finally, the chapter closes in §5.4 with a brief summary of the work included in the chapter.

5.1 Model Framework

An agent-based, microscopic highway traffic simulation model was designed and implemented as a test-bed environment for the evaluation of the effectiveness of various highway traffic control measures under the guidance of policies provided by reinforcement learning algorithms.

This model has been built in such a manner that it is able to represent a section of highway, together with intersections, consisting of both on- and off-ramps, with sufficient accuracy so as to be able to conduct a thorough evaluation of the effectiveness of highway control policies proposed by reinforcement learning agents. The simulation model developed is stochastic in nature, since Monte Carlo methods are utilised, and Poisson, exponential and uniform distributions are employed when attributes are assigned to the various model entities. Furthermore, the model is continuous as well as dynamic, as its state variables are updated continually throughout model execution.

The static entities of the simulation model comprise road mark-up elements. These entities are roads, intersections, traffic signals, and stop lines. The only dynamic entities in the simulation model are vehicles, as they are the only entities that physically move within the simulated environment during execution of the simulation model. The traffic signals implemented in order to enforce ramp metering at on-ramps are a special type of entity, since they may also be classified as a resource, allocating green time to vehicles and thereby controlling the vehicle flow.

Each of the aforementioned entities possesses a number of unique attributes. For vehicles, these attributes include speed, acceleration, deceleration, colour, length, arrival rate, arrival location, destination, position, as well as travel time and distance travelled. Some of these attributes are assigned random values through the use of built-in probability distributions. The attributes unique to road segments include length and the number of lanes in each direction, while intersection attributes include the roads connected by the intersection, as well as the manoeuvres that the vehicles are allowed to perform as they pass through the intersection. The current phase, the elapsed time during the current phase, the time remaining for the current phase, as well as phase durations and sequences are the attributes specific to each traffic signal. Finally, the attributes associated with a stop line include its position along a specific road segment, as well as the type of traffic sign associated with the stop line. In the case of a speed limit sign, the value of the speed limit is also an attribute of the stop line.

The events occuring during the execution of the simulation model may either be endogenous (internal) or exogenous (external). Endogenous events include vehicle manoeuvres, the changing of traffic signal phases, and changes in vehicle speeds, while exogenous events include vehicle arrivals into the system and vehicle exits from the system.

5.1.1 Constructing the Road Network

One of the most important aspects to consider when constructing any simulation model is the requirement that the model has to be an accurate representation of the real-world system. In the case of a traffic simulation model, the road network implemented within the simulation model should accurately represent the corresponding real-world network. In order to facilitate the accurate construction of road networks in terms of scale and shape, AnyLogic [5] offers a built-in geographic information system (GIS) function which allows access to the open street map (OSM) [108] server. The OSM server provides a readily available global map of road networks. Within the so-called gisMap in AnyLogic, specific gisPoints may then be specified, and routes between these points may be generated, based on existing roads between the two points. An example of this is shown in Figure 5.1. These GIS routes may then be converted to road mark-up elements such as roads and intersections, which form part of AnyLogic’s built-in Road Traffic Library. The advantage of employing this approach is that the scale and underlying shape of the automatically generated road sections are an accurate representation of the real-world equivalent. Alternatively, the user may manually trace the road network over an image of a map. When this approach is adopted, however, the choice of the appropriate scale is of primary importance,

Figure 5.1: A screenshot illustrating the GIS routing capabilities within AnyLogic, with an automatically generated route (indicated by the black, dashed line) between Cape Town and Durbanville in the Western Cape province of South Africa.

Roads are arguably the most important components of the space mark-up of a road network. Within the AnyLogic Road Traffic Library, such roads may comprise straight or curved segments and possess a number of properties as specified by the user, including whether the road is a one- way or two-way road, the number of lanes in the “forward” direction as well as the number of lanes in the “backward” direction. Lanes in opposite directions are separated using a so-called lane divider of a user-specified width. Roads also allow the user to access the number of vehicles, as well as a list of the individual vehicles travelling on a specific road section at any point in time during execution of the simulation model. Through the use of this list, attributes specific to the vehicles may then be accessed and varied. Certain properties are applicable not only to individual roads, but also to the entire road network. These properties are the traffic flow direction, lane width and the road appearance in the visualisation animation of the simulation model.

Intersections are employed to connect various sections of road to one another. This may include intersections that control traffic flows from multiple directions, the gradual increase from an n- lane road into a m-lane road where m > n, or the gradual decrease from an n-lane road to an m-lane road where m < n. The movement of vehicles through an intersection is governed by so-called lane connectors which specify paths which may be followed by the vehicles as they travel through the intersection.

Stop lines are another method of controlling traffic flow within the simulation model. Stop lines may be placed at any location along a section of road. These entities may be employed in order to introduce road signs, thereby enforcing traffic rules such as the indication of a stop street, a speed limit, the end of the scope of a speed limit, or a yield sign. Finally, stop lines may be used

for the facilitation of the execution of a specific portion of code which is to be executed every time a vehicle passes over the stop line.

5.1.2 The Benchmark Model

In order to demonstrate the working of the process of reinforcement learning as applied to the highway traffic control problem, and to evaluate the performance of the policies proposed by a reinforcement learning agent, a simple benchmark simulation network is introduced in this section. This benchmark network consists of a hypothetical highway section following the general layout shown in Figure 5.2.

O2 O1

S1.1 S1.2 S1.3 S1.4 S2.1 S2.2

Figure 5.2: The benchmark highway network considered in this study.

As may be seen in the figure, the network has two demand nodes, denoted by O1 and O2, which

occur in the mainline and at a single on-ramp, respectively. The stretch of highway before the on-ramp consists of four sections, denoted by S1.1–S1.4 which are all 1 km in length. After the

on-ramp there are two further 1 km sections of highway, denoted by S2.1 and S2.2, which lead

to a single destination node, denoted by D1. All highway sections have two lanes in the forward

direction, while the on-ramp has only a single lane joining into the highway stream.

A more detailed representation of the on-ramp implementation in the benchmark network is given in Figure 5.3. As may be seen in the figure, the vehicles entering the main stream from the on-ramp are given a lateral lane space of 110 metres in order to join the traffic flow on the highway. StopLine1 and StopLine3, positioned as indicated in the figure, are used to display speed limits, while StopLine2 is used for the placement of a traffic signal in the case where ramp metering is applied. StopLine4 is employed so as to display a warning sign regarding the lane merge ahead.

110 metres

StopLine2

StopLine1 StopLine3

S1.4 S2.1

StopLine4

Figure 5.3: A screenshot from the simulation environment showing the highway intersection as the on-ramp joins the highway in the benchmark network. The direction of travel is from left to right.

join the highway traffic flow, rather than specifying the exact point at which the vehicles should enter the highway traffic flow.

5.1.3 The Generation of Vehicles

Vehicles are generated and removed from a simulation run by means of a number of state chart blocks included in the Process Modelling and Road Traffic Libraries. These blocks include a sourceblock, which is used to generate vehicles, a queue block which acts as a buffer in the case where vehicles that have been generated have to wait before entering the simulated road network (such as when congestion spill back reaches past the boundaries of the simulated environment), a carEnterRoadNetwork block, where vehicle attributes are specified, a carMoveTo block which is used to define the destination of vehicles, and finally a carDispose block which removes vehicles from the simulation once they have reached their destination. An example of such a configuration, specifically for the benchmark network described in _{§5.1.2, is given in Figure 5.4.}

Figure 5.4: A number of connected blocks in the simulation model for the benchmark network, indi- cating that 104 vehicles have been generated at O1, none of which are waiting in a queue to enter the simulated road network. Similarly, 31 vehicles have been generated at O2which have, again, all entered the road network. A total of 126 vehicles remain within the simulation, while 9 vehicles have reached their destination and have thus been removed from the network.

In the situation where the entry point to the road network has multiple lanes in the forward direction (such as at the demand point O1), the vehicles generated may enter the network

in either a user-defined or randomly allocated lane. If, however, the entry road consists of a single lane, the vehicle will appear in the single lane once it enters the road network. Vehicle generation may be performed in a number of different ways. Vehicles may be generated according to an arrival rate following a Poisson distribution with an input mean corresponding to the desired traffic volume. Alternatively, the desired vehicle interarrival times may be specified either explicitly or through the use of a suitable probability distribution (e.g. an exponential distribution, a normal distribution or a uniform distribution). Finally, vehicle arrivals may take place according to a deterministic arrival schedule in which case the vehicles are generated at exact times following a user-specified schedule, or vehicles may be generated by calling a so-called vehicle inject function.

Once a vehicle has been generated, and there is sufficient space available on the road network at the arrival location to accommodate the vehicle, a number of attributes are simultaneously assigned to it. Among these attributes are its length, initial speed, preferred speed, maximum

acceleration and deceleration, as well as its entry point in the simulation model. These attributes are assigned to the vehicle when it passes through the carEnterRoadNetwork block. If, however, sufficient space to enter the road network is not available, the vehicle waits in the queue block until space becomes available at which point the vehicle may enter the road network. The destination of the vehicle is only assigned to it once it reaches the carMoveTo block.

In the simulation environment, all vehicles obey all traffic laws. As a result, the model is unable to account for vehicles that perform illegal manoeuvres such as running red signals or exceeding the imposed speed limit. Furthermore, vehicles maintain a suitable following distance which is stochastically calculated based on the vehicles’ deceleration abilities. The vehicle following distance is, however, always of such a magnitude that if both vehicles were to decelerate at the maximum deceleration rate, a collision of the vehicles would be avoided. The gaps between stationary vehicles are uniformly distributed distances ranging from 1 to 3 metres in length.

5.1.4 Model Output Data

Performance data recorded throughout the execution of each simulation run are saved and written to an excel file at the end of each simulation run. These data may be partitioned into three major classes of performance measure indicators (PMIs), based on which the relative performance of the different control policies, as determined by the various reinforcement learning algorithms, may be evaluated.

The first of these PMIs is the total time spent in the system by the vehicles (TTS), which is simply the sum total of the times spent in the system by all vehicles. This PMI is then broken down into two further PMIs, namely the total time spent in the system by vehicles travelling along the highway (TTSHW) only, and the total time spent in the system by vehicles that join the network from the on-ramp (TTSOR). The reason for this breakdown is that it is expected that there may be an increase in the total time spent in the system by vehicles that join the network from the on-ramp due to ramp metering, which may not be reflected sufficiently in the single total time in the system measure.

The second of the PMI classes is the mean vehicle travel time. This is again broken down into the mean travel time of vehicles travelling along the highway (TISHW) only, and the mean travel time of vehicles joining the highway from the on-ramp (TISOR). During the data collection process, the maximum travel time achieved by a vehicle travelling along the highway only, as well as the maximum travel time of a vehicle joining the highway from the on-ramp, is also recorded. This is due to the fact that road users may not only be interested in how long it would take them to travel the same distance on average, but also what their travel time would be in a worst-case scenario. These values constitute the third PMI class.

For all of the aforementioned output data generated by the simulation model, further information is also recorded in addition to the explicit values taken as PMIs. These include the corresponding minimum values, maximum values, standard deviations and confidence intervals, as well as the number of sample points included in these calculations.

In document Reinforcement learning for the control of traffic flow on highways (Page 122-127)