6.2 From raw GPS data to a spatial information
6.2.1 Tracking trucks in space and time
Since April 2016, each vehicle of more than 3.5 tons driving within Belgium has
to pay a road tax for the use of the 6,500 kilometres of road network. For each
truck, the tax is calculated according to the number of kilometres travelled. To do that, trucks have to be equipped with an On Board Unit (OBU) that calculates in real time the value of the tax (http://www.viapass.be/fr/). This
OBU corresponds to a GPS tax tracker sending a GPS point every 30 seconds when the truck is moving.
The positions of the trucks were recorded on each Belgian road for one week, from Monday 14 November 2016 to Sunday 20 November 2016 (and where we decide to exclude the information collected during the weekend).
The dataset contains more than 799,000 different ID of trucks, registered in
63 countries and emitting more than 270 million GPS points. Each of these GPS point is characterised by the ID of the Truck, the coordinates of the GPS point, the time (timestamp) when the GPS point was sent, the instant velocity of the truck, the direction of the truck, the country where the truck is registered, the Eurovalue (a pollution class reference in Europe), and finally the MTM that corresponds to the maximum weight that a given truck can carry. This Chapter focuses only on trucks that are “really” driving between
places: only trucks that emit more than 10 GPS points per day are
considered in this contribution. This choice allows avoiding trucks that are driving on very small distances and that are not particularly relevant in consideration of our objective, like moves within a parking area.
The data used here were initially collected for a taxation purpose and hence not for scientific research. The main objective of this taxation was to internalise the negative externalities (pollution, road damages, congestion,
etc.) of the road transportation of goods. After some cleaning and
transformation steps, Belgium has at his disposal an unprecedented way to deal with spatial and temporal information on individual scale and on the totality of the trucks’ displacement. Either on quantity or quality, the data formerly available (samples from local counting points, surveys, etc) were
more limited. Furthermore, local counting points located on specific road
segments only allow one-time measurement of the traffic (the flow of trucks circulating on this specific road) without any generalisation on a larger scale. No further information related to the kinds of vehicles, the origin or the destination, as well as the goods transported were available. Similarly, using surveys leads to neglect the exhaustiveness (or worse, the representativeness)
of the phenomena due to temporal or financial constraints. Obviously, a
sample of logistic companies has to be made, and the low answer rate would inevitably lead to really small samples from where it is difficult to extract
generalities (please readLombard, 1999, in the case of road transportation in
France). However, these questions linked with the quality and the quantity of the data is not totally raised in this contribution, as explained in the following Section.
Analysing traces left by objects is not a novel approach: pedestrians,
cyclists and cabs have already been studying in spatio-temporal analyses in
tracking of trucks
2012;Thomopouloset al.,2015). Other works explore the road freight system using different data sources: sensors were once placed on the road (counting area, surveillance cameras), or installed on board (transponder, tachographs,
GPS trackers, smartphones, etc). A synthesis is presented by Antoniouet al.
(2011). Research focused on real-time tracking are mainly based on data
provided by a single GPS system maker, or based on a sample of transport
companies. They often answered questions related to the efficiency of the
road network (Flaskouet al., 2015) within a specific geographical area - some
American states or emerging countries such as China or South Africa (Joubert and Meintjes,2015;Kuppamet al., 2014; Maet al.,2011).
The major outline of this literature review is the lack of converging criteria, thresholds or filtering operation to move from a raw GPS data structure into a built, reliable and usable geographical information. The common step within many publications is the identification of the trips and the OD segments within
space and time. Shen and Stopher (2014) point out this step as a main
“challenge” due to the numerous issues linked with the loss of GPS signals, or the amount of noise recorded in the database. To correctly determine how
different trips made by a given truck are organised in space and time, Shen
and Stopher(2014) determine various thresholds (temporal or spatio-temporal ones) allowing to correctly split the succession of GPS points into distinct trips. Temporal thresholds varying from a few seconds to minutes are cited without
any consensus (Shen and Batty,2018;Thakuret al.,2015;Zanjaniet al.,2015),
especially that their significations depend on the geographical context, the objectives of the research, or the data structure (continuing samples or not, variable time gap, recording only when objects are driving, or not, etc.). The low interest of exhaustiveness and representativeness, as well as the absence of validation process are the major critics of these works.