Variance in arrival time - Applying machine learning on the data of a controltower in a retail

The section elaborates on explaining the variance. In this research, several variables are ranked by their influence on the on-time arrival of a truck. One of the solutions to solve

the on-time arrival rate is to add slack time in the planning. So, every ride gets extra slack to always be on-time. However, this is not a realistic solution. The following variables have the most influence on the on-time arrival rate: distance, deviation in volume, return volume, time on the day, weather, and earlier disruption on the day.

The variables distance is the distance a truck has to travel from its origin towards its destination. In the area of DC Pijnacker trucks that have to travel a short distance, tends to be more often not on-time, in comparison with trucks that needs to travel a great distance.

The deviation in volume is a deviation in volume between the planned volume and realized volume measured in load carriers. Figure 7.3 shows the relation between the deviation in arrival time and the deviation in volume. The relation between deviation in volume and the on-time arrival is: if the deviation in volume decreases, the chance a truck will be on time increases. This can be explained by the fact that a truck gets a predetermined unloading/loading time at a store based on the volume forecast. So, if the volume deviates from the planned volume, the planned time for unloading/loading might not be sufficient. To solve this, the volume forecast should be perform better.

The return volume is the number of load carriers a store returns to the DC. In the current situation, the volume of the return is not known in advance. The truck drivers have to deal with the returns when they arrive at a store. The returns can be improved by adding the load-carriers that must be returned to the planning, and make return orders for them. In this case, the optimal number of load carriers that should be returned can be planned. The stores and the drivers will known in this case the return volume beforehand. Furthermore, the unloading/loading time at a store can be more accurately planned, since the return volume is known beforehand.

The variable time on the day indicates the starting hour for a given ride. If the time on the day increases, the chance a truck will be too late increases. So, later that day, more trucks are not on-time. This can be explained by the fact that in most cases a driver is not able to make up for the lost time and the next ride for this driver or trailers will probably not start on-time. This effect during the day is related to the variable earlier disruptions. The variables related to earlier disruptions on the day shows that the delay during the day is very important in the on-time arrivals. Hence, the delay during the day, such as the deviation in the arrival of the previous ride of a truck or the deviation in the arrival of the previous stops is strongly correlated with the on-time arrival. This insight identified a problem Albert Heijn needs to focus on. A possible solution to handle the delays during the day is to real-time adjust to the execution of the planning.

The weather is a combined name for the variables rainfall, temperature and wind speed. These variables are all related to the on-time arrival rate. The weather conditions that cause the higher, not on-time arrival rate are rainfall, low temperatures, and high wind speed. These insights could be used to identify several weather profiles and make specific plannings to deal with weather conditions. For example, adjust the drive times, have extra back-up capacity to deal with disruptions caused by extreme weather conditions, or delay/cancel deliveries during extreme weather conditions.

In summary, the variance in on-time arrivals is not explainable by one variable or cause. It is caused by several reasons. This makes it difficult to solve. So, to improve

the on-time arrival rate in the area of DC Pijnacker, Albert Heijn needs to solve multiple problems, like a better volume prediction and make orders for the return volume.

Discussion

This section discusses this research and its contribution. Section 9.1 discusses the results of this research. Section 9.2 discusses the theoretical contribution of this research. Section 9.3 discuss identified challenges. Section 9.4 discusses the limitations and the results of this research. Section 9.5 provides recommendations for Albert Heijn. Lastly, section 9.6 discusses future research.

9.1 Results

To deal with the practical challenges from the case study of AH, this research attempts to propose a model to explain the variance in arrival time of trucks in the area of retail distribution. This model was built with the dataset from the Simacan/AH control tower. From this case study, it justifies that the proposed model can explain the variance in arrival times by ranking the variables on importance. This enables the business to focus on the right problems. The collected dataset is further processed and extended with external sources. This extended dataset from the Simacan/AH control tower can be valuable by: (i) Explaining the variance in the on-time arrivals by identifying important variables, (ii) predicting and classify trucks that will not be on-time in the near future, (iii) data analytics based on the constructed dataset.

This section elaborates on some interesting insights that were discovered during this research. One interesting insight is the influence of traffic congestion. In the area of DC Pijnacker, the traffic congestion that is caused by heavy traffic is not the main factor that causes the low service degree in on-time deliveries. The rate of trucks that are not on-time is almost the same for trucks that drove towards heavy traffic in comparison with trucks that did not face any traffic congestion. If the cause of the traffic congestion is an accident or road work, the on-time rate of the trucks is lower than the average on-time rate. In summary, the predictable traffic jams by heavy traffic did not cause variance. The transport planning by Albert Heijn handles correctly with heavy traffic in the area of DC Pijnacker with their drive time estimations.

The other insights were discussed in section 8.2. These insights are related to the variables with the highest influence on the on-time arrival rate. These insights are mostly straightforward. Examples are the influence of the weather, the return volume and disruptions earlier on the day. These prejudices were shared across the transportation experts of Albert Heijn transport and us. However, this research proved for the Albert Heijn case that the prejudices were correct. These outcomes show where the business (Albert Heijn) should focus on to gain the highest impact.

The proposed model in chapter 6 gives insights and helps to explain the variance in the on-time arrival rate. It can be implemented by constructing a dataset with the correct

variables as explained in chapter 4 and 5, train and test the Random Forest classifier and use the feature ranking to identify the most important features. These features can be further analysed with data analytics. These outcomes were described in section 8.2.

The Random Forest model, which is proposed in this research, is in our opinion applicable to the other Albert Heijn DCs or other control towers in the retail distribution. The Random Forest is a powerful model since it combines a lot of weak learners into one model. These individual weak learners (decision trees) are uncorrelated, which is the reason why a Random Forest works so well. Other advantages of Random Forest are:

• They are parallelizable, meaning that we can split the process to multiple machines to run. This results in faster computation time.

• They are faster to train than decision trees because we are working only on a subset of features in this model, so we can easily work with hundreds of features. Prediction speed is significantly faster than training speed because we can save generated forests for future uses.

• They handle non-linear features.

• It has methods for balancing error in class population unbalanced data sets.

• The working of a Random Forest is easy to explain towards the business.

However, Random Forest has drawbacks. One of the drawbacks is, that for very large data sets, the size of the trees can take up a lot of memory. So, to use a Random Forest on very large datasets, special PCs, like virtual machines are needed. Another drawback is that a Random Forest can tend to over fit. To overcome this, one should use cross-validation and tune the parameters.

In summary, the Random Forest is a powerful model to apply on control tower in the retail distribution. In my opinion, it can be applied to other DCs or companies. However, this can lead to different results, since the context change. In the Albert Heijn case, we expect that the results for DC Zaandam will be comparable with DC Pijnacker since both DCs are located in urban areas.

9.2 Contribution

The major contribution of this thesis is presenting a successful case study of machine learning on the data of a control tower in a retail distribution landscape. To the best of our knowledge, applications of machine learning in this area have not been documented before. We describe a method to extend the control tower data with open data on weather and traffic, and apply machine learning on the extended dataset.

9.3 Challenges

Technologies such as artificial intelligence and machine learning can play a valuable part in retail distribution within a control tower environment. It can enhance performance

and help to better understand the practical challenges. Such machine learning algorithms require enough data of high-quality.

During this research, several challenges occur when applying machine learning on the data from the AH/Simacan control tower. The following challenges were identified:

• Technology innovation in the transportation industry can be challenging due to the conservative and reactive nature of the industry. Process changes can be costly due to process revalidation and the discrepancies of embracing such changes across partners.

• Integrating the planning and execution between organizations requires significant work to sort through information across multiple systems.

• Within an integration platform, in this case, the control tower, between different parties it is difficult to enhance perfect data quality. Since the data is combined from multiple systems form multiple parties there will be errors in the data.

• The data from a control tower should be extended with external before it can lead to insights. Especially, open data on the weather and traffic congestion is valuable.

Altogether, if these challenges are solved, machine learning can transform supply chain management. The supply chain professionals of the future transform from that of man- aging exceptions to creating more strategic value through new ways of working.

9.4 Limitations

Like any other research, this research involves certain limitations. This research considered a subset of the problem. In the use case, only one regional DC is considered. Furthermore, the set-up during this research was the cooperation between one com- pany and the external stakeholders. So, it is difficult to generalize the proposed model. However, this research showed the application of machine learning in the area of retail distribution. A second limitation is related to the selection and tuning of the machine learning algorithms. Specific to the method used to determine the parameters of the different machine learning algorithms. A step-by-step process has been used, in most cases, to determine the parameters. The disadvantage of using such a method is that interaction effects between the parameters are not fully taken into account. Not taking those effects into accounts means that the performance might be lower.

9.5 Recommendations

In this section, we give recommendations for Albert Heijn. These recommendations are the result of the case study that we performed on the DC Pijnacker. However, the recommendations are not limited to DC Pijnacker. They are also applicable on the other DCs. We suggest the following recommendations:

• We suggest that Albert Heijn applies this research and its method on their other regional DCs. We assume this will lead to valuable insights, that might differ from the results for DC Pijnacker, since the DCs serve different parts of the Netherlands.

• The next recommendation is related to the data in the control tower. The traveled route by a truck should be saved. This valuable information is missing in the control tower. In this research, we used the recommend routes by HERE Maps. However, this is not preferable. Furthermore, the data in the control tower should be automatically extended with open data on the weather and the traffic conditions. This open data leads to valuable results and insights. However, in the current way of working this is an intensive manual task.

• This recommendation is related to the volume. The forecast of the volume should be improved, so it performs better. This will help to minimize the deviation in the actual volume in comparison with the planned volume and improve the on-time arrival rate. Next, the return volume should be added to the planning as return orders. In this case, the return volume is known beforehand and it could be taken into account with the planning.

• Albert Heijn should investigate the possibilities to change from a static way of planning to a more dynamic transport planning, to be able to adjust the planning during the operation when disruptions occur.

Furthermore, section 7.2 discussed interesting business possibilities by using a control tower. In our opinion, the most promising possibilities are using the historical realization data from the control tower in a way such that a route planning software can learn from it with the help of artificial intelligence, and replacement of several administration steps that are mostly done on paper. For example, the administration of the ride of a truck. With the control tower data, a ride can be tracked and all the necessary information for the administration can be stored in one central database. Next, a central database can be useful to guarantee higher quality data. Besides, external data, like traffic and weather data can be added more easily.

9.6 Future research

The work can be further extended from several aspects. Firstly, future research can focus on the entire logistics network and not only one a subset. Secondly, human factors, such as the commitment of the drivers could be taken into account. This means the commitment towards the planning and the work descriptions. For example, will drivers take the proposed route? Future research can examine and test the influence of human factors. Thirdly, the shortage of drivers was not examined in this research, due to data availability. Future research can extend this research with data on the shortage of drivers.

Bibliography

[1] Distrifood Nieuws kennis en carriere. Marketshare of Dutch supermarkets. https: //bit.ly/2Z6Kxpn, 2019. Last accessed 2019-05-02.

[2] John Fernie and Leigh Sparks. 1 retail logistics: changes and challenges. Logistics and Retail Management: Emerging issues and new challenges in the retail supply chain, page 1, 1998.

[3] Bernhard Fleischmann. The impact of the number of parallel warehouses on total inventory. OR spectrum, 38(4):899–920, 2016.

[4] John Fernie and Leigh Sparks. Logistics and retail management: emerging issues and new challenges in the retail supply chain. Kogan page publishers, 2018.

[5] Seyda Serdarasan. A review of supply chain complexity drivers. Computers & Industrial Engineering, 66(3):533–540, 2013.

[6] Glen Weisbrod, Don Vary, and George Treyz. Measuring economic costs of urban traffic congestion to business. Transportation research record, 1839(1):98–106, 2003. [7] Manuela Samek Lodovici, Enrico Pastori, and Caterina Corrias. Shortage of qualified personnel in road freight transport. https://bit.ly/2ZlOW31, 2019. Last accessed 2019-05-14.

[8] Kulikowska. Lack of drivers in the Netherlands. https://bit.ly/2KPSQgA, 2018. Last accessed 2019-08-20.

[9] Jennifer Rowley. Using case studies in research. Management research news, 25(1):16–27, 2002.

[10] Galit Shmueli et al. To explain or to predict? Statistical science, 25(3):289–310, 2010.

[11] Benita M Beamon. Supply chain design and analysis: Models and methods. Inter- national journal of production economics, 55(3):281–294, 1998.

[12] Sunil Chopra. Designing the distribution network in a supply chain. Transportation Research Part E: Logistics and Transportation Review, 39(2):123–140, 2003.

[13] Heinrich Kuhn and Michael G Sternbeck. Integrative retail logistics: an exploratory study. Operations Management Research, 6(1-2):2–18, 2013.

[14] Bart Rouwenhorst, B Reuter, V Stockrahm, Geert-Jan van Houtum, RJ Mantel, and Willem HM Zijm. Warehouse design and control: Framework and literature review. European journal of operational research, 122(3):515–533, 2000.

[15] René De Koster, Tho Le-Duc, and Kees Jan Roodbergen. Design and control of warehouse order picking: A literature review. European journal of operational research, 182(2):481–501, 2007.

[16] Michael G Sternbeck and Heinrich Kuhn. An integrative approach to determine store delivery patterns in grocery retailing. Transportation Research Part E: Logistics and Transportation Review, 70:205–224, 2014.

[17] Linda K Nozick and Mark A Turnquist. Inventory, transportation, service quality and the location of distribution centers. European Journal of Operational Research, 129(2):362–371, 2001.

[18] Susan Van Zelst, Karel Van Donselaar, Tom Van Woensel, Rob Broekmeulen, and Jan Fransoo. Logistics drivers for shelf stacking in grocery retail stores: Potential for efficiency improvement. International Journal of Production Economics, 121(2):620– 632, 2009.

[19] Sumeet Dua and Xian Du. Data mining and machine learning in cybersecurity. Auerbach Publications, 2016.

[20] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning, volume 1. Springer series in statistics New York, 2001.

[21] Peter Flach. Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, 2012.

[22] Stuart Russell and Peter Norvig. Artificial intelligence prentice hall. Upper Saddle River, NJ, 1995.

[23] Lior Rokach and Oded Maimon. Top-down induction of decision trees classifiers-a survey.IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 35(4):476–487, 2005.

[24] Kellie J Archer and Ryan V Kimes. Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis, 52(4):2249– 2260, 2008.

[25] Stephen M Omohundro. Five balltree construction algorithms. International Com- puter Science Institute Berkeley, 1989.

[26] Ping Yan and Zheng Yan. A survey on dynamic mobile malware detection. Software Quality Journal, 26(3):891–919, 2018.

[27] Henrik Madsen and Poul Thyregod. Introduction to general and generalized linear models. CRC Press, 2010.

[28] David R Cox. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2):215–232, 1958.

[29] G Peter Zhang. Neural networks for data mining. In Data mining and knowledge discovery handbook, pages 419–444. Springer, 2009.

[30] Jerome H Friedman. Greedy function approximation: a gradient boosting machine.

In document Applying machine learning on the data of a controltower in a retail distribution landscape (Page 85-100)