On the Design of a Cost-efficient
Resource Management Framework
for Low Latency Applications
Binxu Yang
A dissertation submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy of
University College London.
Department of Electronic and Electrical Engineering University College London
I, Binxu Yang, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indi-cated in the work.
Abstract
The ability to offer low latency communications is one of the critical design requirements for the upcoming 5G era. The current practice for achieving low latency is to overprovision network resources (e.g., bandwidth and computing resources). However, this approach is not cost-efficient, and cannot be applied in large-scale. To solve this, more cost-efficient resource management is required to dynamically and efficiently exploit network resources to guarantee low latencies. The advent of network virtualization provides novel opportunities in achieving cost-efficient low latency communications. It decouples network resources from physical machines through virtualization, and groups resources in the form of virtual machines (VMs). By doing so, network resources can be flexibly increased at any network locations through VM auto-scaling to alleviate network delays due to lack of resources. At the same time, the operational cost can be largely reduced by shutting down low-utilized VMs (e.g., energy saving). Also, network virtualization enables the emerging concept of mobile edge-computing, whereby VMs can be utilized to host low latency applications at the network edge to shorten communication latency. Despite these advantages provided by virtualization, a key challenge is the optimal resource management of different physical and virtual resources for low latency communications.
This thesis addresses the challenge by deploying a novel cost-efficient resource man-agement framework that aims to solve the cost-efficient design of 1) low latency communi-cation infrastructures; 2) dynamic resource management for low latency applicommuni-cations; and 3) fault-tolerant resource management.
Compared to the current practices, the proposed framework achieves 80% of deploy-ment cost reduction for the design of low latency communication infrastructures; contin-uously saves up to 33% of operational cost through dynamic resource management while always achieving low latencies; and succeeds in providing fault tolerance to low latency communications with a guaranteed operational cost.
Acknowledgements
First of all, I would like to thank my supervisor, Professor George Pavlou, for giving me the opportunity to pursue a Ph.D., and for constantly providing me with his mentoring, advice, encouragement and support. Also, I would like to thank Dr. Wei Koong Chai for his countless support, advice and insightful discussions, which made this thesis possible. Most importantly, I would like to thank both of them for teaching me not only how to do research, but also how to solve every complex problem with critical thinking, which I can benefit from for the rest of my life.
I also benefited from the advice and help provided by many mentors that I encountered along my Ph.D. In particular, I would like to thank Dr. Zichuan Xu and Dr. Daphne Tuncer, who guided me into the world of optimization, and provided me insightful comments for different research questions tackled in this thesis. In addition, I would like to thank Dr. Konstantinos V. Katsaros for his tremendous help in the initial stage of my Ph.D. Last, I would also like to thank Dr. Truong Khoa Phan, Dr. Onur Ascigil, and all the colleagues of the NSRL group for the encouragement and advice throughout my Ph.D.
I would also like to thank the external and internal examiners of my Ph.D., Professor Kun Yang (University of Essex) and Professor Miguel Rio, for all their valuable comments and feedback concerning this thesis.
Finally, I would like to thank my family and friends. In particular, I am deeply grateful to my parent, Hong Yang and Dr. Yanling Sun. They provided me constant encouragement, support and endless love throughout my life, which has been a major source of inspiration for all my achievements.
Contents
1 Introduction 14
1.1 Context and Motivation . . . 14
1.2 Problem Statement . . . 17
1.3 Contributions . . . 18
1.4 Thesis Outline . . . 20
2 Related Work and Background 21 2.1 Introduction . . . 21
2.2 Background . . . 22
2.2.1 Low Latency Applications . . . 22
2.2.2 End-to-End Latency Decomposition . . . 23
2.2.3 Enabling Technologies . . . 25
2.3 Resource Management in Computer Networks . . . 28
2.3.1 Resource Types . . . 28
2.3.2 Cost Definition . . . 28
2.3.3 Correlation Between Latency, Resources, and Costs . . . 30
2.3.4 Relevant Optimization Models . . . 30
2.3.5 Optimization Solutions . . . 32
2.4 Problem and Design Space . . . 35
2.4.1 Static Network Planning . . . 36
2.4.2 Dynamic End-to-End Resource Management . . . 38
3 Cost-efficient Network Planning for Low Latency Applications 44
3.1 Introduction . . . 44
3.2 Background and Problem Statement . . . 47
3.2.1 Delay-Sensitive Synchrophasor Monitoring Applications . . 47
3.2.2 Deployment Cost Minimization Problem . . . 49
3.3 Analysis of End-to-End Delay Impact Factors . . . 54
3.3.1 Bandwidth . . . 55
3.3.2 Topology . . . 56
3.3.3 Flow Synchronization . . . 58
3.4 Design of Network Planning Algorithms . . . 61
3.4.1 Algorithm Based on Path Length Constraint . . . 62
3.4.2 Algorithm Based on Application-Level Betweenness and Path Length Constraint . . . 63
3.4.3 Algorithm Based on Flow Interference and Bandwidth Con-straint . . . 65
3.5 Performance Evaluation . . . 66
3.6 Conclusion . . . 69
4 Dynamic Resource Management for Low Latency Applications 71 4.1 Introduction . . . 71
4.2 System Model and Problem Formulation . . . 75
4.2.1 System Model . . . 75
4.2.2 Operational Cost Minimization Problem . . . 77
4.3 Cost-efficient Dynamic Resource Allocation Framework . . . 81
4.3.1 Overview . . . 81
4.3.2 Heuristic-Based Incremental Allocation Mechanism . . . . 84
4.3.3 Global Optimal Reoptimization Algorithm . . . 88
4.3.4 Algorithm Analysis . . . 90
4.4 Performance Evaluation . . . 93
4.4.1 Service Latency and Operational Costs . . . 94
4.5 Conclusion . . . 105
5 Fault-Tolerant End-to-End Resource Management 106 5.1 Introduction . . . 106
5.2 Fault-Tolerant Virtual Network Function Placement Problem . . . . 109
5.2.1 System Model . . . 109
5.2.2 Requests for Service Chains . . . 110
5.2.3 Stateful Active and Stand-by Instances . . . 110
5.2.4 Cost Model . . . 111
5.2.5 Problem Definition . . . 112
5.3 Fast Heuristic Solution for Fault-Tolerant Placement Problem . . . . 113
5.3.1 Algorithm . . . 113
5.3.2 Algorithm Complexity . . . 115
5.4 Approximate Solution for a Special Problem Instance . . . 117
5.4.1 Overview . . . 117
5.4.2 A (2, 4 + ε) Bicriteria Approximation Algorithm . . . 118
5.4.3 Algorithm Analysis . . . 119 5.5 Evaluations . . . 121 5.5.1 Experiment Settings . . . 121 5.5.2 Performance Evaluation . . . 122 5.6 Conclusion . . . 124 6 General Conclusions 126 6.1 Summary . . . 126
6.2 Future Research Directions . . . 128
List of Figures
2.1 End-to-end communication example. . . 24
2.2 End-to-end routing and resource allocation. . . 27
2.3 Trade-off between latency and cost. Allocate more resources re-duces latency, but results in high costs. . . 29
2.4 Illustration of upper and lower bounds for minimization problems. . 33
2.5 End-to-end resource management framework. . . 36
3.1 Medium voltage power grid. . . 48
3.2 (CDF) Te2eof PMU flows with PLC and optical fiber . . . 51
3.3 Hybrid communication infrastructure. . . 53
3.4 Impact of topology and technology choices on PMU application performance . . . 56
3.5 Example of path Pj0 (flow j0) joining path Pj(flow j) at node ui. . . 59
3.6 CDF of Te2efor 500Kbps PLC links: PLeC algorithm . . . 67
3.7 CDF of Te2efor 500Kbps PLC links: AB-PLeC and FIB algorithm . 68 3.8 Each point corresponds to one flow and denotes the length of the path traversed towards the PDC and the number of times the flow may encounter synchronization delays. . . 69
4.1 Hierarchical MEC system model. . . 75
4.2 Example of MEC operational cost minimization problem. . . 78
4.3 Dynamic resource allocation framework overview. . . 83
4.4 Response time . . . 97
4.6 Cost efficiency gap to OPTLB. . . 101
4.7 Heuristic+Reoptimization’s average cost for each latency constraint over different network sizes. . . 103 4.8 Algorithm running time comparison. . . 104 5.1 An example of fault-tolerant placement problem in G with a set
DC = {DC1, DC2, DC3} connected by a set V = {v2, v3, v5} of
switches. . . 109 5.2 An example of the auxiliary graph G0= (V0, E0) constructed from
network G with a setDC = {DC1, DC2, DC3} of DCs that are
con-nected by a set V = {v2, v3, v5} of switches. R = {rj, rj+1, rj+2}. . . 119
5.3 Performance of algorithms Heuristic and Greedy. . . 123 5.4 Performance of algorithms Approximation and GreedynoBW. . . 125
List of Tables
3.1 Summary of real MV grid topological properties of a large
Euro-pean DNO. . . 50
3.2 Real MV grid topological properties per area. . . 50
3.3 Summary of resulting topologies. . . 67
4.1 Notations for dynamic resource allocation problem . . . 77
Abbreviations
AB-PLeC Application-Level Betweeness and Path Length Constraint ADN Active Distribution Network
ALB Auto-Scaling and Load Balancing AP Access Point
AR Augmented Reality CAPEX Capital Expenditure
CSCP Capacitated Set Covering Problem CVD Capacity Violation Detection DC Data Center
DER Distributed Renewable Energy Sources DNO Distribution Network Operator
DR Data Rate EV Electric Vehicle
FIB Flow Interference and Bandwidth Constraint HV High Voltage
ILP Integer Linear Programming ISP Internet Service Provider IoT Internet of Thing
LP Linera Programming MAC Medium Access Control MEC Mobile Edge-Cloud MV Medium Voltage
NFV Network Function Virtualization NLCG Network Latency Constraint Greedy OPEX Operating Expenditure
OS Operating System
PDC Phasor Data Concentrator PLC Power Line Communication
PLeC Path Length Constraint PMU Phasor Measurement Unit P-SS Primary-Substations RIN Route Interference Number RTSE Real-Time State Estimation
SCPA Set Cover Partition Approximation SDN Software-Defined Networking SFC Service Function Chaining SP Service Provider
S-SS Secondary-Substations
TCAM Ternary Content Aware Memory
VALB Virtual Auto-Scaling and Load Balancing VM Virtual Machine
VNF Virtual Network Function WAN Wide Area Network
Publications
• Yang, B., Xu, Z., Chai, W.K., Liang, W., Tuncer, D., Galis, A. and Pavlou, G., Algorithms for Fault-Tolerant Placement of Stateful Virtualized Net-work Functions, In Communications (ICC), IEEE International Conference on, May 2018.
• Yang, B., Chai, W.K., Xu, Z., Katsaros, K.V. and Pavlou, G., Cost-Efficient NFV-Enabled Mobile Edge-Cloud for Low Latency Mobile Ap-plications, IEEE Transactions on Network and Service Management, DOI: 10.1109/TNSM.2018.2790081, January 2018.
• Yang, B., Chai, W.K., Pavlou, G. and Katsaros, K.V., Seamless support of low latency mobile applications with nfv-enabled mobile edge-cloud. In Cloud Networking (Cloudnet), IEEE International Conference on, October 2016.
• Yang, B., Katsaros, K.V., Chai, W.K. and Pavlou, G., Cost-efficient low la-tency communication infrastructure for synchrophasor applications in smart grids, IEEE Systems Journal, DOI: 10.1109/JSYST.2016.2556420, May 2016.
• Chai, W.K., Wang, N., Katsaros, K.V., Kamel, G., Pavlou, G., Melis, S., Hoefling, M., Vieira, B., Romano, P., Sarri, S., Tesfay, T.T., Yang, B et al, An information-centric communication infrastructure for real-time state estimation of active distribution networks, IEEE Transactions on Smart Grid, 6(4), February 2015.
• Katsaros, K.V., Yang, B., Chai, W.K. and Pavlou, G., November. Low la-tency communication infrastructure for synchrophasor applications in distribution networks, In Smart Grid Communications (SmartGridComm), IEEE International Conference on, November 2014.
Introduction
1.1
Context and Motivation
With the advancement of communication and computing technologies, ultra-low la-tency applications such as augmented reality (AR) [1], autonomous car control [2] and remote surgery [3] are expected to take off in the upcoming 5G era [4]. These applications require a low latency network to support fast interactive communica-tions between users and servers. In this sense, designing novel communication networks to support low latency applications in a cost-efficient way has become a key research challenge.
Despite strong motivation in commercializing low latency applications, only a limited number of low latency networks have been deployed due to long wide area network (WAN) latencies. According to [5], the average WAN latencies range from 50ms to 83ms (e.g., round-trip time), which makes Internet Service Providers (ISPs) unable to provide low latency services (e.g., 10ms). The long WAN laten-cies are intrinsically due to the current Internet communication paradigm, in which a client sends a service request across the WAN to access services located at end servers [6]. In such a scenario, a request needs to go through a number of network equipment (e.g., servers, middleboxes [7], routers) along the communication path, where different types of delay might be incurred. For instance, a request might suffer from transmission delay due to low bandwidth resources on an intermediate network link. Further, a request might experience long queueing delay at an
inter-mediate middlebox (e.g., Firewall) [8] due to the large volume of inbound traffic and lack of computing resources (e.g., CPU, memory). Similarly, long processing delay might be encountered at end servers due to inefficient allocation and scheduling of computing resources.
Existing low latency networks adopt overprovisioning to achieve ultra low la-tency at the cost of significant capital expenditure (CAPEX) and operating expen-diture (OPEX). For instance, dedicated networks for financial trading and military communications adopt a full optical fiber deployment to achieve single-purpose low latency communications [9]. However, such a solution cannot be applied to large-scale commercial networks such as Internet of things (IoTs) [10] and 5G cellular networks due to the large number of costly high-bandwidth network links. Sim-ilarly, deploying more network equipment (e.g., servers) can increase the overall computing resources [11], thereby enabling fast data processing. As such, laten-cies can be guaranteed given that computational congestions (e.g., queueing delays at network equipment such as servers) in the face of peak workloads are avoided. However, the large number of deployed servers not only result in substantial deploy-ment cost but also consume considerable energy during their operation. According to [12], energy consumption represents a major part of ISPs’ operating expenses (OPEX), and it is expected to further increase by 10-12% per year [13]. Therefore, in order to make low latency applications accessible to the general public, low la-tency communication networks need to be carefully designed so that both CAPEX and OPEX can be minimized.
The advent of network function virtualization (NFV) [8] enables novel oppor-tunities to achieve low latency, while largely improving the CAPEX and OPEX of networks. The fundamental and original idea of NFV is to exploit high-volume commodity servers at different network locations (e.g., base station, aggrega-tion point, access router, core router, etc.) to provide virtualized resources (e.g., bandwidth resources and computing resources) in the form of virtual machines (VMs) [7]. As such, network resources are decoupled from dedicated hardware, and any commodity server can instantiate any network functions (i.e., middleboxes)
from its VMs. NFV has been further extended to use its VMs to support more gen-eral service processing such as video transcoding [14]. By doing so, all the network equipment (e.g., middleboxes, routers, and servers) on the entire end-to-end com-munication path can be now virtualized. One direct benefit is the long-term CAPEX reduction. For instance, rather than purchasing new expensive dedicated hardware for new network functions or services, new functions/services can be installed as an instance of plain software into existing high-volume commodity servers. This avoids purchasing new and expensive dedicated equipment, and simplifies the ser-vice deployment process. In addition, resource usage efficiency and energy con-sumption can be largely improved (i.e., OPEX reduction) [15] by NFV. This is achieved by dynamically reallocating or migrating instantiated functions/services (e.g., hosted in VMs) to fewer commodity servers (e.g., shutting down low-utilized servers) during non-peak traffic time. At the same time, network functions and ser-vices can be instantiated at network locations that best serve end users in terms of access latency. For example, VM instances of end services can be brought from dat-acenters (DCs) to access points (APs) to shorten communication latencies, if APs are equipped with commodity servers.
Despite the abovementioned novel opportunities enabled by NFV, there are still many engineering challenges raised by the design of resource management ap-proaches for cost-efficient low latency communications. For instance, existing un-derlying communication infrastructures often lack of high-bandwidth network links (e.g., backhaul network [16], smart grid [17]), which is the major rationale behind failing low latency. To this end, the cost-efficient (CAPEX) deployment/upgrade of underlying communication infrastructures with costly high-bandwidth links (e.g., WiMAX [18], optical fiber) is required. Next, in order to continuously maintain cost efficiency and low latency in a NFV-enabled network, virtualized resources on end-to-end communication paths need to be dynamically optimized in the face of varying network conditions (e.g., due to user mobility, varying workload), so that the allocated resources are always being efficiently utilized. This involves dynam-ically finding 1) the optimal placement of VM instances for intermediate network
functions and end services, 2) the optimal VM capacity and 3) the optimal end-to-end routing paths that go through the instantiated VMs. In particular, due to the capacity limitations of commodity servers, the VM placement, capacity and routing paths need to be jointly determined to optimally utilize different network resources to achieve the required latency. On the other hand, the optimal resource allocation approach needs to ensure that low latency services can be continuously delivered even under extreme network conditions such as network failures. That is, when cer-tain network equipment or VMs are unavailable due to faulty hardware or software, the back-up resources need to be in place to guarantee low latency.
1.2
Problem Statement
Given the abovementioned design challenges in cost-efficient resource manage-ment, the following questions will be addressed in this thesis.
1. How to plan network capacities (e.g., link capacities) for underlying commu-nication infrastructures so that the deployment cost (CAPEX) can be mini-mized and low latency can be achieved.
2. How to design online end-to-end resource management algorithms, so that different end-to-end virtualized resources can be dynamically and jointly managed in the face of varying network traffic, while always achieving low latencies and maintaining low operational costs (OPEX).
3. How to further guarantee low latencies and low operational costs in face of network failures (e.g., hardware failures from underlying communication in-frastructure, software failures in VMs).
These open questions motivate the design of a resource management frame-work that can effectively address the deployment costs (CAPEX) and operational costs (OPEX) for low latency communications. Specifically, the framework design questions will be answered through the analysis of realistic and specific low latency applications.
1.3
Contributions
Designing a resource management framework consists in devising optimization al-gorithms that optimally derive the best trade-off between the amount of allocated resource and the resulting end-to-end latencies. Specifically, algorithms that target different resource management scenarios such as static network planning, online resource allocation, and fault-tolerant resource allocation are studied. The details of contributions are presented in the following.
Design of cost-efficient low latency communication infrastructures [19, 20]: Given the high deployment costs of high-bandwidth technologies (e.g., opti-cal fiber), the deployment/upgrade of low latency communication infrastructures needs to be optimized so that the required number of high-bandwidth network links is minimized. To this end, three different static network planning algorithms for cost-efficient low latency communication infrastructures were developed, aimed at minimizing deployment costs at the network planning stage. These algorithms determine the minimum amount of end-to-end network resources to achieve the required low latency, and derive network locations and capacities to deploy high-bandwidth communication links. Specifically, the proposed algorithms consider the characteristics of low latency applications (e.g., datarate, packet size) together with topological characteristics (e.g, path length and betweenness [21]) to iden-tify network locations where network capacities are insufficient. In particular, one algorithm based on network calculus [22] provides worst-case guarantees on end-to-end latencies for deterministic workloads. Based on the proposed algorithms, a realistic case of upgrading smart grid communication networks [17] to support mission-critical applications is studied. The solution achieves 80% of deployment cost reduction compared to conventional approaches for a large set of real power grid topologies.
Dynamic cost-efficient resource management for low latency communica-tions [23, 24]: The underlying communication infrastructures can be fully virtual-ized with the latest network virtualization technology [7], whereby operational costs
(OPEX) can be dynamically tuned by dynamic resource allocation (e.g., shutdown VMs during low workloads). In order to fully make use of the virtualized infrastruc-ture to minimize operational costs, VM placement (e.g., VMs locations hosting end services), VM capacity and routing (e.g., network paths between users and services) need to be dynamically determined in face of varying network traffic. Conventional approaches [15, 25, 26] focus either on optimizing network link resources (e.g., bandwidth) or optimizing network node resources (e.g., CPU), and they assumed a set of predefined network locations to host VMs. In contrast, the dynamic resource management approach proposed in this thesis jointly optimizes different end-to-end resources (e.g., bandwidth and computing resources) to further improve the cost ef-ficiency. Specifically, the proposed approach first applies a fast heuristic-based in-cremental allocation mechanism that dynamically increases the allocated resources in the network area where user traffic is heavy. Later, a reoptimization algorithm pe-riodically adjusts the allocated resources to maintain a near-optimal operational cost over time. Mathematical analysis shows that the reoptimization algorithm provides a worst-case operational cost guarantee in polynomial time. Further, experiments under realistic network settings demonstrate that the dynamic resource management succeeds in achieving cost-efficient low latency communications. In particular, the dynamic approach continuously saves up to 33% cost efficiency compared to cur-rent approaches, while guaranteeing the cost efficiency to be within 20% of the lower bound of the optimal solution, regardless of network sizes, services’ latency requirements and server capacities.
Fault-tolerant cost-efficient resource management [27]: The proposed de-sign of underlying communication infrastructures and online resource management algorithms achieve a near-optimal cost efficiency while satisfying low latency re-quirements. However, both designs are vulnerable to hardware and software fail-ures, which can lead to unavailable resources and application latency violations. To enhance online resource management’s fault tolerance, a stateful fault-tolerant resource management problem is considered, whereby user states associated with VMs need to be transferred to the corresponding back-up VMs upon failures. In this
problem, cost-efficient routing, VM placement, back-up VM placement and state transfer paths need to be jointly optimized. To this end, an efficient heuristic algo-rithm and a bicriteria approximation algoalgo-rithm with performance (e.g, cost) guar-antees are proposed. Specifically, the approximation algorithm adopts an auxiliary graph approach [28] to jointly consider all the on-path resources in an end-to-end manner. Simulations with large-scale networks show that the proposed algorithms largely outperform the conventional approaches where resources of network nodes and links are separately considered. Last, it must be stressed that the proposed so-lution is a general approach which is also valid for stateless fault-tolerant resource management.
1.4
Thesis Outline
The rest of this thesis is organized as follows. Chapter 2 provides the state-of-the-art in the area of cost-efficient resource management and low latency communications. Chapter 3 introduces the design of cost-efficient low latency communication infras-tructures with a focus on reducing network deployment costs. Chapter 4 presents a dynamic resource management framework for low latency applications, aimed at achieving a trade-off between systems’ operational costs and low latencies. Chapter 5 focuses on the fault-tolerant aspects of dynamic cost-efficient resource manage-ment. The proposed approach enhances networks’ fault tolerance by simultaneously achieving cost efficiency and low latency.
Related Work and Background
2.1
Introduction
Low latency communications have been extensively studied in dedicated single-purpose networks during the last decade. Conventional approaches to resolve this problem involve adopting high bandwidth technologies such as optical fibers to re-duce latencies. However, these approaches raise concerns of cost efficiency and are not applicable to large-scale commercial networks due to considerable deploy-ment costs. Alternatively, effective routing, resource allocation and scheduling ap-proaches can prioritize packets with low latency requirements, so that the overall ratio of successful admitted packets that meet the latency requirements can be im-proved. This chapter looks into existing approaches in supporting low latency com-munications and cost-efficient resource management.
The remainder of this chapter is organized as follows. In Sec. 2.2, the back-ground on low latency applications is presented. Specifically, different low latency applications’ characteristics are investigated, followed by the introduction of end-to-end delay decomposition. Also, the latest technology enablers for low latency ap-plications are discussed. In Sec. 2.3, the classic resource management models and optimization methods are presented. In Sec. 2.4, different types of network costs and resources considered in this thesis are presented. Then, the correlation between cost efficiency, low latency and resource allocation is discussed to further shed light on the problem and design space of cost-efficient resource management. Further,
the related work of cost-efficient resource management is broken down into differ-ent subsections according to each subproblem, such as the design of cost-efficidiffer-ent low latency communication infrastructures; the design of cost-efficient dynamic re-source management; and the design of fault-tolerant rere-source management.
2.2
Background
2.2.1
Low Latency Applications
Low latency applications have received significant attention in the last decade. The application scenario ranges from time-critical sensor-based monitoring applications to the latest smartphone applications such as face recognition and AR. In the fol-lowing, three applications (AR, on-demand gaming, and smart grid monitoring ap-plications) are reviewed as the representative applications.
• Mobile augmented reality: exploits computer vision to display relevant in-formation as an overlay above a live view captured by camera [1, 29] of smart-phones. For instance, additional information such as street names, restau-rant ratings and number of parking places in a building could be added onto camera views to enhance user experience. However, such user experience enhancement relies on instant responses either from smartphones or remote clouds [30]. In the case of exploiting computing resources from remote clouds, smartphones’ uploading flows consume significant bandwidth, which may lead to potential network bottlenecks. Given that humans are sensitive to delays and user enhancement is a real-time functionality, the response time requirement for mobile AR is therefore strict, and is on the order of hundreds of milliseconds [1, 29, 30].
• On-demand gaming: also known as cloud gaming [31], refers to online video gaming where gamers do not execute games at user terminals, but exploit computing resources at server/cloud side to perform computing ex-pensive tasks (e.g., video transcoding). Specifically, rather than exploiting computing resources from resource-constrained terminals, on-demand gam-ing performs the intensive part of gamgam-ing computation (e.g., game graphics
generation) remotely in clouds with the processed output streamed back to end users. Such shift from conventional gaming terminals to clouds/servers frees users from dedicated gaming hardware, but it requires the support of interactive low latency communications between users and gaming instances hosted in networks. Previous studies [31] showed that players begin to notice a quality degradation when RRT is more than 80ms.
• Smart grid monitoring applications: are of great importance to the next generation power grid [17]. In such networks, system dynamics such as volt-age variation need to be monitored with high-frequency sampling rate [32]. By doing so, the real-time global network status is known to the power grid controller, which can further perform prompt control actions to protect power grids. However, these monitoring and control actions have to rely on low la-tency communication networks. In particular, time-critical applications run-ning on communication networks need to be delivered within very stringent latency constraints as information exchanged between grid components is useful/valid only within a predefined time window as small as 3ms. Any failure of meeting such delay requirement could result in cascading failures and large-scale blackout [33].
As can be observed from the abovementioned, each application has its own characteristics that require a specific approach to reduce end-to-end delays. For instance, for time-critical smart grid applications, network delays on network links are the components that need to be reduced due to high bandwidth consumptions. In contrast, for gaming and mobile AR, processing delays at end servers have to be considered along with network delays due to the fact that these applications require intensive CPU and GPU processing.
2.2.2
End-to-End Latency Decomposition
In order to shed light on potential methods to achieve low latency, we give the iden-tification of the various components of end-to-end delay, denoted as Te2e. Further,
End Server Terminal
End-to-End Communica4on (Te2e)
Hop1 Hop2 Hop3 Hop4
Figure 2.1: End-to-end communication example.
delay.
• Processing delay: denoted as tproc, the time used for operations such as
medium adaptation, (de)coding, switching, routing, message authentications codes generation / verification.
• Propagation delay: denoted as tprop, depends on the transmission medium and
the distance traveled by the signal.
• Transmission delay: denoted as ttrans, the time required to transmit the data
and is subject to the bandwidth of the underlying transmission technology.
• Queuing delay: denoted as tqueue, the time spent by data waiting for
transmis-sion and processing at the transmitting devices. For instance, the computing congestion at devices is due to the lack of computing resources, and it is a consequence of queuing delay at devices.
To clarify how end-to-end delay is composed by delay components, a detailed example is given in Fig. 2.1. We see that an end-to-end communication is com-posed of per-hop communication at each of network node (e.g., hop1, hop2, etc), which can be further decomposed into the abovementioned delay components. For instance, an end-to-end communication path from the two communication terminals is shown by a red bold line across routers. A packet in such example travels from
the sender machine to the next intermediate router, experiencing processing, propa-gation, transmission and queueing delays. Depending on the network protocols and the specific functionality, processing delay varies at each hop (i.e., processing delay in the terminals is higher than that of intermediate routers as application-level pro-cessing is involved). Furthermore, queueing delay is not necessarily experienced at each hop, and this depends on network conditions (e.g., network congestion), as well as workloads at the processing device.
2.2.3
Enabling Technologies
Virtualization technology [34] plays a key role in enabling cost-efficient low la-tency communications. By definition, virtualization refers to the act of creating a virtual version of computing resources (e.g., CPU, RAM, storage, etc), operating systems (OSs) and virtual networks. Such virtualization technology facilitates the management of physical resources through resource abstraction and virtual resource manager [7]. Also, the resulting monetary and operational costs can be dynamically optimized through resizable and migratable VMs. In the following, the related vir-tualization concept such as cloud computing, mobile edge computing, NFV and software-defined networking (SDN) is presented together with their applicability to cost-efficient low latency communications.
Cloud Computing: has become the predominant technology for hosting In-ternet applications and services in the last decade, where virtualized resources are grouped in the form of VMs and utilized to support computational tasks (e.g., transcoding) that used to run in physical machines. The major advantage of cloud technology is its intrinsic benefits brought by DC consolidation [30, 35], whereby the high resource utilization and concentration exploit economies of scale and lower the marginal cost of system administration and operations. In addition to consoli-dation, the other major advantage is the elastic control of virtual resources where the allocated virtualized resources can be elastically adjusted in the face of dynamic traffic. By doing so, cloud users can significantly reduce their CAPEX and OPEX as resources are consumed and charged only based on the actual consumption (i.e., no resource wastage).
Mobile Edge Computing [36]: Unlike cloud computing where virtualized resources are located at remote DCs, mobile edge computing exploits mobile edge-clouds (MECs) (e.g., micro-edge-clouds) that are installed at network locations (e.g., APs, aggregation points) close to end users. As such, mobile users offload compu-tationally expensive tasks to MECs’ VMs where task processing takes place. By doing so, communication delays can be largely reduced compared to offloading tasks to remote DCs. However, in order to achieve low latencies, edge comput-ing requires the deployment of a large number of micro-clouds, which breaks the DC consolidation and incurs significant operational costs [30]. To this end, cost-efficient resource management approaches for edge resources are required.
Network Function Virtualization: on the other hand, provides virtualized network functions (VNFs) that used to be embedded in dedicated network appli-ances such as firewall, load balancer and deep packet inspection (also referred as middleboxes) [8]. It aims to transform the way that network operators architect their networks by evolving standard IT virtualization technology to consolidate many network equipment types onto high volume commodity servers [8]. Such transformation towards VNFs enables efficient network resource sharing, whereas cloud computing and mobile edge computing enable computing resources sharing. Therefore, the joint use of NFV and cloud-related technologies is the key enabler for end-to-end cost-efficient resource management. In addition, when VNFs are in-terconnected, a service function chaining (SFC) [37] can be built which provides network processing as a chain of network functions. An important feature in SFC is the ordering of different VNFs, that is, network flows need to follow a specific order defined by ISPs or service providers (SPs) before they reach the end service.
Software-Defined Networking: was proposed to facilitate network configu-rations by decoupling data plane and control plane [38]. In SDN, network states such as network devices’ utilization are continuously monitored by software-based controllers, which can further implement network intelligence for routing config-uration decisions. Such configconfig-uration decisions are sent back via a programmable open interface [39] to virtualized network devices (e.g., virtual switch [40]) to
per-form optimized packet forwarding. In contrast, traditional network devices have a combined data plane and control plane, which makes the implementation of new network routing a difficult task. In the context of dynamic resource management, the role of SDN is to enable dynamic network routing, so that network resources connecting different virtual network components (e.g., clouds, MECs, virtual switch and VNFs) can be efficiently managed.
The abovementioned virtualization technologies provide virtualization at dif-ferent network locations to enable intelligent end-to-end resource management ap-proaches. For instance, cloud computing and mobile edge computing provide end service virtualization, whereas NFV provides virtualization for intermediate net-work equipment. On the other hand, SDN provides the routing control between each virtualized network entity.
Core Network VNF1 MEC 1 AP2 AP1 VNF2 VNF3 Cloud Services in DC SFC 1 SFC 2 SDN Controller Router Virtual Switch Control Plane
Figure 2.2: End-to-end routing and resource allocation.
Fig. 2.2 shows an example of end-to-end communications enabled by different virtualization technologies. Two end-to-end communication paths are presented, whereby Path 1 goes through SFC1 (e.g., composed of VNF1 and VNF2) to reach MEC1, and Path2 goes through SFC2 (e.g., composed of VNF1, VNF2, and VNF3) to reach the remote DC. In these examples, the computing resources and bandwidth
resources can be managed (e.g., scale up or scale down) in an on-demand manner following the variation of traffic.
2.3
Resource Management in Computer Networks
Given the underlying communication infrastructure and the upper layer virtualiza-tion, different network resources can be managed in a dynamic manner. In par-ticular, managing resources involves deriving the location and amount of network resources to be allocated. In this section, we firstly introduce different types of net-work resources and costs considered in this thesis. Then, the correlation between latency, resource and cost is discussed to shed light on potential resource manage-ment models and solutions.
2.3.1
Resource Types
• Computing resources refer to resources such as CPU, GPU and RAM. When a service is executed in a computer system, a number of processes are initial-ized, which occupy CPU cycles and RAM for computation. Furthermore, the utilized CPU and RAM consume energy (e.g., operational cost), which is highly correlated with resource utilization.
• Storage resources refer to resources such as disk space of computer systems, which are used to store data generated during service operation. For instance, certain services may record system logs and user-related data, which is stored in systems’ disks.
• Bandwidth resources are allocated through network interface cards, which determine the communication throughput.
2.3.2
Cost Definition
In the following, the definitions of costs considered in this thesis are presented. • Deployment costs: referred to as CAPEX, represent costs related to the
ex-penses on equipment such as network cables, routers, APs, gateway, mid-dleboxes and servers, as well as expenses incurred in equipment deploy-ment [41]. Specifically, the overall deploydeploy-ment costs highly depend on the
Allocated More Resources Low Latency High Costs Trade-off
Figure 2.3: Trade-off between latency and cost. Allocate more resources reduces latency, but results in high costs.
network scale, which can be interpreted as the number of aforementioned network equipment. As such, the deployment plans (e.g., static network plan-ning) highly affect the resulting deployment costs, and a careful design of communication infrastructures that minimizes the number of network equip-ment can effectively reduce deployequip-ment costs.
• Operational costs: refer to costs related to expenses incurred during net-work operation, which include staffing costs, energy consumption at netnet-work equipment (e.g., servers, APs, routers, etc) and network management costs. For instance, according to [42], a DC consumes as much energy as 25,000 households, representing the most significant part of ISPs’ OPEX. On the other hand, the impact of network management overheads (i.e., the number of management messages, the time required to configure a network, band-width consumption) also leads to operational costs. However, these costs can only be implicitly interpreted as monetary impact to network operators’ revenues. For example, considerable management overheads could result in network congestions, which in turn affect the network performance and op-erators’ revenues. To solve the cost issues, dynamic approaches such as tem-porarily shutting down low utilized physical machines or changing routing paths could largely reduce operational costs.
2.3.3
Correlation Between Latency, Resources, and Costs
Having specified different types of costs and delay components, the correlation be-tween these two factors is introduced from a resource allocation’s perspective. First, the correlation between costs and provisioned resources is straightforward, that is, the more resources (both computing and bandwidth resources) are allocated to a ser-vice, the more cost it incurs (see Fig. 2.3). For example, data-intensive low latency applications need considerable computing resources to achieve low processing de-lay, but considerable computing resources usually result in high operational costs due to high energy consumption.
The correlation between provisioned resources and resulting latency is less ob-vious, but it follows the same principle as the correlation between costs and resource provisioning. Specifically, the more resources are provisioned, the less latency a network packet experiences (see Fig. 2.3). For instance, processing any packet re-quires computing resources, and the speed of processing a packet is proportional to the allocated resources. In other words, the more CPUs are allocated, the faster a task will be computed. Similarly, exploiting high capacity bandwidth technology such as optical fiber, as opposed to low bandwidth technology, enables low trans-mission delay and low propagation delay.
Given the aforementioned correlation between latency, provisioned resources, and costs, it is now clear that provisioning more resources on the communication path can improve the resulting end-to-end latency, but leads to concerns in terms of costs. Given such trade-off, the problem of supporting cost-efficient low la-tency communication can be transferred into a cost-efficient resource management problem; that is, how to efficiently allocate network resources (e.g., computational resources, network resources) in a cost-efficient manner to guarantee low latency requirements.
2.3.4
Relevant Optimization Models
Optimization models provide the fundamental formulations for conventional re-source management problems, based on which advanced rere-source management problems (e.g., with more constraints) can be formulated. Originally, these
mod-els are designed to solve operational research problems, whereby the locations and capacities of warehouses need to be determined so that commodities of different sizes can be transported to warehouses with minimum costs [43]. Later, optimiza-tion models have been largely adopted in the design of telecommunicaoptimiza-tion networks and computer networks [44]. For instance, the locations of end servers and the se-lection of routing paths can be represented by a set of integer decision variables. The potential optimization algorithm determines a set of server locations and rout-ing paths based on problem inputs and objective. A typical problem input can be in the form of a set of user requests, which need to be routed and processed in the considered network. A typical objective can be the minimization of maximum bandwidth utilization, that is, minimizing the level of network congestion.
• Facility location: is one of the most common models in both static and dy-namic network resource allocation [43]. It considers a set of potential loca-tions for warehouses with fixed costs and capacities, and a set of customers with demands for goods supplied from these warehouses. The transportation cost per unit for goods supplied from warehouses to all customers is given. The problem is to derive the locations of a subset of warehouses that min-imize the total costs so that all customers can be satisfied without violating the capacity constraints of warehouses. At the same time, the problem needs to find the assignment of customers to facilities, which can be referred as transportation problems [45]. In order to adapt the facility location model to computer networks, a few changes need to be made. First, warehouses need to be replaced by servers. Then, customers need to be replaced by users, and transportation costs need to be represented by communication costs between servers and users.
• Set covering location: is the extension of the well-know set covering model [46]. In the set cover location, it finds the minimum number of facility locations that cover all customers within a distance constraint. A variant of the set covering problem, capacitated set covering problem (CSCP) [47], that takes the set capacities into account, can be applied to many network planning
scenarios. For instance, the minimum number of required servers to support a certain number of users can be formulated with CSCP, whereby each server has a capacity constraint, and each user has certain request demands that need to be served within a latency constraint.
• Multi-commodity flow: considers a routing problem whereby a certain num-ber of commodities need to be transported from a set of source nodes to a set of destinations without violating all link capacities [48]. Specifically, the routing decision variables can be either integer or linear variables, which correspond to non-splittable routing or splittable routing in a network. The multi-commodity flow model provides the basis for more advanced routing models where additional constraints (e.g., latency constraint, single source) can be expanded. For instance, [49] extended this model with an additional condition, whereby flows need to be processed by intermediate nodes in the network. As such, the placement and resource allocation of in-network pro-cessing (e.g., middleboxes, SFC) can be taken into account.
2.3.5
Optimization Solutions
Once resource management problems are formulated, a decision space consisting of a set of potential decisions will be created. For instance, an optimal routing path might need to be derived between two nodes in a network in order to achieve the lowest communication latency. To solve this, all possible paths between the two nodes will be first found to create a solution space, from which the optimal path will be searched. However, the optimization solution space in computer net-work resource management problems could become extremely large given the large problem input (e.g., a city can have millions of users, and a very complex computer network topology). As such, exhaustive search (e.g., brute-force [50]) is not often an effective and feasible solution to the formulated problems due to its complexity. To this end, efficient optimization strategies such as relaxation, meta heuristic and approximation can be applied to either reduce the problem complexity or accelerate the optimization searching process. The major difference between each intelligent
Upper bounds
Lower bounds
Optimal solutions to the minimization problem Objective Value Meta heuristic Heuristic Exact solution Relaxation Approximation F easi b le S o lu ti o n s In feasi b le S o lu ti o n s
Figure 2.4: Illustration of upper and lower bounds for minimization problems.
approach lies in the achieved optimality and the optimization running time. That is, how far the derived solution is from the optimum and how quick the solution can be obtained. In the following, two important concepts, lower bound and upper bound [50], are first introduced, which can be used to classify the abovementioned optimization solutions (see Fig. 2.4).
• Lower bound: refers to the solution space that achieves a smaller overall objective value (when the optimization problem is a minimization problem) compared to the optimum (see Fig. 2.4). However, such solutions are de-rived by omitting a certain constraint (e.g., relax integer constraints to linear constraints). As a result, the obtained solutions are not feasible solutions to the original problem. Typical approaches in constraint relaxation include lin-ear programming (LP) relaxation and Lagrangian relaxation [51], which will later be discussed in more details.
• Upper bound: refers to the solution space that achieves a higher overall objective value (when the optimization problem is a minimization problem) compared to the optimum (See Fig. 2.4). Unlike lower bound where the solu-tions are not feasible, upper bound solusolu-tions are found by searching through the feasible solution space that satisfies all constraints of the original problem.
Fig. 2.4 provides an overview of the mapping between each optimization solution and the achieved optimization performance. Clearly, most of the existing solutions (e.g., heuristic, meta heuristic and approximation) look for a solution in the feasi-ble solution space. In contrast, relaxation-related solutions achieve a better solution than the optimum, but cannot justify the feasibility of the obtained solutions. In the following, each optimization solution is discussed with its advantages and disad-vantages.
• Relaxation: refers to methods that solve a simplified version of the original problem (e.g., relax a certain constraints). By doing so, a complex combi-natorial optimization problem can be quickly solved. However, this obtained solution is not a feasible solution to the original problem. In order to ob-tain feasible solutions, the relaxed solutions need to be adjusted (e.g., in the case of integer linear programming (ILP) relaxation, the obtained linear so-lutions need to be rounded up or down) [50]. In addition, the relaxation with rounding techniques is widely used in deriving exact [52] and approximate solutions [53].
• Exact solution: solves the formulated problem in an optimal way. It either adopts an existing solver (e.g., CPLEX [54]) or approaches from brute-force enumeration (e.g., branch-and-bound, branch-and-cut, branch-and-price) to derive the set of optimal decisions [50]. However, exact solutions can be only applied to small problem instances as most of network resource management problems are ILP, and are therefore NP-hard [50].
• Meta heuristic: Given the problem input size and the resulting combinatorial solution space, the searching space might be extremely large, which motivates intelligent ways of searching optimal solutions in the problem space. In this sense, Meta heuristic solutions are devised to find near-optimal solutions by always keeping improving a candidate solution with regard to a given mea-sure of quality (e.g., objective value). Existing meta heuristic includes simu-lated annealing, genetic algorithms, ant colony optimization, tabu search and
etc [55].
• Heuristic: is a technique to quickly solve a complex optimization problem. The low execution time is achieved by sacrificing the optimality of solutions. Heuristic is a common approach to solve in resource management problems due to their complexity.
• Approximation algorithm: refers to algorithms that provide a performance guarantee in polynomial time. Specifically, it provides a resulting perfor-mance that is at most constant times of the optimum. However, it is challeng-ing to prove that an approximation algorithm has a performance guarantee.
In this thesis, the exact solutions, heuristics and approximation algorithms are thoroughly investigated for the design of resource allocation algorithms. In par-ticular, we aim to provide performance guarantees for ISPs with approximation algorithms, so that the worst-case network costs can be taken into account for ISPs’ networks.
2.4
Problem and Design Space
Having specified different optimization models and solutions, we now investigate the design challenges of a cost-efficient resource management framework. Essen-tially, this consists in finding the optimal allocation of different resources (e.g., com-puting resources, storage resources, and bandwidth resources) to achieve a certain required latency.
Fig. 2.5 shows an example of different subproblems considered in this the-sis. For instance, the lower part of Fig. 2.5 illustrates the design problem of cost-efficient underlying low latency networks, whereby the network capacities and the locations to install/deploy different network equipment (e.g., server, MEC, DC, etc) need to be optimally derived. Furthermore, the upper part of Fig. 2.5 provides an intuitive example of the dynamic end-to-end resource management and the fault-tolerant network resource management for low latency communications. For the dynamic end-to-end resource management problem, network resources (bandwidth
and computing resources) on two end-to-end communication paths need to to dy-namically allocated (see Fig. 2.5 the red path to MEC and the blue path to DC). Problems of this category not only consist in deriving the locations to instantiate virtualized resources, but also in finding the amount of required resources to guar-antee low latency requirements (e.g., the portion of shared computing/bandwidth resources in Fig. 2.5). Moreover, the back-up resources need to be allocated to pre-vent from failing low latencies during network failures (e.g., network links and node in green, see Fig. 2.5). In particular, the resulting costs of back-up resources need to be optimized. Access Network Core Network MEC AP2 AP1 DC End-to-End Path 1 VNF3 VNF4 VNF2 Shared Computing Resources Shared Bandwidth Resources Communication Infrastructure VNF1 Resource Management Framework Edge
Intermediate Network Locations
Remote End-to-End Path 2
Routing and Resource Allocation in Virtualized Network
Allocated Computing Resources for Path 1 Allocated Computing Resources for Path 2
VNF5
Back-Up Path
Back-Up Computing Resources
Figure 2.5: End-to-end resource management framework.
2.4.1
Static Network Planning
The first step towards a cost-efficient resource management framework is to provide low latency communication infrastructures for ISPs. This consists of finding the optimal placement of physical network equipment such as router, server, network link and etc (see the lower part of Fig. 2.5). Conventional approaches to solveing the low latency network planning problem (i.e., guarantee worst-case latency) adopt overprovisioning [56], whereby the capacity of network equipment is provisioned according to the predicted peak workload. However, ISPs’ networks are not fully
utilized during off-peak times as the average level of workload is much smaller than the peak workload [57]. Obviously, overprovisioning approaches result in resource wastage, and are not cost-efficient. To solve this, a trade-off between cost efficiency and low latency needs to be made in the design of the underlying communication infrastructures.
To achieve the abovementioned trade-off, advanced techniques focused on op-erational research [58] and graph theory [59], have been applied at different stages of network planning. First, when networks need to be designed from scratch, fa-cility location, set cover location and transportation models are largely adopted to model the decision-making problem with respect to the locations of network links and servers [44]. In addition, capacitated models such as capacitated facility loca-tion [43] and capacitated set cover localoca-tion [47] are adopted to formulate not only the locations but also the capacities of network links and servers. Since network planning takes place at the design stage, there is no actual requirement in terms of optimization algorithms’ running time. That is, network planning optimization is offline optimization, and can, therefore, afford long running time incurred by ex-act solutions. Second, when existing networks need to increase their capacity to accommodate higher traffic [60], decisions such as where to deploy additional net-work links to increase bandwidth resources need to be made in an efficient way, such that the network upgrade costs (e.g., required additional network equipment) can be minimized. To solve the decision-making problem, a network performance analy-sis with respect to network congestion locations is required to first understand the demanded capacity of the considered network. Further, an analysis of the expected network traffic after network upgrade needs to be performed in order to provide the capacities.
An extensive amount of work has been carried out to solve network planning problems (e.g., design from scratch) [61, 62, 63, 64]. They adopted exact solutions (e.g., brute-force approach) with CPLEX optimizer to derive the locations and ca-pacities of network equipment respectively in the context of smart grid, wireless sensor network, DC network and mobile edge network. However, the exact
so-lutions can only solve small-scale complex planning problems or simple network planning problems in reality. As such, the large-scale complex network planning problem (e.g., IoT) will lead to infinite optimization running time if exact solutions are adopted. Compared to network planning problems from scratch, the network upgrade problem has received little attention, and is considered to be more complex due to the additional constraints imposed by existing network topologies. In this sense, analysis with advanced graph theory and complex network theory [65] will be required to first understand the problems faced by existing networks.
In Sec. 3, we proposed a novel network upgrade approach specifically targeting cost-efficient upgrade problems for low latency communication networks.
2.4.2
Dynamic End-to-End Resource Management
Given the underlying communication infrastructures and the latest advancements in virtualization, different network resources (e.g., bandwidth resources, comput-ing resources and storage resources) on end-to-end paths (see the upper layer of Fig. 2.5) can be jointly optimized to achieve cost-efficient low latency networks. Specifically, such optimization process involves simultaneously determining re-source allocation on end servers and routing paths between re-source network nodes and destination network nodes. In the following, resource allocation problems are classified based on locations where allocation takes place. First, resource allocation at remote DC (e.g., cloud servers) is reviewed (see Fig. 2.5). Second, resource al-location in the context of mobile edge computing will be reviewed whereby MECs are located at network edges (see Fig. 2.5). Last, routing algorithms that derive the paths between users and end services are reviewed (see Fig. 2.5).
Cost-efficient Resource Allocation in Cloud Computing: Cost efficiency [15, 66, 67, 68, 69, 70, 71, 72, 73, 74] has been extensively studied in the context of cloud computing over the last years. Most of the work [15, 69, 71, 72] in this domain focused on achieving cloud consolidation by dynamically allocating/reallocating VMs that host end services, thereby minimizing the number of active servers. The other direction in improving cloud cost efficiency considers monetary costs for cloud users [66, 68, 70, 73, 75]. In the first category, [15, 69, 71, 72, 74] studied the
energy consumption minimization problem in a single DC. The objective of these problems can be either the minimization of a number of active servers [69, 71, 74] or the minimization of the resulting power consumption [15, 72]. Unfortunately, work in DC consolidation focused on the reduction of energy consumption, and did not consider computational tasks’ deadline (this will be discussed later), which is a key requirements for low latency services.
In contrast to the abovementioned studies, work in deadline-constrained auto-scaling and scheduling [56, 68, 73] focused on achieving latency requirements of cloud services via elastic resource allocation and scheduling. This class of work considered task completion deadlines. They aimed to derive the minimum allocated computing resources and the optimal sequence of task processing (e.g., prioritize packet processing for packets with lower latency) to efficiently meet task comple-tion deadlines. However, since network delays dominate the entire end-to-end delay, the achieved latency savings in DCs with scheduling techniques are limited com-pared to latency savings achieved by MECs, which largely reduces network delays. Resource Allocation in Edge/Fog Computing: Unlike providing services from remote DCs, edge/fog computing aims to bring services closer to end users by exploiting virtualized resources in micro-clouds located at network edges. By doing so, the communication latency can be largely reduced, but it raises concerns in cost efficiency due to the distributed nature of micro-clouds and the break of DC consolidation. Limiting the number of distributed clouds can resolve the cost issue, but would result in the violation of low latency requirements. In this sense, a trade-off between the amount of allocated resources and delays needs to be addressed (i.e., the more resources are allocated to a service, the faster the processing will be). Most of the recent work in MEC resource allocation [25, 76, 20, 77, 78] con-sidered that end services have already been placed/instantiated in VMs from MECs, and the uploading of application logics is not required. As such, mobile users can simply upload traffic to MECs to be processed. These studies investigated the ser-vice placement, network planning, dynamic resource allocation and user admission problems. Compared to similar problems in the context of DC-based cloud
comput-ing, the two distinguishing features, resource limitation and strict latency constraint of edge computing systems, need to be carefully considered. Existing work such as [78] considered distributed service placement for multiple services in resource con-strained MECs. The authors adopted a mixed integer programming to formulate the problem, and aimed to find the service placement that minimizes admission failures. Then, they solved it with heuristics inspired by caching content placement heuristics [79]. [80] studied a MEC planning problem where they formulated a slightly differ-ent facility location problem that has a predetermined K MECs to be placed. In this problem, both locations of MEC and routing paths need to be determined such that the average communication delay is minimized. [81] considered an admission con-trol problem in computing resource-constrained MECs. In particular, they adopted the distribution of arriving mobile users and the average MEC service rate to model the gained utility of admitting a mobile user with a Semi-Markov decision process. Next, they integrated the Markov decision model into a LP problem formulation, and solved the problem with existing LP solver. [82] considered load balancing in MEC whereby they devised two dynamic workload-to-MEC assignment algorithms (heuristic and genetic algorithm) to minimize the maximum average response time in all MECs.
Most of the existing work in edge computing focused on minimizing end-to-end latencies with fixed-location micro-clouds. However, the trade-off between achieved latency and the cost efficiency of edge cloud resources has not been stud-ied. At the same time, the potentials of jointly exploiting dynamic routing and dynamic resource allocation on end-to-end communication paths have not been re-vealed. To this end, we review existing routing approaches in the following.
Dynamic Request Routing : Routing aims to find network paths between two end pairs that minimize the accumulated metrics of traversed paths. Depend-ing on the optimization objective, such metric can be defined in different forms such as congestion-caused monetary loss, link latency or amount of available link bandwidth. Conventional routing algorithms such as distance vector algorithms and link-state algorithms adopt different searching methods to find shortest paths
be-tween a set of paired locations [26]. However, in order to apply routing policies at routers, network operators need to separately configure each on-path router with low-level and often vendor-specific commands [7], which is difficult to achieve in the current Internet paradigm (e.g., equipment from different vendors).
SDN decouples the data plane from the control plane to facilitate the imple-mentation of complex routing decisions. As such, SDN can more intelligently and dynamically optimize the use of network resources compared to conventional ap-proaches. Most of the work to this regard focused on optimal routing approaches that optimize certain objective functions such as (e.g., minMax link utilization, max-imum request admission rate, minMax server utilization). For instance, [83, 84] considered a routing optimization problem in SDN, whereby they aimed to find routing paths that maximize the admitted flows while conforming to SDN for-warding table size. Specifically, the problem of Ternary Content Aware Memory (TCAM) was taken into account, which is a limited and expensive resource. As a result, the number of flows that can go through a TCAM-based router is constrained by the forwarding table size. To solve the problem, [83] adopted a randomized rounding approach that provides a performance guarantee on the overall admitted flow. [84] adopted a graph theory approach to construct an auxiliary graph convert-ing network node capacity constraints (e.g., forwardconvert-ing table size constraint) to link constraints (e.g., using the capacity of an added virtual link to represent the node capacity).
Summary of dynamic end-to-end resource management : Clearly, resource allocation in DCs does not entirely resolve latency issues for low latency applica-tions (i.e., extreme low latency requirements cannot be met). In contrast, edge com-puting addresses latency issues, but faces issues in cost efficiency. This requires a joint optimization of dynamic routing and dynamic resource allocation at MECs. Existing work in routing with TCAM constraints firstly introduced the concept of jointly optimizing resources on network nodes and links. However, the joint opti-mization of resources from end cloud servers and network links has not yet been addressed. As such, it might occur that abundant amount of on-path network
band-width is allocated to an end-to-end communication path, but an end cloud server does not possess enough computing resources to process the routed traffic [85]. As a result, the provisioned on-path resources are not efficiently utilized (i.e., resource wastage) due to the bottleneck at the end server, which would lead to latency viola-tions.
In Sec. 4, we solved the online cost-efficient resource management problem for ISPs by adopting MECs. Specifically, we aim to dynamically minimize the resulting operational cost of different network resources while always achieving the required low latencies by optimally exploiting different resources.
2.4.3
Fault-Tolerant End-to-End Resource Management
The faulty hardware and software could severely affect communication latencies. Therefore, we consider a specific scenario in cost-efficient end-to-end resource management, which is the resource allocation in SFC. In this scenario, different VNFs such as deep packet inspection, firewall, and load balancer are chained to-gether to provide more complex network services. As such, for services that need to go through SFCs, the end-to-end latency will highly depend on the placement of VNFs, requests-to-VNFs assignment and the routing between VNFs.
[86] considered an ordered VNF placement and routing problem in a network resource-constrained environment. In this problem, the authors considered three objectives when optimizing the VNF placement, aimed at balancing the resulting load on network links, minimizing the number of used network nodes for host-ing VNF instances and minimizhost-ing end-to-end latencies of the created paths. [87] considered an unordered VNF placement and routing problem for operational cost minimization, whereby they adopted a facility location model and a general ment model to respectively formulate the VNF placement and VNF request assign-ment. To solve it, they proposed an approximation-based algorithm that leveraged linear programming relaxation and rounding techniques to find near-optimal solu-tions. [88] aimed to find both the optimal VNF placement and the assignment of requests to VNF chains. They first considered a maximum network link utilization minimization problem, and then considered an energy minimization problem. A