We have described delegation through the infrastructure service interface and distributed coordina- tion through the DCP interfaces and protocols. The infrastructure service interface is a clear point of interoperability between infrastructure service users and providers that warrants standardisation and we have described a possible approach to structure an appropriate standard. The interface itself and the data model passed over the interface can be separated to some extent by using an abstract representation in the interface, but a tighter alignment and agreement on data models is far more beneficial than the interface alone. Standardisation of this interface through data model and interface specification is currently being proposed by CloNe to European Telecommunications Standards Institute (ETSI).
The DCP includes a variety of interfaces and protocols as it encompasses technology specific interaction between administrative domains (such as BGP) and higher level management coordina- tion between infrastructure services (such as resource life cycle coordination through management functions), some of which are public standards, some are proprietary. We have identified MOM and REST as suitable generic technologies to be supported in the DCP. As the management functions develop we will continue to re-examine the DCP to identify any future areas for standardisation.
5 Cloud Network Management
The CloNe management architecture is based on three management functions - goal translation, fault management and resource management. These management functions are in general focused on challenges concerning scalability, efficiency, manageability, adaptability and reliability, in order to cope with e.g. increasing network complexity, equipment heterogeneity, and fast reconfiguration in cloud networking. We will here give an overview of the initial management architecture. In Section 5.1 we present the management concepts. Section 5.2 presents the management architec- ture, the distributed management functions and relevant interfaces. In the Sections 5.2.5 and 5.2.6, we present general management processes and collaboration workflows between the management functions, followed by an overview of the initial approaches addressing cloud network management concepts in Section 5.3.
5.1 Management Concepts
As efficient and flexible usage of resources are some of the most important driving forces for the adoption of cloud networking solutions, management solutions should be designed to be highly efficient, scalable, adaptive and autonomous. The cloud network management should provide func- tionality for efficient and effective management of computational, storage and network resources, facilitate management of legacy networks, and utilise legacy management capabilities where needed. In order to be practically deployed, reliability and controllability must be ensured to a very high de- gree in all cloud networking solutions. Flexibility, reliability, and resilience can be achieved through decentralised, collaborative management solutions that to a high degree autonomously adapt to the dynamic conditions in the cloud network.
For the overall management of FNSs, we identify and present three important management concepts: goal translation, fault management, and resource management, which cover the most critical aspects of cloud network management, such as transformation of high-level objectives to low- level resource configuration objectives, security goals (further described in Section 7), configuration, monitoring, flexible resource allocation, and efficient optimisation.
Goal translation (GT) is needed for expressing high-level objectives in terms of low-level resource parameters for the configuration of FNSs, and facilitates dynamic optimisation and reconfiguration of the service infrastructure. Management via goals enables users with different backgrounds and agendas to request and manage services, without the need to deal with low-level aspects of compo- sition and configuration of services. Competing service providers can delegate control of the service to its users without the need for disclosing the service infrastructure or business sensitive informa- tion. For robust self-management capabilities of cloud network management functions, uncertainty can be encoded into the goals taking into account the volatile service infrastructure environment.
Fault management (FM) is critical in providing resilience to faults and performance degradations, and to ensure reliability in cloud network services. Fast, accurate detection and reconfiguration is thus essential for maintaining service, connectivity and QoS. Fault management solutions need to be both reactive and proactive to network failures, for fast identification of the root cause and fault handling - preferably before the problem causes a noticeable degradation of the service or a violation to a high-level objective. Scalable and autonomous fault management solutions are necessary, to handle both growing network complexity and volatile network environments. Quick
adaptation to changes in the network is crucial, and collaborative detection and localisation of faults and disturbances must be efficient to reduce communication overhead. It is therefore required that fault management solutions to a large extent operate in a decentralised manner.
The resource management (RM) concept is dynamic localisation and allocation of necessary re- sources in an efficient and timely manner. Scalable and efficient resource scheduling mechanisms enable fast location and prioritisation of available resources at a given time, ensuring short reaction times for FNS creation and adaptation in order to minimise disruptions in FNS operations. Mech- anisms for quick adaptation for equipment joining and leaving the pool of resources are essential for effective and dynamic allocation of resources. In large systems that may span multiple domains controlled by individual stakeholders exposing more or less limited capabilities and resource in- formation, a homogeneous abstraction layer is required to abstract over potential heterogeneities for flexible FNS management. The unpredictability that follows from complex and dynamic net- work environments must be taken into account in order to avoid e.g., resource starvation. As asynchronous and concurrent creation of many FNSs prohibits the use of commonly used resource- blocking protocols, it is essential to design efficient algorithms that obtain a consistent snapshot of the resource situation while avoiding excessive temporary reservation of resources. To ensure that the resource allocation is optimised at all times, efficient live migration mechanisms operating seamlessly from the application and user are also necessary.
The combination of goal translation, autonomous fault management, and efficient resource man- agement, has the potential to provide the foundation for a highly efficient management plane, in which resources are easily configured, managed, and monitored for malfunction.