This section presents an overview of optical grid networks, and their dimensioning and scheduling problems. An optical grid network is an emerging network originated from WDM optical networks and grid computing. In grid networks, distributed resources (computing or storage elements as well as scientific instruments) are interconnected to support compute-intensive and data-intensive applications [61]. Nowadays, most
critical scientific applications, multimedia applications, and business grids need to exchange huge amounts of data between the distributed sites. Optical networks are employed to provide high-bandwidth optical fibers and lightpaths for data transfer between interconnected grid resources. The grid is upgraded to the so-called Optical Grid [70].
The term“grid” arises from electrical “power grid”, the idea is that accessing the computing power and storage of computers connected through some types of networks is similar, as accessing to electrical power from an electrical grid [2]. The consumers of electricity do not care which electric grid station provides electricity. Similarly, the users of an optical grid network do not need to worry about where a given job will be executed. Hundreds of computer grids are available around the world; they are used in different areas of research, such as biological science, earth science, high energy physics, engineering, among others. Currently, there are few service providers who commercially offer grid resources on-demand, such as Amazon’s cloud computing ”Elastic Compute Cloud” [68].
Recall that an optical grid network corresponds to geographically spread resources in different locations, connected through an optical transport network, and consisting of core and access networks. The core network is connected through Optical Cross Connects (OXCs) and optical fibers, and in access networks, each site is connected to the OXCs through optical fibers or any other media. A site comprises users and the computing resources. Each optical fiber contains a limited technology-dependent number of wavelengths, and each wavelength has also technology-dependent data rate
Database Server Database Server Database Server
Figure 2.5: Homogeneous optical grid network.
(bandwidth) [65]. An optical grid network may consist of homogeneous or heteroge- neous resources. Homogeneous resources refer to all the server nodes with the same functionality, i.e., each server node offers a similar type of services. For example, Fig- ure 2.5 shows server nodes that offer data-intensive services. However, heterogeneous resources grid network offer different types of services, as shown in Figure 2.6. In this particular case, one node offers video services only, another node offers information services. There is yet another node that offers two services: application (computing) and data-intensive services.
In terms of traffic volume, it is expected that by 2016, global data center traffic could reach 6.6 zettabytes (1 ZB = 1021 bytes), and nearly two thirds thereof will be
cloud traffic [1]. This growing demand of traffic requires a reliable and high-bandwidth communication medium, i.e., optical fibers. These fibers can be efficiently used by applying Dense Wavelength Division Multiplexing (DWDM) technology, i.e., running
Video Server Application & Database Servers Information Server
Figure 2.6: Heterogeneous optical grid network.
multiple wavelength carriers simultaneously over the same physical fiber to provide large bandwidth and thus as cost-effective solution to the network providers. Given the continually rising bandwidth demands, today’s solutions can run 100Gb/s per wavelength (40-80 wavelengths on each fiber pair using DWDM). Currently, flexible grid networks are being considered: the flexible grid refers to the adaptive transceivers and intelligent nodes, allowing service providers to increase the bandwidth without overhauling it [25]. This new paradigm is called Elastic Optical Networking (EON).
2.2.1
Advance Reservation
Advance reservation (AR) system is also important in some fields of optical grid networks including data-intensive and video conferences for surgery. For example, if a surgeon is assisting a colleague to perform a surgery at a remote site, AR ensures availability of required bandwidth on the specified time [23].
2.2.2
Anycast Routing and Wavelength Assignment (AC-
RWA)
The optical network is a prominent candidate for high data rate communications, reliable and economical as compared to others. In traditional optical networks, users have fixed destinations to execute their jobs, while, in an optical grid network, a user does not care about where the job is to be executed; this is known as anycast
routing, also referred to as location transparency [50]. This major difference of optical
grid networks require the architecture of a flexible optical layer, routing, wavelength assignment, dimensioning, and task scheduling strategies [20].
Given the amount of traffic, the determination of required resources (number of servers and link capacity) in optical grids is referred to as the dimensioning problem. A dimensioning problem in optical grid networks is different than in classical optical networks in two ways [21]. First one, needs to find suitable destination; optical grids work on anycast routing, where only the source is known and the destination can be selected to be any best node that can execute the requested job/task. Secondly, the task can also be lost because of lack of executing resources.
A key problem in optical grid networks is how to efficiently manage the available infrastructure in order to satisfy user requirements and maximize resource utilization. This is in large part influenced by the routing and scheduling of tasks [63], which leads to develop efficient routing and scheduling strategies.
2.2.3
Fault Tolerance in Optical Grid Network Survivability
In an optical grid network, WDM mesh optical network survivability techniques can be used. Faults can also occur in optical grids as in traditional optical networks, and these faults may occur because of the failure of a link, a node, or server resources. In a grid environment, users do not care about the faults due to anycast principle. In anycast routing, destinations are not fixed, so if there is any resource failure on the primary server, a submitted job should be diverted to the backup server. Different schemes are used for the backup server, but in optical grid networks, resources are pre-computed for backup [47]. In addition to those hardware faults, there is also the possibility of software faults occurring in applications, operating systems, proto- cols, among others. Common software faults include unhandled exceptions (run time errors), division by zero, and memory leaks.
Two recovery strategies exist for providing fault tolerance in an optical grid net- work: Job check-pointing and replication. Job check-pointing periodically stores the image of a job, which can be restored in case of failure. In replication, a job is sent to the primary as well as to the replication (secondary) server. If there is a failure on the primary server, the replication server will continue taking the execution of the job [14]. For a recent survey on strategies for fault tolerance in optical grid networks see [12].