2.5 Review of replication techniques usage
2.5.7 Replication for High Availability
Disaster Recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organiza-tion after a natural or human-induced disaster. Disaster recovery is a subset of business continuity. While business continuity involves planning for keeping all aspects of a business functioning in the midst of disruptive events, disaster re-covery focuses on the IT or technology systems that support business functions.
Data replication tools presented in this subsection are used to implement disaster recovery solutions.
A system fulfill High Availability demands when it ensures a certain degree of operational continuity during a given period of time. Availability refers to the ability of the users to access the system, while they are submitting theirs new work, changing work previously done, or gathering the results of previous work.
If a user cannot access the system, it is said to be unavailable [81]. Generally, the term downtime is used to refer to periods when a system is unavailable.
Data replication for ensuring High Availability of the system is typically re-alized as one-way replication with a source data disks and destination data disks in different location. Only data in the source database can be modified (read and write operation allowed), while the destination data is not accessible at all, or is accessible only for read operations. In systems with relational databases data
replication based on Hot Stanby technology is used to ensure High Availability in these systems. Hot Standby implementations are Oracle Data Guard [70] im-plementations, or IBM DB2 High Availability Disaster Recovery (HADR) [10].
When the primary database fails, the secondary instance starts to provide service for clients. To start secondary database, Switchover (performed on the user’s de-mand) or Failover (automatically after crash) operations are available. The main problem in such configurations is that resources of the secondary database are not utilized.
Such products as EMC Symmetrix Remote Data Facility (SRDF) [23] or Veri-tas Volume Replicator by Symantec [106] provides remote replication for disaster recovery and business continuity. Processes realized on data blocks level provide host-independent data replication to one or more physically separate systems, which allows companies to deliver 7 x 24 x 365 data availability.
Veritas Volume Replicator and EMC Symmetrix Remote Data Facility are available for many different operating systems (Oracle Solaris, IBM AIX, HP-UX, Linux, etc.). DRBD [33] is an open source software for replication and is implemented for for with platforms Linux based operating system.
2.6 Summary
The majority of the presented in literature approaches designed for data repli-cation in complex, multi-node environments (for instance multi-agent systems) focuses mainly on the fault-tolerance problems within a system and the security of the system. These approaches address the problems related to the communi-cation and interaction between agents, as well as the coordination of the agents.
Data synchronization in distributed, multi-node systems is performed between a large number of remote nodes, thus, it is required to ensure appropriate scalabil-ity level of the whole system with data replication. Since data in the distributed systems can be stored in different database management systems, running on variety of operating systems and hardware platforms, it is necessary to enable cooperation between replication nodes in such heterogeneous systems.
Table 2.2 presents the analysis of the data replication approaches in the as-pects of theirs possible usage in various data management systems, as well as
provides the review of existing implementations of these approaches. Most of researches related to data replication in distributed multi-node systems focus on ensuring of system fault tolerance within a multi-agent system, or address the difficulties of making reliable mobile agents which are more exposed to security problems. Thus, the main goal of this research is to propose the replication method for the class of distributed systems with large number of nodes that will ensure high level of scalability.
Aspect System example Replication Approaches Practical implementation .Transactional
Real Time Systems Replication approaches based on RC-COS algorithm,
Table 2.2: Aspects of data replication usage
The following requirements are defined to gain the desired efficiency and avail-ability of the designed replication method in the distributed multi-node systems:
• reduction or elimination of the locks between remote replicas which causes that the usage of distributed transactions is not a must,
• minimization of the amount of the messages exchanged between replication nodes.
• transaction processing in parallel.
• ensuring high level of scalability.
• portability and easiness of usage in heterogeneous environments.
The aim of this research is to propose the replication method for the class of distributed systems with large number of nodes, that will overcome the lim-itations of the currently available approaches, and will be suitable for systems in which demand on high level of scalability is a key issue. The approach is to be suitable for systems working in heterogeneous environments based on various platforms, operating systems or database vendors. Especially, The designed ap-proach should be applicable for IBIS multi-agent system, as a representative of the mentioned above class, to support communication and exchange of information among remote agents.