Application and Research of Read write Separation Technology in Active active Database

(1)

2018 International Conference on Information, Electronic and Communication Engineering (IECE 2018) ISBN: 978-1-60595-585-8

Application and Research of Read-write Separation Technology in

Active-active Database

Xue-jun LI

1

, Qian-jun WU

2,

*, Dan JIN

1

and qing-quan DONG

2

1

State Grid Gansu Electric Power Company, Gansu, China

2

Information System Integration Company, NARI Group Corporation, Nanjing City, Jiangsu Province, China

*Corresponding author

Keywords: Information system, Disaster recovery, Read-write separation cluster, Active-active data center.

Abstract. Data security, business, and performance could be unexpectedly affected in the case of information system interruption, and thus an active-active disaster recovery system is needed to recover the system urgently. For such a purpose, this paper conducted the research using a read-write separation technology, analyzed its technical advantages of solving high-concurrency business system performance bottlenecks, and applied the redo-based read-write separation technology to the data layers between two data centers. A complete active-active architecture was designed for an information system to achieve a dual active data center disaster recovery system. Then, an OMS business system was upgraded by using this design scheme. It can not only achieve disaster recovery of data, keep service continuity after a disaster, and zero-data loss, but also the response speed of the new system increased by more than 50%, based on the read-write separation technology under normal status which improved the overall business processing capability of the system.

Introduction

With the successful construction of the SG186 project, of the State Grid Corporation has built an information support platform based on an enterprise-class information integration platform, eight business applications, and six security systems [1-3]. The dependence on data and network applications for company and users is increasing. The longer the time, the more irresistible disasters could be caused, such as earthquakes, tsunami and other destructive events and human factors, etc.). This could result in a great impact on data and business of the entire information system, and even cause a devastating blow to the company. Therefore, it is indispensable for the State Grid Corporation to maintain the security and reliability of its information system business within a long-term sustainable operation while a disaster-tolerant system [4-6] is desirable. In this paper, we conduct the research on the existing disaster recovery architecture in order to identify the key characteristics of an intelligent dispatching management system. Hereafter, an active-active disaster recovery system with read/write separation function was designed by integrating the REDO read/write separation clustering technology. Such a design is utilized as a disaster recovery system in an original intelligent dispatch management system (OMS system) and demonstrates the system performance.

(2)

The OMS system is a high-concurrency transactional system. When the writing transactions is relatively less intensive, compared to the reading transactions, the read-write split cluster technology of the Da Meng 7 database [15] can be used. This paper describes the design and presents the results of our implementation in such an OMS system.

Design For Active-Active Data Center

The Architecture Design

[image:2.595.182.414.334.478.2]

As seen in Figure 1, a data center includes a Global Server Load Balancing (GSLB) and multiple Server Load Balancing (SLB) components in the access layer for users to access the available applications. SLB/GSLB provides load balancing across multiple servers in the application layer. In the application layer, distributed resource scheduling and intelligent service discover are modeled to flexible access the data in the Data Layer. Meanwhile, the Data Layer maintains the data synchronization and replication to ensure data availability with redundancy and reliability at the lowest layer named Storage Layer. Furthermore, the architecture in Figure 1 should be able to provide "active-active" capability for continuous services in the entire data center. This is achieved by designing "active-active" technologies across the Access Layer, Application Layer, Data Layer, and Storage Layers as described in the following subsections.

Figure 1. The architecture design of active-active system. Active-active Design for Access Layer

The Access Layer is a channel for users to access the business system from the Internet. A global load balancer is used at this layer, and the user is allocated to the data center through multiple clusters by using the global DNS (Domain Name System) and near-access characteristics of the load balancer. In the process, the primary data center DNS is used as the resolved root domain name. In our design, two NS records are configured on the DNS of the primary data center in a way to reach the load balancing over two clusters. The domain name is resolved by using the functions of the intelligent DNS and assigns the parsed access to the primary data center or the secondary data center according to the proximity strategy. In the “active-active” mode, in order to make full use of dual-center resources, it is necessary to properly guide user traffic in the network layer and balance the network load between the two centers.

Active-active Design for Application Layer

(3)

implementing server clusters and middleware clusters. Thus, the application "double live" can be realized by deploying applications to different application servers.

Active-active Design for Data Layer

The database is the core of the business system and is used to organize and manage business data[10]. The "double live" of the database requires the dual-center database to run online at the same time and provide a unified service to support the same application loading. When there is a problem with a central database, it can quickly switch automatically and keep the service continuously during the switching process. Compared with the disaster recovery center in the standby mode, the active-active database avoids the impact of the handover process on the foreground application, and improves the capacity of the database through load balancing configuration.

Database "active-active" can be achieved through database clustering technology. Database clustering technology provides transparent data services for applications by constructing a virtual single database logical image, like a single database system. To realize cross-regional database clustering, connectivity of the dual-centric and second-tier networks must be ensured and storage virtualization must be realized firstly, and then, the database clusters in the existing production center can be expanded or the existing data servers in the cluster can be Placed in two data centers.

The redo-based read/write separation technology is deployed on the database cluster to implement read/write separation. the basic idea of implementing read-write split clustering is that all operations are sent to the standby database first to take advantage of the standby database to provide read-only service and cannot modify data characteristics when the host split ratio is zero. Once an error occurs during the standby database is executed, it is Send back to the main database and executed again. By "trial and error" operation of the standby database, the read-only operation is naturally offloaded to the standby database. Meanwhile, the "trial and error" operation of the standby database is automatically completed by the interface layer and is transparent to applications.

When the host split ratio is greater than zero, it is automatically allocated according to the proportion of transactions performed by the database. The host directly performs the assigned transaction. The standby machine is continue by using the above-mentioned “trial and error” mode.

[image:3.595.208.389.481.631.2]

The connection process of the read and write separate cluster database as following, as shown in Figure 2.

Figure 2. Read and write split connection creation. Step 1. The user initiates a database connection request.

Step 2. Interfaces (JDBC, DPI, etc.) are configured to log in to the main database based on the service name.

Step 3. The main database picks an IP/Port that is valid for the standby database and returns it to the interface.

Step 4. The interface initiates a connection request to the standby database based on IP and Port information of the returned standby.

Step 5. The standby database returns connection success information.

(4)

[image:4.595.232.368.106.252.2]

After the connection is created, the read-write split cluster implements the statement distribution process as shown in Figure 3 as following:

Figure 3. Read and Write Separation Cluster Statement Distribution Process. Step 1. The interface receives the user's request.

Step 2. The interface sends SQL to the standby database first.

Step 3. The standby database return the execution result. If the interface receives an execution success message, go to step 6, or if the interface receives an execution failure message, go to step 4.

Step 4. The failed SQL is Send back to the main database and executed again. As long as the SQL in step 3 fails to execute in the standby database, all subsequent operations (including read-only operations) for the same transaction are sent directly to the main database for executing.

Step 5. The main database executes and returns the execution result to the interface. Once the write transaction executed on the main database is submitted, the execution will continue from step 1 next time.

Step 6. The interface responds to the user and returns the execution result to the user.

According to the distribution process of the read and write separation cluster, it can get better performance through the reasonable planning application of the business logic as shown in Figure 4.

Figure 4. Read-write transactions are proportionally separated.

1. Plan transactions as read-only and purely modified as much as possible to avoid invalid trial and error for standby database.

2. The read operation should be placed before the write operation, and pressure of the system is apportioned by the readability of the repository.

E. Active-active Design for Storage Layer

[image:4.595.240.360.472.613.2]

(5)

1. It realizes real-time synchronization of data between two data centers, so as to guarantee zero data loss under abnormal conditions;

2. It provides storage shared volumes for simultaneous access by two data center hosts to implement cross-site deployment of host application clusters. This ensures that applications can switch automatically when an exception is guaranteed.

Experiments

Operations in the Regular Situations

After the active-active design from the access layer, application layer, data layer, to the storage layer of the prime and standby systems, an active-active disaster recovery system with a read/write separation function has been built.

[image:5.595.202.394.345.482.2]

It takes the successful implementation of intelligent dispatching management system (OMS system) as an example. Its main system is called provincial deployment while the standby system is called standby database. Under the normal circumstances, user access requests are connected to the provincial deployment and standby by using the domain name and proximity strategy which realizes the active-active of the active/standby system. Therein, the prime database is responsible for writing, 35% for reading when 65% reading for standby database, and second level replication of prime and standby databases, as shown in Figure 5.

Figure 5. Operations in the Regular Situations.

(6)

[image:6.595.89.516.99.457.2]

Table 1. Comparison of page response times of the active-active and the original system. Seria

l

Functiona

l Operation

Response speed of system(s) Percentage of promotion Legacy† Active-active

1 Plan Login system to the main

console is refreshed 2.99 2.07 44%

The power outage planning management /

View the flow chart

1.28 0.83 54%

The power outage planning management /

show log

1.9 0.57 200%

Follow-up personnel

selection 2.1 1.22 72%

2 Regulation Login system to the main

The user opens the scheduling page until the

refresh is completed

5.59 2.31 104%

Select "Scheduling Staff " 1.23 0.86 43% 3 System Login system to the main

From the fixed value list

Enter the task 1.18 0.68 74%

4 Relay Login system to the main

Task page loading

completed 0.95 0.78 21%

5 Automatio n

Login system to the main

From the data in the Table 1, it can be seen that the new active-active system has a significantly shorter response time on the page, and some operation page times have been reduced by a factor of two. From the feedback of user experience, users indicate that the new system operates more smoothly than the old system. The slow phenomenon has barely appeared. More importantly, it is almost 7*24 hours available. There is no longer any situation where the old system could not be used because of a single failure.

Operation in the Disaster Situations

If a disaster occurs in the primary database, the standby database will take over the database service within 10 seconds to undertake the read and write tasks. In addition to the database, all other hardware and software resources saved by the province will continue to provide services and improve resource utilization. In this case, the user is almost unaware, as shown in Figure 6.

[image:6.595.204.394.623.758.2]

(7)

Similarly, when the OMS server is unavailable, the roles of the prime and standby databases do not change. When the global load balancing performs domain name resolution, it will issue the expiration time of the domain name. After the expiration time of the domain name, the user will return to the standby server and load balancing address when accessing the domain name. The standby server provides services. The entire active-active system continues to provide services for users.

Acknowledgment

This research was financially supported by the Science and Technology projects of State Grid Corporation of China (NO. 500409081).

References

[1] Liu Xin, Zhu Kaijin. Research on Data Replication Mode of Data Backup Center of State Grid Corporation of China [J]. Silicon Valley, 2012(14): 94-95.

[2] Wu Xiang, Ouyang Hong, Dong Lijuan et al. Research and Application of Marketing Business Application System of State Grid Corporation of China [J]. Electric Power Informatics, 2011, 9(2): 49-54.

[3] Cheng Zhihua, Li Hongfa, Su Zhuo et al. Design and Research of Disaster Recovery Center of Centralized Information System[J]. Electric Power Informatization, 2012, 9(2):77-80.

[4] Li Chunyan, Sun Yuanzhang, Chen Xiangyi et al. A preliminary analysis of the "11.4" blackout accident in Western Europe and the measures to prevent large-scale blackouts in China [J]. Grid Technology, 2006, 30(24): 16-21.

[5] Zhang Xiaoyun, Jiang Xudong, Zhou Xiaoyu, etc., Application of Disaster Recovery System in Electric Power Enterprises [J] Guangdong Electric Power, 2006, 19 (2): 67-69

[6] Hu Yonghua, Chen Yutao, Yunnan Power Grid Corporation Disaster Recovery System Research and Construction [J]. Electric Power Information, 2010, 8(5):29-31.

[7] Zhang Bin, Ji Yutian, Zhang Yonglian, An Overview of Shanghai Disaster Recovery Center Construction of State Grid Corporation of China [J]. Supply and Use, 2011, 28(2):10-12.

[8] Yao Feng, Zhang Huafeng, Research and Application of Disaster Recovery Technology for Power Grid Enterprise Information System [J], Power Informatization, 2012, 10(1):76-79.

[9] LU Xiao-qiang. Dual live data center network architecture[J]. Financial Technology Time, 2013(7): 63-65.

[10] MA Ji-bo. Analysis on the transformation of data center to "doublelive" cloud data center[J]. Shandong Social Sciences, 2013(S1):244-245.

[11] Shi Chenyang, Shou Hongyu. Future-oriented city network architecture of the twin city[J]. Financial Computerizing, 2013(11): 52-54

[12] Xie Pengyu, Zeng Mingxuan, Hang Cong, etc., Discussion on the Application of Enterprise-level Application of Double-living in the Same City of Guangxi Electric Power Grid [J]. Guangxi Electric Power, 2018, 41(1):51-55.

[13] Jia Bo, Zhang Jisheng, Yang Fei, etc., Research and design of electric marketing business application system in the same city[J]. Electric Power Information and Communication Technology, 2017, 15(2):66-70.

[14] Du Congqiang, Sun Bingtang, Application of DB2 Database Read/Write Detachment Technology in Bank Core System[J]. Information Technology and Informatics, 2017, 12:51-53.