SQL Database Design for Distributed Processing System

The database is the core part of the distributed processing data management system and holds the business rules, cluster parameters data, and calculation data. The database is a part of the cluster controller system; hence, the database resides in the dedicated cluster controller server. Most of the database entities are designed to facilitate different types of calculation cluster configurations, and they have the flexibility to develop further if required in the future. The database is a relational database; hence, it has a certain order that connects each database object, and different types of database object can be added when needed. Table 4.9 lists SQL database table types and their roles.

Table 4.9: SQL database table types and their roles

Table Type Description

Calculation Node Reference Table

All the calculation node workstation’s details are maintained in the calculation node’s reference table. This table is the main driver table, and the data is used for collecting various parameters in the cluster.

Parameter Table

Used for maintaining the system parameters and metadata for the distributed processing systems operations.

Temporary Tables

A number of temporary data holding tables that used for collecting each calculation node’s various status data during the calculation node checking process. The data collected in these tables are used by the cluster controller for monitoring the cluster status and operations.

Computer System Parameter Tables

Used for holding each calculation node’s system-related data. The data is provided by the Windows operating system’s WMI classes. Collected data is used by the cluster controller for monitoring.

Event Log Table

Used for capturing critical events during the cluster operations; captured events data is used to analyse and monitor varous events during the cluster operations.

Historical Data Tables

Used for capturing historical data; captured data is used for monitoring and analysing cluster performance and applying effective scheduling algorithms.

4.11.1 SQL Database Table Design

Each table is designed to represent each entity, and a logical relationship between the entities corresponds to the relationship between SQL tables. System data tables are populated using WMI classes to capture the appropriate data from each calculation node workstation. Some data in the main table (tbl_NW_COMPUTER) is maintained manually for monitoring purposes, and this table has most of the calculation node workstation-related data. Table 4.10 shows SQL table names ad their descriptions.

Table 4.10: SQL table names and their descriptions

Table Name Description

tbl_APPLICATION Application-related data

tbl_BATCH_DATA Batch processing-related data

tbl_BATCH_ID Batch ID-related data

tbl_COMPUTER_TEMP Workstation parameters

tbl_CPU Workstation CPU-related data

tbl_CPU_CORE Workstation CPU core-related data

tbl_CPU_TEMP CPU parameters collected by WMI classes

tbl_CPU_USAGE Workstation CPU usage data

tbl_DISK_DRIVE_TEMP Workstation HD-related data

tbl_EVENT_LOG Event/Error-related data for the whole system

tbl_MEMORY Workstation memory-related data

tbl_MEMORY_TEMP Memory parameters collected by WMI classes

tbl_MEMORY_USAGE Workstation memory usage data

tbl_NETWORK_ADAPTER_TEMP Workstation network card-related data

tbl_NW_COMPUTER Workstation data

tbl_NW_COMPUTER_SCAN Workstation status monitoring data

tbl_NW_DP_RISK_DATA_TEST Risk calculation test data

tbl_PARAM_DATA Distributed processing system parameters

The SQL database is mainly used to manage the following:  Calculation node-related parameters

 Cluster controller-related parameters  Historical performance data

 Calculation-related data  Batch processing-related data

The following parameters are collected on demand using WMI classes:  Workstation hardware and software data

 Memory capacity data

 CPU, CPU-core, and CPU speed data  Storage devices data

 Network card data

 Network connection speed data

The following parameters are continuously collected using WMI classes:  Workstation availability related data

 Memory usage data  CPU usage data

 Calculation start and finish times



Processing events, warnings, and errors

Collected data is used by the cluster management controller for allocating tasks to each calculation node to perform a distributed calculation in an efficient way. How these collected data are used for efficient scheduling and load balancing algorithms is discussed in detail in Chapter 5. The SQL RDBMS database has sets of rules to which the data must comply for efficient database operations, and for distributed process management database, the rules are split into two categories: Hardware-based rules and software-based rules. Figure 4.15 shows the hardware-related entity relationship, and Figure 4.16 shows the software-related entity relationship.

The hardware-related entity relationship logical rules are as follows: Each management controller can have many clusters

Each cluster can have many calculation nodes Each calculation node can have many CPU units Each calculation node can have many memory units Each calculation node can have many network cards Each calculation node can have many storage devices Each CPU can have many cores

The software-related entity relationship logical rules are as follows: Each application can have many programs

Each program can have many calculations Each calculation can have may processes Each process can have many events Each event can have many time-slots

Figure 4.15: Hardware-related entity relationship

Figure 4.16: Software-related entity relationship

Controller

CPU Unit

CPU-Core

Cluster

Memory Unit Storage Device NIC

Application Program Calculation Process Event Time Slot

In document Smart Distributed Processing Technologies For Hedge Fund Management (Page 75-79)