Tungsten Replicator 2.2

(1)

(2)

Continuent

Copyright © 2013 and beyond Continuent, Inc.

Abstract

This manual documents Tungsten Replicator 2.2.

Build date: 2014-04-08, Revision: 952

(3)

Preface . . . xii

1. Legal Notice . . . xii

2. Conventions . . . xii

1. Introduction . . . 14

1.1. Tungsten Replicator . . . 14

1.1.1. Extractor . . . 14

1.1.2. Appliers . . . 15

1.1.3. Transaction History Log (THL) . . . 15

1.1.4. Filtering . . . 15

2. Deployment . . . 16

2.1. Requirements . . . 18

2.1.1. Operating Systems Support . . . 18

2.1.2. Database Support . . . 18

2.1.3. RAM Requirements . . . 18

2.1.4. Disk Requirements . . . 18

2.1.5. Java Requirements . . . 19

2.1.6. Cloud Deployment Requirements . . . 19

2.2. Deployment Sources . . . 19

2.2.1. Using the TAR/GZipped files . . . 20

2.2.2. Using the RPM and DEB package files . . . 20

2.3. Deploying a Master/Slave Topology . . . 21

2.3.1. Monitoring a Master/Slave Dataservice . . . 22

2.4. Deploying a Multi-master Topology . . . 24

2.4.1. Management and Monitoring . . . 26

2.4.2. Alternative Multimaster Deployments . . . 28

2.5. Deploying a Fan-In Topology . . . 28

2.5.1. Management and Monitoring Fan-in Deployments . . . 30

2.6. Deploying a Star Topology . . . 32

2.7. Deploying a Multi-site (SOR) Topology . . . 36

2.7.1. Shard Configuration and Management . . . 36

2.8. Deploying Oracle Replication . . . 36

2.8.1. How Oracle Replication Works . . . 38

2.8.2. Data Type Differences and Limitations . . . 38

2.8.3. Creating a MySQL to Oracle Deployment . . . 40

2.8.3.1. Configure the MySQL database . . . 40

2.8.3.2. Configure the Oracle database . . . 40

2.8.3.3. Install the Master replicator service . . . 41

2.8.3.4. Create the Destination Schema . . . 42

2.8.3.5. Install Slave Replicator . . . 42

2.8.4. Creating an Oracle to MySQL Deployment . . . 42

2.8.4.1. Creating the Oracle Environment . . . 43

2.8.4.2. Creating the MySQL Environment . . . 44

2.8.4.3. Creating the Master Replicator . . . 44

2.8.4.4. Creating the Destination Schema . . . 45

2.8.4.5. Creating the Slave Replicator . . . 46

2.8.5. Creating an Oracle to Oracle Deployment . . . 46

2.8.6. Troubleshooting Oracle Deployments . . . 46

2.9. Deploying MySQL to MongoDB Replication . . . 46

2.9.1. Preparing Hosts . . . 47

2.9.2. Installing MongoDB Replication . . . 48

2.10. Deploying MySQL to Amazon RDS Replication . . . 52

2.10.2. Installing Amazon RDS Replication . . . 53

2.10.4. Changing Amazon RDS Instance Configurations . . . 56

2.11. Deploying MySQL to Vertica Replication . . . 57

2.11.1. Preparing Hosts for Vertica Deployments . . . 57

2.11.2. Installing Vertica Replication . . . 59

2.12. Deploying Infobright Replication . . . 63

(4)

2.13. Deploying InfiniDB Replication . . . 64

2.13.2. Installing InfiniDB Replication . . . 65

2.14. Deploying PostgreSQL Replication . . . 65

2.14.2. Installing PostgreSQL Replication . . . 66

2.15. Additional Configuration and Deployment Options . . . 66

2.15.1. Deploying Multiple Replicators on a Single Host . . . 66

2.16. Replicating Data Into an Existing Dataservice . . . 67

2.17. Starting and Stopping Tungsten Replicator . . . 69

2.18. Configuring Startup on Boot . . . 70

2.19. Upgrading Tungsten Replicator . . . 70

2.19.1. Upgrading Installations using update . . . 70

2.19.2. Upgrading Tungsten Replicator to use tpm . . . 71

2.19.3. Upgrading Tungsten Replicator using tpm . . . 72

2.19.4. Installing an Upgraded JAR Patch . . . 74

3. Advanced Deployments . . . 75

3.1. Migrating and Seeding Data . . . 75

3.1.1. Migrating from MySQL Native Replication 'In-Place' . . . 75

3.1.2. Seeding Data through Oracle . . . 76

3.2. Deploying Parallel Replication . . . 76

3.2.1. Application Prerequisites for Parallel Replication . . . 76

3.2.2. Enabling Parallel Apply . . . 77

3.2.3. Channels . . . 77

3.2.4. Disk vs. Memory Parallel Queues . . . 78

3.2.5. Parallel Replication and Offline Operation . . . 78

3.2.5.1. Clean Offline Operation . . . 78

3.2.5.2. Tuning the Time to Go Offline Cleanly . . . 78

3.2.5.3. Unclean Offline . . . 79

3.2.6. Adjusting Parallel Replication After Installation . . . 79

3.2.6.1. How to Change Channels Safely . . . 79

3.2.6.2. How to Switch Parallel Queue Types Safely . . . 79

3.2.7. Monitoring Parallel Replication . . . 79

3.2.7.1. Useful Commands for Parallel Monitoring Replication . . . 79

3.2.7.2. Parallel Replication and Applied Latency On Slaves . . . 80

3.2.7.3. Relative Latency . . . 80

3.2.7.4. Serialization Count . . . 81

3.2.7.5. Maximum Offline Interval . . . 81

3.2.7.6. Workload Distribution . . . 81

3.2.8. Controlling Assignment of Shards to Channels . . . 82

3.3. Batch Loading for Data Warehouses . . . 83

3.3.1. How It Works . . . 83

3.3.2. Important Limitations . . . 84

3.3.3. Batch Applier Setup . . . 84

3.3.4. Connect and Merge Scripts . . . 85

3.3.5. Staging Tables . . . 86

3.3.5.1. Staging Table Names . . . 86

3.3.5.2. Whole Record Staging . . . 87

3.3.5.3. Delete Key Staging . . . 87

3.3.5.4. Staging Table Generation . . . 87

3.3.6. Character Sets . . . 87

3.3.7. Time Zones . . . 88

3.4. Deploying SSL Secured Replication and Administration . . . 88

3.4.1. Creating the Truststore and Keystore . . . 88

3.4.1.1. Creating Your Own Client and Server Certificates . . . 89

3.4.1.2. Creating a Custom Certificate and Getting it Signed . . . 90

3.4.1.3. Using an existing Certificate . . . 91

3.4.1.4. Converting SSL Certificates for keytool . . . 92

3.4.2. SSL and Administration Authentication . . . 93

3.4.3. Configuring the Secure Service through tpm . . . 93

4. Operations Guide . . . 96

4.1. Checking Replication Status . . . 96

4.1.1. Understanding Replicator States . . . 98

(5)

4.1.3. Changing Replicator States . . . 99

4.2. Managing Transaction Failures . . . 100

4.2.1. Identifying a Transaction Mismatch . . . 100

4.2.2. Skipping Transactions . . . 102

4.3. Provision or Reprovision a host . . . 103

4.4. Creating a Backup . . . 103

4.4.1. Using a Different Backup Tool . . . 104

4.4.2. Backup a Different Host . . . 104

4.5. Restoring a Backup . . . 104

4.5.1. Restoring a Backup to a Different Node . . . 105

4.6. Switching Master Hosts . . . 105

4.7. Configuring Parallel Replication . . . 107

4.8. Performing Database or OS Maintenance . . . 108

4.8.1. Performing Maintenance on a Single Slave . . . 108

4.8.2. Performing Maintenance on a Master . . . 109

4.8.3. Performing Maintenance on an Entire Dataservice . . . 109

4.9. Making Online Schema Changes . . . 110

5. Command-line Tools . . . 111

5.1. The ddlscan Command . . . 111

5.2. The thl Command . . . 111 5.2.1. thl list Command . . . 111 5.2.2. thl index Command . . . 113 5.2.3. thl purge Command . . . 113 5.2.4. thl info Command . . . 114 5.2.5. thl help Command . . . 114 5.3. The tpm Command . . . 114

5.3.1. Comparing Staging and INI tpm Methods . . . 115

5.3.2. Processing Installs and Upgrades . . . 118

5.3.3. tpm Command-line Configuration . . . 119

5.3.3.1. Configuring default options for all services . . . 119

5.3.3.2. Configuring a single service . . . 120

5.3.3.3. Configuring a single host . . . 120

5.3.3.4. Reviewing the current configuration . . . 120

5.3.3.5. Installation . . . 120

5.3.3.6. Upgrades and Updates . . . 121

5.3.3.7. Making configuration changes . . . 121

5.3.4. tpm INI File Configuration . . . 122

5.3.4.1. Creating an INI file . . . 122

5.3.4.2. Installation with INI File . . . 122

5.3.4.3. Upgrades with INI File . . . 122

5.3.4.4. Making configuration changes . . . 122

5.3.5. tpm Commands . . . 123 5.3.5.1. tpm configure Command . . . 124 5.3.5.2. tpm diag Command . . . 124 5.3.5.3. tpm fetch Command . . . 124 5.3.5.4. tpm firewall Command . . . 124 5.3.5.5. tpm help Command . . . 124 5.3.5.6. tpm install Command . . . 125 5.3.5.7. tpm mysql Command . . . 125 5.3.5.8. tpm query Command . . . 125 5.3.5.9. tpm reset Command . . . 127 5.3.5.10. tpm reset-thl Command . . . 127 5.3.5.11. tpm restart Command . . . 128 5.3.5.12. tpm reverse Command . . . 128 5.3.5.13. tpm start Command . . . 128 5.3.5.14. tpm stop Command . . . 128 5.3.5.15. tpm update Command . . . 128 5.3.5.16. tpm validate Command . . . 129 5.3.5.17. tpm validate-update Command . . . 129 5.3.6. tpm Configuration Options . . . 129 5.3.7. Troubleshooting . . . 172

5.4. The trepctl Command . . . 172

5.4.1. trepctl Options . . . 173

5.4.2. trepctl Global Commands . . . 174

5.4.2.1. trepctl kill Command . . . 174

5.4.2.2. trepctl services Command . . . 174

(6)

5.4.2.4. trepctl version Command . . . 176

5.4.3. trepctl Service Commands . . . 176

5.4.3.1. trepctl backup Command . . . 177

5.4.3.2. trepctl capabilities Command . . . 178

5.4.3.3. trepctl check Command . . . 178

5.4.3.4. trepctl clear Command . . . 179

5.4.3.5. trepctl clients Command . . . 179

5.4.3.6. trepctl flush Command . . . 179

5.4.3.7. trepctl heartbeat Command . . . 180

5.4.3.8. trepctl load Command . . . 181

5.4.3.9. trepctl offline Command . . . 181

5.4.3.10. trepctl offline-deferred Command . . . 182

5.4.3.11. trepctl online Command . . . 183

5.4.3.12. trepctl properties Command . . . 185

5.4.3.13. trepctl purge Command . . . 186

5.4.3.14. trepctl reset Command . . . 187

5.4.3.15. trepctl restore Command . . . 187

5.4.3.16. trepctl setrole Command . . . 187

5.4.3.17. trepctl shard Command . . . 188

5.4.3.18. trepctl start Command . . . 189

5.4.3.19. trepctl status Command . . . 189

5.4.3.20. trepctl stop Command . . . 195

5.4.3.21. trepctl unload Command . . . 196

5.4.3.22. trepctl wait Command . . . 196

5.5. The multi_trepctl Command . . . 197

5.5.1. multi_trepctl Options . . . 197

5.5.2. multi_trepctl Commands . . . 199

5.5.2.1. multi_trepctl list Command . . . 199

5.5.2.2. multi_trepctl run Command . . . 200

5.6. The setupCDC.sh Command . . . 200

5.7. The tungsten_provision_slave Script . . . 204

5.8. The tungsten_read_master_events Script . . . 205

5.9. The tungsten_set_position Script . . . 205

5.10. The updateCDC.sh Command . . . 206

6. Using the Cookbook . . . 208

7. Replication Filters . . . 209

7.1. Enabling/Disabling Filters . . . 210

7.2. Enabling Additional Filters . . . 211

7.3. Filter Status . . . 211 7.4. Filter Reference . . . 212 7.4.1. BidiRemoteSlaveFilter . . . 214 7.4.2. BuildAuditTable . . . 214 7.4.3. BuildIndexTable . . . 214 7.4.4. CaseMappingFilter . . . 215 7.4.5. CDCMetadataFilter . . . 215 7.4.6. ColumnNameFilter . . . 215 7.4.7. ConsistencyCheckFilter . . . 217 7.4.8. DatabaseTransformFilter . . . 217 7.4.9. DummyFilter . . . 217 7.4.10. EnumToStringFilter . . . 218 7.4.11. EventMetadataFilter . . . 219 7.4.12. HeartbeatFilter . . . 219 7.4.13. LoggingFilter . . . 219 7.4.14. MySQLSessionSupportFilter . . . 220 7.4.15. OptimizeUpdatesFilter . . . 220 7.4.16. PrimaryKeyFilter . . . 220 7.4.17. PrintEventFilter . . . 221 7.4.18. RenameFilter . . . 221

7.4.18.1. Rename Filter Examples . . . 223

7.4.19. ReplicateColumnsFilter . . . 224 7.4.20. ReplicateFilter . . . 224 7.4.21. SetToStringFilter . . . 225 7.4.22. ShardFilter . . . 226 7.4.23. TimeDelayFilter . . . 227 7.5. JavaScript Filters . . . 227

7.5.1. Writing JavaScript Filters . . . 228

(7)

7.5.1.2. Getting Configuration Parameters . . . 229

7.5.1.3. Logging Information and Exceptions . . . 229

7.5.1.4. Exposed Data Structures . . . 230

7.5.2. JavaScript Filter Reference . . . 235

7.5.2.1. ansiquotes.js Filter . . . 235 7.5.2.2. breadcrumbs.js Filter . . . 236 7.5.2.3. dbrename.js Filter . . . 237 7.5.2.4. dbselector.js Filter . . . 238 7.5.2.5. dbupper.js Filter . . . 238 7.5.2.6. dropcolumn.js Filter . . . 239 7.5.2.7. dropcomments.js Filter . . . 240 7.5.2.8. dropmetadata.js Filter . . . 240 7.5.2.9. dropstatementdata.js Filter . . . 241 7.5.2.10. foreignkeychecks.js Filter . . . 241 7.5.2.11. insertsonly.js Filter . . . 242 7.5.2.12. nocreatedbifnotexists.js Filter . . . 242 7.5.2.13. noonlykeywords.js Filter . . . 243 7.5.2.14. pgddl.js Filter . . . 243 7.5.2.15. shardbyseqno.js Filter . . . 244 7.5.2.16. shardbytable.js Filter . . . 244 7.5.2.17. tosingledb.js Filter . . . 245 7.5.2.18. truncatetext.js Filter . . . 245 7.5.2.19. zerodate2null.js Filter . . . 246

8. Performance and Tuning . . . 247

8.1. Block Commit . . . 247

8.1.1. Monitoring Block Commit Status . . . 247

9. Configuration Files and Format . . . 249

A. Troubleshooting . . . 250

A.1. Contacting Support . . . 250

A.2. Error/Cause/Solution . . . 251

A.2.1. Too many open processes or files . . . 251

A.2.2. The session variable SQL_MODE when set to include ALLOW_INVALID_DATES does not apply statements cor-rectly on the slave. . . . 251

A.2.3. Unable to update the configuration of an installed directory . . . 252

A.3. Known Issues . . . 252

A.3.1. Triggers . . . 252

A.4. Troubleshooting Timeouts . . . 252

A.5. Troubleshooting Backups . . . 252

A.6. Running Out of Diskspace . . . 252

A.7. Troubleshooting Data Differences . . . 253

A.8. Comparing Table Data . . . 253

A.9. Troubleshooting Memory Usage . . . 253

B. Release Notes . . . 254

B.1. Tungsten Replicator 2.2.0 GA (23 December 2013) . . . 254

C. Prerequisites . . . 259

C.1. Staging Host Configuration . . . 259

C.2. Host Configuration . . . 260

C.2.1. Creating the User Environment . . . 260

C.2.2. Configuring Network and SSH Environment . . . 261

C.2.2.1. Network Ports . . . 262

C.2.2.2. SSH Configuration . . . 262

C.2.3. Directory Locations and Configuration . . . 263

C.2.4. Configure Software . . . 263

C.2.5. sudo Configuration . . . 264

C.3. MySQL Database Setup . . . 264

C.3.1. MySQL Configuration . . . 264

C.3.2. MySQL User Configuration . . . 267

C.4. Oracle Database Setup . . . 267

C.5. PostgreSQL Database Setup . . . 267

D. Terminology Reference . . . 268

D.1. Transaction History Log (THL) . . . 268

D.1.1. THL Format . . . 268

D.2. Generated Field Reference . . . 271

D.2.1. Terminology: Fields activeConnectionsCount . . . 271

D.2.2. Terminology: Fields alertMessage . . . 271

D.2.3. Terminology: Fields alertStatus . . . 272

(8)

D.2.5. Terminology: Fields appliedLastEventId . . . 272

D.2.6. Terminology: Fields appliedLastSeqno . . . 272

D.2.7. Terminology: Fields appliedLatency . . . 272

D.2.8. Terminology: Fields callableStatementsCreatedCount . . . 272

D.2.9. Terminology: Fields channels . . . 272

D.2.10. Terminology: Fields clusterName . . . 272

D.2.11. Terminology: Fields connectionsCreatedCount . . . 272

D.2.12. Terminology: Fields currentEventId . . . 272

D.2.13. Terminology: Fields currentTimeMillis . . . 272

D.2.14. Terminology: Fields dataServerHost . . . 273

D.2.15. Terminology: Fields dataServiceName . . . 273

D.2.16. Terminology: Fields driver . . . 273

D.2.17. Terminology: Fields extractCount . . . 273

D.2.18. Terminology: Fields extensions . . . 273

D.2.19. Terminology: Fields extractTime . . . 273

D.2.20. Terminology: Fields highWater . . . 273

D.2.21. Terminology: Fields host . . . 273

D.2.22. Terminology: Fields isAvailable . . . 273

D.2.23. Terminology: Fields isComposite . . . 273

D.2.24. Terminology: Fields lastCommittedBlockSize . . . 273

D.2.25. Terminology: Fields lastCommittedBlockTime . . . 273

D.2.26. Terminology: Fields lastError . . . 273

D.2.27. Terminology: Fields lastShunReason . . . 273

D.2.28. Terminology: Fields latestEpochNumber . . . 273

D.2.29. Terminology: Fields masterConnectUri . . . 273

D.2.30. Terminology: Fields masterListenUri . . . 274

D.2.31. Terminology: Fields maximumStoredSeqNo . . . 274

D.2.32. Terminology: Fields minimumStoredSeqNo . . . 274

D.2.33. Terminology: Fields name . . . 274

D.2.34. Terminology: Fields offlineRequests . . . 274

D.2.35. Terminology: Fields pendingError . . . 274

D.2.36. Terminology: Fields pendingErrorCode . . . 274

D.2.37. Terminology: Fields pendingErrorEventId . . . 274

D.2.38. Terminology: Fields pendingErrorSeqno . . . 274

D.2.39. Terminology: Fields pendingExceptionMessage . . . 274

D.2.40. Terminology: Fields pipelineSource . . . 274

D.2.41. Terminology: Fields precedence . . . 274

D.2.42. Terminology: Fields preparedStatementsCreatedCount . . . 275

D.2.43. Terminology: Fields relativeLatency . . . 275

D.2.44. Terminology: Fields resourcePrecedence . . . 275

D.2.45. Terminology: Fields rmiPort . . . 275

D.2.46. Terminology: Fields role . . . 275

D.2.47. Terminology: Fields seqnoType . . . 275

D.2.48. Terminology: Fields sequence . . . 275

D.2.49. Terminology: Fields serviceName . . . 275

D.2.50. Terminology: Fields serviceType . . . 275

D.2.51. Terminology: Fields simpleServiceName . . . 275

D.2.52. Terminology: Fields siteName . . . 275

D.2.53. Terminology: Fields sourceId . . . 275

D.2.54. Terminology: Fields state . . . 275

D.2.55. Terminology: Fields statementsCreatedCount . . . 275

D.2.56. Terminology: Fields timeInStateSeconds . . . 275

D.2.57. Terminology: Fields transitioningTo . . . 275

D.2.58. Terminology: Fields uptimeSeconds . . . 276

D.2.59. Terminology: Fields url . . . 276

D.2.60. Terminology: Fields vendor . . . 276

D.2.61. Terminology: Fields version . . . 276

D.2.62. Terminology: Fields vipAddress . . . 276

D.2.63. Terminology: Fields vipInterface . . . 276

D.2.64. Terminology: Fields vipIsBound . . . 276

E. Files and Directories . . . 277

E.1. The Tungsten Replicator Install Directory . . . 277

E.1.1. The backups Directory . . . 277

E.1.1.1. Purging Backup Files . . . 277

E.1.1.2. Copying Backup Files . . . 278

E.1.1.3. Relocating Backup Storage . . . 278

(9)

E.1.3. The releases Directory . . . 279

E.1.4. The service_logs Directory . . . 279

E.1.5. The share Directory . . . 279

E.1.6. The thl Directory . . . 279

E.1.6.1. Purging THL Log Information . . . 280

E.1.6.2. Moving the THL File Location . . . 281

E.1.6.3. Changing the THL Retention Times . . . 282

F. Internals . . . 283

F.1. Extending Backup and Restore Behavior . . . 283

F.1.1. Backup Behavior . . . 283

F.1.2. Restore Behavior . . . 283

F.1.3. Writing a Custom Backup/Restore Script . . . 283

F.1.4. Enabling a Custom Backup Script . . . 285

F.2. Memory Tuning and Performance . . . 285

G. Frequently Asked Questions (FAQ) . . . 286

H. Ecosystem Support . . . 287

H.1. Managing Log Files with logrotate . . . 287

(10)

2.1. Topologies: Component Architecture . . . 16

2.2. Topologies: Master/Slave . . . 21

2.3. Topologies: Multiple-masters . . . 24

2.4. Topologies: Fan-in . . . 29

2.5. Topologies: Star . . . 32

2.6. Topologies: MySQL to Oracle . . . 37

2.7. Topologies: Oracle to MySQL . . . 37

2.8. Topologies: Oracle to Oracle . . . 38

2.9. Topologies: MySQL to MongoDB . . . 46

2.10. Topologies: MySQL to Amazon RDS . . . 52

2.11. Topologies: MySQL to Vertica . . . 57

2.12. Topologies: MySQL to Infobright . . . 63

2.13. Topologies: MySQL to InfiniDB . . . 64

2.14. Topologies: MySQL to PostgreSQL . . . 65

2.15. Topologies: Replicating into a Dataservice . . . 67

5.1. tpm Staging Based Deployment . . . 116

5.2. tpm INI Based Deployment . . . 117

7.1. Filters: Pipeline Stages on Masters . . . 209

7.2. Filters: Pipeline Stages on Slaves . . . 210

C.1. Tungsten Deployment . . . 259

(11)

2.1. Key Terminology . . . 17

2.2. Data Type differences when replicating data from MySQL to Oracle . . . 38

2.3. Data Type Differences when Replicating from Oracle to MySQL or Oracle . . . 39

2.4. setupCDC.conf Configuration File Parameters . . . 43

4.1. Node States . . . 98 5.1. thl Options . . . 111 5.2. TPM Deployment Methods . . . 117 5.3. tpm Common Options . . . 119 5.4. tpm Core Options . . . 123 5.5. tpm Commands . . . 123 5.6. tpm Configuration Options . . . 130

5.7. trepctl Command-line Options . . . 173

5.8. trepctl Replicator Wide Commands . . . 174

5.9. trepctl Service Commands . . . 176

5.10. trepctl backup Command Options . . . 177

5.11. trepctl clients Command Options . . . 179

5.12. trepctl offline-deferred Command Options . . . 182

5.13. trepctl online Command Options . . . 183

5.14. trepctl purge Command Options . . . 186

5.15. trepctl reset Command Options . . . 187

5.16. trepctl setrole Command Options . . . 187

5.17. trepctl shard Command Options . . . 188

5.18. trepctl status Command Options . . . 189

5.19. trepctl wait Command Options . . . 196

5.20. multi_trepctl Command-line Options . . . 197

5.21. multi_trepctl--output Option . . . 198

5.22. multi_trepctl Commands . . . 199

5.23. setupCDC.sh Configuration Options . . . 200

5.24. tungsten_provision_slave Command-line Options . . . 204

5.25. tungsten_read_master_events Command-line Options . . . 205

5.26. tungsten_set_position Command-line Options . . . 206

D.1. THL Event Format . . . 269

(12)

1. Legal Notice

The trademarks, logos, and service marks in this Document are the property of Continuent or other third parties. You are not permit-ted to use these Marks without the prior written consent of Continuent or such appropriate third party. Continuent, Tungsten, uni/clus-ter, m/clusuni/clus-ter, p/clusuni/clus-ter, uc/connector, and the Continuent logo are trademarks or registered trademarks of Continuent in the United States, France, Finland and other countries.

All Materials on this Document are (and shall continue to be) owned exclusively by Continuent or other respective third party own-ers and are protected under applicable copyrights, patents, trademarks, trade dress and/or other proprietary rights. Under no circum-stances will you acquire any ownership rights or other interest in any Materials by or through your access or use of the Materials. All right, title and interest not expressly granted is reserved to Continuent.

2. Conventions

This documentation uses a number of text and style conventions to indicate and differentiate between different types of information: • Text in this style is used to show an important element or piece of information. It may be used and combined with other text styles as

appropriate to the context.

• Text in this style is used to show a section heading, table heading, or particularly important emphasis of some kind.

• Program or configuration options are formatted using this style. Options are also automatically linked to their respective

documen-tation page when this is known. For example, tpm --hosts links automatically to the corresponding reference page. • Parameters or information explicitly used to set values to commands or options is formatted using this style.

• Option values, for example on the command-line are marked up using this format: --help. Where possible, all option values are

direct-ly linked to the reference information for that option.

• Commands, including sub-commands to a command-line tool are formatted using Text in this style. Commands are also automatically linked to their respective documentation page when this is known. For example, tpm links automatically to the corresponding refer-ence page.

• Text in this style indicates literal or character sequence text used to show a specific value.

• Filenames, directories or paths are shown like this /etc/passwd. Filenames and paths are automatically linked to the corresponding

ref-erence page if available.

Bulleted lists are used to show lists, or detailed information for a list of items. Where this information is optional, a magnifying glass symbol enables you to expand, or collapse, the detailed instructions.

Code listings are used to show sample programs, code, configuration files and other elements. These can include both user input and re-placeable values:

shell> cd /opt/staging

shell> unzip tungsten-replicator-2.2.0-288.zip

In the above example command-lines to be entered into a shell are prefixed using shell. This shell is typically sh, ksh, or bash on Linux

and Unix platforms, or Cmd.exe or PowerShell on Windows.

If commands are to be executed using administrator privileges, each line will be prefixed with root-shell, for example:

root-shell> vi /etc/passwd

To make the selection of text easier for copy/pasting, ignorable text, such as shell> are ignored during selection. This allows multi-line

instructions to be copied without modification, for example:

mysql> create database test_selection;

mysql> drop database test_selection;

Lines prefixed with mysql> should be entered within the mysql command-line.

If a command-line or program listing entry contains lines that are two wide to be displayed within the documentation, they are marked using the » character:

(13)

the first line has been extended by using a » continuation line

They should be adjusted to be entered on a single line.

Text marked up with this style is information that is entered by the user (as opposed to generated by the system). Text formatted

us-ing this style should be replaced with the appropriate file, version number or other variable information according to the operation

being performed.

In the HTML versions of the manual, blocks or examples that can be userinput can be easily copied from the program listing. Where there are multiple entries or steps, use the 'Show copy-friendly text' link at the end of each section. This provides a copy of all the user-enterable text.

(14)

1.1. Tungsten Replicator

Tungsten Replicator is an open source high performance replication engine that works with a number of different source and target databases to provide high-performance and improved replication functionality over the native solution. With MySQL replication, for ex-ample, the enhanced functionality and information provided by Tungsten Replicator allows for global transaction IDs, advanced topolo-gy support such as multi-master, star, and fan-in, and enhanced latency identification.

In addition to providing enhanced functionality Tungsten Replicator is also capable of heterogeneous replication by enabling the repli-cated information to be transformed after it has been read from the data server to match the functionality or structure in the target server. This functionality allows for replication between MySQL, Oracle, PostgreSQL, MongoDB and Vertica, among others.

Understanding the Tungsten Replicator works requires looking at the overall replicator structure. In the diagram below is the top-level overview of the structure of a replication service.

At this level, there are three major components in the system that provide the core of the replication functionality: • Extractor

The extractor component reads data from a data server, such as MySQL or Oracle, and writes that information into the Transaction History Log (THL). The role of the extractor is to read the information from a suitable source of change information and write it into the THL in the native ro devined format, either as SQL statements or row-based information.

For example, within MySQL, information is read directly from the binary log that MySQL produces for native replication; in Oracle, the Change Data Capture (CDC) information is used as the information source.

• Applier

Appliers within Tungsten Replicator convert the THL information and apply it to a destination data server. The role of the applier is to read the THL information and apply that to the data server.

The applier works a number of different target databases, and is responsible for writing the information to the database. Because the transactional data in the THL is stored either as SQL statements or row-based information, the applier has the flexibility to reformat the information to match the target data server. Row-based data can be reconstructed to match different database formats, for ex-ample, converting row-based information into an Oracle-specific table row, or a MongoDB document.

• Transaction History Log (THL)

The THL contains the information extracted from a data server. Information within the THL is divided up by transactions, either im-plied or explicit, based on the data extracted from the data server. The THL structure, format, and content provides a significant pro-portion of the functionality and operational flexibility within Tungsten Replicator.

As the THL data is stored additional information, such as the metadata and options in place when the statement or row data was ex-tracted are recorded. Each transaction is also recorded with an incremental global transaction ID. This ID enables individual transac-tions within the THL to be identified, for example to retrieve their content, or to determine whether different appliers within a repli-cation topology have written a specific transaction to a data server.

These components will be examined in more detail as different aspects of the system are described with respect to the different sys-tems, features, and functionality that each system provides.

From this basic overview and structure of Tungsten Replicator, the replicator allows for a number of different topologies and solutions that replicate information between different services. Straightforward replication topologies, such as master/slave are easy to under-stand with the basic concepts described above. More complex topologies use the same core components. For example, multi-master topologies make use of the global transaction ID to prevent the same statement or row data being applied to a data server multiple times. Fan-in topologies allow the data from multiple data servers to be combined into one data server.

1.1.1. Extractor

Extractors exist for reading information from the following sources: • MySQL

• Oracle • PostgreSQL

(15)

1.1.2. Appliers

The replicator commits transactions using block commit meaning it only commits on x transactions. This imporves performance but when using a non-transactional engine it can cause the problems you have seen. By default this is set to 10 (The value is

replicator.global.buffer.size in replicator.properties). It is possible to set this to 1 which will remove the problem with MyISAM tables but it will impact the performance of the replicators

Available appliers include: • MongoDB

• MySQL • Oracle • PostgreSQL • Vertica

1.1.3. Transaction History Log (THL)

Tungsten Replicator operates by reading information from the source database (MySQL, PostgreSQL, Oracle) and transferring that in-formation to the Tungsten History Log (THL).

Each transaction within the THL includes the SQL statement or the row-based data written to the database. The information also in-cludes where possible transaction specific option and metadata, such as character set data, SQL modes and other information that may affect how the information is written when the data is applied. The combination of the metadata and the global transaction ID also en-able more complex data replication scenarios to be supported, such as multi-master, without fear of duplicating statement or row data application becuase the source and global transaction ID can be compared.

In addition to all this information, the THL also includes a timestamp and a record of when the information was written into the data-base before the change was extracted. Using a combination of the global transaction ID and this timing information provides informa-tion on the latency and how up to date an a dataserver is compared to the original datasource.

Depending on the underlying storage of the data, the information can be reformatted and applied to different data servers. When deal-ing with row-based data, this can be applied to a different type of data server, or completely reformatted and applied to non-table based services such as MongoDB.

THL information is stored for each replicator service, and can also be exchanged over the network between different replicator in-stances. This enables transaction data to be exchanged between different hosts within the same network or across wide-area-networks.

1.1.4. Filtering

(16)

Tungsten Replicator creates a unique replication interface between two databases. Because Tungsten Replicator is independent of the dataserver it affords a number of different advantages, including more flexible replication strategies, filtering, and easier control to pause, restart, and skip statements between hosts.

Replication is supported from, and to, different dataservers using different technologies through a series of extractor and applier com-ponents which independently read data from, and write data to, the dataservers in question. A basic overview is provided in Figure 2.1, “Topologies: Component Architecture”.

Figure 2.1. Topologies: Component Architecture

Although connectivity is possible through these different combinations, only certain combinations are officially certified, as shown in this table. Replication sources are shown in the first column; replication destinations are shown in the remaining columns for each dataserver type

Source/Destination MySQL PostgreSQL Oracle Amazon RDS

Vertica InfiniDB Infobright MongoDB

(17)

Source/Destination MySQL PostgreSQL Oracle Amazon RDS

Vertica InfiniDB Infobright MongoDB

PostgreSQL Yes Yes Yes - - - -

-Oracle Yes Yes Yes - - - -

-Amazon RDS - - -

-Vertica - - -

-InfiniDB - - -

-Infobright - - -

-MongoDB - - -

-Different deployments also support different topologies according to the available extractor and applier mechanisms that are available. For example, MySQL replication supports master slave, multi-master, star, fan-in, and multi-site/multi-master solutions for widescale de-ployments.

Database/Topology Master/Slave Multi-master Fan-In Star Multi-Site Multi-Site/Multi-Master

MySQL Yes Yes Yes Yes Yes

PostgreSQL Yes - - -Oracle Yes - - -Amazon RDS Yes - - -Vertica Yes - - -InfiniDB Yes - - -Infobright Yes - - -MongoDB Yes - -

-The replication process is made possible by reading the binary log on each host. -The information from the binary log is written into the Tungsten Replicator Transaction History Log (THL), and the THL is then transferred between hosts and then applied to each slave host. More information can be found in Chapter 1, Introduction.

Before covering the basics of creating different dataservices, there are some key terms that will be used throughout the setup and in-stallation process that identify different components of the system. these are summarised in Table 2.1, “Key Terminology”.

Table 2.1. Key Terminology

Tungsten Term Traditional Term Description

dataserver Database The database on a host. Datasources include MySQL, PostgreSQL or Oracle. datasource Host or Node One member of a dataservice and the associated Tungsten components. staging host - The machine (and directory) from which Tungsten Replicator is installed and

configured. The machine does not need to be the same as any of the existing hosts in the cluster.

staging directory - The directory where the installation files are located and the installer is execut-ed. Further configuration and updates must be performed from this directory. Before attempting installation, there are a number of prerequisite tasks which must be completed to set up your hosts, database, and Tungsten Replicator service:

1. Setup a staging host from which you will configure and manage your installation. 2. Configure each host that will be used within your dataservice.

3. Depending on the database or environment you are using, you may need to perform additional configuration steps for the dataserver:

• Configure your MySQL installation, so that Tungsten Replicator can work with the database. • Configure your Oracle installation, so that Tungsten Replicator can work with the database. • Configure your PostgreSQL installation, so that Tungsten Replicator can work with the database.

The following sections provide guidance and instructions for creating a number of different deployment scenarios using Tungsten Repli-cator.

(18)

2.1. Requirements

2.1.1. Operating Systems Support

Operating

Sys-tem

Variant Status Notes Linux RedHat/CentOS Primary

plat-form

RHEL 4 and 5 as well as CentOS 5.x versions are fully supported. Linux Ubuntu Primary

plat-form

Ubuntu 9.x/10.x versions are fully supported. Linux Debian/Suse/Other Secondary

Platform

Other Linux platforms are supported but are not regularly tested. We will fix any bugs reported by customers.

Solaris Secondary

Platform

Solaris 10 is fully supported. OpenSolaris is not supported at this time.

Mac OS X Secondary

platform

Mac OS/X Leopard and Snow Leopard are used for development at Continuent but not certified. We will fix any bugs reported by customers.

Windows Limited

Sup-port

Tungsten 1.3 and above will support Windows platforms for connectivity (sten Connector and SQL Router) but may require manual configuration. Tung-sten clusters do not run on Windows.

BSD Limited

Sup-port

Tungsten 1.3 and above will support BSD for connectivity (Tungsten Connector and SQL Router) but may require manual configuration. Tungsten clusters do not run on BSD.

2.1.2. Database Support

Database Version Support Status Notes MySQL 5.0, 5.1, 5.5, 5.6 Primary

plat-form

Statement and row based replication is supported. MyISAM and InnoDB table types are fully supported; InnoDB tables are recommended.

Percona 5.5, 5.6 Primary plat-form MariaDB 5.5 Primary

plat-form Oracle 10g Release 2

(10.2.0.5), 11g

Primary Plat-form

Synchronous CDC is supported on Standard Edition only; Synchronous and Asyn-chronous are supported on Eneterprise Editions

PostgreSQL 8.2, 8.3, 8.4, 9.0 Primary plat-form

Warm standby clustering is supported for PostgreSQL 8.2-8.4. PostgreSQL 9 Streaming Replication is supported.

Drizzle Secondary

Platform

Experimental support for Drizzle is available. Drizzle replication is not tested.

2.1.3. RAM Requirements

RAM requirements are dependent on the workload being used and applied, but the following provide some guidance on the basic RAM requirements:

• Tungsten Replicator requires 2GB of VM space for the Java execution, including the shared libraries, with approximate 1GB of Java VM heapspace. This can be adjusted as required, for example, to handle larger transactions or bigger commit blocks and large pack-ets.

Performance can be improved within the Tungsten Replicator if there is a 2-3GB available in the OS Page Cache. Replicators work best when pages written to replicator log files remain memory-resident for a period of time, so that there is no file system I/O required to read that data back within the replicator. This is the biggest potential point of contention between replicators and DBMS servers.

2.1.4. Disk Requirements

Disk space usage is based on the space used by the core application, the staging directory used for installation, and the space used for the THL files:

• The staging directory containing the core installation is approximately 150MB. When performing a staging-directory based installa-tion, this space requirement will be used once. When using a INI-file based deployment, this space will be required on each server. For more information on the different methods, see Section 5.3.1, “Comparing Staging and INI tpm Methods”.

(19)

• Deployment of a live installation also requires approximately 150MB.

• The THL files required for installation are based on the size of the binary logs generated by MySQL. THL size is typically twice the size of the binary log. This space will be required on each machine in the cluster. The retention times and rotation of THL data can be con-trolled, see Section E.1.6, “The thl Directory” for more information, including how to change the retention time and move files during

operation.

When replicating from Oracle, the size of the THL will depend on the quantity of Change Data Capture (CDC) information generated. This can be managed by altering the intervals used to check for and extract the information.

Because the replicator reads and writes information using buffered I/O in a serial fashion, spinning disk and Network Attached Storage (NAS) is suitable for storing THL, as there is no random-access or seeking.

2.1.5. Java Requirements

Tungsten Replicator is known to work with Java 1.6. and Java 1.7 and using the following JVMs:

2.1.6. Cloud Deployment Requirements

Cloud deployments require a different ste of considerations over and above the general requirements. The following is a guide only, and where specific cloud environment requirements are known, they are explicitly included:

Instance Types/Configuration

Attribute Guidance Amazon Example

Instance Type Instance sizes and types are dependent on the workload. m1.xlarge or better

Instance Boot Volume Use block, not ephemeral storage. EBS Instance Deployment Use standard Linux distributions and bases. For ease of

deploy-ment and configuration, use Puppet.

Amazon Linux AMIs

Development/QA nodes should always match the expected production environment. AWS/EC2 Deployments

• Use Virtual Private Cloud (VPC) deployments, as these provide consistent IP address support.

• Multiple EBS-optimized volumes for data, using Provisioned IOPS for the EBS volumes depending on workload: Parameter tpm Option tpm Value MySQL my.cnf Option MySQL Value

/ (root)

MySQL Data datasource-mysql-data-di-rectory [143]

/volumes/mysql/data datadir /volumes/mysql/data

MySQL Binary Logs

datasource-log-directo-ry [142]

/volumes/mysql/binlogs log-bin /volumes/mysql/bin-logs/mysql-bin

Transaction History Logs (THL)

thl-directory [170] /volumes/mysql/thl

Recommended Replication Formats

• MIXED is recommended for MySQL master/slave topologies (e.g., either single clusters or primary/data-recovery setups).

• ROW is strongly recommended for multi-master setups. Without ROW, data drift is a possible problem when using MIXED or STATEMENT.

Even with ROW there are still cases where drift is possible but the window is far smaller.

• ROW is required for heterogenous replication.

2.2. Deployment Sources

Tungsten Replicator is available in a number of different distribution types, and the methods for configuration available for these differ-ent packages differs.

Deployment Type/Package TAR/GZip RPM/DEB

tpm Command-line Configuration Yes Yes

(20)

Deployment Type/Package TAR/GZip RPM/DEB

Deploy Entire Cluster Yes No

Deploy Per Machine No Yes

Two primary deployment sources are available: • Tar/GZip

Using the TAR/GZip package creates a local directory that enables you to perform installs and updates from the extracted 'staging' di-rectory, or use the INI file format.

• RPM/DEB Packages

Using the RPM/DEB package format is more suited to using the INI file format, as hosts can be installed and upgraded to the latest RPM/DEB package independently of each other.

All packages are named according to the product, version number, build release and extension. For example:

tungsten-replicator-2.2.0-288.tar.gz

The version number is 2.2.0 and build number 288. Build numbers indicate which build a particular release version is based on, and may

be useful when installing patches provided by support.

2.2.1. Using the TAR/GZipped files

To use the TAR/GZipped packages, download the files to your machine and unpack them:

shell> tar zxf tungsten-replicator-2.2.0-288.tar.gz

This will create a directory matching the downloaded package name, version, and build number from which you can perform an install using either the INI file or command-line configuration. To use, you will need to use the tpm command within the tools directory of the

extracted package:

shell> cd tungsten-replicator-2.2.0-288

Before completing configuration, you must have completed all the pre-requisite steps described in Appendix C, Prerequisites.

2.2.2. Using the RPM and DEB package files

The RPM and DEB packages can be used for installation, but are primarily designed to be in combination with the INI configuration file. Installation

Installing the RPM or DEB package will do the following: 1. Create the tungsten system user if it doesn't exist

2. Make the tungsten system user part of the mysql group if it exists

3. Create the /opt/continuent/software directory

4. Unpack the software into /opt/continuent/software

5. Define the $CONTINUENT_PROFILES and $REPLICATOR_PROFILES environment variables

6. Update the profile script to include the /opt/continuent/share/env.sh script

7. Create the /etc/tungsten directory

8. Run tpm install if the /etc/tungsten.ini or /etc/tungsten/tungsten.ini file exist

Although the RPM/DEB packages complete a number of the pre-requisite steps required to configure your cluster, there are additional steps, such as configuring ssh, that you still need to complete. For more information, see Appendix C, Prerequisites.

By using the package files you are able to setup a new server by creating the /etc/tungsten.ini file and then installing the package. Any

output from the tpm command will go to /opt/continuent/service_logs/rpm.output.

To obtain the package files, you can use one of the following methods: • Download from an existing download page

• For yum platforms (RHEL/CentOS/Amazon Linux), add the package source to your yum configuration. For the current stable (GA) re-lease packages:

(21)

For nightly builds:

root-shell> rpm -i http://releases.continuent.com.s3.amazonaws.com/replicator-release-nightly-0.0-1.x86_64.rpm

• For Ubuntu/Debian packages:

root-shell> echo "deb http://apt.tungsten-replicator.org/ stable main" \ >/etc/apt/sources.list.d/tungsten_stable.list

Nightly builds are also available:

root-shell> echo "deb http://apt-nightly.tungsten-replicator.org/ nightly main" \ >/etc/apt/sources.list.d/tungsten_nightly.list

Then update your apt repository:

root-shell> apt-get update

Once an INI file has been created and the packages are available, the installation can be completed using: • On RHEL/CentOS/Amazon Linux:

root-shell> yum install tungsten-replicator

• On Ubuntu/Debian:

root-shell> apt-get install tungsten-replicator

For more information, see Section 5.3.4, “tpm INI File Configuration”. Upgrades

If you upgrade to a new version of the RPM or DEB package it will do the following: 1. Unpack the software into /opt/continuent/software

2. Run tpm update if the /etc/tungsten.ini or /etc/tungsten/tungsten.ini file exist

The tpm update will restart all Continuent Tungsten services so you do not need to do anything after upgrading the package file.

2.3. Deploying a Master/Slave Topology

Master/slave is the simplest and most straightforward of all replication scenarios, and also the basis of all other types of topology. The fundamental basis for the master/slave topology is that changes in the master are distributed and applied to the each of the configured slaves.

(22)

tpm includes a specific topology structure for the basic master/slave configuration, using the list of hosts and the master host definition to define the master/slave relationship. Before starting the installation, the prerequisites must have been completed (see Appendix C, Prerequisites). To create a master/slave using tpm:

shell> ./tools/tpm install alpha\ --topology=master-slave \ --master=host1 \ --replication-user=tungsten \ --replication-password=password \ --home-directory=/opt/continuent \ --members=host1,host2,host3 \ --start

The description of each of the options is shown below; click the icon to hide this detail: Click the icon to show a detailed description of each argument.

• tpm install

Executes tpm in install mode to create the service alpha.

• --master=host1 [154]

Specifies which host will be the master. • --replication-user=tungsten [163]

The user name that will be used to apply replication changes to the database on slaves. • --replication-password=password [163]

The password that will be used to apply replication changes to the database on slaves. • --home-directory=/opt/continuent [152]

Directory where Tungsten Replicator will be installed. • --members=host1,host2,host3 [155]

List of all the hosts within the cluster, including the master host. Hosts in this list that do not appear in the --master [154] option

will be configured as slaves. • --start [166]

Starts the service once installation is complete.

If the MySQL configuration file cannot be located, the --datasource-mysql-conf [143] option can be used to specify it's location:

shell> ./tools/tpm install alpha\ --topology=master-slave \ --master=host1 \ --replication-user=tungsten \ --replication-password=password \ --datasource-mysql-conf=/etc/mysql/my.cnf \ --home-directory=/opt/continuent \ --members=host1,host2,host3 \ --start

Once the installation has been completed, the service will be started and ready to use. For information on checking the running service, see Section 2.3.1, “Monitoring a Master/Slave Dataservice”.

For information on starting and stopping Tungsten Replicator see Section 2.17, “Starting and Stopping Tungsten Replicator”; configur-ing init scripts to startup and shutdown when the system boots and shuts down, see Section 2.18, “Configurconfigur-ing Startup on Boot”.

2.3.1. Monitoring a Master/Slave Dataservice

Once the service has been started, a quick view of the service status can be determined using trepctl:

shell> trepctl services

Processing services command... NAME VALUE ---- ---appliedLastSeqno: 3593 appliedLatency : 1.074 role : master serviceName : alpha serviceType : local started : true

(23)

state : ONLINE Finished services command...

The key fields are:

• appliedLastSeqno and appliedLatency indicate the global transaction ID and latency of the host. These are important when monitoring

the status of the cluster to determine how up to date a host is and whether a specific transaction has been applied. • role indicates the current role of the host within the scope of this dataservice.

• state shows the current status of the host within the scope of this dataservice.

More detailed status information can also be obtained. On the master:

shell> trepctl status

Processing status command... NAME VALUE ---- ---appliedLastEventId : mysql-bin.000009:0000000000001033;0 appliedLastSeqno : 3593 appliedLatency : 1.074 channels : 1 clusterName : default currentEventId : mysql-bin.000009:0000000000001033 currentTimeMillis : 1373615598598 dataServerHost : host1 extensions : latestEpochNumber : 3589 masterConnectUri : masterListenUri : thl://host1:2112/ maximumStoredSeqNo : 3593 minimumStoredSeqNo : 0 offlineRequests : NONE pendingError : NONE pendingErrorCode : NONE pendingErrorEventId : NONE pendingErrorSeqno : -1 pendingExceptionMessage: NONE pipelineSource : jdbc:mysql:thin://host1:3306/ relativeLatency : 604904.598 resourcePrecedence : 99 rmiPort : 10000 role : master seqnoType : java.lang.Long serviceName : alpha serviceType : local simpleServiceName : alpha siteName : default sourceId : host1 state : ONLINE timeInStateSeconds : 604903.621 transitioningTo : uptimeSeconds : 1202137.328

version : Tungsten Replicator 2.2.0 build 288 Finished status command...

Checking a remote slave:

shell> trepctl -host host2 status

Processing status command... NAME VALUE ---- ---appliedLastEventId : mysql-bin.000009:0000000000001033;0 appliedLastSeqno : 3593 appliedLatency : 605002.401 channels : 5 clusterName : default currentEventId : NONE currentTimeMillis : 1373615698912 dataServerHost : host2 extensions : latestEpochNumber : 3589 masterConnectUri : thl://host1:2112/ masterListenUri : thl://host2:2112/ maximumStoredSeqNo : 3593 minimumStoredSeqNo : 0 offlineRequests : NONE pendingError : NONE pendingErrorCode : NONE pendingErrorEventId : NONE pendingErrorSeqno : -1 pendingExceptionMessage: NONE pipelineSource : thl://host1:2112/

(24)

relativeLatency : 605004.912 resourcePrecedence : 99 rmiPort : 10000 role : slave seqnoType : java.lang.Long serviceName : alpha serviceType : local simpleServiceName : alpha siteName : default sourceId : host2 state : ONLINE timeInStateSeconds : 2.944 transitioningTo : uptimeSeconds : 1202243.752

For more information on using trepctl, see Section 5.4, “The trepctl Command”.

Definitions of the individual field descriptions in the above example output can be found in Section D.2, “Generated Field Reference”. For more information on management and operational detailed for managing your cluster installation, see Chapter 4, Operations Guide.

2.4. Deploying a Multi-master Topology

When configuring a multi-master topology, tpm automatically creates a number of individual services that are used to define a mas-ter/slave topology between each group of hosts. In a three-node multimaster setup, three different services are created, each service creates a master/slave relationship between a primary host and the slaves. A change on any individual host will be replicated to the oth-er databases in the topology creating the multi-mastoth-er configuration.

For example, with three hosts, HostA, HostB, and HostC, three separate configurations are created: • HostA is the master, and HostB and HostC are slaves of HostA (Service Alpha, yellow)

• HostB is the master, and HostA and HostC are slaves of HostB (Service Beta, green) • HostC is the master, and HostA and HostB are slaves of HostC (Service Gamma, red)

Figure 2.3, “Topologies: Multiple-masters” shows the structure of the configuration replication.

(25)

These three individual services, one for each host and two slave scenario, effrectively create a multi-master topology, since a change on any single master will be replicated to the slaves.

Some considerations must be taken into account for any multi-master scenario:

• For tables that use auto-increment, collisions are possible if two hosts select the same auto-increment number. You can reduce the ef-fects by configuring each MySQL host with a different auto-increment settings, changing the offset and the increment values. For ex-ample, adding the following lines to your my.cnf file:

auto-increment-offset = 1 auto-increment-increment = 4

In this way, the increments can be staggered on each machine and collisions are unlikely to occur.

• Use row-based replication. Statement-based replication will work in many instances, but if you are using inline calculations within your statements, for example, extending strings, or calculating new values based on existing column data, statement-based replication may lead to significant data drift from the original values as the calculation is computed individually on each master. Update your con-figuration file to explicitly use row-based replication by adding the following to your my.cnf file:

binlog-format = row

• Beware of triggers. Triggers can cause problems during replication because if they are applied on the slave as well as the master you can get data corruption and invalid data. Tungsten Replicator cannot prevent triggers from executing on a slave, and in a multi-mas-ter topology there is no sensible way to disable triggers. Instead, check at the trigger level whether you are executing on a masmulti-mas-ter or slave. For more information, see Section A.3.1, “Triggers”.

• Ensure that the server-id for each MySQL configuration has been modified and is different on each host. This will help to prevent the

application of data originating on the a server being re-applied if the transaction is replicated again from another master after the ini-tial replication. Tungsten Replicator is designed not to replicate these statements, and uses the server ID as part of the identification process.

To create the configuration use tpm can set the entire configuration with just one command. Before starting the installation, the pre-requisites must have been completed (see Appendix C, Prepre-requisites). This takes the list of hosts, and a list of master services that will be configured, and then creates each service automatically:

shell> ./tools/tpm install epsilon \ --topology=all-masters \ --home-directory=/opt/continuent \ --replication-user=tungsten \ --replication-password=secret \ --master=rep-db1,rep-db2,rep-db3 \ --members=rep-db1,rep-db2,rep-db3 \ --master-services=alpha,beta,gamma \ --start

Host and service information is extracted in corresponding sequence as provided in the command-line options. The description of each of the options is shown below; click the icon to hide this detail:

Click the icon to show a detailed description of each argument.

• Creates a service, alpha, with rep-db1 as master and the other hosts as slaves.

• Creates a service, beta, with rep-db2 as master and the other hosts as slaves.

• Creates a service, gamma, with rep-db3 as master and the other hosts as slaves.

The different options set the values and configuration for the system as follows:

Different options set the configuration for the system for different deployment types; click the icon to hide this detail:

Click the icon to show a detailed description of the different options set the configuration for the system for different deployment types:

• --topology=all-masters [171]

Configures the topology type, in this case, all-masters indicates that a multi-master topology is required.

• --home-directory=/opt/continuent [152]

Set the installation directory for Tungsten Replicator. • --replication-user=tungsten [163]

(26)

Set the user to be used by Tungsten Replicator when applying data to a database. • --replication-password=secret [163]

Set the password to be used by Tungsten Replicator when applying data to a database. • --master=rep-db1,rep-db2,rep-db3 [154]

Sets the list of master hosts. As we are configuring a multi-master topology, all three hosts in the cluster are listed as masters. • --members=rep-db1,rep-db2,rep-db3 [155]

Sets the list of member hosts of the dataservice. As we are configuring a multi-master topology, all three hosts in the cluster are list-ed as members.

• --master-services=alpha,beta,gamma [155]

Specifies the list of service names to be used to identify each individual master/slave service. • --start [166]

Indicates that the services should be started once the configuration and installation has been completed. Once tpm has completed, the service will be started and the replication will be enabled between hosts.

2.4.1. Management and Monitoring

To check the configured services use the services parameter to trepctl: shell> trepctl services

Processing services command... NAME VALUE ---- ---appliedLastSeqno: 44 appliedLatency : 0.692 role : master serviceName : alpha serviceType : local started : true state : ONLINE NAME VALUE ---- ---appliedLastSeqno: 40 appliedLatency : 0.57 role : slave serviceName : beta serviceType : remote started : true state : ONLINE NAME VALUE ---- ---appliedLastSeqno: 41 appliedLatency : 0.06 role : slave serviceName : gamma serviceType : remote started : true state : ONLINE Finished services command...

The output shows the three individual services created in the multimaster configuration, alpha, beta, and gamma, and information about

the current latency, status and role of the current host. This gives you an overview of the service state for this host.

To get detailed information about dataservices, each individual dataservice must be checked individually, and explicitly stated on the command-line to trepctl as there are now multiple dataservices configured. To check the dataservice status the current host will be dis-played, in the example below, rep-db1:

shell> trepctl -service alpha status

Processing status command... NAME VALUE ---- ---appliedLastEventId : mysql-bin.000011:0000000000006905;0 appliedLastSeqno : 44 appliedLatency : 0.692 channels : 1 clusterName : alpha

(27)

currentEventId : mysql-bin.000011:0000000000006905 currentTimeMillis : 1373891837668 dataServerHost : rep-db1 extensions : latestEpochNumber : 28 masterConnectUri : thl://localhost:/ masterListenUri : thl://rep-db1:2112/ maximumStoredSeqNo : 44 minimumStoredSeqNo : 0 offlineRequests : NONE pendingError : NONE pendingErrorCode : NONE pendingErrorEventId : NONE pendingErrorSeqno : -1 pendingExceptionMessage: NONE pipelineSource : jdbc:mysql:thin://rep-db1:13306/ relativeLatency : 254295.667 resourcePrecedence : 99 rmiPort : 10000 role : master seqnoType : java.lang.Long serviceName : alpha serviceType : local simpleServiceName : alpha siteName : default sourceId : rep-db1 state : ONLINE timeInStateSeconds : 254530.987 transitioningTo : uptimeSeconds : 254532.724

In the above example, the alpha dataservice is explicitly requested (a failure to specify a service will return an error, as multiple services

are configured).

To get information about a specific host, use the -host [173] option. This can be used with the trepctl services command:

shell> trepctl -host rep-db3 services

Processing services command... NAME VALUE ---- ---appliedLastSeqno: 44 appliedLatency : 1.171 role : slave serviceName : alpha serviceType : remote started : true state : ONLINE NAME VALUE ---- ---appliedLastSeqno: 40 appliedLatency : 1.658 role : slave serviceName : beta serviceType : remote started : true state : ONLINE NAME VALUE ---- ---appliedLastSeqno: 41 appliedLatency : 0.398 role : master serviceName : gamma serviceType : local started : true state : ONLINE Finished services command...

In the above output, you can see that this host is the master for the dataservice gamma, but a slave for the other two services.

Other important fields in this output:

• appliedLastSeqno and appliedLatency indicate the global transaction ID and latency of the host. These are important when monitoring

the status of the cluster to determine how up to date a host is and whether a specific transaction has been applied. • role indicates the current role of the host within the scope of the corresponding dataservice.

• state shows the current status of the host within the scope of the corrresponding dataservice.

(28)

shell> trepctl -host rep-db3 -service alpha status

Processing status command... NAME VALUE ---- ---appliedLastEventId : mysql-bin.000011:0000000000006905;0 appliedLastSeqno : 44 appliedLatency : 1.171 channels : 1 clusterName : alpha currentEventId : NONE currentTimeMillis : 1373894128902 dataServerHost : rep-db3 extensions : latestEpochNumber : 28 masterConnectUri : thl://rep-db1:2112/ masterListenUri : thl://rep-db3:2112/ maximumStoredSeqNo : 44 minimumStoredSeqNo : 0 offlineRequests : NONE pendingError : NONE pendingErrorCode : NONE pendingErrorEventId : NONE pendingErrorSeqno : -1 pendingExceptionMessage: NONE pipelineSource : thl://rep-db1:2112/ relativeLatency : 256586.902 resourcePrecedence : 99 rmiPort : 10000 role : slave seqnoType : java.lang.Long serviceName : alpha serviceType : remote simpleServiceName : alpha siteName : default sourceId : rep-db3 state : ONLINE timeInStateSeconds : 256820.611 transitioningTo : uptimeSeconds : 256820.779

The following sequence number combinations should match between the different hosts on each service:

Master Service Master Host Slave Host

alpha rep-db1 rep-db2,rep-db3

beta rep-db2 rep-db1,rep-db3

gamma rep-db3 rep-db1,rep-db2

The sequence numbers on corresponding services should match across all hosts. For more information on using trepctl, see Section 5.4, “The trepctl Command”.

Definitions of the individual field descriptions in the above example output can be found in Section D.2, “Generated Field Reference”. For more information on management and operational detailed for managing your cluster installation, see Chapter 4, Operations Guide.

2.4.2. Alternative Multimaster Deployments

The multimaster deployment can be used for a wide range of different scenarios, and using any number of hosts. The tpm command used could, for example, be expanded to four or five hosts by adding them to the list of members and master hosts in the configuration command.

The basis fo the multimaster deployment can also be used in multiple site configurations. For more information on multisite/multimas-ter deploymnts, see Section 2.7, “Deploying a Multi-site (SOR) Topology”.

2.5. Deploying a Fan-In Topology

The fan-in topology is the logical opposite of a master/slave topology. In a fan-in topology, the data from two masters is combined to-gether on one slave. Fan-in topologies are often in situations where you have satellite databases, maybe for sales or retail operations, and need to combine that information together in a single database for processing.

(29)

• HostA is the master replicating to HostC • HostB is the master replicating to HostC

Figure 2.4. Topologies: Fan-in

Some additional considerations need to be made when using fan-in topologies:

• If the same tables from each each machine are being merged together, it is possible to get collisions in the data where auto increment is used. The effects can be minimised by using increment offsets within the MySQL configuration:

auto-increment-offset = 1 auto-increment-increment = 4

• Fan-in can work more effectively, and be less prone to problems with the corresponding data by configuring specific tables at differ-ent sites. For example, with two sites in New York and San Jose databases and tables can be prefixed with the site name, i.e. sjc_sales

and nyc_sales.

Alternatively, a filter can be configured to rename the database sales dynamically to the corresponding location based tables. See

Section 7.4.18, “RenameFilter” for more information.

• Statement-based replication will work for most instances, but where your statements are updating data dynamically within the state-ment, in fan-in the information may get increased according to the name of fan-in masters. Update your configuration file to explicitly use row-based replication by adding the following to your my.cnf file:

binlog-format = row

• Triggers can cause problems during fan-in replication if two different statements from each master and replicated to the slave and cause the operations to be triggered multiple times. Tungsten Replicator cannot prevent triggers from executing on the concentrator host and there is no way to selectively disable triggers. Check at the trigger level whether you are executing on a master or slave. For more information, see Section A.3.1, “Triggers”.

To create the configuration the masters and services must be specified, the topology specification takes care of the actual configura-tion:

shell> ./tools/tpm install epsilon \ --replication-user=tungsten \ --replication-password=password \ --home-directory=/opt/continuent \ --masters=rep-db1,rep-db2 \ --members=rep-db1,rep-db2,rep-db3 \ --master-services=alpha,beta \