Logical Replication - BLOCKCHAIN DEEP DIVE

For years, PostgreSQL lacked a reasonable option for replication. The best you could do was take the “write-ahead logs”, binary files that described transactions and provided part of PostgreSQL’s legendary stability, and copy them to another server. Over time, this became a standard way to have a slave server, until several years ago when you could stream these write-ahead log (WAL) files to another server. Master-slave replication thus became a standard PostgreSQL feature, one used by many organizations around the world—both to distribute the load across multiple servers and to provide for a backup in the case of server failure. One machine (the master) would handle both read and write queries, while one or

AT THE FORGE

more other (slave) machines would handle read-only queries.

Although streaming WALs certainly worked, it was limited in a number of ways.

It required that both master and slave use the same version of PostgreSQL, and that the entire server’s contents be replicated on the slave. For reasons of performance, privacy, security and maintenance, those things deterred many places from using PostgreSQL’s master-slave streaming.

So it was with great fanfare that “logical replication” was included in PostgreSQL 10. The idea behind logical replication is that a server can broadcast (“publish”) the changes that are made not using binary files, but rather a protocol that describes changes in the publishing database. Moreover, details can be published about a subset of the database; it’s not necessary to send absolutely everything from the master to every single slave.

In order to get this to work, the publishing server must create a “publication”. This describes what will be sent to subscribing servers. You can use the new _CREATE PUBLICATION command to do this.

As I wrote above, replication of the WAL files meant that the entire database server (or “cluster”, in PostgreSQL terminology) needed to be replicated. In the case of logical replication, the replication is done on a per-database basis. You then can decide to create a publication that serves all tables:

CREATE PUBLICATION mydbpub FOR ALL TABLES;

Note that when you say FOR ALL TABLES, you’re indicating that you want to publish not only all of the tables that currently exist in this database, but also tables that you will create in the future. PostgreSQL is smart enough to add tables to the publication when they are created. However, the subscriber won’t know about them automatically (more on that to come).

If you want to restrict things, so that only a specific table is replicated, you can do

AT THE FORGE

so with this:

CREATE PUBLICATION MyPeoplePub FOR TABLE People;

You also can replicate more than one table:

CREATE PUBLICATION MyPeopleFooPub FOR TABLE People, Foo;

If you are publishing one or more specific tables, the tables must already exist at the time you create the publication.

The default is to publish all actions that take place on the published tables. However, a publication can specify that it’s going to publish only inserts, updates and/or deletes.

All of this is configurable when the publication is created and can be updated with the ALTER PUBLICATION command later.

If you’re using the interactive “psql” shell, you can take a look at current publications with _\dRp, which is short for “describe replication publications”. It’s not the easiest command to remember, but they long ago ran out of logical candidates for single-letter commands. This command will show you which publications have been defined and also what permissions they have (more on that in a moment). If you want to know which tables are included in a publication, you can use _\dRp+.

Once you’ve set up the publication, you can set up a subscription with (not

surprisingly) the CREATE SUBSCRIPTION command. Here, things are a bit trickier, because the data is actually arriving into the subscriber’s database, which means there might be conflicts or issues.

First and foremost, creating a subscription requires that you have a valid login (user name and password) on the publisher’s system. With that in hand, you can say:

CREATE SUBSCRIPTION mysub CONNECTION 'host=mydb user=myuser' ↪PUBLICATION MyPeoplePub;

AT THE FORGE

Notice that you use a standard PostgreSQL “connecting string” to connect to the server. You can use additional options if you want, including setting the port number and the connection timeout. Because a database might have multiple publications, you have to indicate the publication name to which you want to subscribe, as indicated here. Also note that the user indicated in this connection string must have “replication” privileges in the database.

Once the subscription has been created, the data will be replicated from its current state on the publisher.

I’ve already mentioned that uSing the FOR ALL TABLES option with _CREATE PUBLISHER means that even if and when new tables are added, they will be included as well. However, that’s not quite true for the subscriber. On the subscriber’s side, you need to indicate that there have been changes in the publisher and that you want to refresh your subscription:

ALTER SUBSCRIPTION testsub REFRESH PUBLICATION;

If you’ve done any binary replication in previous PostgreSQL versions, you already can see what an improvement this is. You don’t have to worry about WALS, or about them being erased, or about getting the subscribing server up to speed and so forth.

Now, it’s all well and good to talk about replication, but there’s always the possibility that problems will arise. For example, what happens if the incoming data violates one or more constraints? Under such circumstances, the replication will stop.

There are also a number of caveats regarding what objects are actually

replicated—for example, only tables are replicated, such objects as views and sequences are not.

AT THE FORGE

In document BLOCKCHAIN DEEP DIVE (Page 55-59)