Table Partitioning - BLOCKCHAIN DEEP DIVE

Let’s say you’re using PostgreSQL to keep track of invoices. You might want to have an “invoices” table, which you can query by customer ID, date, price or other factors.

That’s fine, but what happens if your business becomes extremely popular, and you’re suddenly handling not dozens of customers a month, but thousands or even millions?

Keeping all of that invoicing data in a single database table is going to cause problems.

Not only are many of the older invoices taking up space on your primary filesystem, but your queries against the table are going to take longer than necessary, because these older rows are being scanned.

A standard solution to this problem in the database world is partitioning. You divide the table into one or more sub-tables, known as “partitions”. Each partition can exist on a different filesystem. You get the benefits of having a single table on a single database, but you also enjoy the benefits of working with smaller tables.

Unfortunately, such partitioning was available in previous versions of PostgreSQL—

and although it worked, it was difficult to install, configure and maintain. PostgreSQL 10 added “declarative partitioning”, allowing you to indicate that a table should be broken into separate partitions—meaning that when you insert data into a partitioned table, PostgreSQL looks for the appropriate partition and inserts it there.

PostgreSQL supports two types of partitioning schemes. In both cases, you have to indicate one or more columns on which the partitioning will be done. You can partition according to “range”, in which case each partition will contain data from a range of values. A typical use case for this kind of partition would be dates, such as the invoices example above.

But, you also can partition over a “list” value, which means that you divide things according to values. For example, you might want to have a separate partition for each state in the US or perhaps just for different regions. Either way, the list will determine which partition receives the data.

For example, you can implement the date invoice example from above as follows.

AT THE FORGE

First, create an Invoices table:

postgres=# CREATE TABLE Invoices ( id SERIAL,

issued_at TIMESTAMP NOT NULL, customer_name TEXT NOT NULL, amount INTEGER NOT NULL, product_bought TEXT NOT NULL ) partition by range (issued_at);

CREATE TABLE

(And yes, in an actual invoice system, you would be using foreign keys to keep track of customers and products.)

Notice that at the conclusion of the CREATE TABLE command, I’ve added a

“partition by range” statement, which indicates that partitions of this table will work according to ranges on _{issued_at}, a timestamp.

But perhaps even more interesting is the fact that _id, the _SERIAL (that is,

sequence) value, is not defined as a primary key. That’s because you cannot have a primary key on a partitioned table; that would require checking a constraint across the various partitions, which PostgreSQL cannot guarantee.

With the partitioned table in place, you now can create the individual partitions:

postgres=# CREATE TABLE issued_at_y2018m01 PARTITION OF Invoices FOR VALUES FROM ('2018-jan-01') to ('2018-jan-31');

CREATE TABLE

postgres=# CREATE TABLE issued_at_y2018m02 PARTITION OF Invoices postgres-# FOR VALUES FROM ('2018-feb-01') to ('2018-feb-28');

CREATE TABLE

AT THE FORGE

Notice that these partitions don’t have any column definition. That’s because the columns are dictated by the partitioned table. In _psql, I can ask for a description of the first partition. See Table 1 for an example of what this would look like.

Table 1. public.issued_at_y2018m01

Column Type Collation Nullable Default

id integer not null nextval('invoices_

id_seq'::regclass)

issued_at

timestamp without time zone

not null

customer_name text not null

amount integer not null

product_bought text not null

Partition of: invoices FOR VALUES FROM ('2018-01-01 00:00:00') ↪TO ('2018-01-31 00:00:00')

You can see from the example shown in Table 1 not only that the partition acts like a regular table, but also that it knows very well what its range of values is. See what happens if I now insert rows into the parent “invoices” table:

postgres=# insert into invoices (issued_at , customer_name, ↪amount, product_bought)

postgres-# values ('2018-jan-15', 'Jane January', 100, 'Book');

INSERT 0 1

AT THE FORGE

postgres=# insert into invoices (issued_at , customer_name, ↪amount, product_bought)

values ('2018-jan-20', 'Jane January', 200, 'Another book');

INSERT 0 1

postgres=# insert into invoices (issued_at , customer_name, ↪amount, product_bought)

values ('2018-feb-3', 'Fred February', 70, 'Fancy pen');

INSERT 0 1

postgres=# insert into invoices (issued_at , customer_name, ↪amount, product_bought)

values ('2018-feb-15', 'Fred February', 60, 'Book');

INSERT 0 1

So far, so good. But, now how about a query on “invoices”:

postgres=# select * from invoices;

4 | 2018-02-15 00:00:00 | Fred February | 60 | Book 1 | 2018-01-15 00:00:00 | Jane January | 100 | Book

2 | 2018-01-20 00:00:00 | Jane January | 200 | Another book (4 rows)

I also can , if I want, query one of the partitions directly:

postgres=# select * from issued_at_y2018m01 ;

2 | 2018-01-20 00:00:00 | Jane January | 200 | Another book (2 rows)

AT THE FORGE

Although you don’t have to do so, it’s probably a good idea to set an index on the partition key on each of the individual partitions:

postgres=# create index on issued_at_y2018m01(issued_at);

CREATE INDEX

postgres=# create index on issued_at_y2018m02(issued_at);

CREATE INDEX

That will help PostgreSQL find and update the appropriate partition.

Not everything is automatic or magical here; you’ll have to add partitions, and you even can remove them when they’re no longer needed. But this is so much easier than used to be the case, and it offers more flexibility as well. It’s no surprise that this is one of the features most touted in PostgreSQL 10.

Conclusion

I’ve personally been using PostgreSQL for about 20 years—and for so many years people said, “Really? That’s your preferred open-source database?” But, now a large and growing number of people are adopting and using PostgreSQL. It already was full of great features, but there’s always room to improve—and with PostgreSQL 10, there are even more reasons to prefer it over the alternatives.

Resources

To learn more about PostgreSQL, download the code, read the documentation and sign up for the community e-mail lists, go to https://www.postgresql.org. ◾

Send comments or feedback

via http://www.linuxjournal.com/contact or email [email protected].

Cryptocurrency

In document BLOCKCHAIN DEEP DIVE (Page 59-64)