inner workings of our database.
COLUMNS
36 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM
AT THE FORGE
and open-source project, is set to release version 9.4. This new version, like all other PostgreSQL versions, contains a number of optimizations, improvements and usability features.
But two of the most intriguing
features to me are HStore and JSONB, features that actually turn PostgreSQL into a NoSQL database.
Fine, perhaps I’m exaggerating a bit here. PostgreSQL was and always will be relational and transactional, and adding these new data types hasn’t changed that. But having a key-value store within PostgreSQL opens many new possibilities for developers. JSONB, a binary version of JSON storage that supports
indexing and a large number of operators, turns PostgreSQL into a document database, albeit one with a few other features in it besides.
In this article, I introduce these NoSQL features that are included in PostgreSQL 9.4, which likely will be released
before this issue of Linux Journal gets to you. Although not every application needs these features, they can be useful—and with this latest release of PostgreSQL, the performance also is significantly improved.
HStore
One of the most interesting new
developments in PostgreSQL is that of
HStore, which provides a key-value store within the PostgreSQL
environment. Contrary to what I originally thought, this doesn’t mean that PostgreSQL treats a
particular table as a key-value store.
Rather, HStore is a data type, akin to INTEGER, TEXT and XML. Thus, any column—or set of columns—within a table may be defined to be of type HSTORE. For example:
CREATE TABLE People ( id SERIAL,
info HSTORE, PRIMARY KEY(id) );
Once I have done that, I can ask PostgreSQL to show me the definition of the table:
COLUMNS
WWW.LINUXJOURNAL.COM / NOVEMBER 2014 / 37
AT THE FORGE
As you can see, the type of my
“info” column is hstore. What I have effectively created is a (database) table of hash tables. Each row in the “people” table will have its own hash table, with any keys and values. It’s typical in such a situation for every row to have the same key names, or at least some minimum number of overlapping key names, but you can, of course, use any keys and values you like.
Both the keys and the values in an HStore column are text strings. You can assign a hash table to an HStore column with the following syntax:
INSERT INTO people(info) VALUES ('foo=>1, bar=>abc, baz=>stuff');
Notice that although this example inserts three key-value pairs into the HStore column, they are stored together, converted automatically into an HStore, splitting the pairs where there is a comma, and each pair where there is a => sign.
So far, you won’t see any difference between an HStore and a TEXT column, other than (perhaps) the fact that you cannot use text functions and operators on that column. For example, you cannot use the || operator, which normally concatenates text strings, on the HStore:
UPDATE People SET info = info || 'abc';
ERROR: XX000: Unexpected end of string
LINE 1: UPDATE People SET info = info || 'abc';
^
PostgreSQL tries to apply the ||
operator to the HStore on the left, but cannot find a key-value pair in the string on the right, producing an error message. However, you can add a pair, which will work:
UPDATE People SET info = info || 'abc=>def';
As with all hash tables, HStore is designed for you to use the keys to retrieve the values. That is, each key exists only once in each HStore value, although values may be
repeated. The only way to retrieve a value is via the key. You do this with the following syntax:
SELECT info->'bar' FROM People;
---
Notice several things here. First, the name of the column remains without any quotes, just as you do when you’re retrieving the full
COLUMNS
38 / NOVEMBER 2014 / WWW.LINUXJOURNAL.COM
AT THE FORGE
contents of the column. Second, you put the name of the key after the -> arrow, which is different from the
=> (“hashrocket”) arrow used to delineate key-value pairs within the HStore. Finally, the returned value always will be of type TEXT. This means if you say:
SELECT info->'foo' || 'a' FROM People;
---
| ?column? | | ---
| 1a | | --- (1 row)
Notice that ||, which works on text values, has done its job here.
However, this also means that if you try to multiply your value, you will get an error message:
SELECT info->'foo' * 5 FROM People;
info->'foo' * 5 from people;
^ Time: 5.041 ms
If you want to retrieve info->'foo'
as an integer, you must cast that value:
SELECT (info->'foo')::integer * 5 from people;
---
| ?column? | | ---
| 5 | | --- (1 row)
Now, why is HStore so exciting? In particular, if you’re a database person who values normalization, you might be wondering why someone even would want this sort of data store, rather than a nicely normalized table or set of tables.
The answer, of course, is that there are many different uses for a database, and some of them can be more appropriate for an HStore. I never would suggest storing serious data in such a thing, but perhaps you want to keep track of user session information, without keeping it inside of a binary object.
Now, HStore is not new to
PostgreSQL. The big news in version 9.4 is that GiN and GIST indexes now