• No results found

Date/Time Proc Date/Time Processing and Que essing and Querrying ying

5 Data Types5 Data Types

5.3 PostgreSQL Data Types PostgreSQL Data Types

5.3.11 Date/Time Proc Date/Time Processing and Que essing and Querrying ying

5.3.11 Date/Time Proc Date/Time Processing and Que essing and Querrying ying

Once the application's data, or rather the user data is properly stored as timestamp with time zone, PostgreSQL allows implementing all the processing you need to.

As an example data set this time we're playing with git git history. The PostgreSQL and pgloader project history have been loaded into the commitlogcommitlog table thanks to the git git loglog command, with a custom format, and some post-processing — properly splitting up the commit's subjects and escaping its content. Here's for example the most recent commit registered in our localcommitlogcommitlog table:

select

select project, hashhash, author, ats, committer, cts, subject

from

from commitlog

where

where project = 'postgresql'

order

order byby ats descdesc limit limit 1;

The column namesatsats and ctscts respectively stand for author commit timestampauthor commit timestamp and committer commit timestamp

committer commit timestamp, and thesubject subject is the first line of the commit message, as per the git log git log format%s%s.

To get the most recent entry from a table weorder by order by dates indescendingdescending order thenlimilimi the result set to a single entry, and we get a single line of output:

─[ RECORD 1 ]─────────────── ────────────────── ─────────────────── ────────────

project │ postgresql

hash │ b1c2d76a2fcef812 af0be3343082414d40 1909c8 author │ Tom Lane

ats │ 2017-08-19 19:39:37+02 committer │ Tom Lane

cts │ 2017-08-19 19:39:51+02

subject │ Fix possible core dump in parallel restore when using a TOC list.

With timestamps, we can compute time-based reporting, such as how many commits each project received each year in their whole history:

select

select extract(year fromfrom ats) asas year,

count(*) filterfilter(wherewhere project = 'postgresql') asas postgresql, count(*) filterfilter(wherewhere project = 'pgloader') asas pgloader

from

As we have only loaded two projects in ourcommitlogcommitlog table, the output is better with a pivo pivo query. We can see more than 20 years of sustained activity for the PostgreSQL project, and a less active project for pgloader:

year │ postgresql │ pgloader

══════╪════════════╪══════════

We can also build a reporting on the repartition of commits by weekday from the beginning of the project, in order to guess if contributors are working on the project on the job only, or mostly during their free time (weekend).

select

select extract(isodow fromfrom ats) asas dow, to_char(ats, 'Day') asas day, count(*) asas commits,

round(100.0*count(*)/sum(count(*)) overover(), 2) asas pct, repeat('■', (100*count(*)/sum(count(*)) overover())::int) asas hist

from

It seems that our PostgreSQL committers tend to work whenever they feel like it, but less so on the weekend. The project's lucky enough to have a solid team of committers being paid to work on PostgreSQL: commit timestamp. Those are different, but by how much?

with

percentile_cont(array[0.5, 0.9, 0.95, 0.99]) within groupgroup(orderorder byby cts-ats) asas parr

from

from

from perc_arrays;

Here's a detailed output of the time difference statistics, per project:

─[ RECORD 1 ]────────────────────────────── ─────

project │ pgloader

average │ @ 4 days 22 hours 7 mins 41.18 secs median │ @ 5 mins 21.5 secs

%90th │ @ 1 day 20 hours 49 mins 49.2 secs

%95th │ @ 25 days 15 hours 53 mins 48.15 secs

%99th │ @ 169 days 24 hours 33 mins 26.18 secs

═[ RECORD 2 ]══════════════════════════════ ═════

project │ postgres

average │ @ 1 day 10 hours 15 mins 9.706809 secs median │ @ 2 mins 4 secs

%90th │ @ 1 hour 46 mins 13.5 secs

%95th │ @ 1 day 17 hours 58 mins 7.5 secs

%99th │ @ 40 days 20 hours 36 mins 43.1 secs

Reporting is a strong use case for SQL. Application will also send more classic queries. We can show the commits for the PostgreSQL project for the 1st of June 2017:

\setset day '2017-06-01'

select

select ats::time,

substring(hashhash fromfrom 1 forfor 8) asashashhash,

substring(subject fromfrom 1 forfor 40) || '…' asas subject

from

from commitlog

where

where project = 'postgresql'

and

and ats >= date :'day'

and

and ats < date :'day' +interval '1 day'

order order byby ats;

It's tempting to use thebetweenbetween SQL operator, but we would then have to remember that between

between includes both its lower and upper bound and we would then have to compute the upper bound as the very last instant of the day. Using explicit greater greater than or than or equalequal andlessless than

than operators makes it possible to always compute the very first time of the day, which is easier, and well supported by PostgreSQL.

Also, using explicit bound checks allows ys to use a single date literal in the query, so that's a single parameter to send from the application.

ats hash subject

══════════╪════════ ══╪═══════════════ ══════════════════ ══════════

01:39:27 │ 3d79013b │ Make ALTER SEQUENCE, including RESTART, … 02:03:10 │ 66510455 │ Modify sequence catalog tuple before inv…

04:35:33 │ de492c17 │ doc: Add note that DROP SUBSCRIPTION dro…

19:32:55 │ e9a3c047 │ Always use -fPIC, not -fpic, when buildi…

23:45:53 │ f112f175 │ Fix typo…

(5 rows)

Manydata typ e formatting fun ctions are available in PostgreSQL. In the previous querym although we chose tocast cast our timestamp with time zone entry down to atimetime value, we could have chosen another representation thanks to theto_char to_char function:

set

set lc_time toto 'fr_FR';

select

select to_char(ats, 'TMDay TMDD TMMonth, HHam' ) asas time, substring(hashhash fromfrom 1 forfor 8) asashashhash,

substring(subject fromfrom 1 forfor 40) || '…' asas subject

from

from commitlog

where

where project = 'postgresql'

and

and ats >= date :'day'

and

and ats < date :'day' +interval '1 day'

order order byby ats;

And this time we have a French localized output for the time value:

time hash subject

═══════════════════ ══╪══════════╪════ ══════════════════ ═══════════════════ ══

Jeudi 01 Juin, 01am │ 3d79013b │ Make ALTER SEQUENCE, including RESTART, … Jeudi 01 Juin, 02am │ 66510455 │ Modify sequence catalog tuple before inv…

Jeudi 01 Juin, 04am │ de492c17 │ doc: Add note that DROP SUBSCRIPTION dro…

Jeudi 01 Juin, 07pm │ e9a3c047 │ Always use -fPIC, not -fpic, when buildi…

Jeudi 01 Juin, 11pm │ f112f175 │ Fix typo…

(5 rows)

Take some time to familiarize yourself with the time and date support that PostgreSQL comes with out of the box. Some very useful functions such asdate_trunc()date_trunc() are not shown here, and you also will find more gems.

While most programming languages nowadays include the same kind of feature set, having this processing feature set right in PostgreSQL makes sense in several use cases:

It makes sense when the SQL logic or filtering you want to implement depends on the result of the processing (e.g. grouping by week).

When you have several applications using the same logic, it's often easier to share a SQL query than to set up a distributed service API offering the same result in XML or JSON (a data format you then have to parse).

When you want to reduce your run-time dependencies, it's a good idea to understand how much each architecture layer is able to support in your implementation.