PostgreSQL Crosstab Example
April 13, 2006 on 10:32 pm | In PostgreSQL |Here is a bit of usage and demonstration description for the PostgreSQL Crosstab Query I put together in pursuit of something else completely…
This example uses the following table definition. We have table ct3 that contains a primary key field id, a quality representing the field we want to organize our data around, and the attr field which is one of several attributes for a given quality.
Table "public.ct3" Column | Type | Modifiers ---------+--------------------------+-------------------------------------------------- id | integer | not null default nextval('ct3_id_seq'::regclass) quality | integer | attr | integer | value | timestamp with time zone | | |
The table is populated with a semi-random assortment of data. You may note that the times are all sequentially increasing — this isn’t particularly important and is mostly an artifact of insertion.
id | quality | attr | value ----+---------+------+---------------------------- 1 | 1 | 1 | 2006-04-13 16:01:49.937+02 2 | 1 | 2 | 2006-04-13 16:01:54.774+02 5 | 2 | 1 | 2006-04-13 16:02:15.223+02 6 | 2 | 2 | 2006-04-13 16:02:19.75+02 7 | 3 | 1 | 2006-04-13 16:02:24.516+02 8 | 3 | 2 | 2006-04-13 16:02:28.472+02 9 | 1 | 1 | 2006-04-13 20:11:21.194+02 10 | 3 | 1 | 2006-04-13 20:11:56.505+02 11 | 1 | 2 | 2006-04-13 20:13:01.899+02 12 | 2 | 1 | 2006-04-13 20:13:49.758+02 (10 rows) | |
We use the ‘attr’ column value ‘1′ to signify our “start date”, and the value ‘2′ to signify the ‘finish’ date. As you can see, there are three different ‘quality’ values we are tracking (examples might be inventory item id’s, customers, or other such data sources).
The goal of this function is to rotate these ’start’ and ‘finish’ times into a single record for each discrete pairing. Our core query is:
SELECT * FROM crosstab ( 'SELECT id, quality, attr, value FROM ct3 ORDER BY 2,1,3', 'SELECT DISTINCT attr FROM ct3 ORDER BY 1 ASC' ) AS ( id integer, quality integer, startdate timestamp, finishdate timestamp ); | |
This selects the id, quality, attr, and value fields from the table, ordering each of them in turn, and specifies that the ‘attr’ field is to be the column headings. The result set is four columns, the original id number (which we persist to keep the result set from prematurily collapsing and not showing multiple start/finish pairs), the quality to identify which set of pairs we’re refering to, and of course the start and finish timestamps.
The following data is returned. Note how while there is always a startdate, the finish date is not guaranteed. Also note that there are gaps on the appropriate columns. Our goal is to collapse those gaps, like donkykong.
id | quality | startdate | finishdate ----+---------+-------------------------+------------------------- 1 | 1 | 2006-04-13 16:01:49.937 | 2 | 1 | | 2006-04-13 16:01:54.774 9 | 1 | 2006-04-13 20:11:21.194 | 11 | 1 | | 2006-04-13 20:13:01.899 5 | 2 | 2006-04-13 16:02:15.223 | 6 | 2 | | 2006-04-13 16:02:19.75 12 | 2 | 2006-04-13 20:13:49.758 | 7 | 3 | 2006-04-13 16:02:24.516 | 8 | 3 | | 2006-04-13 16:02:28.472 10 | 3 | 2006-04-13 20:11:56.505 | (10 rows) | |
Our end result looks something like this:
qual | start | finish ------+-------------------------+------------------------- 1 | 2006-04-13 16:01:49.937 | 2006-04-13 16:01:54.774 1 | 2006-04-13 20:11:21.194 | 2006-04-13 20:13:01.899 2 | 2006-04-13 16:02:15.223 | 2006-04-13 16:02:19.75 2 | 2006-04-13 20:13:49.758 | 3 | 2006-04-13 16:02:24.516 | 2006-04-13 16:02:28.472 (5 rows) | |
As you can see, the start and finish times have been collapsed into a row for each “set”, as ordered by their chronological times.
Here is the actual function. We iterate over the records we received from the query (see above), and compare the ‘quality’ to identify if we’re starting a new set, or completing a set. We save the appropriate value for the next iteration, or complete the set by pushing the data into the ‘push_record’, and placing it on the return stack.
CREATE OR REPLACE FUNCTION collapse_rows() RETURNS SETOF RECORD AS $$ DECLARE push_record RECORD; cur_record RECORD; qual int; start_ts timestamp; BEGIN FOR cur_record IN SELECT * FROM crosstab ( 'SELECT id, quality, attr, value FROM ct3 ORDER BY 2,1,3', 'SELECT DISTINCT attr FROM ct3 ORDER BY 1 ASC' ) AS ( id integer, quality integer, startdate timestamp, finishdate timestamp ) LOOP RAISE NOTICE 'new record value is %', textin(record_out(cur_record)); IF qual = cur_record.quality THEN SELECT INTO push_record qual, start_ts, cur_record.finishdate; RETURN NEXT push_record; qual := NULL; start_ts := NULL; ELSE IF NOT qual IS NULL THEN SELECT INTO push_record qual, start_ts, NULL::timestamp; RETURN NEXT push_record; END IF; qual := cur_record.quality; start_ts := cur_record.startdate; END IF; END LOOP; END; $$ LANGUAGE plpgsql; | |
And here is the necessary code to execute the query itself:
SELECT * FROM collapse_rows() AS (qual INT, start TIMESTAMP, finish TIMESTAMP); | |
No Comments yet
Sorry, the comment form is closed at this time.
Entries and comments feeds.
^Top^
18 queries. 0.313 seconds.