Aggregating (x,y) coordinate point clouds in PostgreSQL

I have a a PostgreSQL database table with the following simplified structure:

  • Device Id varchar
  • Pos_X (int)
  • Pos_Y (int)

Basically this table contains a lot of two dimensional waypoint data for devices. Now I want to design a query which reduces the number of coordinates in the output. It should aggregate nearby coordinates (for a certain x,y threshold) An example:

row 1: DEVICE1;603;1205

row 2: DEVICE1;604;1204

If the threshold is 5, these two rows should be aggregated since the variance is smaller than 5. Any idea how to do this in PostgreSQL or SQL in general?


Use the often overlooked built-in function width_bucket() in combination with your aggregation:

If your coordinates run from, say, 0 to 2000 and you want to consolidate everything within squares of 5 to single points, I would lay out a grid of 10 (5*2) like this:

SELECT device_id
     , width_bucket(pos_x, 0, 2000, 2000/10) * 10 AS pos_x
     , width_bucket(pos_y, 0, 2000, 2000/10) * 10 AS pos_y
     , count(*) AS ct -- or any other aggregate
FROM   tbl
GROUP  BY 1,2,3
ORDER  BY 1,2,3;

To minimize the error you could GROUP BY the grid as demonstrated, but save actual average coordinates:

SELECT device_id
     , avg(pos_x)::int AS pos_x   -- save actual averages to minimize error
     , avg(pos_y)::int AS pos_y   -- cast if you need to
     , count(*)        AS ct      -- or any other aggregate
FROM   tbl
GROUP  BY
       device_id
     , width_bucket(pos_x, 0, 2000, 2000/10) * 10  -- aggregate by grid
     , width_bucket(pos_y, 0, 2000, 2000/10) * 10
ORDER  BY 1,2,3;

sqlfiddle demonstrating both alongside.

Well, this particular case could be simpler:

...
GROUP  BY
       device_id
     , (pos_x / 10) * 10          -- truncates last digit of an integer
     , (pos_y / 10) * 10
...

But that's just because the demo grid size of 10 conveniently matches the decimal system. Try the same with a grid size of 17 or something ...


Expand to timestamps

You can expand this approach to cover date and timestamp values by converting them to unix epoch (number of seconds since '1970-1-1') with extract().

SELECT extract(epoch FROM '2012-10-01 21:06:38+02'::timestamptz);

When you are done, convert the result back to timestamp with time zone:

SELECT timestamptz 'epoch' + 1349118398 * interval '1s';

Or simply to_timestamp():

SELECT to_timestamp(1349118398);

select [some aggregates] group by (pos_x/5, pos_y/5); 

Where instead of 5 you can have any number depending how much aggregation you need/