Including citizen science data as well
And it all fits nicely in the definition of big data
SELECT ST_AsText(
ST_ClosestPoint(
ST_GeomFromText('POLYGON((175 150, 20 40, 50 60, 125 100, 175 150))'),
ST_Buffer(ST_GeomFromText('POINT(110 170)'), 20)));
------------------------------------------
POINT(140.752120669087 125.695053378061)
CREATE FUNCTION add_two(integer) RETURNS integer AS 'select $1 + 2;' LANGUAGE SQL IMMUTABLE RETURNS NULL ON NULL INPUT;
select my_column, add_two(my_column) from my_table where my_column < 144 my_column | add_two ------------------- 3 | 5 143 | 145
-- Function that breaks lines at long segments -- Works with simple LineStrings, outputs either LineStrings or MultiLineStrings CREATE OR REPLACE FUNCTION break_lines(line geometry, distance integer) RETURNS geometry AS $BODY$ DECLARE segment geometry; line_1 geometry; started boolean; multi boolean; multi_started boolean; multi_line geometry; length float; i int; count int; BEGIN count := ST_NPoints(line); IF count IS NULL THEN RETURN NULL; END IF; IF count < 2 THEN RETURN NULL; END IF; started := false; multi := false; multi_started := false; FOR i IN 1..count LOOP segment := ST_MakeLine(ST_PointN(line, i), ST_PointN(line, i + 1)); length := ST_Length(segment, false); IF length IS NULL THEN CONTINUE; END IF; IF length < distance THEN IF NOT started THEN line_1 := ST_MakeLine(ST_PointN(line, i), ST_PointN(line, i + 1)); started := true; ELSE line_1 := ST_MakeLine(line_1, ST_PointN(line, i + 1)); END IF; ELSE RAISE NOTICE 'Long segment!'; RAISE NOTICE 'Length: %', length; IF line_1 IS NULL THEN CONTINUE; END IF; IF NOT multi_started THEN multi_line := line_1; multi_started := true; started := false; multi := true; ELSE multi_line := ST_Union(multi_line, line_1); multi := true; started := false; END IF; END IF; END LOOP; IF NOT multi THEN RETURN line_1; ELSIF multi AND line_1 IS NOT NULL THEN RETURN ST_Union(multi_line, line_1); ELSE RETURN multi_line; END IF; END $BODY$ LANGUAGE 'plpgsql';
ST_NPoints(), ST_PointN(), ST_Length(), ST_Union(), ST_MakeLine()
CREATE TABLE split_lines_1000m AS
SELECT row_number() OVER (ORDER BY data.geom ASC) as id, some_other_important_column,
break_lines(data.geom, 1000) as geom,
FROM
(SELECT * FROM mytable) AS data
Radius of gyration is a measure to quantify the spread of locations around their center of mass
Looks complicated?
It's not.
All we need is to find the center of mass and measure distances
-- Function to compute radius of gyration CREATE OR REPLACE FUNCTION comp_gyration(usr text, tab character varying) RETURNS SETOF double precision AS $BODY$ DECLARE rec geometry; mass_center geometry; line geometry; count int; count_points int; count_geometries int; points_num int; geometry_num int; point geometry; tmp int; sum float; gyration float; BEGIN -- radius of gyration = sqrt(1/n * sum( distance(point - center of mass)^2) -- n: count, sum: sum -- Calculate center of mass -- Use subquery. ST_Dump can't be nested with ST_Collect EXECUTE 'SELECT ST_Centroid(ST_Collect(foo.geom2)) FROM (SELECT (ST_DumpPoints(geom)).geom as geom2 FROM ' || tab || ' WHERE key = ' || quote_literal(usr) || ')as foo;' INTO mass_center; sum := 0; count := 0; -- For each geometry that belongs to a user FOR rec in EXECUTE 'SELECT geom FROM ' || tab || ' where key =' || quote_literal(usr) || ';' LOOP -- Handle multilines CASE ST_GeometryType(rec) WHEN 'ST_MultiLineString' THEN -- Extract each linestrings geometry_num := ST_NumGeometries(rec); count_geometries := 1; LOOP count_points := 1; line := ST_GeometryN(rec, count_geometries); points_num := ST_NPoints(line); LOOP -- calculate distance from each point to the center of mass. add to sum sum := sum + ST_Distance(ST_PointN(line, count_points), mass_center, false)^2; count_points := count_points+ 1; -- get next point, exit at last count := count + 1; IF count_points > points_num THEN EXIT; END IF; END LOOP; -- move to next linestring, exit at last count_geometries := count_geometries + 1; IF count_geometries > geometry_num THEN EXIT; END IF; END LOOP; -- If geometry is not multi, loop thrugh all points WHEN 'ST_LineString' THEN points_num := ST_NPoints(rec); count_points:= 1; LOOP sum := sum + ST_Distance(ST_PointN(rec, count_points), mass_center, false)^2; count_points := count_points+ 1; count := count + 1; IF count_points = points_num THEN EXIT; END IF; END LOOP; -- If geometry is polygon WHEN 'ST_Polygon' THEN -- Extract boundary as line, then use method above rec := ST_Boundary(rec); IF rec IS NOT NULL THEN points_num := ST_NPoints(rec); count_points:= 1; LOOP -- exit before last point (to avoid double counting of first point) IF count_points = points_num THEN EXIT; END IF; sum := sum + ST_Distance(ST_PointN(rec, count_points), mass_center, false)^2; count_points := count_points + 1; count := count + 1; END LOOP; ELSE count := 0; END IF; -- Finally if geometry is point, just calculate sum of square distances ELSE sum := sum + ST_Distance(rec, mass_center, false)^2; count := count + 1; END CASE; END LOOP; -- Return radius of gyration for the user IF count = 0 THEN gyration := 0; ELSE gyration := sqrt(CAST(1 AS float)/CAST(count AS FLOAT) * sum); END IF; -- RAISE NOTICE 'gyration: %', gyration; RETURN NEXT gyration; END $BODY$ LANGUAGE plpgsql
SELECT comp_gyration('some_user_key', 'some_table_with_geometry'); -- Another approach: ALTER TABLE some_table ADD COLUMN rad_gyr float; UPDATE some_table SET rad_gyr = comp_gyr(key, 'data_table');
Pretty straightforward spatio-temporal overlap between 2 data sources
Keep your database statistics updated
VACUUM ANALYZE;
Check on AUTOVACUUM
Monitor your queries and see what's expensive
EXPLAIN ANALYZE
SELECT id, version, minor, visible, user_id, user_name, valid_from, valid_to, tags, z_order, ST_AsText(geom) FROM southeurope_line
WHERE
(valid_to between '2015-12-20'::timestamp and '2015-12-20'::timestamp + interval '25' hour or valid_from between '2015-12-20'::timestamp and '2015-12-20'::timestamp + interval '25' hour)
and geom && ST_MakeEnvelope(2.124825,41.352072,2.219582,41.407716)
QUERY PLAN
"Bitmap Heap Scan on southeurope_line (cost=8.92..33.04 rows=6 width=662) (actual time=32.294..34.359 rows=134 loops=1)"
" Recheck Cond: (((geom && '010300000001000000050000001DC9E53FA4FF00402DEBFEB110AD44401DC9E53FA4FF00402907B30930B444407A522635B4C101402907B30930B444407A522635B4C101402DEBFEB110AD44401DC9E53FA4FF00402DEBFEB110AD4440'::geometry) AND (valid_to >= '2015-12-20 (...)"
" Heap Blocks: exact=38"
" -> BitmapOr (cost=8.92..8.92 rows=6 width=0) (actual time=32.231..32.231 rows=0 loops=1)"
" -> Bitmap Index Scan on southeurope_line_geom_and_time_index (cost=0.00..4.46 rows=3 width=0) (actual time=19.530..19.530 rows=75 loops=1)"
" Index Cond: ((geom && '010300000001000000050000001DC9E53FA4FF00402DEBFEB110AD44401DC9E53FA4FF00402907B30930B444407A522635B4C101402907B30930B444407A522635B4C101402DEBFEB110AD44401DC9E53FA4FF00402DEBFEB110AD4440'::geometry) AND (valid_to >= '20 (...)"
" -> Bitmap Index Scan on southeurope_line_geom_and_time_index (cost=0.00..4.46 rows=3 width=0) (actual time=12.699..12.699 rows=79 loops=1)"
" Index Cond: ((geom && '010300000001000000050000001DC9E53FA4FF00402DEBFEB110AD44401DC9E53FA4FF00402907B30930B444407A522635B4C101402907B30930B444407A522635B4C101402DEBFEB110AD44401DC9E53FA4FF00402DEBFEB110AD4440'::geometry) AND (valid_from >= ' (...)"
"Planning time: 0.403 ms"
"Execution time: 34.488 ms"
Create smaller tables
Go deeper if that's not enough
Indexes are a common way to enhance database performance. An index allows the database server to find and retrieve specific rows much faster than it could do without an index.
Your database literally checks every single record to see if it should be returned
Your database effectively cuts a portion of your data and performs checks on a much smaller set
Think of them as a Table of Contents
That your database engine always opens when you need something
So it doesn't have to read through the whole book but only the page where your records are located
A spatial index index usually uses the bounding box of features as a proxy. A light read on the process can be found here
They are really important to have!
CREATE INDEX [indexname] ON [tablename] USING GIST ([geometrycolumn]);
Really, don't miss creating this! Did I mention they're important?
To check 5 billion records
In 70+ different tables
If they overlap with tens of thousands irregular areas
From all around the world
At a given point of time?
Which is reasonable.
You don't have to be a computer scientist to use PostgreSQL/PostGIS ...
... If you're willing to learn about it.
But that's what scientists do
We extend horizons and navigate to previously unknown areas
Just ask!
levente.juhasz@ufl.edu
Social media example
Photos from:Questions:
Setup
Students of SUR5365 Digital Mapping
Students evaulated photos in areas they're familiar with
For each photo we acquired the photographer's position from students
So we were able to compare:
How did PostgreSQL/PostGIS help in this study?
Data collection
DB connected to frontend to display photos for students
Multiple geometries stored for each photo