Source Datasets and Data model.
map5topo is composed of multiple (open) datasets. The aim is to fit these all into a single model, (feature) classification and DB-schema.
This is an ongoing process. The name of the (PostGIS) schema is
Tables in this schema are filled with data from other schemas/tables that
contain the various (vector) sources like OpenStreetMap, BRT, BRK, BGT, BAG etc.
SQL scripts are used for this transformation.
Some similar work, common data schemas from various source datasets:
- OpenMapTiles Schema OMT
- PDOK Vector Tiles BGT BRT - Pilot
- Basemap DE Data Model
- OVerture Maps Data Schema
Most approaches apply hierarchical feature classification: OMT uses
subclass, PDOK Vector Tiles Pilot uses
lod2 etc. We like the latter convention as it allows an endless
hierarchy (and is short to type!).
Although reclassification has been exercised in many mapping projects like the ones mentioned above, the extra challenge here is that completely disjunct datasets are used: Dutch Key Registries (BAG, BGT, BRT etc) and OpenStreetMap. Even among Dutch Key registries very different feature classification schemes are in place. For example 'Terrain' (Dutch: 'Terrein') among Top10NL and BGT and even Top50NL.
An additional challenge is that part of the map include neighbouring countries, for which only OSM and maybe later local datasets, is/are available. The aim is to work with completely integrated data tables/layers, not seperate layers/styles for neighbouring countries.
This is the list of feature sets. Criteria/guidelines:
- Each feature set will be a table.
- Some tables may expand to multiple Layers.
- Each feature set/table will always have a single geometry type.
- Some, like housenumber, which are in effect labels, are separate feature sets because of the sheer set's size.
- For some names the geometry type is implicit, like 'poi'.
- No plural naming like
- Multiple geometry simplifications/generalizations of the same feature may appear in single table
- Hillshade is not table-based, but GeoTIFF raster
The list below is not fixed, subject to change based on new insights, or data items that do not fit in any set.
- aeroway - polygon - aerodromes, aprons, helipads (aeroway lines part of transport_line)
- area_label - point - any that has a name (except house numbers, transport, water)
- border - multipolygon - administrative borders
- contour_line - line - height lines (derived from DEM)
- grid - line - grid lines in map
- housenumber - point - clear, also house names
- landcover - polygon - mostly ground level earth covering ("aardbedekking")
- landuse - polygon - functional use of land, like military areas, graveyards, parks
- parcel - line - borders of cadastral parcels
- pitch - polygon - mainly sports pitches - special case of landuse, special handling with overlay SVG
- place - point - names of cities, towns up to hamlets
- poi - point - Points of Interest
- structure - polygon - anything human-built from buildings/houses up to civil tech structures
- structure_line - line - barriers, fences, powerlines
- transport - line - transportation infrastructure: roads, railways, etc
- transport_area - polygon - transportation infrastructure areas
- transport_label - point - road names and symbols
- water - multipolygon - water polygons
- water_label - line - water names formed/curved by shapes of waterbodies.
- waterway - line - water lines and names
aerowaylike aerodromes (polygon) is always a separate feature set, why? Could be in
- to add to this: aeroway lines are part of
- think labels can be bundled
The table-SQL for the ETL can be found here: https://github.com/map5nl/map5topo/tree/main/tools/etl/sql/map5/tables.
Each table in the map5 schema has a similar setup, i.e. columns:
CREATE TABLE map5.xyz ( -- table-specific columns, usually classifications, area, population etc (lod1-lod3) .. .. .. -- COMMON COLUMNS -- -- Relative height z_index INTEGER DEFAULT 0, -- min and maxzoom in Dutch RD -- when to show an object on the map rdz_min INTEGER, rdz_max INTEGER, -- Where the data record originates from src_schema TEXT, src_table TEXT, src_idref TEXT, -- Is the object in NL or outside (abroad)? abroad BOOLEAN DEFAULT FALSE, -- Geometry in Dutch EPSG geom GEOMETRY(POINT|POLYGON|LINESTRING, 28992) );
Many tables will contain a classification, like for Landcover or POIs. For example:
CREATE TABLE map5.landcover ( lod1 TEXT, lod2 TEXT, lod3 TEXT, area BIGINT DEFAULT 0, -- Common columns z_index INTEGER DEFAULT 0, rdz_min INTEGER DEFAULT -1, rdz_max INTEGER DEFAULT 13, src_schema TEXT, src_table TEXT, src_idref TEXT, abroad BOOLEAN DEFAULT FALSE, geom GEOMETRY(POLYGON, 28992) );
lod3 provide an hierarchical classification of the
feature. This is defined within the project. Each source dataset-specific
classification is mapped. Usually
lod2 is sufficient.
lod3 is usually
the source-specific value like
naaldbos, mainly for debugging.
The image below summarizes this table design. (Click image to enlarge).
Each table may contain multiple geometry generalizations (simplifications)
for the same object. Per record the zoomlevel range
is specified with
rdz_max. Mapnik always provides a
when accessing a
Layer. Via the SQL Function
rdz() this scaledenominator is
converted to an RD Zoomlevel (range 1-13, equal to WebMerc 6-18) that is used in the query on that Layer.
This way only the relevant records for that zoomlevel are selected.
Many zoom-ranges also have VIEWs, for example for analysis. If needed, for performance,
PostgreSQL materialized VIEWs may be applied. But at least
data for a single feature type, usually a layer, is not spread over multiple tables now.
For example a VIEW for low-zoom Terrain:
CREATE VIEW map5.landcover_z0_z3 AS SELECT lc.* FROM map5.landcover lc WHERE lc.rdz_min >= 0 AND lc.rdz_max <= 3;
This also works for Web Mercator tiles as scale-ranges are shared, see e.g.
. . <!ENTITY maxscale_zoom15_rd10 "<MaxScaleDenominator>20000</MaxScaleDenominator>"> <!ENTITY minscale_zoom15_rd10 "<MinScaleDenominator>10000</MinScaleDenominator>"> <!ENTITY maxscale_zoom16_rd11 "<MaxScaleDenominator>10000</MaxScaleDenominator>"> <!ENTITY minscale_zoom16_rd11 "<MinScaleDenominator>5000</MinScaleDenominator>"> . .
Using these common ranges allows parallel zoomlevels: for example RD level 11 is 16 in Web Mercator. RD level 13 is WM level 18, the highest level etc. Usually zoom WM = Zoom RD +5.
The map area includes data from countries both bordering (Germany, Belgium) and in vicinity
(Northern France) of The Netherlands. By intersecting a (multi)polygon that contains The Netherlands' borders,
in each OSM-record can be marked as located abroad or not This is realized through a simple boolean column,
abroad. For data from Dutch Key Registries, the fixed
abroad value is
OSM Source Data
OSM data is downloaded from geofabrik.de, and comprises:
# The Netherlands https://download.geofabrik.de/europe/netherlands-latest.osm.pbf # Abroad http://download.geofabrik.de/europe/belgium-latest.osm.pbf http://download.geofabrik.de/europe/germany/niedersachsen-latest.osm.pbf http://download.geofabrik.de/europe/germany/nordrhein-westfalen-latest.osm.pbf http://download.geofabrik.de/europe/germany/rheinland-pfalz-latest.osm.pbf http://download.geofabrik.de/europe/france/nord-pas-de-calais-latest.osm.pbf # Sea https://osmdata.openstreetmap.de/download/simplified-water-polygons-split-3857.zip https://osmdata.openstreetmap.de/download/water-polygons-split-3857.zip
"The Netherlands" and "Abroad" and stitched together using Osmosis. "Water" data is basically
sea, with exception of Oosterschelde. These are directly converted, using GDAL
ogr2ogr, to PostgreSQL
files for quick reuse.
The image below shows an example for the table
map5.landcover ("Landcover" and "Landuse" are separated).
The figure shows the table structure and the hierarchical classification:
lod2 and their values.
This table is filled from various source tables like BGT and BRT (all "terrain"-related) but also OSM.
Also an excerpt from the
map5.landcover.sql ETL is shown, for mapping records from BRT Top50NL.
This mapping is an ongoing process, as the lod-hierarchy needs to be established and mapped
from very different source datasets. Even Dutch Topographic datasets like BRT (smallscale) and BGT (largescale) have
very different feature classification schemes. Also none of the source datasets seems to get the difference
between "Landcover" and "Landuse" right. For example, "orchard" or "graveyard": are these Landcover or Landuse?
This is also an ongoing discussion in the OSM Community, as
this recent Landcover proposal shows.
Each record within tables in the
map5 database schema contains its source schema/table and visibility zoom-range.
This allows for generating metadata such that this information can be made available per zoomlevel, but also other
statistical information like the number of records (at that zoom-level) from a particular source table.
This is exactly what the
map5.metadata table beholds. As part of the ETL (SQL) that fills
map5. tables from thir source
schemas/tables, a utility SQL function will also insert (replace) records in
map5.metadata. The image below shows the
Below an example query: "Show me all information for zoomlevel RD 12 that is not abroad".
Below is the SQL function called at the end of each
map5. table generation, that extracts the relevant metadata
records from that table.