Overview
If you want to read/write shp files in Python you are probably going to use pyshp (or maybe Fiona).
I tried to find a python complement for pyshp library in pypi that could make queries on the database file .dbf
.
Given the Spanish census section shapefile (INE Cartografía digitalizada) we can visualize it with Mapshaper:
The goal is to select parts of a shapefile map and save it as another collection of .shp
, .dbf
, shx
files. For example:
- select from Spanish census section:
- province: Madrid
- autonomy: Castilla y León, Galicia
Shapefile
The shapefile is a geospatial vector data format. Actually, it is a collection of three files:
- `.shp: binary shapes (polygon…), the geometry itself.
.dbf
: data of shapes or records. In dBase format..shx
: shape index format (not mandatory) for quicker indexing.
Other files like .proj
, .shp.xml
, .sbn
… may be included.
Query snippet
The .dbf
file has the following fields for each shape:
For example NPROV
is the name of the province, NMUN
is the name of the municipal area and CUMUN
is the municipal code.
We would like to select in the same query, multiple fields with multiple values, for example, NPRO = Madrid, Sevilla and NMUN = Barcelona.
First we load the shapefile from the Spanish Statistical Institute into our brand new ShapeFileItils
object that extends from shapefile.Reader
class:
Then we make a query selecting several provinces from the North of Spain, skipping Asturias, one autonomous region and the most populated municipalities from Asturias:
It will return a shapefile.Writer
object that we can save and visualize in Mapshaper:
We can also make two queries to save two shapefiles and represent them together. Madrid province and Madrid municipality area:
View of Retiro census section inside Madrid municipality area inside Madrid Province:
Next steps
Making queries this way is kind of odd. It would be nice to make it similar to how Django ORM do it. For example:
sf.objects.filter( NPRO=’Barcelona’ ).exclude( NMUN=’Prat de Llobregat, El’ )…