Software Insights: osm2pgsql with Jochen Topf

Thunderforest has been sponsoring the osm2pgsql software project since early 2020. It’s a key piece of open-source technology that we rely on, and we’re very happy to support the team behind it. We use osm2pgsql to process and load OpenStreetMap data into our PostGIS databases, which we then use for creating our whole range of maps. It might seem at first like loading this data into the databases would be a reasonably straightforward task. But when you consider handling complex features like bicycle, hiking and public transport route relations, or assembling huge forest polygons with tens of thousands of corners from hundreds of individual OpenStreetMap ways, and then add on top the challenges of processing the incremental “diff updates” that we pull in from OpenStreetMap every few hours, then there’s more complexity to osm2pgsql than initially meets the eye.

Jochen Topf is one of the lead developers of osm2pgsql, and we worked together on the initial concept for what became osm2pgsql’s “flex backend” in 2019. More recently, in early 2022 Jochen worked on speeding up our “diff update” processing, including by making changes to the way we process geometries for our vector tiles. Some ideas from that work will be heading upstream to osm2pgsql in the near future.

Last month Jochen and I met up at the annual OpenStreetMap State of the Map conference. It was great to see the continuing interest in osm2pgsql, particularly in the Birds of a Feather session, and it was great to have a chance to catch up with Jochen in person too.

Andy: Do you remember when you first got involved in osm2pgsql? What was it that you worked on back then?

Jochen: Osm2pgsql has a long history of different contributors. Looking into the commit log, my first contribution was a typo fix in 2007. So I have known and used osm2pgsql for a long time. But I started with serious contributions only in November 2019, first lots of code and build system clean-ups to get more familiar with the code and then implementing the flex output.

Andy: What have you been working on more recently?

Jochen: Because osm2pgsql is an old project there were lots of corners where we needed to modernize and clean up the code. That’s the background task I keep working on. This lead to some huge performance improvements in the non-slim import code for instance, and a lot more robust code with less bugs. And I continue to work on the flex output and make it more flexible still. Lately the work has been around refactoring a lot of the geometry processing code and making it accessible from the Lua configuration.

Andy: Other than loading data into PostGIS for map rendering, what else can osm2pgsql be used for?

Jochen: Because a PostGIS database is such a versatile tool, there are endless possibilities here. Rendering into raster or vector maps is just the most visible use case (in more than one sense). The other important use case that comes to mind is geocoding. The Nominatim geocoder used, for instance, on openstreetmap.org uses osm2pgsql for data imports. You can also analyze the OSM data in many ways once you have it in a PostGIS database, for instance to calculate road lengths, analyze the public transport network or figure out where to put up a wind turbine that’s far away from residential buildings but close to existing power lines. Its is also a great tool to do (ad-hoc) queries for finding bad or inconsistent data in OSM.

Andy: When you sit down to work on osm2pgsql, how do you prioritise where to put your efforts, for example between routine code maintenance or developing new features?

Jochen: Important bug fixes usually come first, of course, and I also try to stay on top of reported issues and participate in discussions on GitHub. Helping users with their small problems often isn’t much effort but can have a huge impact on their use of and happiness with osm2pgsql. Often this leads to me finding some small thing in the code that could be improved and, if the effort isn’t too large, I’ll just do that. Not just fixing the immediate problem, but making the code or the manual a bit better each time you look at it, makes for huge progress in the long run. Serious development needs more time though, so that will usually happen every few weeks when there is time in my schedule. There are no hard and fast rules on what gets priority, but an important consideration is how many users will probably benefit (and whether a paying customer does). Often something gets priority that doesn’t seem like an important thing, just because I know it will enable lots of other improvements down the line.

Andy: What’s next in your plans for osm2pgsql?

Jochen: I am still working on improving the geometry processing facilities of osm2pgsql. This means that more geometry processing can be done on the fly while the data is imported into the database, instead of having it to do afterwards. I am also looking into adding more functionality around supporting geometry generalization with the help of PostGIS. This should make it easier for users to generate simplified data for rendering on lower zoom levels.

You can follow regular updates on the development work and new version releases on the osm2pgsql website. You can also consider joining Thunderforest in sponsoring the development of osm2pgsql.