Python Leiden (NL) meetup: Rendering spatial data in 3d using zarr, jax, babylon.js and DuckDB - Jesse K.V.

Tags: pun, python

(One of my summaries of the second Python Leiden (NL) meetup in Leiden, NL).

He’s working with civil engineering, hydrology and weather data. And… he wanted to toy with 3D models. His example site: https://topography.jessekv.com/ . You can click almost anywhere in the world and get the upstream catchment. (I checked it: yes, it seems to work pretty well!)

He runs all requests through a single asyncio python thread. As a personal challenge he wanted it to handle the heavy load of a hacker news post. In the end, it worked fine. One async python thread was more than enough.

One of the tricks he used was to preprocess as much as reasonable so that most clicks are direct lookups in a database (vector data). Depending on the size of the selected area, he uses more detailed rasters for small areas and coarser ones for big areas.

He wanted a mostly-working prototype quickly, so he experimented with LLMs. Generating math code was mostly bad, but the UI code was OK.

He used duckdb with a spatial extension. Duckdb uses GDAL vector routines. This is what he used to pre-process the catchment areas on his M1 mac laptop. Afterwards, he exported it to postgres. Postgres is much more optimised for actual production use.

Duckdb doesn’t always work perfectly, but if you’re able to define your workload in such a way (parallelised) that you stay within the limits of your memory, you can get real good performance out of it.

Duckdb’s file-based approach is also handy. Just like sqlite’s files. Easy for experimenting.

zarr is what he used for pre-processing the 3D landscape. Zarr is efficient for storing large arrays of gridded data. Zarr is way nicer than netcdf. It is designed to leverage the linux page cache. And you store compressed data in memory. Storing on S3 is also well-integrated.

jax is an easy way to take numpy/scipy to make it run in parallel. JAX-metal is a jax backend that runs on his M1 macbook’s GPU. Processing is aligned to chuncks for more efficient reads and writes.

For landcover, he used jax.scipy.stats.mode and for elevation jax.numpy.nanmean. (NaN: not-a-number: elevation models are made with radar technology and water areas reflect radar, resulting in NaN).

A useful trick he used was to introduce a bit of wait time for some expensive operations to make sure his service wouldn’t get flooded with requests. Simply waiting a few seconds and then popping up a dialog “you’re going to download 400MB, y/n?” already helped.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):