Djangocon EU: digitising historical caving data with Python and Django - Andrew Northall¶
(One of my summaries of the 2026 Djangocon EU in Athens).
Andrew’s hobby is caving: exploring wild cave systems. It is a niche hobby, but he likes it a lot. One of the nicest thinks is that you can be the first human being standing in a specific spot on earth.
Cavers achieve high standards, without formal organisation. Sounds like Django :-) With Django, we have exposure and visibility and bug trackers: we’re transparent. So our high standards are verifiable. Would the same be possible for caving? The safety record is real good, but the public image is bad. Can the safety record be made more visible?
There’s an (American) organisation that has historical incident reports. But… mostly in printed text. Image recognition was hard. And something like “fall 1985” isn’t really a Postgres data format. Several volunteers put a lot of work in it by manually entering incidents in a Django website that Andrew build. It was slow going.
Nowadays they have an LLM pipeline for it that is actually really good. Extraction: docling. Splitting into separate incidents: LLM. Formatting/structuring/checking: LLM. Normalisation: mostly with the help of Django.
Docling (https://github.com/docling-project/docling) is a great project for extracting usable text data out of various sources. Including detecting paragraphs that start on one page and end on the next.
Normalisation was a problem. Locations are stored as a tree structure using Django Treebeard. You get several precision levels. If you know the town, reference that. If you only know the state, do that. Handy with data that’s not always that precise. The strange or seasonal incomplete dates (“spring 1972”) are handled with a custom Django model field that can handle it.
He generated the frontend with claude. He was surprised that, on a page that shows a single incident, claude added a section for the AI summary of the incident…
Nice: the dataset is already being used for actual research on how to reduce incidents.
A note about the volunteer work that went into the original manual work: that was used as testcases and for verification for the LLM work.
Unrelated photo explanation: a trip in November to the Mosel+Eifel region in Germany. Disused railway station of Gillenfeld on the former Daun-Wittlich line.