PyGrunn: JSON freedom or chaos, how to trust your data - Bart Dorlandt

Tags: python, pygrunn

(One of my summaries of the 2026 one-day PyGrunn conference in Groningen, NL).

Subtitle: a real-world journey from chaos to confidence using Pydantic and Pytest.

Idealy, you’d have perfect json files with a fixed format and rigorous validation and ideally generated. But in a customer project, the other programmers weren’t too happy about it. They had massive JSON files, partially manually crafted. Some where just one single line and others were vertically aligned. And perhaps someone depended on the specific format for some “sed” or “awk” hacking… So whatever happens: it works, don’t touch it.

The freedom trap. No schema means no contract. No contract means no trust. Fields accumulate, nobody removes them: “someone might be using it”. Multi-team challenges: not everyone has the same skillset.

He wanted a different future: a trusted future. Validated and tested and formatted.

Pydantic is a python library for data validation using Python type annotations. You can define a data model with type hints. it will automatically validate and parse data according to those models:

from ipaddress import IPv4Address
from pydantic import BaseModel

class Server(BaseModel):
    hostname: str
    ip: IPv4Address
    ...

Make sure to look at pydantic-extra-types, they have lots of handy types like “two-character country code”.

There’s AfterValidator, you can use it to add a second validator to a field. So first the str type to validate it is a string, then afterwards some ip address validator or so.

Understanding the data is important. Split it up in smaller pieces and try to understand/model/validate those. Especially in a corporate setting, splitting up the problem is handy: you have some small success you can mention at the standup :-)

Do it iteratively. One piece at a time. If you find a problem, create a ticket for it. It might not get fixed, but at least you end up with a list you can slowly tackle with the rest of the organisation.

A good tip: if you discover an error in the data, provide a good, clear error message that your colleague can understand.

When you export the data, use model_dump(exclude_optional=True) to exclude all the optional fields instead of having it as my_field: None.

Bonus: you can call YourModel.model_json_schema() to generate a JSON schema for the Pydantic model. You can then use the JSON schema in vscode when you manually edit your JSON.

Pydantic is great at validating individual fields and structures. But not at validating things that span the entire document, like making sure that all hostnames are unique. He used Pytest for it: he wrote such validation checks as pytest functions!. You can even use Pytest test parametrization to run the same test on multiple directories.

https://reinout.vanrees.org/images/2026/lac-de-kruth5.jpg

Unrelated photo: the “lac de Kruth-Wildenstein” reservoir during a family holiday in France in 2006.