PyUtrecht: The Earley Lark parses more: from structured text to data classes - Dan Jones

Tags: pun, python

(One of my summaries of the Dutch PyUtrecht meetup in Utrecht, NL).

Take your average “advent of code” text parsing problem. First try: some basic string.split() stuff. Not very readable and a bit brittle. Second try: a feast of regex’es…. More elegant, but much less readable.

A regex can parse regular text. If text isn’t regular, you have to get to “EBNF”, extended Backus-Naur form. This is what lark can help you parse.

In lark you define a grammar. You can import pre-defined parts of vocabulary like “signed number” or “quoted string”. You can define custom types/variables. And you can group stuff. So first some number of “account statements”, followed by “whitespace”, followed by “transaction statements”, for instance.

After parsing with lark, you can then process the results. A useful thing you can do is to convert everything to python dataclasses. When you pair it with pydantic, you get even easier type coercion and validation.

A nice quote by Miriam Forner: writing code is communicatng to your future self and other developers. So: readability matters. And Lark can help.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):