(One of my summaries of the Dutch PyUtrecht meetup in Utrecht, NL).
Take your average “advent of code” text parsing problem. First try: some basic
string.split()
stuff. Not very readable and a bit brittle. Second try: a feast of
regex’es…. More elegant, but much less readable.
A regex can parse regular text. If text isn’t regular, you have to get to “EBNF”, extended Backus-Naur form. This is what lark can help you parse.
In lark you define a grammar. You can import pre-defined parts of vocabulary like “signed number” or “quoted string”. You can define custom types/variables. And you can group stuff. So first some number of “account statements”, followed by “whitespace”, followed by “transaction statements”, for instance.
After parsing with lark, you can then process the results. A useful thing you can do is
to convert everything to python dataclasses
. When you pair it with pydantic, you get
even easier type coercion and validation.
A nice quote by Miriam Forner: writing code is communicatng to your future self and other developers. So: readability matters. And Lark can help.
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):