aiGrunn: thinking outside the chat box - JP van Oosten¶

(One of my summaries of the 2023 Dutch aiGrunn AI conference in Groningen, NL).

Getting chatgpt to output valid json can be a chore:

> extract xxxx, output as json
> extract xxxx, output as json list
> extract xxxx, output as json with this schema
> extract xxxx, output as json, aargh JSON I BEG YOU

Apparently they solved the json problem last monday. But he had the same problem when trying to get chatgpt to output only English and not Dutch. So the underlying problem is still there: you have to beg it to output in a certain way and hope it listens.

Some other problems are hallucinations: chatgpt telling you something with complete confidence, even though being wrong. And biases. And it is not really a chatbot, as it doesn’t ask questions. Unparseable output. Lack of explainability. Privacy issues as you’re sending data to servers in the USA.

And… what are the data sources chatgpt used? We don’t know. They’re called “openAI”, but they’re definitively not open.

When to use LLMs and when not to use them. Some good use cases:

Zero/few shot learning. A quick way to get a simple minimum viable product or proof of concept.
Summarizing/transforming.
Data format transformation. html to json for instance.
You can use it to gather training data for easy bootstrapping.

Some bad use cases:

Structured classification tasks. You really want proper, neat output. Especially when you have lots of classes or a big context. For small personal projects it might be OK, but not for production.
Non-text classification… A large language model of course won’t help you with it.
When costs or energy consumption is important. Scaling is an issue.
When it is unclear who is responsible for what gets outputted. A chatbot generating “of course, you can get a refund” can be problematic if the customer really wants the refund it should not get…
When you really want to be sure you get the right answer.

What are some ideas you can look at?

gzip plus near-neighbor analysis. Compress text and see how similar they are. It is not perfect, but it is a neat trick.
“Bag of words” plus “random forest” (a function from scipy).
Embeddings and a classifier. A LLM is used to annotate a dataset and you can then extract the interesting data and work with it.

What he thinks is important: keep humans in the loop. Prevent unwanted consequences. Add a preview step before sending stuff out into the world. Make classifications visible and allow corrections. Ask the user to label something if it is unclear. And don’t forget to audit the automatic classifications.

When all you have is a LLM, everything might start to look like a generative task. But don’t think like that. Who is going to use it? What is the actual problem? Spend some time thinking about it.

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):