The IQ of AI: measuring intelligence in AI models - Jodie Burchell¶

(One of my summaries of the 2023 Dutch pythonconferentie python meeting in Utrecht, NL).

Jodie wants to look at large language models (like chatgpt), which went into full-on hype mode this year. Let’s look at some historical examples of what people have thought to be artificial intelligence.

At the first demonstration of a remote contolled boat, people didn’t know what happened. Was this artificial intelligence?
In 1997, Kasparov lost the chess match to Deep Blue. The basic problem was that the computer did a totally unexpected move (“real intelligence”) in the first match that threw Kasparov off track and caused him to make mistakes in confusion.
Years later a google engineer thought an AI had come to life, It responded like a nine year old. He wanted legal protection for the AI. (He was fired).

Now chatGPT: people think it displays real artificial general intelligence. But what is the reality? Can we look at it more scientifically? A well known article (“sparks of artifical general intelligence”) claims to use categories from an older article to rank it: reasoning, planning, problem solving, abstract thinking, comprehending complex ideas, learning quickly and from experience.

Only… Jodie has lots of experience in psychology and those are not categories that are used to gauge intelligence. And the older article also couldn’t be found.

A common problem with artificial intelligence is that it is only considered artificial until you explain it. When we know how a machine does something intelligent, it ceases to be regarded as intelligent.

Another problem is that artificial intelligence is often very focused and goal-oriented. It performs impressively on one specific task and totally not on others. Don’t give a math problem to chatgpt…

There are several levels of intelligence:

No generalisation: solving tic tac toe.
Local generalisation: pattern recognition.
Broad generalisation: self-driving cars.
Extreme generalisation: human-like capability.
Universality: better than what we can do. Beyond us.

As a human, you’re generally intelligent. You can learn several broad abilities. Which allows you to accomplish tasks. So there are several levels.

General intelligence maps to the extreme generalisation level. Broad maps to broad. No/local generalisation to tasks. This is a good way to think about AI, too.

So if you look at tasks: generalisation is difficult. How many ways are there of solving the task? How many examples are there? How much experience do you need? How high is the value of achieving intelligence? That can be a way of determining the intelligence at the task.

Learning in AI is often done through brute force: lots and lots of examples. If a problem is too far outside of the original training set… Chatgpt hasn’t exactly released what they trained on, but it at least 2021-and-earlier models. When asked to solve programming puzzles that were available on the internet in 2021, chatgpt had a 100% score. “So we don’t need programmers anymore”.

But when asked to solve puzzles from 2022, it failed misserably… When asked, chatgpt even said where it got the answers from the pre-2021 data.

A better question than is this real generic AI is to ask where can this be realistically used. And programmer AI tools like copilot are one of the better use cases, actually. The added benefit is that there’s quite some extra validation you can do on the output (code syntax checkers, the interpreter, etc.)

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):