Levels of Analysis in Machine Learning

I work in machine learning, but my background is in cognitive neuroscience; my understanding of intelligence in artificial systems is heavily influenced by understanding neural systems. These two disciplines come from very different directions, because neuroscience is always approaching systems that work quite well in practice but are hard to probe and understand, while AI is the opposite; it’s always approaching ideas that theoretically make sense, but which don’t always work as well as we initially hope.

I hope that I can introduce machine learners to ideas from cognitive science. Partially because I think it’s useful, and partially because I think it helps with alignment to have greater diversity of mechanisms. So I want to tell you about an idea that has always been very useful to me, yet I think it’s missing from the toolkit of ML researchers who come from a computer science background. And I also want to tell you why it will be useful to you.

Introduction to Levels of Analysis

Often called Marr’s Levels of Analysis, these are 3 different levels on which to understand any mechanism which is proposed to be a source of intelligent behavior.

The three levels are:

  • Computational: what are the inputs and outputs?
  • Algorithmic: description of a function
  • Implementational: the details

I’m going to include this delightful graphic from a published paper on the topic:

Source

Now, David Marr was working in neuroscience, so a lot of his examples are from the brain, but there are equally good examples in machine learning. So I’m hopefully going to modernize this concept by situating it within the field as we understand it today.

1) Computational

When a company goes to employ artificial intelligence, their first thought is about the business need that they have. They may want to, for example, predict the future cost of material, or find the product which the user is most likely to buy. An engineer can find a way to fill that need, but first they should figure out what is needed.

  • Is there a dataset of inputs and outputs, with an implicit function that can be set? If so, what is the dimensionality of the data? What can we say about the data?
  • Is there a value function that the business function wants to maximize? If so, does the value function come from a black box, or can we describe the function that yields it?
  • Are we working with sequence data, visual data, or data that’s structured in some other way?

In Functional Decomposition of AGI, I point out that human-like machine intelligence needs to have many different functions being solved within the mind, and the same is true for many successful AI systems like Dreamer v3. Even within a large language model, the multi-layer perceptrons are approximating a different function than the transformer layers.

2) Algorithmic

The computational layer can tell us what needs to be accomplished in engineering, or it can tell us what appears to be happening, when studying an existing organism. But the algorithmic layer is where we investigate how it is done.

In an algorithms class, students are taught what it means for a list to be sorted, but of course what’s more interesting is how the list is sorted. If you ignore concerns about compute time and memory requirements, insertion sort and quick sort do the same thing. There are a large number of algorithms which take in a list and returns a new list with the same elements but in a sorted order.

In classical AI, the same can be said for pathfinding algorithms. A* search and BFS on a quadtree can both be used to find the shortest path in a planar graph, but they have different performance characteristics.

Likewise in machine learning, there may be multiple algorithms that solve the problem you have.

  • Are you classifying data? You could use a neural network with softmax output, or a random forest model, or a support vector machine
  • Are you fitting a function? Linear or logistic regression will work well for simple data, but a neural network works better for high dimensional data. In some cases, K-nearest-neighbors will work best
  • Making an agent that plays a game? Reinforcement learning will work well, but have you considered using a decision transformer?

In each of these cases, the algorithm matters a lot! The choice of algorithm not only affects the compute demands, but it also affects how well the ML system will do. It’s true that logistic regression and KNN both fit a function, but there are cases where one will be much more performant than the other.

But, if you have misunderstood the problem at the Computational level, then tinkering with the Algorthmic level won’t help you very much; you’ll be solving the wrong thing.

3) Implementational

The final level is the details of how the algorithm is actually computed. Marr was working in biological systems where this is often extremely opaque. For example, it is not clear how the brain could easily represent a number, such as the orientation of the head or the speed of forward movement. Fortunately, we’re engineering with computers, which have sensible implementations to help us represent numbers, but it’s still not always straightforward.

In machine learning, we face decisions like:

  • What framework are we using? Torch? JAX? Tensorflow?
  • Is there a natural way to vectorize the data? Where did the tokenizer come from?
  • Is the data being batched? Is it the training distributed?
  • Can the data be quantized within the network?

These decisions are important to get right. And an existing system often doesn’t make sense if they aren’t understood. For example, quantization can significantly reduce the memory footprint of a system.

As an engineer, you can only do so much to optimize the Implementation until you’ve figured out the Computational and Algorithmic layers. In neuroscience, it often feels like the opposite; we can only begin to understand the Algorithmic once we build up an understanding of the Implementational.

Why This is Useful

As an engineer, you will often approach problems like this:

  1. What does the company need? (Computational)
  2. What is the best algorithm to solve it? (Algorithmic)
  3. Spend six months trying to get it to work (Implementational)

And that’s a great process, if the needs of your business are well defined. As of publication, you can earn a quite respectable salary by following this process.

But I know why you really became an AI engineer; you want to create AGI. You want to create an agent which transcends the narrow intelligence of regression. In order to do this, we must colocate multiple Computational functions within the same system. In order to coexist, they must use data structures that match up, which means you need to sort through multiple possibilities at the Algorithmic level in order to match them up. And to make the whole thing run in a semi-performant way, you must understand the Implementational details, to allow those different algorithms to share data in a workable way.

The next time you read a paper, think:

  • What are the inputs and outputs? What is the black box function that is being solved?
  • What algorithm are they proposing to solve it?
  • Do they discuss the implementation? Was it difficult to get the algorithm to run in a performant way?

Posted

in

by

Tags:


Related Posts