François Chollet: The ARC Prize & How We Get to AGI

Full Title

Summary

This podcast episode features François Chollet discussing his evolving definition of AI intelligence, emphasizing the critical distinction between memorized skills and fluid intelligence. He introduces the Abstraction Reasoning Corpus (ARC) benchmarks to challenge AI's adaptive capabilities and advocates for a new paradigm that combines deep learning with discrete program search to achieve AGI capable of genuine invention and accelerated scientific discovery.

Key Points

Early AI progress, driven by scaling up Large Language Models (LLMs) and training data, led to a misconception that general intelligence would spontaneously emerge from increased scale alone, often confusing memorized skills with true intelligence.
The Abstraction Reasoning Corpus (ARC-1) benchmark, designed in 2019, revealed that even vastly scaled LLMs achieved only negligible accuracy (around 10%) compared to human performance (95%+), demonstrating that mere data scaling does not lead to fluid intelligence, which is the ability to understand novel problems.
A significant shift in AI research occurred in 2024 with the adoption of "test-time adaptation," allowing models to dynamically adapt during inference, which led to substantial progress on ARC-1 and showed genuine signs of fluid intelligence beyond preloaded knowledge.
Chollet defines intelligence as the efficiency with which an entity can operationalize past information to deal with novel, uncertain future situations, contrasting this with the Minsky-style view that defines AI by its ability to perform human tasks, arguing that the latter leads only to automation, not invention.
While test-time adaptation advanced AI, ARC-1 became saturated, necessitating ARC-2, a more sophisticated benchmark designed to probe compositional generalization; current AI models still score near 0% on ARC-2, indicating that further advancements beyond current TTA techniques are required for human-level fluid intelligence.
The path to Artificial General Intelligence (AGI) requires integrating two types of abstraction: value-centric (perception, intuition, pattern recognition) and program-centric (reasoning, planning, exact structural matching via discrete search), as human intelligence effectively combines both.
Chollet's new research lab, ENDIA, aims to develop a "programmer-like meta-learner" that leverages deep learning-guided program search to synthesize new models and expand a library of reusable abstractions, thereby accelerating scientific discovery through independent invention and self-improvement.

Conclusion

AI development must move beyond benchmarks that measure static task-specific skills or memorization and instead focus on rigorous tests of fluid intelligence, such as ARC, which assess an AI's ability to adapt and invent in novel situations.

True Artificial General Intelligence (AGI) will emerge from systems that can efficiently acquire and recombine abstractions, integrating both intuitive deep learning and rigorous discrete program search for effective problem-solving and invention.

The ultimate goal of current research, particularly at ENDIA, is to create AI that can accelerate scientific progress by expanding the frontiers of knowledge through autonomous invention and discovery, empowering human researchers.