Back to Y Combinator Startup Podcast

François Chollet: The ARC Prize & How We Get to AGI

Y Combinator Startup Podcast

Full Title

François Chollet: The ARC Prize & How We Get to AGI

Summary

This podcast episode features François Chollet discussing his evolving definition of AI intelligence, emphasizing the critical distinction between memorized skills and fluid intelligence. He introduces the Abstraction Reasoning Corpus (ARC) benchmarks to challenge AI's adaptive capabilities and advocates for a new paradigm that combines deep learning with discrete program search to achieve AGI capable of genuine invention and accelerated scientific discovery.

Key Points

  • Early AI progress, driven by scaling up Large Language Models (LLMs) and training data, led to a misconception that general intelligence would spontaneously emerge from increased scale alone, often confusing memorized skills with true intelligence.
  • The Abstraction Reasoning Corpus (ARC-1) benchmark, designed in 2019, revealed that even vastly scaled LLMs achieved only negligible accuracy (around 10%) compared to human performance (95%+), demonstrating that mere data scaling does not lead to fluid intelligence, which is the ability to understand novel problems.
  • A significant shift in AI research occurred in 2024 with the adoption of "test-time adaptation," allowing models to dynamically adapt during inference, which led to substantial progress on ARC-1 and showed genuine signs of fluid intelligence beyond preloaded knowledge.
  • Chollet defines intelligence as the efficiency with which an entity can operationalize past information to deal with novel, uncertain future situations, contrasting this with the Minsky-style view that defines AI by its ability to perform human tasks, arguing that the latter leads only to automation, not invention.
  • While test-time adaptation advanced AI, ARC-1 became saturated, necessitating ARC-2, a more sophisticated benchmark designed to probe compositional generalization; current AI models still score near 0% on ARC-2, indicating that further advancements beyond current TTA techniques are required for human-level fluid intelligence.
  • The path to Artificial General Intelligence (AGI) requires integrating two types of abstraction: value-centric (perception, intuition, pattern recognition) and program-centric (reasoning, planning, exact structural matching via discrete search), as human intelligence effectively combines both.
  • Chollet's new research lab, ENDIA, aims to develop a "programmer-like meta-learner" that leverages deep learning-guided program search to synthesize new models and expand a library of reusable abstractions, thereby accelerating scientific discovery through independent invention and self-improvement.

Conclusion

AI development must move beyond benchmarks that measure static task-specific skills or memorization and instead focus on rigorous tests of fluid intelligence, such as ARC, which assess an AI's ability to adapt and invent in novel situations.

True Artificial General Intelligence (AGI) will emerge from systems that can efficiently acquire and recombine abstractions, integrating both intuitive deep learning and rigorous discrete program search for effective problem-solving and invention.

The ultimate goal of current research, particularly at ENDIA, is to create AI that can accelerate scientific progress by expanding the frontiers of knowledge through autonomous invention and discovery, empowering human researchers.

Discussion Topics

  • How might the pursuit of "fluid intelligence" in AI, as opposed to mere automation, impact different industries and daily life in the coming decade?
  • What are the ethical implications of developing AI systems capable of "autonomous invention," and how should society prepare for such a future?
  • Considering the challenges presented by ARC-2 and ARC-3, what specific types of human intelligence do you believe are still the most difficult for AI to replicate, and why?

Key Terms

AGI
Artificial General Intelligence: A hypothetical AI that possesses human-like cognitive abilities across a wide range of tasks, capable of understanding, learning, and applying intelligence to any intellectual task that a human can.
LLM
Large Language Model: A type of artificial intelligence program designed to understand, generate, and process human language based on vast amounts of text data.
GPU
Graphics Processing Unit: A specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images, also highly effective for parallel processing tasks essential for training AI models.
Abstraction Reasoning Corpus (ARC)
A benchmark developed by François Chollet to measure an AI's fluid intelligence—its ability to solve novel problems by applying general reasoning principles rather than memorized skills.
Test-Time Adaptation (TTA)
An AI paradigm where a pre-trained model can modify its internal state or behavior dynamically during the inference phase to adapt to specific, new data or tasks it encounters.
Deep Learning
A subset of machine learning that employs multi-layered artificial neural networks to learn and make decisions from data, excelling in pattern recognition and perception tasks.
Program Synthesis
The automatic generation of a computer program from a high-level specification or example, typically involving searching through a space of possible programs.
Compositional Generalization
The ability of an AI system to understand and generate novel combinations of familiar components, allowing it to solve problems that are structurally different from its training data.
Meta-learner
An AI system that is designed to learn how to learn, capable of adapting its learning process based on experience to become more efficient at acquiring new skills or knowledge.

Timeline

00:00:28

The dominant AI paradigm became scaling up LLM training, with many extrapolating that more scale was all that was needed to solve everything and achieve AGI.

00:01:02

François released ARC-1 in 2019 to highlight the difference between memorized skills and general intelligence, noting that even with 50,000x scale-up, LLMs only reached about 10% accuracy on ARC-1, whereas humans score above 95%.

00:01:30

In 2024, the AI research community pivoted to test-time adaptation, enabling models to change their state at test time and adapt to new situations, leading to significant progress on ARC.

00:02:49

Chollet contrasts the Minsky-style view of AI (performing human tasks) with the MacArthur view (handling unforeseen problems), defining intelligence as a process of dealing with new situations and cautioning against the "shortcut rule" in AI benchmarking, where optimizing for a metric misses the true goal.

00:07:29

ARC-1 is now saturating below human-level fluid intelligence, leading to the development of ARC-2, which focuses on compositional generalization and where current advanced LLMs still score 0%, indicating a continued gap in genuine intelligence.

00:13:08

Chollet explains two types of abstraction: value-centric (continuous domain, perception) and program-centric (discrete domain, structural matching, reasoning), arguing that AGI requires combining both, leveraging deep learning for intuition and discrete program search for invention.

00:17:13

ENDIA, Chollet's new research lab, is focusing on leveraging deep learning-guided program search to build programmer-like meta-learners capable of independent invention and discovery, with ARC-3 as the next milestone to test agency and interactive learning efficiency.

Episode Details

Podcast
Y Combinator Startup Podcast
Episode
François Chollet: The ARC Prize & How We Get to AGI
Published
July 9, 2025