Back to Y Combinator Startup Podcast

Beyond Bigger Models: Recursion As The Next Scaling Law In AI...

Y Combinator Startup Podcast

Full Title

Beyond Bigger Models: Recursion As The Next Scaling Law In AI

Summary

This episode explores the concept of recursion as a next scaling law in AI, moving beyond simply increasing model size. It discusses two papers, Hierarchical Reasoning Models (HRN) and Tiny Recursive Models (TRM), highlighting how recursive approaches can improve reasoning capabilities.

Key Points

  • Traditional Recurrent Neural Networks (RNNs) faced limitations due to backpropagation through time, causing issues with vanishing/exploding gradients and accumulation errors as models grew, hindering progress towards AGI.
  • Large Language Models (LLMs) currently operate with a one-shot feed-forward process at inference, making them less efficient for complex reasoning tasks that require iterative steps or memory compression, unlike RNNs which compress information in a hidden state.
  • The Hierarchical Reasoning Model (HRN) paper demonstrates that recursive structures, inspired by the brain's multi-frequency processing, can achieve state-of-the-art results on tasks like ArcPrize with significantly fewer parameters by using fixed-point iteration instead of backpropagation through all recursion steps.
  • The Tiny Recursive Model (TRM) paper further refines the HRN approach by simplifying the architecture, demonstrating that even smaller recursive models can outperform larger LLMs on certain tasks, and emphasizes the efficiency of recursion over parameter count for complex reasoning.
  • Both HRN and TRM leverage a recursive approach akin to expectation-maximization, iteratively updating hidden states (ZL and ZH) to refine solutions, a process that allows for learning without explicit step-by-step human guidance or chain-of-thought prompting.
  • The success of these recursive models suggests a potential shift in AI development, moving away from solely scaling up model size and towards incorporating more efficient recursive reasoning mechanisms to tackle complex problems.

Conclusion

Recursion is a vital scaling law for AI, offering greater efficiency and reasoning capabilities than simply increasing model size.

The development of Hierarchical Reasoning Models (HRN) and Tiny Recursive Models (TRM) demonstrates the power of recursive approaches in solving complex problems with significantly fewer parameters.

The future of AI likely lies in combining the strengths of these recursive models with the vast knowledge base of larger models for even greater advancements.

Discussion Topics

  • How can the insights from HRN and TRM be applied to current LLM architectures to improve their reasoning capabilities?
  • What are the potential ethical implications of developing highly efficient, small recursive AI models that can outperform larger, more general-purpose models?
  • Beyond specific tasks like Sudoku or sorting, what other complex problem domains could benefit most from the recursive approach demonstrated by these models?

Key Terms

RNN
Recurrent Neural Network; a type of neural network where connections between nodes form a directed graph along a temporal sequence.
LLM
Large Language Model; a type of AI model trained on vast amounts of text data to understand and generate human-like language.
AGI
Artificial General Intelligence; a hypothetical type of intelligence that can understand or learn any intellectual task that a human being can.
Backpropagation Through Time (BPTT)
An algorithm used to train RNNs by unfolding the network through time and applying the backpropagation algorithm.
Gradient
In machine learning, a gradient is a vector of partial derivatives that indicates the direction of steepest ascent of a function.
Chain of Thought (CoT)
A prompting technique that encourages LLMs to generate intermediate reasoning steps before providing a final answer, improving performance on complex tasks.
Deep Equilibrium Models (DEQ)
A class of neural networks that uses fixed-point iteration to model deep networks, effectively treating them as a single layer.
Latent Space
A compressed representation of data learned by a model, where similar data points are located close to each other.

Timeline

00:01:28

The limiting step on RNNs was single backprop through time, leading to accumulation errors and noisy gradients, causing vanishing or splitting gradient problems.

00:02:34

LLMs, while appearing similar, use a one-shot feed-forward process at training time, avoiding the activation storage and vanishing gradient problems of RNNs due to causal masking.

00:03:24

LLMs lack inherent reasoning and temporal compression, requiring the retention of entire input sequences for each decode, unlike RNNs which compress information into a hidden state.

00:04:08

LLMs struggle with tasks requiring iterative steps and explicit reasoning, such as sorting, due to their one-shot, feed-forward nature and inability to perform more than a fixed number of comparison steps.

00:06:12

Current LLMs are essentially feed-forward models, limited by their number of layers for complex tasks, unlike Turing machines that can leverage external memory and iterative steps.

00:07:38

The HRN model is in the lineage of RNNs, incorporating hierarchical processing inspired by the brain's different operating frequencies and using iterative refinement steps.

00:11:22

HRN circumvents the backpropagation through time issue by using a fixed-point iteration method, similar to deep equilibrium models, allowing for gradient updates without backpropagating through all recursion steps.

00:14:24

Bioplausibility in AI research can inspire ideas, but practical ML advancements often move beyond strict biological parallels to what works best computationally, like on GPUs.

00:16:45

A more compelling perspective than bioplausibility for these models is their connection to automata theory and fundamental data structures, where access to a memory cache is crucial for efficient algorithms.

00:17:27

Chain of thought and tool use in LLMs are "hacks" to overcome limitations, relying on human knowledge rather than true inherent recursive discovery, and still operate within the token space output.

00:20:32

Training RNNs in continuous latent space is computationally expensive and hindered by backpropagation through time, making methods that avoid this crucial for efficiency and performance.

00:22:05

The TRM paper simplifies the HRN architecture, focusing on weight sharing and a more efficient recursion approach, achieving better performance with fewer parameters.

00:23:56

TRM's key changes include collapsing the dual-network architecture into a single "net" with weight sharing and simplifying the backpropagation through time by only going back one full latent recursion step.

00:35:01

Recursion is identified as a crucial element for AI research going forward, with added refinement loops and tiny recursive models showing significant promise.

Episode Details

Podcast
Y Combinator Startup Podcast
Episode
Beyond Bigger Models: Recursion As The Next Scaling Law In AI
Published
May 1, 2026