Back to a16z Podcast

Columbia CS Professor: Why LLMs Can’t Discover New Science

a16z Podcast

Full Title

Columbia CS Professor: Why LLMs Can’t Discover New Science

Summary

The episode features Vishal Mishra discussing his formal models of Large Language Models (LLMs), explaining their capabilities and limitations, particularly in creating new science.

Mishra argues that current LLMs are adept at navigating existing knowledge (Bayesian manifolds) but cannot create entirely new scientific paradigms, which he defines as the hallmark of Artificial General Intelligence (AGI).

Key Points

  • LLMs, through their training and architecture, reduce complex information into geometric manifolds, allowing them to navigate and predict within these learned spaces.
  • The "entropy" of an LLM's next-token distribution indicates its confidence; low entropy means a narrow set of predictable outcomes, while high entropy suggests more varied possibilities.
  • Prompts with higher "information entropy" (more specific and rare) lead to lower "prediction entropy" (more focused outcomes) in LLMs, guiding their responses.
  • LLMs can exhibit "few-shot" or "in-context learning" by applying their underlying Bayesian reasoning to new examples provided in a prompt, rather than requiring retraining.
  • The development of Retrieval Augmented Generation (RAG) was an accidental solution to query complex databases using natural language, bridging the gap between user queries and structured data.
  • Current LLMs are sophisticated Bayesian reasoners over their training data but cannot generate fundamentally new knowledge or science, which requires going beyond existing paradigms.
  • True AGI, as defined by Mishra, involves creating new manifolds, new science, and new paradigms, exemplified by discoveries like the theory of relativity or quantum mechanics.
  • Recursive self-improvement for LLMs is limited because they primarily operate within the boundaries of their training data; generating truly novel information requires architectural advancements beyond simply processing more data.
  • The current AI research landscape is often too empirical, focusing on tweaking prompts and observing results rather than developing formal models to understand the underlying mechanisms.
  • While LLMs excel at connecting known concepts, they do not invent new mathematics or axioms; creating new scientific understanding requires an architectural leap.

Conclusion

Current LLMs are powerful tools for navigating and interpolating within existing knowledge, but they are not capable of true scientific discovery or AGI without significant architectural advancements.

The focus on empirical observation and prompt engineering in AI research often overshadows the need for formal models to understand the fundamental limitations and capabilities of LLMs.

The path to AGI likely requires new architectures that can generate novel paradigms and scientific understanding, rather than simply scaling up existing LLM models with more data and compute.

Discussion Topics

  • How can we move beyond the current empirical approach in AI research to develop more formal models that understand LLM capabilities and limitations?
  • What are the key architectural advancements needed for LLMs to transition from navigating existing knowledge to creating genuinely new scientific paradigms?
  • Given the limitations of current LLMs in true scientific discovery, what are the most promising research directions for achieving AGI?

Key Terms

Large Language Models (LLMs)
AI models trained on vast amounts of text data, capable of understanding and generating human-like text.
Artificial General Intelligence (AGI)
A hypothetical type of AI that possesses human-level cognitive abilities across a wide range of tasks.
Bayesian manifold
A concept representing the probability distribution of possible outcomes or states within a system, as understood through Bayesian reasoning.
Entropy
In this context, it refers to Shannon entropy, a measure of uncertainty or randomness in a probability distribution. High entropy means more uncertainty; low entropy means more predictability.
Retrieval Augmented Generation (RAG)
A technique that enhances LLM responses by retrieving relevant information from external knowledge bases before generating an answer.
DSL (Domain-Specific Language)
A computer language specialized for a particular application domain.
In-context learning
The ability of LLMs to learn new tasks or adapt their behavior based on examples provided within the prompt itself, without explicit retraining.
Recursive self-improvement
The theoretical concept of an AI system improving its own intelligence or capabilities iteratively.
Multimodal data
Data that combines different types of information, such as text, images, audio, and video.

Timeline

00:00:00

LLMs reduce complex information into geometric manifolds for navigation and prediction.

00:04:50

Entropy of an LLM's next-token distribution indicates confidence and predictability.

00:06:08

Prompts with high information entropy lead to low prediction entropy in LLMs.

00:13:37

RAG was an accidental solution to query complex databases using natural language.

00:19:44

LLMs are sophisticated Bayesian reasoners but cannot create new scientific paradigms.

00:28:21

LLMs' recursive self-improvement is limited by their training data, needing architectural advances for true novelty.

00:33:45

AGI is defined by creating new manifolds and scientific discoveries, not just navigating existing ones.

00:44:55

The most training data and structure for LLMs is in coding, yet they still hallucinate.

00:47:13

LLMs are becoming better at navigating existing manifolds but not creating new ones.

00:48:10

Future research focuses on the architectural leaps needed to create new manifolds and utilize multimodal data.

Episode Details

Podcast
a16z Podcast
Episode
Columbia CS Professor: Why LLMs Can’t Discover New Science
Published
October 13, 2025