Columbia CS Professor: Why LLMs Can’t Discover New Science

Full Title

Summary

The episode features Vishal Mishra discussing his formal models of Large Language Models (LLMs), explaining their capabilities and limitations, particularly in creating new science.

Mishra argues that current LLMs are adept at navigating existing knowledge (Bayesian manifolds) but cannot create entirely new scientific paradigms, which he defines as the hallmark of Artificial General Intelligence (AGI).

Key Points

LLMs, through their training and architecture, reduce complex information into geometric manifolds, allowing them to navigate and predict within these learned spaces.
The "entropy" of an LLM's next-token distribution indicates its confidence; low entropy means a narrow set of predictable outcomes, while high entropy suggests more varied possibilities.
Prompts with higher "information entropy" (more specific and rare) lead to lower "prediction entropy" (more focused outcomes) in LLMs, guiding their responses.
LLMs can exhibit "few-shot" or "in-context learning" by applying their underlying Bayesian reasoning to new examples provided in a prompt, rather than requiring retraining.
The development of Retrieval Augmented Generation (RAG) was an accidental solution to query complex databases using natural language, bridging the gap between user queries and structured data.
Current LLMs are sophisticated Bayesian reasoners over their training data but cannot generate fundamentally new knowledge or science, which requires going beyond existing paradigms.
True AGI, as defined by Mishra, involves creating new manifolds, new science, and new paradigms, exemplified by discoveries like the theory of relativity or quantum mechanics.
Recursive self-improvement for LLMs is limited because they primarily operate within the boundaries of their training data; generating truly novel information requires architectural advancements beyond simply processing more data.
The current AI research landscape is often too empirical, focusing on tweaking prompts and observing results rather than developing formal models to understand the underlying mechanisms.
While LLMs excel at connecting known concepts, they do not invent new mathematics or axioms; creating new scientific understanding requires an architectural leap.

Conclusion

Current LLMs are powerful tools for navigating and interpolating within existing knowledge, but they are not capable of true scientific discovery or AGI without significant architectural advancements.

The focus on empirical observation and prompt engineering in AI research often overshadows the need for formal models to understand the fundamental limitations and capabilities of LLMs.

The path to AGI likely requires new architectures that can generate novel paradigms and scientific understanding, rather than simply scaling up existing LLM models with more data and compute.