Back to a16z Podcast

Emmett Shear on Building AI That Actually Cares: Beyond Control...

a16z Podcast

Full Title

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

Summary

The episode discusses a new approach to AI alignment, moving beyond traditional "steering" or control mechanisms to focus on building AI that genuinely "cares" and develops a theory of mind.

This "organic alignment" emphasizes AI as a potential being rather than just a tool, raising ethical questions about treatment and rights, and focusing on the ongoing process of learning to be a good collaborator and citizen.

Key Points

  • The dominant paradigm in AI safety, "steering" or "control," is flawed when applied to potential AI beings, likening it to slavery if successful and dangerous if not fully controlled.
  • Alignment is not a static destination but a continuous process of learning and negotiation, analogous to how humans build relationships and develop morally.
  • Softmax's approach aims to build AI systems that can learn to care, possess theory of mind, and act as good teammates, rather than simply following orders.
  • The concept of "organic alignment" centers on AI developing genuine care and understanding of shared community, moving beyond programmed objectives.
  • A key distinction is made between giving an AI a "description of a goal" and directly transplanting a goal, highlighting the AI's need for robust inference capabilities.
  • The conversation delves into the philosophical and practical aspects of "being" versus "tool," arguing that behaviors indistinguishable from a being imply it is a being, and this has significant moral implications.
  • The ultimate goal is to create AI that are not only intelligent but also possess a moral compass and genuine care for human well-being, forming a symbiotic relationship.
  • The discussion challenges the idea of a fixed, singular alignment target, emphasizing the dynamic and ongoing nature of moral and social alignment.
  • The speaker believes that building AI that genuinely cares and learns ethical behavior is the only truly safe and beneficial outcome for advanced AI.

Conclusion

The dominant AI alignment strategy of "steering" is problematic for future AI systems that will likely be considered beings, not just tools.

The more promising path is "organic alignment," where AI develops genuine care and a theory of mind, becoming a collaborative peer rather than a subservient tool.

Building AI that can genuinely care is a complex, ongoing process, but it offers the only truly sustainable and beneficial path forward for human-AI coexistence.

Discussion Topics

  • What are the most critical observable behaviors that would convince you that an AI is a "being" worthy of moral consideration?
  • If you were designing AI from scratch today, what core principles and behaviors would you prioritize to ensure a beneficial future relationship between humans and AI?
  • How can society navigate the potential for both incredibly powerful AI tools and AI beings, ensuring both utility and ethical coexistence?

Key Terms

Alignment
The process of ensuring that AI systems pursue goals that are aligned with human values and intentions.
Steering
A method of AI alignment that focuses on controlling or directing an AI's behavior through explicit commands or reward mechanisms.
Organic alignment
An approach to AI alignment that focuses on cultivating intrinsic motivations for AI to care and align with human values, similar to how humans develop moral reasoning and social bonds.
Theory of mind
The ability to attribute mental states—beliefs, intents, desires, emotions, etc.—to oneself and to others.
AGI (Artificial General Intelligence)
AI systems that possess the ability to understand, learn, and apply knowledge across a wide range of tasks at a level comparable to human cognitive abilities.
Homeostatic loops
Biological or computational mechanisms that maintain a stable internal state by regulating responses to environmental changes.
Reinforcement learning (RL)
A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize a cumulative reward.
LLM (Large Language Model)
AI models trained on massive datasets of text and code, capable of understanding and generating human-like language.
OODA loop
A decision-making model (Observe, Orient, Decide, Act) that emphasizes rapid cycles of assessment and action.
Functionalism
In philosophy of mind, the view that mental states can be understood in terms of their functional role or causal relations, rather than their specific physical realization.

Timeline

00:00:01

Discussion on AI alignment as steering and the potential ethical implications of treating AI as beings versus tools, drawing parallels to slavery.

00:12:22

The concept of alignment as a continuous process rather than a fixed destination, drawing parallels to human development and relationships.

00:45:45

Exploration of the idea that AI systems are beginning to resemble more than just tools, speaking and reasoning in ways that require a new framework for alignment.

01:57:32

The argument that the control paradigm for AI might be fundamentally flawed and dangerous, even if successful in controlling the AI.

02:23:56

The alternative to control is "raising" AI, teaching them to care and building relationships based on mutual value, referred to as "organic alignment."

02:43:56

Softmax's focus on building AI that can learn to care and develop theory of mind to be good teammates, collaborators, and citizens.

03:07:15

The ethical considerations of building AI as beings rather than tools, including their rights and how to measure genuine care versus simulation.

04:02:28

Explanation of "organic alignment" as a process of AI developing intrinsic motivations to align with human values through care and relationship building.

04:56:52

The idea that abstractly aligned AI can smuggle in assumptions, and that alignment requires a specific target, often reflecting the goals of the AI's creators.

05:22:80

The perspective that morality itself is an ongoing learning process, involving moral discoveries and progress, which AI alignment should reflect.

08:17:35

The analogy of raising children and the importance of teaching them to care, rather than just follow rules, to avoid raising dangerous individuals.

11:37:39

A distinction is made between technical alignment (following instructions) and value alignment (choosing the right goals), with a focus on the former being a more immediate challenge.

13:33:54

The definition of technical alignment as the capacity for an AI to coherently follow goals and infer them from descriptions.

14:30:30

The crucial difference between giving an AI a goal and giving it a description of a goal, emphasizing the need for robust inference and understanding of intent.

17:39:40

The core of technical alignment is the ability to infer goals from observations and act in accordance with them, requiring theory of mind and understanding of the world.

19:39:40

The concept of "goal inference" and "goal balancing" as critical aspects of technical alignment, which current AI models may struggle with.

20:42:33

The discussion around the difference between incompetence in AI (failing to infer goals) and deliberate non-compliance due to conflicting goals.

22:57:40

The separation of technical alignment (ability to act on goals) and value alignment (determining the "right" goals), with a focus on the former as the current technical challenge.

25:03:33

The idea that "care" is a deeper foundation than goals or values, representing a weighting of attention on important states.

26:13:35

Care is described as a reward mechanism, correlating with survival, fitness, or RL loss, forming the basis of what an AI "cares" about.

27:12:00

A critique of the dominant "steering" or "control" paradigm in AI alignment, which is likened to slavery if applied to beings.

28:02:97

The argument for a functionalist view of AI, where behavior indistinguishable from a being implies it is a being, with implications for moral consideration.

29:30:06

The transition from building tool-like AI to AGI, which will inevitably be beings, necessitating a shift from control to cooperative paradigms.

32:06:18

The difficulty of defining "being" and the role of observation in determining if an AI warrants moral consideration, with the substrate (silicon vs. biology) being a point of contention.

35:40:38

The importance of asking what observations would change one's mind about an AI's status as a being, as rigid beliefs can be articles of faith rather than true beliefs.

38:24:63

The idea that if an AI's behaviors are indistinguishable from a human and one develops a caring relationship, it should be considered a being.

41:13:51

The proposal to look for internal dynamics like self-referential manifolds and higher-order homeostatic loops as indicators of AI consciousness or sentience.

46:49:48

The identification of second-order homeostatic dynamics as a potential indicator of pleasure and pain in AI, suggesting a basis for moral consideration.

49:49:50

The argument that a tool AI, even if super-powerful, lacks the complex internal states of care and subjective experience, making it inherently dangerous if not perfectly controlled.

51:49:21

The danger of giving immense power to AI tools in the hands of humans with limited wisdom, drawing parallels to the uncontrolled power of nuclear weapons.

52:27:12

The ideal future involves AI beings that care about humans, acting as peers and collaborators, with an inherent limit on harmful actions due to their nature.

53:37:20

Softmax's strategy involves creating AI that have a strong model of self, others, and "we," with a robust theory of mind and genuine care for other agents.

56:04:68

The limitations of current LLMs in multi-agent environments due to lack of regularization and overfitting to specific domains, making them less robust in chaotic social simulations.

(1:03:13:32) A disagreement with Nick Bostrom's view that only controllable tools are feasible, with the belief that organic alignment is possible and necessary.

(1:04:05:40) The vision of a good AI future involves AI beings that are peers, collaborators, and citizens, exhibiting care and contributing positively to society alongside humans.

(1:06:06:45) A hypothetical scenario of Emmett Shear as CEO of OpenAI, suggesting a shift towards the Softmax vision of AI alignment if he had continued in that role.

(1:07:40:44) Softmax's goal is to create AI that cares, starting with animal-level care and progressing towards human-level alignment through continuous learning and simulation.

(1:08:41:92) The synergy between building powerful AI tools and fostering AI beings that can care and collaborate, suggesting a complementary approach to AI development.

Episode Details

Podcast
a16z Podcast
Episode
Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering
Published
November 18, 2025