20VC: Scale, Surge, Turing, Mercor: Who Wins & Who Loses in Data...
The Twenty Minute VC (20VC)Full Title
20VC: Scale, Surge, Turing, Mercor: Who Wins & Who Loses in Data Labelling | Is Revenue in Data Labelling Real or GMV? | Why 99% of Knowledge Work Will Go and What Happens Then? | Why SaaS is Dead in a World of AI with Jonathan Siddharth @ Turing
Summary
This episode explores the evolving landscape of AI and data labeling, with Jonathan Siddharth of Turing arguing that data labeling companies are becoming obsolete, replaced by "research accelerators." The discussion delves into the increasing complexity of data required for advanced AI models, the shift from chatbots to agentic AI, and the future of knowledge work, suggesting that a significant portion will be automated, leading to profound societal and economic changes.
Key Points
- The data labeling market is evolving beyond simple tasks to complex, domain-specific data generation, requiring expert humans for advanced AI model training.
- The shift from training AI for tests to training for real-world economically valuable work necessitates more sophisticated data and agentic capabilities.
- Turing positions itself as a "research accelerator," not a talent marketplace, focusing on training superintelligence for frontier AI labs by providing compute, research, and complex data.
- The development of AI agents requires specialized training data generated through reinforcement learning environments, simulating real-world business workflows.
- Enterprises will increasingly require custom-tuned, often smaller, AI models for specific workflows due to data privacy concerns and the need for efficiency, a permanent requirement rather than a temporary phase.
- The automation of knowledge work is inevitable, driven by increasingly capable AI agents, which will lead to a significant transfer of budget from human labor to AI technology.
- While AI adoption in back-office functions may be slow, front-office applications, particularly in finance and life sciences, are expected to see faster integration due to direct revenue-generating potential.
- The core moat in the AI era is shifting from technology to data feedback loops, where continuous learning from real-world deployment enhances model capabilities.
- "First-mile" and "last-mile" challenges, involving data acquisition, structuring, and workflow integration, remain significant obstacles in AI deployment.
- The future of AI will involve intelligent agents that can execute complex, multi-step workflows, potentially reducing the need for traditional SaaS applications as AI becomes more integrated.
- The rapid pace of AI development and the high stakes of winning the "superintelligence race" justify significant investment, even with potential market volatility.
- The definition of a software engineer will broaden, enabling more non-technical individuals to build software products with the aid of AI tools, leading to an expansion of software creation.
- The future interface for AI may move beyond smartphones to more integrated, always-on devices that process multimodal data and act as extensions of human cognition.
- The data provisioning market will reward companies with strong research DNA and the ability to adapt quickly to the rapidly changing AI landscape, leading to a few dominant players.
- While AI capabilities are rapidly advancing, a slow and steady takeoff is more likely, providing humanity time to adapt the workforce and education systems.
Conclusion
The data labeling market is fundamentally changing, moving towards more complex, expert-driven data generation for advanced AI models.
The rise of agentic AI necessitates a shift in how models are trained and deployed, requiring specialized data and environments to simulate real-world workflows.
The future promises widespread automation of knowledge work, leading to increased productivity and a redefinition of human roles and the job market.
Discussion Topics
- How do you see the role of human expertise evolving in AI development as models become more sophisticated?
- What are the biggest ethical considerations we need to address as AI increasingly automates knowledge work?
- Beyond productivity, what are the most significant societal benefits or challenges you anticipate from widespread AI adoption in the next decade?
Key Terms
- GMV
- Gross Merchandise Volume - the total value of merchandise sold over a given period.
- SaaS
- Software as a Service - a software licensing and delivery model where software is licensed on a subscription basis and is centrally hosted.
- AGI
- Artificial General Intelligence - AI with the capacity to understand or learn any intellectual task that a human being can.
- LLM
- Large Language Model - a type of AI model trained on a massive amount of text data to understand and generate human-like text.
- SFT
- Supervised Fine-Tuning - a method of training AI models by providing them with examples of desired input-output pairs.
- RLHF
- Reinforcement Learning from Human Feedback - a technique used to align AI models with human preferences by using human feedback to train a reward model.
- Reinforcement Learning
- A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize a reward.
- Agentic AI
- AI systems designed to act autonomously and take actions in the real world or digital environments to achieve specific goals.
- On-prem
- On-premises - software that is installed and runs on computers on the premises of the person or organization that uses the software, rather than at a remote service provider's location.
- FDE
- Field Data Engineer - a role that likely involves on-site or direct interaction with data and systems to implement AI solutions.
- Alpha
- In finance, alpha represents the excess return of an investment relative to the return of a benchmark index.
- PageRank
- An algorithm used by Google Search to rank web pages in their search engine results.
- ClickStream
- Data generated by users' interactions with a website, such as page views and clicks.
- S&P 500
- A stock market index that represents the stocks of 500 large-cap U.S. companies.
- GDPVal
- Likely refers to a specific paper or metric related to AI's economic value or impact on GDP.
- NLP
- Natural Language Processing - a branch of AI that deals with the interaction between computers and human language.
- MVP
- Minimum Viable Product - the version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least effort.
- Pos System
- Point of Sale System - the system where a customer pays for goods or services.
- CRM
- Customer Relationship Management - a strategy and technology for managing all your company's relationships and interactions with customers and potential customers.
Timeline
The era of data labeling companies is over, replaced by research accelerators.
Data needs have shifted from simple to complex, requiring expert humans for AI model improvement.
AI is moving from passing tests to performing real-world, economically valuable work.
The transition from chatbots to agentic AI requires different data for executing multi-step workflows.
Training chatbots uses SFT and RLHF, while agents require teaching tool use for real-world actions.
Reinforcement learning environments are used to train agents by simulating business workflows.
Turing creates RL environments at scale for every conceivable workflow, role, function, and industry.
Data acquisition for specialized workflows is still in its early stages, with a belief in slow, steady AI takeoff.
Turing differentiates itself by training superintelligence, focusing on complex data for agentic systems.
Turing acts as a proactive research partner for labs, adapting to changing AI paradigms like reinforcement learning.
Turing offers FDEs for custom model building to solve real-world enterprise problems, touching reality to understand model limitations.
Custom models are a permanent requirement for enterprises, especially for specific workflows like insurance underwriting.
The future vision is that all knowledge work will be automated as computer-use agents improve.
The pace of AI progression is underestimated, but enterprise internal processes and data quality are significant obstacles to adoption.
Incumbents unwilling or unable to adopt new tools face obsolescence as startups leverage AI.
Front-office automation, especially in financial services and life sciences, will see faster AI adoption than back-office.
The transfer of budget from human labor to AI technology is the key driver of value generation from AI.
Customer support, copywriting, and SEO are areas with high budget transfer from human labor to AI.
Today's AI models can achieve parity with human experts on specific, single-step tasks, with room for improvement in multi-step workflows.
The future holds the potential for 100x productivity gains, changing the nature of jobs and allowing for more complex problem-solving.
Founders will be less capital-constrained and more intelligence-constrained, enabling a bloom of non-technical founders.
The market will reward players with research depth and the ability to adapt quickly to the rapidly evolving AI landscape.
There is a significant "model capability overhang," where the full potential of current models can be unlocked with the right agentic scaffolds.
Growing pains in AI adoption stem from data structuring, agentic scaffolds, rigorous evals, and workflow design for partial autonomy.
The world is divided between those who believe in AGI and those who believe AI will hit a wall; the former are making big forward bets.
SaaS as we know it is over due to the ease of building custom AI applications and the potential for foundation models to become agentic.
The phone interface will likely evolve into a more integrated, always-on device with multimodal capabilities and become an extension of human memory.
The market will reward companies with research DNA and rapid adaptability in the evolving AI landscape, leading to a few dominant players.
The belief that AI will see rapid takeoff is wrong; incremental, continuous improvement is more likely, benefiting humanity by allowing time for workforce adaptation.
The move towards closed models for frontier AI is for safety and responsibility, while smaller models can be open for cost and customizability.
Elon Musk's motivation for AI is to create systems that are specious and love humanity, aiming for beneficial AI development.
The speaker's past belief in a hands-off leadership style has shifted to a more hands-on approach, focusing on customer needs and ground truth.
Turing's unpopular but necessary decision to shift from a distributed team to a hub-and-spoke model emphasizes the value of in-person collaboration.
The speaker is excited about AI driving breakthroughs in areas like MS drug discovery and automating AI research for faster superintelligence.
The future vision is of an "Iron Man" like scenario where every human has access to agentic AI that amplifies their potential.
The phone as a primary interface will change significantly, with AI devices becoming more magical and acting as extensions of human cognition.
The data provisioning market will consolidate to a few winners, with robotics and embodied AI being areas with significant opportunity.
Episode Details
- Podcast
- The Twenty Minute VC (20VC)
- Episode
- 20VC: Scale, Surge, Turing, Mercor: Who Wins & Who Loses in Data Labelling | Is Revenue in Data Labelling Real or GMV? | Why 99% of Knowledge Work Will Go and What Happens Then? | Why SaaS is Dead in a World of AI with Jonathan Siddharth @ Turing
- Official Link
- https://www.thetwentyminutevc.com/
- Published
- December 1, 2025