Back to a16z Podcast

AI Agents and the Fight for Customer Data

a16z Podcast

Full Title

AI Agents and the Fight for Customer Data

Summary

The episode discusses how the rise of AI agents is fundamentally changing the need for centralized data infrastructure, shifting its purpose from business intelligence to providing context for AI.

It also explores the pushback from some SaaS vendors attempting to lock down customer data and the broader implications for data access and the future of enterprise software.

Key Points

  • The core need for AI agents is context, driving a renewed emphasis on consolidating data from various sources into a single location.
  • Historically, companies centralized data for business intelligence and reporting, but now the same infrastructure is being repurposed for AI agents.
  • Some SaaS vendors are reacting to AI by restricting data access, aiming to force users into their proprietary AI tools, as exemplified by SAP's policy changes.
  • The concern for SaaS companies is that AI agents could bypass their interfaces, reducing the perceived value of their applications, though this is debated.
  • The historical precedent of open APIs in the 90s suggests that attempts to lock down data may ultimately be unsuccessful and detrimental to vendors.
  • Data gravity, the idea that large data sets are too expensive to move, is challenged as an outdated concept, with efficient data replication being achievable.
  • The emergence of AI agents is creating new opportunities for data infrastructure companies and necessitates a rethinking of enterprise software architecture.
  • The merger of Fivetran and DBT is presented as a strategic move to provide a more comprehensive data solution, combining data ingestion with data organization and modeling.
  • The development of AI agents is leading to new ways of managing workflows, with discussions around agents having their own identities and integrating into teams.
  • The future of AI agents will likely involve more direct API usage rather than solely relying on user interfaces, with specialized tools and platforms emerging to support this.
  • There's a debate about whether AI will commoditize infrastructure or lead to more complex software creation, with the current trend suggesting increased demand for infrastructure.

Conclusion

AI agents are fundamentally changing data infrastructure needs, shifting the focus to context and enabling new use cases beyond traditional BI.

Vendors attempting to lock down data access will likely face pushback from customers who need open access for AI and business intelligence.

The integration of data ingestion (Fivetran) and data transformation (DBT) creates a powerful, comprehensive platform for managing data in the AI era.

Discussion Topics

  • How are AI agents changing the fundamental requirements for data infrastructure, and what new opportunities does this create?
  • What are the long-term implications of vendors attempting to restrict customer data access in the age of AI, and how can companies best navigate this?
  • With the rise of AI coding agents, how will the development and maintenance of software, particularly data pipelines and SaaS applications, evolve?

Key Terms

AI Agents
Software programs that can perform tasks autonomously using artificial intelligence, often requiring context from data.
SaaS
Software as a Service, a software distribution model where a third-party provider hosts applications and makes them available to customers over the Internet.
API
Application Programming Interface, a set of rules and protocols that allows different software applications to communicate with each other.
Data Lake
A centralized repository that allows you to store all your structured and unstructured data at any scale.
Data Gravity
The concept that large data sets become increasingly difficult and expensive to move, influencing where data processing and analytics occur.
Egress Charges
Fees charged by cloud providers for data transferred out of their network.
Change Data Capture (CDC)
A set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data.
OLTP
Online Transaction Processing, a type of data processing that supports transaction-oriented applications, typically involving many short online transactions.
Postgres
PostgreSQL, a powerful, open-source object-relational database system.
Technical Debt
The implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better approach that would take longer.
M&A
Mergers and Acquisitions, the consolidation of companies or assets through various types of financial transactions.
DBT
Data Build Tool, an open-source command-line interface that enables data analysts and engineers to transform data in their warehouse more effectively.
Databricks
A unified data analytics platform that offers services for data engineering, machine learning, and data warehousing.
Snowflake
A cloud-based data warehousing company that provides a platform as a service (PaaS) for data storage and analysis.
BigQuery
A serverless, highly scalable, and cost-effective multi-cloud data warehouse designed for business agility.
ICEBERG
Apache Iceberg, an open table format for large analytic datasets.
MCP
(Likely refers to) Middleware Component Platform, a set of services that enable the integration and interoperability of different software applications and components.
LLM
Large Language Model, a type of AI model that can understand and generate human-like text.

Timeline

00:00:05

AI agents require context, making centralized data crucial for their effectiveness.

00:00:20

Discussion on the potential "SaaS apocalypse" and the threat of AI-native companies.

00:00:54

Companies are now building data infrastructure for AI agents, not just business intelligence.

00:01:20

Introduction of George Frazier, CEO of Fivetran, and the Fivetran/DBT merger announcement.

00:01:45

Fivetran's core function: consolidating data from various systems into one place.

00:02:35

The emergence of AI agents as a new, significant driver for centralized data.

00:03:05

Industry shifts and vendor approaches to data in the context of AI.

00:03:49

Historical need for data centralization vs. the new reality of AI agents.

00:04:09

The reaction of some SaaS vendors to AI, including locking down data access.

00:04:37

SAP's recent API policy banning AI agent access highlights vendor concerns.

00:05:08

Clarification on data access for agents: context for business questions, not for training models.

00:05:17

Concern that agents could devalue SaaS interfaces by accessing data directly.

00:06:00

Discussion on whether agents will truly make SaaS systems less valuable or if it's an overblown concern.

00:06:26

AI agents may require fewer individual user identities, focusing on roles.

00:06:58

The historical importance of open APIs for SaaS longevity and the current challenge AI agents pose.

00:07:40

Argument that agents are not fundamentally different from custom software in disintermediating SaaS.

00:08:01

The possibility that the current AI agent threat is "much ado about nothing."

00:08:14

Reiteration that many AI-driven threats are not new, echoing past debates about APIs.

00:08:48

The value of SaaS interfaces may persist regardless of whether humans or agents are consuming them.

00:09:00

Software costs are often immaterial compared to overall business spend, making seat reduction unlikely as a primary AI driver.

00:09:40

AI labs themselves still rely on SaaS tools, suggesting their long-term utility.

00:10:09

The negative impact of API lockdown policies on customers.

00:10:27

Recommendation for companies to manage data access challenges from vendors.

00:10:38

Why API restrictions are bad for customers: hindering reporting and AI agent functionality.

00:11:09

Analogy to early ChatGPT: AI without real-time business data is like having a knowledge cut-off.

00:11:18

The critical importance of companies creating their own data platforms.

00:11:42

Fivetran's "Open Data Infrastructure" benchmark for evaluating vendor data access policies.

00:12:36

Discussion on the worst offenders for restrictive data access policies.

00:13:19

SAP's historical issues with data access, with some internal shifts noted.

00:13:21

Salesforce's generally good data access policies, with exceptions like Slack.

00:13:34

Hope that vendors will move back towards data openness.

00:13:50

Prediction that the trend will revert to greater data openness, similar to past API evolutions.

00:15:11

Discussion on the concept of "data gravity" and whether it's overrated or not real.

00:15:37

Argument that data gravity is "completely fake" due to efficient data replication.

00:15:47

Definition of data gravity: the idea that data is too expensive to move due to egress charges.

00:16:03

Evidence against data gravity from Fivetran's low egress costs despite replicating large datasets.

00:17:04

The role of change data capture (CDC) in minimizing data movement costs.

00:17:13

How "dumb" data pipelines contribute to the perception of data gravity.

00:18:03

Advice for CIOs: insist on owning a copy of company data in a controlled data lake.

00:18:41

The importance of fighting for data access and embedding it in vendor contracts.

00:19:14

Model language for MSAs is available on opendatainfrastructure.com.

00:19:32

Discussion on the evolution of AI agents: from software to personal agents.

00:20:11

The trend towards agents having their own identities, emails, and phone numbers.

00:20:50

The idea of an "HR for AI" to onboard and manage AI agents within a company.

00:21:13

In this future, agents might represent more "seats" or consumption of software.

00:21:28

Questioning whether treating agents like humans is the right approach.

00:21:32

The "intermediate form" of agents integrating into existing workflows.

00:22:03

Fivetran's internal AI agent for responding to support tickets.

00:22:37

The possibility of a "closed loop" system where agents operate autonomously.

00:22:44

The rationale for using dedicated hardware (like a Mac Mini) for agents, enabling specific functionalities.

00:23:04

The shift towards "headless" operations vs. the continued utility of traditional UIs.

00:23:19

Browsers offer full functionality for agents trained on human data, despite potential inefficiencies.

00:24:00

The limitations of browser automation (speed, token consumption) and the benefits of direct API/CLI usage.

00:24:36

Fivetran's development of a Salesforce administration agent using its CLI.

00:24:51

The Salesforce CLI is comprehensive and agents can already use it effectively.

00:25:10

Prediction that most user agents in five years will use existing human interfaces due to solved integration challenges.

00:25:30

The future of agent interaction will likely lean towards API usage over traditional UIs.

00:25:37

Discussion on mediation technologies like MCPs for AI agents.

00:26:01

MCPs solve important problems like authentication and discoverability for AI agent tool usage.

00:26:32

The ecosystem of AI tools is growing around MCPs, making them practically necessary.

00:27:04

The possibility that smarter models could eventually build their own tools, bypassing intermediate layers.

00:27:33

The massive investment in foundation models suggests they could become the primary drivers of tool creation.

00:27:53

Nanobot's approach to AI agent development: forkable, customizable, and with separation of concerns.

00:28:33

The ongoing challenge of complexity in AI agent development.

00:28:49

Satya Nadella's prediction of a "collapse of SaaS" and the potential for AI to drive this shift.

00:29:07

Debate on whether a massive SaaS shift driven by AI agents is imminent or overstated.

00:29:23

The public markets accurately reflect increased uncertainty for SaaS companies due to AI.

00:29:46

The primary threat is not necessarily the collapse of SaaS categories, but AI-native companies.

00:30:00

AI-native companies can more easily build and potentially improve upon existing SaaS offerings.

00:30:12

Evidence suggests traditional companies are accelerating, not slowing down, due to AI adoption.

00:30:41

The threat of AI is real, but companies that adapt will thrive.

00:31:08

The long-term trend of net dollar retention is a key indicator for SaaS viability.

00:31:37

The idea that AI might enable DIY solutions for data replication, posing a threat to Fivetran.

00:32:00

Fivetran is leveraging AI to improve its core business of data replication connectors.

00:32:44

AI's current limitations in handling the full complexity of data replication.

00:33:09

The opportunity for AI to significantly improve data replication quality and reliability.

00:34:50

AI-native companies like OpenAI and Anthropic use Fivetran similarly to traditional enterprises.

00:35:11

AI companies use Fivetran for data replication into data lakes for analytics and AI workflows.

00:35:51

A crucial message: existing modern data platforms are suitable foundations for AI.

00:36:11

Examples of suitable data platforms include Snowflake, Databricks, and BigQuery.

00:36:25

The idea that AI commoditizes infrastructure is questioned; instead, it increases demand.

00:37:10

The difficulty of building robust software is not AI's strength, but its power is still unknown.

00:37:54

AI is creating more demand for infrastructure rather than commoditizing it.

00:38:10

The consumption layer of software is most threatened by AI's ability to navigate lower levels.

00:38:34

AI's ability to handle more complex infrastructure may reduce the need for highly user-friendly layers.

00:39:10

Discussion on Fivetran's acquisition strategy, including Census and DBT Labs.

00:40:03

Fivetran's acquisition strategy is driven by strong, unique reasons rather than a dedicated corp dev function.

00:40:53

The Fivetran and DBT merger is a natural fit, as they are historically used together.

00:41:06

Fivetran ingests data, and DBT organizes and models it for data consumers.

00:41:17

DBT is expected to benefit significantly from AI coding agents, leading to more widespread use.

00:41:45

SQL queries in DBT projects are a form of human communication and executable documentation of business rules.

00:42:24

George Frazier's personal experimentation with AI and coding agents.

00:43:33

A personal project: a data lake catalog aiming for invisibility.

00:43:53

An experimental OLTP SQL database built with AI coding, using S3 as a backend.

00:45:00

A controversial take: Postgres is outdated and not a good database due to technical debt.

00:45:42

The potential for undergraduate projects to create better database storage engines than Postgres.

00:46:11

The allure and potential distraction of deep technical exploration with AI.

00:47:00

Frazier's regular mental exercise: pretending to be a new CEO brought in to fix Fivetran.

00:48:05

The DBT merger was a clear decision based on that CEO simulation framework.

00:48:27

Frazier's biggest accidental worry: AI coding agents becoming so good that customers build their own data connectors (DIY).

00:49:14

The biggest opportunity: AI enabling new and greater needs for data consolidation and organization.

00:49:37

Fivetran and DBT are well-positioned to provide the tools needed on the other side of data platforms.

Episode Details

Podcast
a16z Podcast
Episode
AI Agents and the Fight for Customer Data
Published
June 5, 2026