Apple Is Paying Google $1B/Year to Make Siri an AI Agent. It Still Only Works Two-Thirds of the Time.

Apple just signed a multi-year deal worth roughly $1 billion per year to bring Google Gemini into Siri. The goal is to transform Siri from a voice assistant that sets timers and reads weather forecasts into a real AI agent -- one that can take actions, chain tasks together, and operate across apps on your behalf.

There is just one problem. Apple's own internal testing shows these agentic Siri features work properly about 66% of the time.

Let that number sit for a second. A trillion-dollar company, with the best hardware team on earth, partnering with the company that arguably leads in foundation models, spending a billion dollars a year on the integration -- and the result fails one out of every three attempts.

I am not writing this to dunk on Apple. I am writing this because the Siri situation reveals something important about the state of AI agents that most people are not talking about.

The billion-dollar gap#

Apple has been chasing the agent dream since it acquired Siri in 2010. Sixteen years later, Siri still cannot reliably book a restaurant, manage a multi-step workflow, or handle anything that requires real context awareness across apps. The Gemini deal is an acknowledgment that Apple's in-house models were not going to close that gap on their own.

The deal structure tells you how serious they are. A billion dollars annually is not an experiment. That is a bet-the-product commitment. Apple is essentially outsourcing Siri's brain to Google while keeping the interface, the privacy layer, and the on-device execution on their side.

But the 66% reliability number -- reportedly from Apple's own internal benchmarks -- exposes the fundamental challenge. It is not a model problem. Gemini is an excellent foundation model. It is an architecture problem. Siri is trying to be everything to everyone. It needs to handle voice commands, app integrations across the entire iOS ecosystem, on-device processing, cloud inference, privacy constraints, and multi-step agentic workflows. All at once. For a billion users.

That scope is the enemy of reliability.

A billion dollars for two-thirds reliability

Why general-purpose agents keep failing#

This is not unique to Apple. Every major tech company building a general-purpose AI assistant is hitting the same wall. The pattern is consistent:

Google Assistant went through its own agent pivot and still struggles with multi-step tasks outside of Google's own apps. Alexa had a reported $10 billion cumulative loss before Amazon restructured the division. Cortana was quietly retired entirely. Meta AI keeps pivoting its assistant strategy every six months.

The common thread is not bad engineering. These are some of the best engineering teams in the world. The common thread is that building a single AI agent that does everything well is a fundamentally harder problem than building focused agents that each do one thing reliably.

General-purpose assistants are trying to solve an almost infinite surface area of tasks. Every new app integration, every new capability, every new edge case in natural language adds combinatorial complexity. At 66% reliability, the math works out to roughly a 29% chance that a three-step workflow completes successfully (0.66 cubed). A five-step workflow drops to about 13%.

That is not an agent. That is a coin flip with extra steps.

The focused agent advantage#

The teams getting real results with AI agents right now are not trying to build the next Siri. They are building narrow, purpose-built agents that do specific jobs well.

A freelancer running a lead qualification agent on Telegram that monitors incoming messages and flags high-intent prospects. An agency with a client reporting agent that pulls data from three sources and sends weekly summaries. A recruiter with an agent that screens resumes against job requirements and surfaces the top candidates.

These agents work because their scope is constrained. They do not need to handle every possible user request across every possible app. They need to do one workflow, consistently, without breaking.

This is the approach behind OpenClaw -- the open-source agent framework that has become the default for building personal AI agents. Instead of one monolithic assistant, you deploy small, focused agents that each own a specific task. When an agent only needs to handle one workflow, reliability goes from 66% to something that actually works in production.

The infrastructure gap nobody talks about#

There is another angle to the Apple story that matters for anyone building agents. Apple can afford to throw a billion dollars a year at this problem and iterate until the reliability improves. Most teams cannot.

If you are a small team or solo operator trying to deploy AI agents, you are not choosing between Apple's approach and Google's approach. You are choosing between spending weeks on infrastructure setup -- servers, containers, model hosting, webhook configuration, monitoring -- or actually building the agent logic that creates value.

This is where something like RapidClaw changes the equation. It runs the full OpenClaw framework as a managed service starting at $29/month. You get always-on agents, Telegram integration out of the box, and managed upgrades. No infrastructure to maintain. No DevOps overhead. The time you would spend fighting Docker configurations goes into defining the agent behavior that actually matters.

The point is not that $29/month competes with a billion-dollar Gemini integration. The point is that a focused agent running on managed infrastructure can outperform a general-purpose assistant on any specific task -- because focus wins over breadth every single time.

What happens next#

Apple will improve Siri's reliability. They have the resources, the talent, and now the Gemini partnership to push that 66% number higher. By this time next year, it might be 80%. Maybe 85%. For a general-purpose assistant serving a billion users, that would actually be impressive.

But here is what will not change: the fundamental tradeoff between breadth and reliability. Every new capability Siri gains adds another axis of potential failure. The more it tries to do, the harder it gets to keep everything working.

Meanwhile, the teams running focused agents will keep compounding their advantage. Not because their technology is better than Apple's -- it is not, and pretending otherwise would be dishonest. But because a narrow agent with a 95% success rate on its one job will always outperform a general assistant with a 66% success rate across a thousand jobs.

The billion-dollar lesson from the Apple-Google deal is simple: if you want an AI agent that actually works, stop trying to build one that does everything. Build one that does your thing. Then build another one for the next thing. Repeat.

The future of AI agents is not one super-agent. It is a squad of focused ones.

Frequently asked questions#

Why does Siri only work 66% of the time with Gemini?#

The reliability problem is not about model quality -- Gemini is an excellent foundation model. The issue is architectural complexity. Siri needs to handle voice commands, app integrations across all of iOS, on-device processing, cloud inference, privacy constraints, and multi-step workflows simultaneously. That combinatorial scope makes consistent reliability extremely difficult, even with a billion-dollar budget.

Are focused AI agents better than general-purpose assistants like Siri?#

For any specific task, yes. A focused agent that handles one workflow -- like lead qualification or client reporting -- can achieve 95%+ reliability because its scope is constrained. A general-purpose assistant like Siri tries to cover thousands of use cases, which means each individual task gets lower reliability. The math favors specialization: a 66% success rate per step means a three-step workflow only completes 29% of the time.

How much does Apple pay Google for Gemini in Siri?#

Apple signed a multi-year deal worth roughly $1 billion per year to integrate Google Gemini into Siri. The deal gives Apple access to Gemini as the AI backbone for agentic Siri features, while Apple retains control of the interface, the privacy layer, and on-device execution. This is one of the largest AI licensing agreements publicly reported.

Apple Is Paying Google $1B/Year to Make Siri an AI Agent. It Still Only Works Two-Thirds of the Time.

The billion-dollar gap#

Why general-purpose agents keep failing#

The focused agent advantage#

The infrastructure gap nobody talks about#

What happens next#

Frequently asked questions#

Why does Siri only work 66% of the time with Gemini?#

Are focused AI agents better than general-purpose assistants like Siri?#

How much does Apple pay Google for Gemini in Siri?#

Ready to build your own AI agent?

Related Posts

Google Gemma 4 Runs AI Agents on Your Phone. The Cloud Just Lost Its Moat.

Android Just Got Official AI Agent Tools — What Developers Need to Know

Google's MCP for Android Means Your Phone Apps Talk to AI Now

Stay in the loop