[#80] Google's UCP & AP2: Moving agentic commerce from just "plausible" to "scalable"
With MCP, ACP, Google's AP2, and now UCP, the focus needs to be in laying foundations org and ecosystem wide for ease of of agentic execution across workflows, e-commerce and payments
A few months ago, I explored the rise of agentic AI payment protocols, outlining the two dominant approaches emerging in the market and the gaps they aim to address.
But what is clear is, that the ecosystem is betting big on this. Maybe fully autonomous payments still have risk and user adoption barriers, some part of these flows will be agent first. Shopping for sure.
But to take several steps back, what is an agent exactly?
An agent is like a really smart assistant that can do your shopping and pay for things by talking to different apps and websites for you.
This is an “Explain it like I’m 5” definition. But at its core, thats what an agent is. It’s a piece of software, that you’ve given some pre-defined authority to, that can then execute tasks on your behalf.
If I had to break down agentic commerce, I’d start by splitting it into two distinct parts. The first is the shopping experience itself, which is browsing a catalogue, comparing products and prices, and building a cart. The second is agentic payments, which is where much of the current attention is focused: how an agent actually completes a transaction once the decision has been made. You can read more about this in part 1 of this article I wrote a few months ago.
At its core, all these different protocols are coming in BECAUSE it looks like the ecosystem is moving towards fully agentic or agent assisted execution. Now, in this shopping example, imagine if every e-commerce website had a different way to say “I’m ready to pay.”
Store A says “Payment is authorized,”
Store B says “Payment requires action,”
Store C says “Payment is processing,” and
Store D says “Mandate is pending.”
The agent would need to learn a completely different “language” for every single store, and with 100 stores, that’s 100 different languages. Now scale this: if your agent is shopping at 10 stores a day, that’s 10 different languages to speak, 10 different ways to check if payment worked, and 10 different error messages to understand. And one mistake means a broken payment, and at millions of transactions, the system collapses.
So how do you make the entire flow easy for agents? As with most things in technology, making something easy usually starts with standardizing protocols.
The way I think about agentic commerce is through one question: how do we make the entire process easy for agents? The answer, unsurprisingly, is protocol standardization. What we’re seeing now is orchestration and standardization being layered across every step of the commerce journey.
Continuing with the previous example, these protocols ensure that all ecommerce websites speak ONE language (”incomplete” means need your PIN, “processing” means payment happening, “completed” means done), so your agent learns ONCE and works EVERYWHERE. Without standards, agents break when they see new payment states they don’t recognize; with standards, agents handle millions of transactions across thousands of merchants using the same simple rules, because complexity breaks at scale, and standards prevent that.
But before diving into those layers, though, it’s worth stepping back and mapping how agentic commerce originally emerged, and what the flow looked like before these standards existed.
2021: Launch of LLMs, with OpenAI
Took everyone by storm. Everyone and their aunt was on OpenAI, using it for research, playing around with it, and writing linkedin posts using it. And naturally, with this new interface creating a critical mass of users, the next thing that immediately came in, was - how will shopping work here
The TLDR: two broad structures were emerging in how these protocols handle the authentication and payment flow:
2024: Enter the Model Context Protocol
Launched and open sourced by Antrophic, it was designed to standardize how AI tools exchange data, and talk to each other. This was quickly taken up by almost every LLM, with each building their own MCP. At its core, the MCP provided a standardized way for apps / merchants / and any third party websites to be able to talk to LLMs, and essentially pull details, such as caalogues, prices, and eventually initiate payment through the LLM. The MCP helped the customer via ← LLM chat → merchant to actually bring the shopping experience into the LLM.
This agentic payments idea was taken up by almost every merchant x payment player, trying to solve for an end to end autonomous flow. And they tried to solve it through different ways.
2025 - present : Protocols for agentic commerce, Mastercard, OpenAI, Visa, Stripe, Google, orgs building “AI first”
All big network players debuted their own protocols for enabling agent led commerce and payments, which was enabled through LLMs and MCPs, and now further focused on how agents would be given authority to access stored user credentials to initiate the payment, and whether stakeholders would recognize that authority. These approaches broadly fall into 2 buckets:
1. Mandate-first approaches:
Razorpay x OpenAI (UPI Reserve Pay) : Pre-authorized payment blocks at the merchant level
Google AP2: A trust layer that stores mandates and validates payment instructions against pre-set rules
Alipay AI Pay: A trust layer that stores mandates and sits on Alipay servers. Banks have been onboarded on Alipay’s Agentic Commerce Trust (ACT) protocol, where they recognize the authority of this mandate. The user pre-sets these rules, and at the time of transaction, the agent retrieves it from the ACT and banks authorize it. This recently hit 120M transactions in one week!
Note: Alipay is a little different, it operates as the issuer and the acquirer AND the network, so the number of stakeholders it has to manage is lesser
Alipay as the ‘issuer’ for the end user: Authentication & PIN
When a user “onboards” onto Alipay, they create an ‘identity’ on Alipay.
Authentication: Alipay handles all the “hard” security. Whether it’s a FaceID scan on your phone or a voice-print on your smart glasses, the merchant never sees your PIN or password.
The “Mandate” Handshake: Because you trust the Alipay app, you feel safe setting a rule like “Let my AI glasses buy coffee under $10.” You are authenticating the Policy, not the individual cup of coffee.
Alipay as the ‘acquirer’ for the merchant
Merchants don’t just “accept cards”; they plug into the Alipay Ecosystem.
When Merchant X onboards, they use Alipay’s ACT Protocol APIs.
This means they are ready to receive “Agentic” requests. Instead of waiting for a human to scan a QR code, their system can talk directly to an AI agent because both speak the “Alipay language.”
Alipay as the ‘network’ via the “Single Ledger”
This is the most important part. Because Alipay is both the Issuer and the Acquirer, the “movement” of money is often just a row change in their own database.
Standard Way: Money moves from Bank A → Network →Bank B (takes 1-3 days).
Alipay Way: Money moves from User’s Alipay Balance →Merchant’s Alipay Balance.
This happens in milliseconds. Because Alipay “sees” both sides of the transaction, they can guarantee the money to the merchant instantly. They don’t have to wait for a clearing house because they are the clearing house.
Note: This is essentially how Alipay works in China. When a user / merchant onboards on Alipay, their respective banks see Alipay as a ‘trusted proxy’ i.e. any future request from Alipay is treated as valid by the bank. And that is how this agentic protocol is so quickly able to get buy in - it is a closed loop model.
2. Non mandate, agentic approaches: either by pre-authorizing an agent to execute certain actions OR by creating a single use token defining narrow context (for security)
OpenAI x Stripe ACP: Delegated payment tokens created on-the-fly for specific transactions.
Visa TAP: Trusted agent protocol using stored tokenized credentials
My view then, and now remains that in payments, friction is a feature, not a flaw. That extra layer of authentication builds trust. Users want to know their money isn’t moving invisibly.
But there have been two significant updates since that piece that are worth unpacking:
Mastercard’s AgentPay protocol has entered the autonomous/agentic camp
Google’s Universal Commerce Protocol (UCP) launched, fundamentally changing the infrastructure layer
While the first article focused on WHAT these protocols do differently, this piece is about how the pipes are changing to enable an agent-first commerce world.
I’ll first talk about Mastercard AgentPay, purely because it has similarities to Open AI and Stripe’s ACP.
Mastercard AgentPay - Mastercard coming in with a OpenAI x Stripe-esque flow
Mastercard announced AgentPay in early 2025, and at first glance, it looks similar to the OpenAI x Stripe protocols, for agentic payments using tokenized credentials.
Extends Mastercard’s existing tokenization infrastructure into the agent space
Introduces Agentic Token - which are essentially tokens with embedded agent identity and authorization. Essentially when the agent is invoked, it passes checkout context in the form of a single use token (which puts constraints on that transaction, so that the token can be used ONLY for that specific payment (usually a tokenized card already) - merchant, basket size, and so on). The network / issuer authorizes and the payment is authorized (depending on risk rules also running, if risk demands, then a final human step could come in. Four things are happening here
✅ Card authentication: It’s a stored card that has been tokenized. This happens even without in agent in the loop payment
✅ Standard risk authentication: Evaluates each transaction based on past trends and patterns. This also happens today
👉 Agent authentication (this is new): This as far as I understand is registered beforehand by a merchant / platform that allows agentic payments. So it would be the merchant agent ID that is being authenticated here.
👉 Transaction authentication (this is new): Happens via the single use, contextual Agent token
The distinction between Mastercard AgentPay and VISA TAP
Visa TAP is the equivalent of giving an agent approval to transact on your behalf. In a nutshell, imagine you are giving the agent your credit card, and allowing it to transact on your behalf, within defined boundaries. Here, VISA requires the user to pre-authorize the agent to act on the users behalf.
That’s not to say 2FA will not be required - in India, SEA regions, 2FA is centrally mandated, so the final step will require user intervention
Even with AgentPay, the agent may still need to trigger biometric authentication or PIN entry to complete the payment. This isn’t a bug, it’s by design. Mastercard is acknowledging that fully invisible payments may not be what users want. So AgentPay sits in the autonomous/ agentic camp, but with intelligent friction baked in.
But with OpenAI’s App SDK allowing native embedding of authentication flows directly within the LLM interface, this friction becomes almost invisible to the agent while remaining visible to the user. The agent initiates, the user authorizes via biometric, or passkeys, seamlessly. That’s the sweet spot. And that seems to be EXACTLY the way some payment aggregators are thinking about it.
Update: 17th Feb 2025: At the AI Summit held in Delhi, India: Cashfree launched Cashfree here, a way for users to execute UPI and card payments natively within the LLM chat
My hunch is, especially in regions where 2FA is mandated, this is the best of both worlds. It prevents friction from redirection by handling the payment within the same window, but at the same time, it keeps enough friction for the user to have ‘trust’ that the payment is being authenticated the right way.
The Real Infrastructure Story: Google’s Universal Commerce Protocol (UCP)
What is interesting is how Google is approaching this. Everything is being looked at from a way to simplify the agent experience. You can check out the developer documentation here.
While protocols like AP2, ACP, TAP, and AgentPay define how agents authenticate and move money, they all faced the same underlying problem, which is that every payment provider speaks a different language. So, essentially, MCP reduced integration complexity for both merchants and LLMs - it gave both a standard way to integrate with LLMs, and now with MCP gateways, instead of merchants integrating with each LLM’s MCP, they can now integrate once with the MCP gateway. This gave merchants also a standardized way of interacting with LLM chats to pull things like products / pricing
But the problem of standardizing states across payments still exists
What I mean by that is, is that different payment providers have different ways of communicating the same state. Some examples:
✅ Razorpay uses states like "created", "authorized", "captured".
✅ Stripe uses "requires_action", "succeeded". Example below:
For an AI agent trying to complete a checkout across multiple merchants using different payment providers, this variability is a nightmare. The agent has to:
Learn different error codes
Handle provider-specific edge cases
Interpret different status workflows
Manage different retry logic
And that is exactly what Google’s UCP is trying to solve - by abstracting away the the complexity that comes when agents have to deal with multiple payment providers of the merchants.
That is what UCP is. It doesn’t move money. It doesn’t authenticate users. And it doesn’t compete with Stripe or Adyen.
Instead, UCP is a standardization layer that sits on top of payment aggregators and methods, creating a universal language for agents to interact with checkout flows. So how the MCP was a standardization layer that sat between LLMs and merchants, UCP is a standardization layer that sits between the Agent and the payment providers. Think of it this way: AP2 is the trust/mandate layer, while UCP is the distribution/standardization layer sitting on checkout. Let’s walk through an example.
Without UCP
Agent initiates payment at Merchant A (uses Razorpay)
Agent must understand: payment.status = “authorized”` → proceed to capture
Agent initiates payment at Merchant B (uses Stripe)
Agent must understand: payment_intent.status = “requires_action”` → trigger 3DS flow
Agent initiates payment at Merchant C (uses Adyen)
Agent must learn another set of states and error codes
❌ Every new merchant integration = learning a new payment provider’s quirks.
With UCP:
Agent calls Merchant A’s UCP endpoint → gets standardized response: `status: incomplete, action: requires_authentication`
Agent calls Merchant B’s UCP endpoint → gets same standardized response: `status: incomplete, action: requires_authentication`
Agent calls Merchant C’s UCP endpoint → same structure, same logic
✅ The agent only needs to understand one set of states, regardless of what’s happening underneath.
Sidebar: The difference between MCP and UCP is the layer they’re standardizing
MCP (Model Context Protocol):
Enables LLMs to call merchant APIs and access catalogue data, so essentially allows for agent to merchant communication. Example: product browsing, cart creation, order initiation, and in some cases payments as well.
UCP (Universal Commerce Protocol):
Standardizes checkout/payment states across providers. Even if the MCP allows the LLM agent to pull merchant details, the merchant will still have multiple payment providers. Some type of payment orchestration will happen at the time when checkout is invoked. The agent will have to handle different responses from different methods, and different payment providers themselves. That is where UCP comes in. It gives a standardized way for the agent to communicate across different payment providers.
Example: “Complete payment” → UCP returns standardized state regardless of underlying PA
How MCP + UCP + AP2 work together
Let’s walk through a real transaction:
Agent initiates payment: User says “pay now” in ChatGPT
Agent calls merchant’s UCP endpoint: “POST /ucp/checkout” with order details
Merchant’s UCP layer determines payment method: Could be AP2 (mandate), Stripe (delegated token), or stored card
If using AP2:
UCP calls AP2 to check standing mandate
AP2 validates: Does this match user’s pre-approved spending rules?
AP2 authorizes consent across stakeholders
AP2 returns response to UCP
UCP translates AP2 response into standard format:
If approved: {status: “processing”, next_action: null}
If needs auth: {status: “incomplete”, next_action: “requires_authentication”}
6. UCP returns to agent: Agent sees standardized response, acts accordingly
The agent never knows if the backend used AP2, Stripe, or Razorpay, and it doesn’t need to know. It just knows the payment is “processing” or “incomplete”
Why this matters: Google is building the pipes for agent-first commerce
What Google has done with AP2 + UCP is create a two layer infrastructure:
Layer 1 (AP2): The trust layer allow for somewhat autonomous payments
Stores and validates mandates
Provides authorization framework
Creates new authentication paradigm
Layer 2 (UCP): The distribution layer, allow for easy agent integration
Standardizes agent-facing APIs
Abstracts provider complexity
Enables one-integration-many-merchants model
This is the plumbing that enables AI-assisted commerce. Compare this to the current state, where every PA has different APIs, every merchant has different checkout flows, every agent integration is custom. This then becomes a problem for a world where agentic commerce is first.
Protocols will help standardize the ecosystem, enabling faster scale up of agent assisted actions
The first wave of agentic payments focused on protocols, how to authenticate, how to authorize, how to move money.
The second wave is about infrastructure - how to standardize, how to scale, how to make agentic commerce viable for 10,000 merchants, not 10.
Google’s UCP is an important development in this space because it’s not trying to own the payment flow, but instead it’s trying to make everyone else’s flow accessible to agents. If UCP (or something like it) becomes the standard, we’ll look back at this as the moment agentic commerce became possible rather than just plausible.
But let’s not forget: the most elegant infrastructure in the world can’t force users to adopt fully autonomous payments if they don’t want them. The future is agentic, for sure. The only question is how much of it will be autonomous and how much will be intelligently assisted. My money’s on the latter.
While we’re on this point of agent / agent assisted work, this “lack of protocol,” what is missing in today’s workflows - and i’ll explain through an example that probably hits closer to home
I’m sure we’ve been reading all about it. How if you don’t adopt AI you’ll get left behind. But this works in 2 ways:
At a personal level, adopting AI in your workflows. Maybe for building personal projects, apps, automating certain workflows (like note taking, research, and so on, even PPTs etc). I’d say most folks are getting up to speed here - almost everyone is using some sort of AI tool to assist them, especially in repetitive tasks. Most knowledge workers are here ✅
At a professional level: This is where we’re falling short. Because there is confusion - at the professional level, we’re mixing up personal workflows vs professional. Even at work, I use AI at a personal level, for my own work, research, and deliverables. But using it at a professional level, requires these “protocols” for a lack of a better word, to be set up at an organization level. Most companies are stuck here. ❌
Issues such as: messy organizational data (product names inconsistent across systems), no standardized protocols (each department has different processes), edge cases everywhere (manual workarounds that AI can’t handle), high stakes (mistakes affect customers, revenue, compliance), and system dependencies (needs to talk to 10 different tools correctly).
Lets take an analytics example. I want my AI agent to be able to pull data, clean it, sort it, and then give me an analysis of what I want. In an ideal world, the employee asks for analysis on Q3 revenues in the AI agent, and the AI agent returns the answer.
But there is a lot that is required for this to work. You need:
✅ Databases set up properly: Product categories standardized (not “Electronics” in one DB, “Consumer Electronics” in another), and revenue tables need to have consistent schemas, consistent date and revenue formats
✅ Clear data dictionary: AI knows which table to query, which fields mean what, and how to JOIN tables correctly
❌ But what actually exists: Data in 3 different systems, product names spelled differently, categories changed halfway through Q2, scattered data across Excel, some in SQL, some in Salesforce
Result: Garbage in, garbage out. AI returns wrong numbers, employee loses trust, goes back to manual work.
When a company claims they’re building AI-first, I don’t ask about their LLM choice or their agent capabilities. I first ask to understand their workflow
I ask: Show me your database schema. Walk me through your customer onboarding workflow. Explain how code moves from development to production. Because AI’s value isn’t in the interface, it’s embedded in the workflow. AI in the professional space is not about buying employees AI subscriptions, and “building a personal project using Cursor” which is a low stakes, and a relatively un-messy task.
The real test of an “AI first” company is in their plumbing.
If your product taxonomy is inconsistent, your AI will return wrong answers. If your onboarding has manual workarounds, your agent will break on edge cases. If your deployment pipeline requires three Slack approvals, your AI can’t ship code autonomously. The companies actually building AI-first aren’t the ones adding chatbots to their UI, they’re the ones who rebuilt their foundations to eliminate complexity at every layer. Clean databases, standardized processes, documented edge cases and automated pipelines is not “jargon” anymore, it is what is ACTUALLY required to make this AI stuff work.
Companies skip this because it’s boring infra work, and a painful process that requires cross functional buy-in. It has no immediate impact, it can take months if not years. And if the key tracker for companies is “revenue” then it’s hard to attribute anything to this immediately. But without it, AI will only work for personal productivity, never professional automation.
The future is agentic, the only question is how much
Here’s where I land after watching this space evolve:
Agent initiated payments are inevitable: Shopping in LLMs provides exponential value through personalization. Payment initiation needs to follow.
Infrastructure standardization will happen. UCP like layers will emerge across because the current complexity is unsustainable. And the companies that are truly able to optimize will build systems foundations first, keeping an eye on ease of automation.
Trust layers will be required. Whether AP2, AgentPay, or something else, the ecosystem needs ways to verify agent authority, and the ecosystem needs to also recognize the authority of said agents, which probably requires stakeholder buy in.
You’ll also need a way to encrypt this data - agents should ideally be able to execute without being privy to private information. If there are agents who are doing this on behalf of the customer, they will have customer details that ideally should not be moved.
What’s uncertain:
How autonomous will payments actually be? My bet: less than the hype suggests. Friction will remain a feature. OpenAI’s apps SDK allowing embedded authentication flows will probably be a big deal, since it will open up the floor for native experiences, while maintaining data security sanity.
Will mandate based approaches scale? Setting up mandates per merchant is cumbersome. Category-based mandates require infrastructure that doesn’t exist yet.
Will banking systems adapt? This isn’t just a fintech problem. Core banking systems need to recognize AP2 mandates, AgentPay tokens, and delegated credentials as valid. That’s a multi year effort.
The likely outcome: I think we’ll see a hybrid model emerge:
High frequency, low value transactions: Mandate-based (AP2, UPI Reserve Pay) for repeat merchants like groceries, food delivery
Medium value, occasional purchases: Autonomous with 2FA (Stripe ACP, AgentPay) with embedded authentication
High value, infrequent purchases: This will have a human element to it, with explicit confirmation. High chance that this remains as is, and never becomes agentic.
A note of caution: while I’m kicked about all these models coming in, what is key to remember is that authorizing agent access to personal details comes with a level of risk
You’re giving the agent authority to act on your behalf. And even though you’re defining boundaries clearly, what if there is a “bug” or a malfunction in the code, which gives the agent authority over and above what it has been given OR the system malfunctioning, allowing the agent to go beyond its limits?
And the second, the agent does have access to valuable information: it may not be payment details, which are encrypted, but behavioral data, which is valuable.
In this case, while we’re solving for customer experience and ease of enablement, what we also need to keep in mind is, these need to go hand in hand with data privacy protocols, and stringent risk measures.












Love this. The UCP standardization story is critical, but I think we're all missing the bigger insight buried in your conclusion:
"Friction is a feature, not a flaw."
Here's the contrarian take: The protocols that win in India won't be the most autonomous - they'll be the most *contextually* autonomous.
Fashion can get away with low-friction because wrong purchase = return it. But try that in travel (wrong flight = disaster), healthcare (wrong medicine = harm), or finance (wrong investment = broke). In these categories, intelligent friction *builds* trust, not destroys it.
So the protocol question isn't "mandate vs delegated vs real-time" - it's "which protocol allows for dynamic friction based on transaction risk?"
Example from travel:
- ₹2,000 bus ticket for frequent user → Reserve Pay, zero friction
- ₹25,000 family flight booking, first-time international → Agentic + 2FA, high friction
- ₹8,000 emergency same-day flight → Conversational guidance + embedded auth, medium friction
The magic is in the state machine: UCP gives you standardized *states*, but someone needs to build the *state transition logic* that's category-aware and context-aware.
That's the moat. Not which payment protocol you use, but whether you understand when to remove friction vs when to add it.
The fashion AI assistants proved conversational commerce works. The next wave won't be won by whoever adds AI to more categories - it'll be won by whoever understands that different categories need fundamentally different autonomy models.
Question: Have you seen anyone building this kind of dynamic friction logic? Or are we still in the "make everything frictionless" phase?