Build, Buy, or Partner: The AI Agent Decision Just Changed

For about a year, the hardest part of shipping an AI agent was everything that wasn't the model. The sandbox it runs in. The memory that survives across sessions. The eval harness that tells you whether the thing is getting better or worse. The orchestration that lets one agent hand work to another without the whole system collapsing into a loop. That was the work. For a lot of teams, that was the moat.

In roughly six weeks this spring, three vendors turned most of it into a line item on a pricing page.

Anthropic's Managed Agents, which launched in April, picked up three new capabilities in a single May update: cross-session memory that lets agents review past sessions and self-improve, rubric-based grading that scores an agent's output against success criteria you define, and lead-agent orchestration that decomposes a task and farms it out to specialized subagents. Two weeks later, Google used I/O 2026 to ship a Managed Agents API that spins up an agent in a single call, inside an isolated, ephemeral Linux environment, with tool use, code execution, file handling, and web browsing already wired in. OpenAI had pushed its Agents SDK in the same direction the month before.

Sandboxing, long-running sessions, persistent memory, evals, orchestration: the things that used to take a quarter to build and another quarter to harden are now commodities you rent by the token. That's good news. It also quietly rewrote a decision most teams are about to get wrong, because most teams frame it with one option missing.

The Frame Is Broken Before You Start

"Build versus buy" sounds like a coin flip. It never was, and it especially isn't now.

The framing has two problems. First, it pretends the two paths are pure. In practice almost nobody builds everything or buys everything. They buy a model, build a workflow, buy an observability tool, build a set of evals, and end up somewhere in the middle whether they planned to or not. The interesting question was never "build or buy." It was "which parts do we build, and which parts do we rent."

Second, and this is the bigger one, the frame leaves out a third option that usually determines whether the other two go well: partner. Not "outsource the whole thing to an agency." Partner in the sense of bringing in someone who has already run these failures, to help you draw the build-versus-buy line in the right place and stand up the parts that are easy to get wrong. The teams that skip this leg are the ones who discover, eighteen months and a few hundred thousand dollars later, that they custom-built a sandbox three vendors now ship as a standard feature.

So the real decision has three legs. The managed-agent launches this spring didn't simplify it. They sharpened it, by moving the line between what's worth building and what's worth renting.

What Just Became a Commodity, and What Didn't

Before you can decide what to build, you have to be clear-eyed about what's still yours to build.

Here is what the managed-agent platforms now do for you, off the shelf: every item on that list above. The sandbox, the long-running sessions, the persistent memory, the eval rubrics, the lead-agent orchestration. If your engineering plan still has tickets for any of those as net-new infrastructure, you are planning to spend money rebuilding something that is now a commodity. That work no longer differentiates you. It's plumbing, and it just got cheap.

Here is what is still yours, and always was: the specific workflow you're automating and the domain knowledge baked into it. Your data, and the access controls around it. The evals that encode your definition of a good answer, which is not the same as a generic grader's. The integration with the systems your business actually runs on. And the judgment calls about when an agent should stop and ask a human, which is the part no platform can decide for you because it depends on your risk tolerance and your liability, not theirs.

Notice that the vendors are happy to host both. They'll run the commodity layer, and they'll also store your memory, your eval rubrics, and your orchestration graph. That second category is where the lock-in lives, and we'll come back to it. First, the three options themselves.

Buy: Fast, Cheap to Start, and Quietly Sticky

Buying means a managed platform or a pre-built vertical agent does the work, and you wire it into your business. The case for it is strong and getting stronger. You skip the infrastructure entirely. Time to first working system drops from months to days. The vendor handles scaling, security patches, and the model upgrades that would otherwise be your migration problem. For a generic capability that isn't your competitive advantage, buying is almost always the right call, and the data backs it up. As I wrote in the pilot-to-production piece, buying or licensing an existing AI solution succeeds far more often than building one from scratch. The figure that gets cited is roughly 67% versus 22%, and while the exact numbers move around, the direction has been consistent for years.

The costs of buying are real but easy to defer. You inherit the vendor's roadmap, their pricing changes, and their outages. Your context and accumulated learning live in their platform. And you have to actually verify that what you bought is an agent and not a wrapper. Gartner spent this spring warning about "agent washing," the practice of relabeling a chatbot or a rules engine as agentic AI to ride the hype. A large share of what's marketed as an autonomous agent right now is neither autonomous nor an agent. Buy when the problem is generic, the vendor is real, and the thing you're buying is not the thing that makes you different.

Build: Control, No Lock-In, and a Pile of Work Three Vendors Just Did for You

Building means you own the stack. The case for it is also real. You get exact fit to your workflow, no dependency on a vendor's priorities, full control over your data, and no surprise pricing letter. For the slice of your system that genuinely is your differentiation, this is the right answer. Nobody is going to sell you a better version of the thing that makes your business yours.

The trap is building the parts that are no longer worth building. Every hour spent hand-rolling a sandbox, a memory store, or an orchestration loop in 2026 is an hour spent rebuilding what Anthropic, Google, and OpenAI now rent for cents. I have watched a team spend two quarters building a sandboxed execution layer with its own memory store, then watch the exact capability ship as a checkbox in a managed platform the same quarter they went live. Worse, you inherit the operating burden that comes with it: the eval labor, the on-call rotation, and the token economics. Multi-agent systems alone can consume on the order of 15 times the tokens of a single-agent equivalent, and when you own the whole stack you own that bill and the work of optimizing it. Build the thin layer that's yours. Rent the layer underneath it. A custom agent that's mostly your workflow logic on top of a managed harness is a different financial proposition than a from-scratch platform, and it's the one that survives contact with production.

Partner: The Leg Nobody Puts on the Slide

This is the option the build-versus-buy framing erases, and it's usually the one that decides the outcome.

Partnering doesn't mean handing the project to an agency and hoping. It means bringing in someone who has already shipped agents into production to make the build-versus-buy calls alongside your team: which capabilities to rent, which thin layer to build, how to wire your evals so they measure what your business cares about, and how to stand up observability before launch instead of doing forensics after. The goal of a good partner is to make your team better at this, not dependent on the partner. You walk away owning the system and the judgment, not renting both.

This is the work I spend most of my time on. The point of honest build-versus-buy guidance is that the answer changes with the situation. When it's "buy the Anthropic platform and build almost nothing," that's the answer. When it's "this one workflow is your moat, build it properly and rent everything else," that's the answer. A partner with no product to push is the only one who can give you either verdict without a conflict of interest.

The Lock-In the Managed-Agent Era Hands You

There's a specific trap worth naming, because the May launches created it and the marketing won't mention it.

When the harness was something you built, your agent's memory, its evaluation rubrics, and its orchestration graph were yours, sitting in your infrastructure. Now the platforms will gladly hold all three. Anthropic's memory feature has agents accumulating learning across sessions inside the managed platform. The grading rubrics that encode your standards live there too. So does the orchestration logic that defines how your agents coordinate. That's convenient right up until you want to leave, at which point you discover that the most valuable, most you-specific parts of your system are the parts that don't travel.

This is the new switching cost, and it's denominated in the assets that matter most. It doesn't mean don't buy. It means decide, deliberately, which of those assets you're willing to keep inside a vendor's walls and which you want to own and keep portable. Keeping your eval suite and your core context in infrastructure you control, even while renting the execution harness, is the kind of architectural choice that costs nothing to make on day one and a fortune to retrofit on day four hundred. Plan the exit before you walk in the door.

A Decision You Can Actually Run

Frameworks are useless if you can't apply them on a Tuesday. So here is the version you can take into a planning meeting. For any capability you're deciding on, ask these in order.

Is this our differentiation? If a competitor having the identical capability wouldn't hurt you, it's not your moat. Don't build it. Buy it or rent it and move on.
Is the data sensitive or regulated? If the workflow touches data you can't legally or contractually hand to a third-party platform, that constraint can force a build regardless of the economics. Decide this before you fall in love with a vendor demo.
Do we have eval discipline in-house? If your team can't already tell you, with numbers, whether an agent is performing, then buying a platform won't save you and building one will bury you. This is the gap a partner most often fills first.
How fast is this capability commoditizing? Sandboxing and orchestration were moats a year ago and are commodities today. If a thing is on an obvious path to becoming a platform feature, building it is buying a depreciating asset.
What does exit cost? If you bought, how hard is it to leave? If your memory, evals, and orchestration logic live in one vendor's platform, the answer is "harder than you think." Price the exit before you sign.

Most real systems come out of this as a blend, and that's the correct outcome, not a compromise. The majority of teams now land on some mix of rented harness and custom logic rather than a pure build or a pure buy. The framework isn't there to force you into one camp. It's there to make sure each individual decision is made on purpose instead of by default.

Where This Leaves You

The managed-agent platforms are genuinely good, and they made a lot of expensive engineering obsolete in the span of a spring. That's worth celebrating, not resisting. The mistake isn't using them. The mistake is letting their existence answer a question they can't answer for you.

Buy the harness. It's a commodity now, and rebuilding it is a vanity project. Build the thin layer that's actually your business, and build it with the engineering discipline any production system deserves. And get an honest second opinion on where that line falls, before you commit a budget to either side. The hard part of agents used to be the infrastructure, and that is commoditized now. The real work is knowing what to rent, what to own, and what to keep portable, and that's a judgment call worth getting right the first time.

If you're staring at that decision and want a read from someone with no platform to sell you, that's the conversation I'm built for. The line between rent and own is different for every business. Drawing it well is most of the work.