Sunday, May 24, 2026
HomeTechnologyRethinking the Agent Harness – O’Reilly

Rethinking the Agent Harness – O’Reilly

We kicked off our new weekly collection This Week in AI on Monday, and we lined plenty of floor in half-hour, together with an AI mannequin that discovered safety holes quicker than a long time of human auditing, an information heart in Utah the dimensions of two Manhattans, and a sensible argument for why the harness you construct round a mannequin now issues greater than which mannequin you decide.

Listed below are a couple of takeaways from the dialog between host Eric Freeman, school member at UT Austin and a longtime buddy of O’Reilly, and visitor John Berryman, founding father of Arcturus Labs, an early manufacturing engineer on GitHub Copilot, and coauthor of O’Reilly’s Immediate Engineering for LLMs. Watch the complete episode to search out out why you ought to be constructing your individual agent and why John believes ultimately there will likely be no web for people.

AI’s safety drawback is now a coverage drawback

You’ve in all probability already heard about Mythos. Anthropic’s inside testing of the frontier mannequin surfaced hundreds of beforehand unknown safety vulnerabilities throughout main working techniques, browsers, and monetary infrastructure, together with a 27-year-old bug in OpenBSD. Anthropic selected to not launch the mannequin publicly and as a substitute launched Venture Glasswing, a restricted program giving monitored entry to a small group of trusted companions for defensive patching.

That call moved quick in Washington. In roughly six weeks, the dialog shifted from the light-touch nationwide AI coverage launched in March to reported White Home discussions of an govt order assessment course of modeled on how the FDA handles medicine. Safety researcher Bruce Schneier has questioned whether or not Mythos is uniquely succesful right here or whether or not related outcomes are achievable with cheaper public fashions, however as Freeman famous (paraphrasing Schneier), both manner, it’s an issue that’s coming.

The compute race is getting stranger

Anthropic leased xAI’s complete Colossus 1 supercluster in Memphis: greater than 200,000 GPUs and 300 megawatts of energy. A month earlier than that deal, Anthropic expanded its settlement with Google and Broadcom for 3.5 gigawatts of capability coming on-line in 2027. For context, that’s roughly 10 instances the facility output of the Colossus 1 deal, in a single contract. After this episode aired, Anthropic introduced that that deal has been expanded to Colossus 2 as properly.

Field Elder County, Utah, simply accredited a 40,000-acre AI information heart known as the Stratos venture, backed by investor and TV persona Kevin O’Leary (a.ok.a. Mr. Fantastic). It’s deliberate for 9 gigawatts at full buildout. That’s a footprint greater than twice the dimensions of Manhattan, powered by the equal of 9 industrial nuclear reactors. And like many information heart offers going ahead, together with Colossus above, it was accredited over native protests.

Infrastructure at this unimaginable scale takes years to return on-line, and the businesses making these bets are pricing in a world the place mannequin functionality retains scaling. Whether or not that assumption holds will decide loads about what’s economically viable to construct within the subsequent decade.

The harness issues greater than the mannequin

John was available to rethink the agent harness, which as he identified, entered a brand new section with the step change in mannequin functionality that occurred in November and December of final yr. He took Eric by way of the arc of AI product improvement, from doc completion and chat loops to tool-calling brokers, DAG-based workflows, and now the harness period represented by instruments like Claude Code. Every development added functionality, John famous, but in addition complexity, and every generated a brand new class of issues round reliability and management. In our present second, which John has dubbed the “age of the unharnessed agent,” brokers at the moment are inside attain of everybody, not simply software program builders.

The payoff of this “unharnessed” period is management. John described a shopper engagement the place he changed a bespoke software with a skills-driven agent. Now area specialists with no improvement expertise can learn the agent’s conduct written in plain English and higher perceive it. As John defined,

Moderately than constructing a bespoke agent. . ., I simply constructed one thing that was simply the agent harness—the agent—and I simply gave it expertise that describe what principally I realized in interviewing their specialists, how they’d work with these brokers. And it labored completely. Not solely does the agent keep on monitor and do what it must do nowadays, however it’s coded, so far as my shopper is worried, in English.

The specialists don’t must complain to builders “this doesn’t work.” The specialists can have a look at the English description of what’s happening and see issues, and possibly even repair it themselves. And I’m actually excited to principally give that energy into the arms of the those who know finest methods to change it, the specialists.

That’s a distinct relationship between the specialists and the device than something a wrapped industrial product provides.

As Eric identified, latest Stanford analysis helps this broader level: Efficiency gaps between a naked mannequin and a well-designed harness now usually matter greater than which underlying mannequin you’re utilizing. The benchmark that used to dominate shopping for choices, which mannequin scores highest, has been displaced by a tougher query about which harness suits the duty.

John closed with a demo of his private agent shifting from an Obsidian pocket book into Wikipedia and again, carrying context throughout environments. He used it for example an idea he known as the “open agent protocol,” his time period for a not-yet-existing commonplace the place an agent receives environment-specific expertise because it strikes between contexts. The protocol doesn’t exist but, however the demo made the route clear.

What’s subsequent

Be a part of us and a rotating lineup of knowledgeable company for weekly stay device demos and deeper dives into the matters that matter in AI. We’re taking subsequent week off for Memorial Day within the US, however we’ll be again on June 1 with host Andreas Welsch and company Maya Mikhailov and Doug Shannon to chop by way of one other week of AI headlines and separate what truly drives enterprise worth from what appears to be like good in a demo however goes nowhere in manufacturing. Our first few episodes are free and open to all in the event you’d wish to attend stay—register right here.

We’ll proceed to share full episodes and publish our takeaways right here on Radar every Friday. You may also watch or pay attention on YouTube, Spotify, Apple, or wherever you get your podcasts.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments