
Cat Wu leads product for Claude Code and Cowork at Anthropic, so she’s well-versed in constructing dependable, interpretable, and steerable AI techniques. And since 90% of Anthropic’s code is now written by Claude Code, she’s additionally deeply accustomed to becoming them into routine day-to-day work. Final month, Cat joined Addy Osmani at AI Codecon for a fireplace chat on the way forward for agentic coding and, equally vital, agentic code assessment, how Anthropic truly makes use of the instruments they’re constructing, and what expertise matter now for builders.
The suggestions loop is itself a product
Boris Cherny initially constructed Claude Code as a aspect challenge to check Anthropic’s APIs. Then he shared the software in a pocket book, and inside two months all the firm was utilizing it. That natural development, Cat stated, was a part of what satisfied the group it was price releasing externally.
However what actually made that inner adoption seen was the response on Anthropic’s inner “dog-fooding” Slack channel. The Claude Code channel will get a brand new message each 5 to 10 minutes across the clock, and this suggestions instantly and instantly informs the product expertise. Cat described it this manner:
We rent for individuals who love sharpening the consumer expertise. And so numerous our engineers truly reside on this channel and discover when there’s points with new options that they’ve labored on and so they proactively lay out the fixes.
The group ships new variations of Claude Code to inner customers many instances a day. The suggestions loop is tight sufficient that it capabilities as a steady integration system for product high quality, not simply code high quality.
Cat advised Addy how she as soon as by accident launched a small interplay bug between prompts and auto-suggestions. However by the point she began engaged on an answer, she discovered one other group member had already crushed her to it. It seems, he had arrange a scheduled job in Claude Code to scan the suggestions channel for something that hadn’t been responded to in 24 hours and open a PR for it. Since Cat hadn’t gotten to it but (whoops!), her teammate’s Claude noticed the unaddressed subject and stuck it for her. And Cat solely discovered when “[her own] Claude seen that his Claude had already landed a change.”
The infrastructure for fast enchancment, in different phrases, is now partly automated. The brokers are writing the code, then monitoring the suggestions and shutting the loop.
The bottleneck has shifted to assessment
There’s no query that AI-assisted coding has created a growth in output. Anthropic engineers are producing roughly 200% extra code than they had been a 12 months in the past, Cat famous. As we speak the primary constraint is reviewing all that code to make sure it’s production-ready.
Cat’s group concluded you can purchase numerous further robustness for not that a lot additional price.
We opted for the heaviest, most sturdy model [of code review]. We truly plot what number of brokers and the way complete of a assessment Claude does after which what number of bugs does it recall. And we picked quite a few very excessive recall and determined we must always ship this, as a result of should you actually need AI code assessment to be a load-bearing a part of your course of, you truly in all probability simply need probably the most complete potential assessment.
The assessment agent doesn’t simply have a look at the diff. It traces code throughout a number of recordsdata and catches bugs in adjoining code that has nothing to do with the change in query. Cat gave two examples. One was a ZFS encryption refactor the place the agent discovered a key cache invalidation bug that wasn’t associated to the creator’s change in any respect however would have invalidated it. The opposite was a routine auth replace that turned out to have a foul aspect impact, caught premerge. In each instances, engineers manually reviewing the code seemingly would have missed the bugs.
The human assessment that continues to be is intentionally small in scope. For many PRs, the human reviewer skims for design precept violations and apparent issues and assumes useful correctness has been dealt with. 5 to 10 brokers run in parallel, every given barely totally different duties, returning independently after which deduplicating what they discovered.
The cultural shift that made this work, although, was possession. The group moved to a mannequin the place the engineer who authors a PR owns it finish to finish, together with postdeploy bugs, and doesn’t lean on peer reviewers to catch errors. “In any other case,” as Cat identified, “you will have conditions the place junior engineers put out a bunch of PRs after which your senior engineers are like drowning in AI-generated stuff the place they’re unsure how completely it’s been examined.”
Full possession meant the AI assessment needed to truly be reliable, which drove the choice to go for top recall somewhat than a lighter contact. That stated, engineers are nonetheless anticipated to grasp each line of code an agent creates. . .for now. As Cat defined, it’s the one strategy to really stop “unknown safety vulnerabilities and to have the ability to shortly reply to incidents if they’re to occur.”
Everybody’s type of an engineer now
Cowork, Anthropic’s agent software for nontechnical customers, is the corporate’s try and take what Claude Code does for engineers and produce it to information work extra broadly. Cat sketched an image of somebody taking a look at 5 or 6 agent duties working concurrently in a aspect panel, managing a fleet of brokers the way in which a senior engineer manages a PR queue.
Within the nearer-term, she’s holding tabs on the shift towards folks utilizing Claude Code to construct issues for themselves, their groups, or their households that wouldn’t have justified skilled growth effort or “in any other case been potential.” The prototype is the storage challenge, the household expense tracker, the software {that a} small group truly wants however that no SaaS product fairly addresses. Cat’s aim and hope is that Claude Code helps folks “remedy their very own issues for themselves” and “stewards a brand new future of private software program.”
Product style as the brand new technical ability
Extra folks constructing extra software program is unambiguously good. Boris Cherny has even floated the concept that coding as we all know it’s “solved.” However what does that imply for the craft of software program engineering? Cat’s learn of the present second is extra nuanced:
I feel pre-AI, the abilities that had been essential had been with the ability to take a spec and implement it nicely. And I feel now the actually vital ability is product style. Even for engineers. Can you employ code to ingest an enormous quantity of consumer suggestions? Do you will have good instinct about which function to construct to handle these wants, as a result of it’s usually totally different than precisely what customers are asking you for? After which, when Claude builds it, are you establishing the fitting bar in order that what you ship folks truly love?
Cat’s not alone in highlighting the significance of style in a world the place code is a commodity. Steve Yegge, Wes McKinney, and lots of others, myself included, see style and judgment as a uniquely human worth. This has sensible implications for a way engineers ought to spend their time now, and for what the subsequent era must be taught.
For junior engineers particularly, Cat described a development: Begin by utilizing Claude Code to grasp the codebase (ask all of the “dumb questions” with out embarrassment), take these solutions to a senior engineer for calibration, after which shut the loop by updating the CLAUDE.md with no matter was lacking.
Consider Claude Code as your intern that you just’re attempting to stage up. Like, educate it again to Claude. Add a
/confirmslash command. Put it within the CLAUDE.md or the agent README. Strategy this as senior engineers serving to you stage up, and then you definately serving to Claude and different brokers stage up.
The advance course of, in different phrases, ought to be bidirectional. Engineers get higher at utilizing the instruments and the instruments get higher via the engineers’ collected information. And considerably, this course of retains people firmly within the loop, enjoying a task that’s “lively, steady, and expert.”
You possibly can watch Cat and Addy’s full chat, plus every little thing else from AI Codecon on the O’Reilly studying platform. Not a member? Join a free 10-day trial, no strings hooked up.
