Methods to Construct a Basic-Objective AI Agent in 131 Strains of Python – O’Reilly

March 25, 2026

22

The next article initially appeared on Hugo Bowne-Anderson’s publication, Vanishing Gradients, and is being republished right here with the writer’s permission.

On this put up, we’ll construct two AI brokers from scratch in Python. One will likely be a coding agent, the opposite a search agent.

Why have I known as this put up “Methods to Construct a Basic-Objective AI Agent in 131 Strains of Python” then? Effectively, because it seems now, coding brokers are literally general-purpose brokers in some fairly shocking methods.

What I imply by that is after getting an agent that may write code, it could possibly:

Do an enormous variety of stuff you don’t typically consider as involving code, and
Lengthen itself to do much more issues.

It’s extra acceptable to think about coding brokers as “computer-using brokers” that occur to be nice at writing code. That doesn’t imply you need to all the time construct a general-purpose agent, however it’s value understanding what you’re really constructing once you give an LLM shell entry. That’s additionally why we’ll construct a search agent on this put up: to point out the sample works no matter what you’re constructing.

For instance, the coding agent we’ll construct under has 4 instruments: learn, write, edit, and bash.

It will possibly do

File/life group: Clear your desktop, kind downloads by sort, rename trip photographs with dates, discover and delete duplicates, manage receipts into folders. . .
Private productiveness: Search all of your notes for one thing you half-remember, compile a packing listing from previous journeys, discover all PDFs containing “tax” from final yr. . .
Media administration: Rename a season of TV episodes correctly, convert pictures to totally different codecs, extract audio from movies, resize photographs for social media. . .
Writing and content material: Mix a number of docs into one, convert between codecs, find-and-replace throughout many recordsdata. . .
Information wrangling: Flip a messy CSV right into a clear deal with e-book, extract emails from a pile of recordsdata, merge spreadsheets from totally different sources. . .

This can be a small subset of what’s attainable. It’s additionally the rationale Claude Cowork appeared promising and why OpenClaw has taken off in the way in which it did.

So how are you going to construct this? On this put up, I’ll present you construct a minimal model.

Brokers are simply LLMs with instruments in a loop

Brokers are simply LLMs with instruments in a dialog loop and as soon as you understand the sample, you’ll have the ability to construct all forms of brokers with it:

As Ivan Leo wrote,

The barrier to entry is remarkably low: half-hour and you’ve got an AI that may perceive your codebase and make edits simply by speaking to it.

The objective right here is to point out that the sample is similar no matter what you’re constructing an agent for. Coding agent, search agent, browser agent, e mail agent, database agent: all of them observe the identical construction. The one distinction is the instruments you give them.

Half 1: The coding agent

We’ll begin with a coding agent that may learn, write, and execute code. As acknowledged, the flexibility to jot down and execute code with bash additionally turns a “coding agent” right into a “general-purpose agent.” With shell entry, it could possibly do something you are able to do from a terminal:

Kind and manage your native filesystem
Clear up your desktop
Batch rename photographs
Convert file codecs
Handle Git repos throughout a number of tasks
Set up and configure software program

You will discover the code right here.

Try Ivan Leo’s put up for a way to do that in JavaScript and Thorsten Ball’s put up for do it in Go.

Setup

Begin by creating our mission:

We’ll be utilizing Anthropic right here. Be at liberty to make use of your LLM of selection. For bonus factors, use Pydantic AI (or an identical library) and have a constant interface for the assorted totally different LLM suppliers. That approach you should utilize the identical agentic framework for each Claude and Gemini!

Be sure to’ve acquired an Anthropic API key set as ANTHROPIC_API_KEY setting variable.

We’ll construct our agent in 4 steps:

Hook up our LLM
Add a device that reads recordsdata
1. Add extra instruments: write, edit, and bash
Construct the agentic loop
Construct the conversational loop

1. Hook up our LLM

Textual content in, textual content out. Good! Now let’s give it a device.

2. Add a device (learn)

We’ll begin by implementing a device known as learn which is able to permit the agent to learn recordsdata from the filesystem. In Python, we are able to use Pydantic for schema validation, which additionally generates JSON schemas we are able to present to the API:

The Pydantic mannequin provides us two issues: validation and a JSON schema. We are able to see what the schema seems to be like:

We wrap this right into a device definition that Claude understands:

Then we add instruments to the API name, deal with the device request, execute it, and ship the end result again:

Add tools, handle request, execute, send result

Let’s see what occurs after we run it:

This script calls the Claude API with a person question handed through command line. It sends the question, will get a response, and prints it.

Observe that the LLM matched on the device description: Correct, particular descriptions are key! It’s additionally value mentioning that we’ve made two LLM calls right here:

One through which the device known as
A second through which we ship the results of the device name again to the LLM to get the ultimate end result

This typically journeys up individuals constructing brokers for the primary time, and Google has made a pleasant visualization of what we’re really doing:

2a. Add extra instruments (write, edit, bash)

We have now a learn device, however a coding agent must do greater than learn. It must:

Write new recordsdata
Edit current ones
Execute code to check it

That’s three extra instruments: write, edit, and bash.

Similar sample as learn. First the schemas:

Then the executors:

And the device definitions, together with the code that runs whichever one Claude picks:

The bash device is what makes this really helpful: Claude can now write code, run it, see errors, and repair them. However it’s additionally harmful. This device may delete your total filesystem! Proceed with warning: Run it in a sandbox, a container, or a VM.

Curiously, bash is what turns a “coding agent” right into a “general-purpose agent.” With shell entry, it could possibly do something you are able to do from a terminal:

Kind and manage your native filesystem
Clear up your desktop
Batch rename photographs
Convert file codecs
Handle Git repos throughout a number of tasks
Set up and configure software program

It was really “Pi: The Minimal Agent Inside OpenClaw” that impressed this instance.

Strive asking Claude to edit a file: It typically desires to learn it first to see what’s there. However our present code solely handles one device name. That’s the place the agentic loop is available in.

3. Construct the agentic loop

Proper now Claude can solely name one device per request. However actual duties want a number of steps: learn a file, edit it, run it, see the error, repair it. We want a loop that lets Claude preserve calling instruments till it’s finished.

We wrap the device dealing with in a whereas True loop:

Observe that right here now we have despatched the complete previous historical past of gathered messages as we progress via loop iterations. When constructing this out extra, you’ll wish to engineer and handle your context extra successfully. (See under for extra on this.)

Let’s strive a multistep process:

4. Construct the conversational loop

Proper now the agent handles one question and exits. However we wish a back-and-forth dialog: Ask a query, get a solution, ask a follow-up. We want an outer loop that retains asking for enter.

We wrap every part in a whereas True:

The messages listing persists throughout turns, so Claude remembers context. That’s the entire coding agent.

As soon as once more we’re merely appending all earlier messages, which implies the context will develop fairly rapidly!

A notice on agent harnesses

An agent harness is the scaffolding and infrastructure that wraps round an LLM to show it into an agent. It handles:

The loop: prompting the mannequin, parsing its output, executing instruments, feeding outcomes again
Instrument execution: really working the code/instructions the mannequin asks for
Context administration: what goes within the immediate, token limits, historical past
Security/guardrails: affirmation prompts, sandboxing, disallowed actions
State: holding monitor of the dialog, recordsdata touched, and many others.

And extra.

Consider it like this: The LLM is the mind; the harness is every part else that lets it really do issues.

What we’ve constructed above is the howdy world of agent harnesses. It covers the loop, device execution, and fundamental context administration. What it doesn’t have: security guardrails, token limits, persistence, or perhaps a system immediate!

When constructing out from this foundation, I encourage you to observe the paths of:

The Pi coding agent, which provides context loading AGENTS.md from a number of directories, persistent classes you may resume and department, and an extensibility system (abilities, extensions, prompts)
OpenClaw, which matches additional: a persistent daemon (always-on, not invoked), chat because the interface (Telegram, WhatsApp, and many others.), file-based continuity (SOUL.md, MEMORY.md, every day logs), proactive conduct (heartbeats, cron), preintegrated instruments (browser, subagents, gadget management), and the flexibility to message you with out being prompted

Half 2: The search agent

As a way to actually present you that the agentic loop is what powers any agent, we’ll now construct a search agent (impressed by a podcast I did with search legends John Berryman and Doug Turnbull). We’ll use Gemini for the LLM and Exa for internet search. You will discover the code right here.

However first, the astute reader might have an fascinating query: If a coding agent actually is a general-purpose agent, why would anybody wish to construct a search agent after we may simply get a coding agent to increase itself and switch itself right into a search agent? Effectively, as a result of if you wish to construct a search agent for a enterprise, you’re not going to do it by constructing a coding agent first… So let’s construct it!