Overview
Welcome to Build Your Own Mini Coding Agent in Rust. Over the next seven chapters you will implement a mini coding agent from scratch – a small version of programs like Claude Code or OpenCode – a program that takes a prompt, talks to a large-language model (LLM), and uses tools to interact with the real world. After that, a series of extension chapters add streaming, a TUI, user input, plan mode, and more.
By the end of this book you will have an agent that can run shell commands, read
and write files, and edit code, all driven by an LLM. No API key is required
until Chapter 6, and when you get there the default model is
openrouter/free
– a zero-cost endpoint on OpenRouter, no credits needed.
What is an AI agent?
An LLM on its own is a function: text in, text out. Ask it to summarize
doc.pdf and it will either refuse or hallucinate – it has no way to open the
file.
An agent solves this by giving the LLM tools. A tool is just a function your code can run – read a file, execute a shell command, hit an API. The agent sits in a loop:
- Send the user’s prompt to the LLM.
- The LLM decides it needs to read
doc.pdfand outputs a tool call. - Your code executes the
readtool and feeds the file contents back. - The LLM now has the text and returns a summary.
The LLM never touches the filesystem. It just asks, and your code does. That loop – ask, execute, feed back – is the entire idea.
How does an LLM use a tool?
An LLM cannot execute code. It is a text generator. So “calling a tool” really means the LLM outputs a structured request and your code does the rest.
When you send a request to the LLM, you include a list of tool definitions
alongside the conversation. Each definition is a name, a description, and a
JSON schema describing the arguments. For our read tool that looks like:
{
"name": "read",
"description": "Read the contents of a file.",
"parameters": {
"type": "object",
"properties": {
"path": { "type": "string" }
},
"required": ["path"]
}
}
The LLM reads these definitions the same way it reads the user’s prompt – they are just part of the input. When it decides it needs to read a file, it does not run any code. It produces a structured output like:
{ "name": "read", "arguments": { "path": "doc.pdf" } }
along with a signal that says “I’m not done yet – I made a tool call.” Your code parses this, runs the real function, and sends the result back as a new message. The LLM then continues with that result in context.
Here is the full exchange for our “Summarize doc.pdf” example:
sequenceDiagram
participant U as User
participant A as Agent
participant L as LLM
participant T as read tool
U->>A: "Summarize doc.pdf"
A->>L: prompt + tool definitions
L-->>A: tool_call: read("doc.pdf")
A->>T: read("doc.pdf")
T-->>A: file contents
A->>L: tool result (file contents)
L-->>A: "Here is a summary: ..."
A->>U: "Here is a summary: ..."
The LLM’s only job is deciding which tool to call and what arguments to pass. Your code does the actual work.
A minimal agent in pseudocode
Here is that example as code:
tools = [read_file]
messages = ["Summarize doc.pdf"]
loop:
response = llm(messages, tools)
if response.done:
print(response.text)
break
// The LLM wants to call a tool -- run it and feed the result back.
for call in response.tool_calls:
result = execute(call.name, call.args)
messages.append(result)
That is the entire agent. The rest of this book is implementing each piece –
the llm function, the tools, and the types that connect them – in Rust.
The tool-calling loop
Here is the flow of a single agent invocation:
flowchart TD
A["👤 User prompt"] --> B["🤖 LLM"]
B -- "StopReason::Stop" --> C["✅ Text response"]
B -- "StopReason::ToolUse" --> D["🔧 Execute tool calls"]
D -- "tool results" --> B
- The user sends a prompt.
- The LLM either responds with text (done) or requests one or more tool calls.
- Your code executes each tool and gathers the results.
- The results are fed back to the LLM as new messages.
- Repeat from step 2 until the LLM responds with text.
That is the entire architecture. Everything else is implementation detail.
What we will build
We will build a simple agent framework consisting of:
4 tools:
| Tool | What it does |
|---|---|
read | Read the contents of a file |
write | Write content to a file (creating directories as needed) |
edit | Replace an exact string in a file |
bash | Run a shell command and capture its output |
1 provider:
| Provider | Purpose |
|---|---|
OpenRouterProvider | Talks to a real LLM over HTTP via the OpenAI-compatible API |
Tests use a MockProvider that returns pre-configured responses so you can
run the full test suite without an API key.
Project structure
The project is a Cargo workspace with three crates and a tutorial book:
mini-claw-code/
Cargo.toml # workspace root
mini-claw-code/ # reference solution (do not peek!)
mini-claw-code-starter/ # YOUR code -- you implement things here
mini-claw-code-xtask/ # helper commands (cargo x ...)
mini-claw-code-book/ # this tutorial
- mini-claw-code contains the complete, working implementation. It is there so the test suite can verify that the exercises are solvable, but you should avoid reading it until you have tried on your own.
- mini-claw-code-starter is your working crate. Each source file contains
struct definitions, trait implementations with
unimplemented!()bodies, and doc-comment hints. Your job is to replace theunimplemented!()calls with real code. - mini-claw-code-xtask provides the
cargo xhelper withcheck,solution-check, andbookcommands. - mini-claw-code-book is this mdbook tutorial.
Prerequisites
Before starting, make sure you have:
- Rust installed (1.85+ required, for edition 2024). Install from https://rustup.rs.
- Basic Rust knowledge: ownership, structs, enums, pattern matching, and
Result/Option. If you have read the first half of The Rust Programming Language book, you are ready. - A terminal and a text editor.
- mdbook (optional, for reading the tutorial locally). Install with
cargo install mdbook mdbook-mermaid.
You do not need an API key until Chapter 6. Chapters 1 through 5 use the
MockProvider for testing, so everything runs locally.
Setup
Clone the repository and verify things build:
git clone https://github.com/odysa/mini-claw-code.git
cd mini-claw-code
cargo build
Then verify the test harness works:
cargo test -p mini-claw-code-starter ch1
The tests should fail – that is expected! Your job in Chapter 1 is to make them pass.
If cargo x does not work, make sure you are in the workspace root (the
directory containing the top-level Cargo.toml).
Chapter roadmap
| Chapter | Topic | What you build |
|---|---|---|
| 1 | Core Types | MockProvider – understand the core types by building a test helper |
| 2 | Your First Tool | ReadTool – reading files |
| 3 | Single Turn | single_turn() – explicit match on StopReason, one round of tool calls |
| 4 | More Tools | BashTool, WriteTool, EditTool |
| 5 | Your First Agent SDK! | SimpleAgent – generalizes single_turn() into a loop |
| 6 | The OpenRouter Provider | OpenRouterProvider – talking to a real LLM API |
| 7 | A Simple CLI | Wire everything into an interactive CLI with conversation memory |
| 8 | The Singularity | Your agent can now code itself – what’s next |
Chapters 1–7 are hands-on: you write code in mini-claw-code-starter and run
tests to check your work. Chapter 8 marks the transition to extension
chapters (9+) which walk through the reference implementation:
| Chapter | Topic | What it adds |
|---|---|---|
| 9 | A Better TUI | Markdown rendering, spinners, collapsed tool calls |
| 10 | Streaming | StreamingAgent with SSE parsing and AgentEvents |
| 11 | User Input | AskTool – let the LLM ask you clarifying questions |
| 12 | Plan Mode | PlanAgent – read-only planning phase with approval gating |
Chapters 1–7 follow the same rhythm:
- Read the chapter to understand the concepts.
- Open the corresponding source file in
mini-claw-code-starter/src/. - Replace the
unimplemented!()calls with your implementation. - Run
cargo test -p mini-claw-code-starter chNto check your work.
Ready? Let’s build an agent.
What’s next
Head to Chapter 1: Core Types to understand the
foundational types – StopReason, Message, and the Provider trait – and
build MockProvider, the test helper you will use throughout the next four
chapters.
Chapter 1: Core Types
In this chapter you will understand the types that make up the agent protocol –
StopReason, AssistantTurn, Message, and the Provider trait. These are
the building blocks everything else is built on.
To verify your understanding, you will implement a small test helper:
MockProvider, a struct that returns pre-configured responses so that you can
test future chapters without an API key.
Goal
Understand the core types, then implement MockProvider so that:
- You create it with a
VecDeque<AssistantTurn>of canned responses. - Each call to
chat()returns the next response in sequence. - If all responses have been consumed, it returns an error.
The core types
Open mini-claw-code-starter/src/types.rs. These types define the protocol
between the agent and any LLM backend.
Here is how they relate to each other:
classDiagram
class Provider {
<<trait>>
+chat(messages, tools) AssistantTurn
}
class AssistantTurn {
text: Option~String~
tool_calls: Vec~ToolCall~
stop_reason: StopReason
}
class StopReason {
<<enum>>
Stop
ToolUse
}
class ToolCall {
id: String
name: String
arguments: Value
}
class Message {
<<enum>>
System(String)
User(String)
Assistant(AssistantTurn)
ToolResult(id, content)
}
class ToolDefinition {
name: &'static str
description: &'static str
parameters: Value
}
Provider --> AssistantTurn : returns
Provider --> Message : receives
Provider --> ToolDefinition : receives
AssistantTurn --> StopReason
AssistantTurn --> ToolCall : contains 0..*
Message --> AssistantTurn : wraps
Provider takes in messages and tool definitions, and returns an
AssistantTurn. The turn’s stop_reason tells you what to do next.
ToolDefinition and its builder
#![allow(unused)]
fn main() {
pub struct ToolDefinition {
pub name: &'static str,
pub description: &'static str,
pub parameters: Value,
}
}
Each tool declares a ToolDefinition that tells the LLM what it can do. The
parameters field is a JSON Schema object describing the tool’s arguments.
Rather than building JSON by hand every time, ToolDefinition has a builder
API:
#![allow(unused)]
fn main() {
ToolDefinition::new("read", "Read the contents of a file.")
.param("path", "string", "The file path to read", true)
}
new(name, description)creates a definition with an empty parameter schema.param(name, type, description, required)adds a parameter and returnsself, so you can chain calls.
You will use this builder in every tool starting from Chapter 2.
StopReason and AssistantTurn
#![allow(unused)]
fn main() {
pub enum StopReason {
Stop,
ToolUse,
}
pub struct AssistantTurn {
pub text: Option<String>,
pub tool_calls: Vec<ToolCall>,
pub stop_reason: StopReason,
}
}
The ToolCall struct holds a single tool invocation:
#![allow(unused)]
fn main() {
pub struct ToolCall {
pub id: String,
pub name: String,
pub arguments: Value,
}
}
Each tool call has an id (for matching results back to requests), a name
(which tool to call), and arguments (a JSON value the tool will parse).
Every response from the LLM comes with a stop_reason that tells you why
the model stopped generating:
StopReason::Stop– the model is done. Checktextfor the response.StopReason::ToolUse– the model wants to call tools. Checktool_calls.
This is the raw LLM protocol: the model tells you what to do next. In
Chapter 3 you will write a function that explicitly matches on
stop_reason to handle each case. In Chapter 5 you will wrap that match
inside a loop to create the full agent.
The Provider trait
#![allow(unused)]
fn main() {
pub trait Provider: Send + Sync {
fn chat<'a>(
&'a self,
messages: &'a [Message],
tools: &'a [&'a ToolDefinition],
) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a;
}
}
This says: “A Provider is something that can take a slice of messages and a
slice of tool definitions, and asynchronously return an AssistantTurn.”
The Send + Sync bounds mean the provider must be safe to share across
threads. This is important because tokio (the async runtime) may move tasks
between threads.
Notice that chat() takes &self, not &mut self. The real provider
(OpenRouterProvider) does not need mutation – it just fires HTTP requests.
Making the trait &mut self would force every caller to hold exclusive access,
which is unnecessarily restrictive. The trade-off: MockProvider (a test
helper) does need to mutate its response list, so it must use interior
mutability to conform to the trait.
The Message enum
#![allow(unused)]
fn main() {
pub enum Message {
System(String),
User(String),
Assistant(AssistantTurn),
ToolResult { id: String, content: String },
}
}
The conversation history is a list of Message values:
System(text)– a system prompt that sets the agent’s role and behavior. Typically the first message in the history.User(text)– a prompt from the user.Assistant(turn)– a response from the LLM (text, tool calls, or both).ToolResult { id, content }– the result of executing a tool call. Theidmatches theToolCall::idso the LLM knows which call this result belongs to.
You will use these variants starting in Chapter 3 when building the
single_turn() function.
Why Provider uses impl Future but Tool uses #[async_trait]
You may notice in Chapter 2 that the Tool trait uses #[async_trait] while
Provider uses impl Future directly. The difference is about how the trait
is used:
Provideris used generically (SimpleAgent<P: Provider>). The compiler knows the concrete type at compile time, soimpl Futureworks.Toolis stored as a trait object (Box<dyn Tool>) in a collection of different tool types. Trait objects require a uniform return type, which#[async_trait]provides by boxing the future.
When implementing a trait that uses impl Future, you can simply write
async fn in the impl block – Rust desugars it to the impl Future form
automatically. So while the trait definition says -> impl Future<...>,
your implementation can just write async fn chat(...).
If this distinction is unclear now, it will click in Chapter 5 when you see both patterns in action.
ToolSet – a collection of tools
One more type you will use starting in Chapter 3: ToolSet. It wraps a
HashMap<String, Box<dyn Tool>> and indexes tools by name, giving O(1)
lookup when executing tool calls. You build one with a builder:
#![allow(unused)]
fn main() {
let tools = ToolSet::new()
.with(ReadTool::new())
.with(BashTool::new());
}
You do not need to implement ToolSet – it is provided in types.rs.
Implementing MockProvider
Now that you understand the types, let’s put them to use. MockProvider is a
test helper – it implements Provider by returning canned responses instead of
calling a real LLM. You will use it throughout chapters 2–5 to test tools and
the agent loop without needing an API key.
Open mini-claw-code-starter/src/mock.rs. You will see the struct and method
signatures already laid out with unimplemented!() bodies.
Interior mutability with Mutex
MockProvider needs to remove responses from a list each time chat()
is called. But chat() takes &self. How do we mutate through a shared
reference?
Rust’s std::sync::Mutex provides interior mutability: you wrap a value in a
Mutex, and calling .lock().unwrap() gives you a mutable guard even through
&self. The lock ensures only one thread accesses the data at a time.
#![allow(unused)]
fn main() {
use std::collections::VecDeque;
use std::sync::Mutex;
struct MyState {
items: Mutex<VecDeque<String>>,
}
impl MyState {
fn take_one(&self) -> Option<String> {
self.items.lock().unwrap().pop_front()
}
}
}
Step 1: The struct fields
The struct already has the field you need: a Mutex<VecDeque<AssistantTurn>>
to hold the responses. This is provided so that the method signatures compile.
Your job is to implement the methods that use this field.
Step 2: Implement new()
The new() method receives a VecDeque<AssistantTurn>. We want FIFO order –
each call to chat() should return the first remaining response, not the
last. VecDeque::pop_front() does exactly that in O(1):
flowchart LR
subgraph "VecDeque (FIFO)"
direction LR
A["A"] ~~~ B["B"] ~~~ C["C"]
end
A -- "pop_front()" --> out1["chat() → A"]
B -. "next call" .-> out2["chat() → B"]
C -. "next call" .-> out3["chat() → C"]
So in new():
- Wrap the input deque in a
Mutex. - Store it in
Self.
Step 3: Implement chat()
The chat() method should:
- Lock the mutex.
pop_front()the next response.- If there is one, return
Ok(response). - If the deque is empty, return an error.
The mock provider intentionally ignores the messages and tools parameters.
It does not care what the “user” said – it just returns the next canned
response.
A useful pattern for converting Option to Result:
#![allow(unused)]
fn main() {
some_option.ok_or_else(|| anyhow::anyhow!("no more responses"))
}
Running the tests
Run the Chapter 1 tests:
cargo test -p mini-claw-code-starter ch1
What the tests verify
test_ch1_returns_text: Creates aMockProviderwith one response containing text. Callschat()once and checks the text matches.test_ch1_returns_tool_calls: Creates a provider with one response containing a tool call. Verifies the tool call name and id.test_ch1_steps_through_sequence: Creates a provider with three responses. Callschat()three times and verifies they come back in the correct order (First, Second, Third).
These are the core tests. There are also additional edge-case tests (empty responses, exhausted queue, multiple tool calls, etc.) that will pass once your core implementation is correct.
Recap
You have learned the core types that define the agent protocol:
StopReasontells you whether the LLM is done or wants to call tools.AssistantTurncarries the LLM’s response – text, tool calls, or both.Provideris the trait any LLM backend implements.
You also built MockProvider, a test helper you will use throughout the next
four chapters to simulate LLM conversations without HTTP requests.
What’s next
In Chapter 2: Your First Tool you will implement the
ReadTool – a tool that reads file contents and returns them to the LLM.
Chapter 2: Your First Tool
Now that you have a mock provider, it is time to build your first tool. You will
implement ReadTool – a tool that reads a file and returns its contents. This
is the simplest tool in our agent, but it introduces the Tool trait pattern
that every other tool follows.
Goal
Implement ReadTool so that:
- It declares its name, description, and parameter schema.
- When called with a
{"path": "some/file.txt"}argument, it reads the file and returns its contents as a string. - Missing arguments or non-existent files produce errors.
Key Rust concepts
The Tool trait
Open mini-claw-code-starter/src/types.rs and look at the Tool trait:
#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait Tool: Send + Sync {
fn definition(&self) -> &ToolDefinition;
async fn call(&self, args: Value) -> anyhow::Result<String>;
}
}
Two methods:
definition()returns metadata about the tool: its name, a description, and a JSON schema describing its parameters. The LLM uses this to decide which tool to call and how to format the arguments.call()actually executes the tool. It receives aserde_json::Valuecontaining the arguments and returns a string result.
ToolDefinition
#![allow(unused)]
fn main() {
pub struct ToolDefinition {
pub name: &'static str,
pub description: &'static str,
pub parameters: Value,
}
}
As you saw in Chapter 1, ToolDefinition has a builder API for declaring
parameters. For ReadTool, we need a single required parameter called "path"
of type "string":
#![allow(unused)]
fn main() {
ToolDefinition::new("read", "Read the contents of a file.")
.param("path", "string", "The file path to read", true)
}
Under the hood, the builder constructs the JSON Schema you saw in Chapter 1.
The last argument (true) marks the parameter as required.
Why #[async_trait] instead of plain async fn?
You might wonder why we use the async_trait macro instead of writing
async fn directly in the trait. The reason is trait object compatibility.
Later, in the agent loop, we will store tools in a ToolSet – a HashMap-backed
collection of different tool types behind a common interface. This requires
dynamic dispatch, which means the compiler needs to know the size of the
return type at compile time.
async fn in traits generates a different, uniquely-sized Future type for
each implementation. That breaks dynamic dispatch. The #[async_trait] macro
automatically rewrites async fn into a method that returns
Pin<Box<dyn Future<...>>>, which has a known, fixed size regardless of
which tool produced it. You write normal async fn code, and the macro
handles the boxing for you.
Here is the data flow when the agent calls a tool:
flowchart LR
A["LLM returns<br/>ToolCall"] --> B["args: JSON Value<br/>{"path": "f.txt"}"]
B --> C["Tool::call(args)"]
C --> D["Result: String<br/>(file contents)"]
D --> E["Sent back to LLM<br/>as ToolResult"]
The LLM never touches the filesystem. It produces a JSON request, your code executes it, and returns a string.
The implementation
Open mini-claw-code-starter/src/tools/read.rs. The struct, Default impl, and
method signatures are already provided.
Remember to annotate your impl Tool for ReadTool block with
#[async_trait::async_trait]. The starter file already has this in place.
Step 1: Implement new()
Create a ToolDefinition and store it in self.definition. Use the builder:
#![allow(unused)]
fn main() {
ToolDefinition::new("read", "Read the contents of a file.")
.param("path", "string", "The file path to read", true)
}
Step 2: definition() – already provided
The definition() method is already implemented in the starter – it simply
returns &self.definition. No work needed here.
Step 3: Implement call()
This is where the real work happens. Your implementation should:
- Extract the
"path"argument fromargs. - Read the file asynchronously.
- Return the file contents.
Here is the shape:
#![allow(unused)]
fn main() {
async fn call(&self, args: Value) -> anyhow::Result<String> {
// 1. Extract path
// 2. Read file with tokio::fs::read_to_string
// 3. Return contents
}
}
Some useful APIs:
args["path"].as_str()returnsOption<&str>. Use.context("missing 'path' argument")?fromanyhowto convertNoneinto a descriptive error.tokio::fs::read_to_string(path).awaitreads a file asynchronously. Chain.with_context(|| format!("failed to read '{path}'"))?for a clear error message.
That is it – extract the path, read the file, return the contents.
Running the tests
Run the Chapter 2 tests:
cargo test -p mini-claw-code-starter ch2
What the tests verify
test_ch2_read_definition: Creates aReadTooland checks that its name is"read", description is non-empty, and"path"is in the required parameters.test_ch2_read_file: Creates a temp file with known content, callsReadToolwith the file path, and checks the returned content matches.test_ch2_read_missing_file: CallsReadToolwith a path that does not exist and verifies it returns an error.test_ch2_read_missing_arg: CallsReadToolwith an empty JSON object (no"path"key) and verifies it returns an error.
There are also additional edge-case tests (empty files, unicode content, wrong argument types, etc.) that will pass once your core implementation is correct.
Recap
You built your first tool by implementing the Tool trait. The key patterns:
ToolDefinition::new(...).param(...)declares the tool’s name, description, and parameters.#[async_trait::async_trait]on theimplblock lets you writeasync fn call()while keeping trait object compatibility.tokio::fsfor async file I/O.anyhow::Contextfor adding descriptive error messages.
Every tool in the agent follows this exact same structure. Once you understand
ReadTool, the remaining tools are variations on the theme.
What’s next
In Chapter 3: Single Turn you will write a function
that matches on StopReason to handle a single round of tool calls.
Chapter 3: Single Turn
You have a provider and a tool. Before jumping to the full agent loop, let’s
see the raw protocol: the LLM returns a stop_reason that tells you whether
it is done or wants to use tools. In this chapter you will write a function
that handles exactly one prompt with at most one round of tool calls.
Goal
Implement single_turn() so that:
- It sends a prompt to the provider.
- It matches on
stop_reason. - If
Stop– return the text. - If
ToolUse– execute the tools, send results back, return the final text.
No loop. Just one turn.
Key Rust concepts
ToolSet – a HashMap of tools
The function signature takes a &ToolSet instead of a raw slice or vector:
#![allow(unused)]
fn main() {
pub async fn single_turn<P: Provider>(
provider: &P,
tools: &ToolSet,
prompt: &str,
) -> anyhow::Result<String>
}
ToolSet wraps a HashMap<String, Box<dyn Tool>> and indexes tools by their
definition name. This gives O(1) lookup when executing tool calls instead of
scanning a list. The builder API auto-extracts the name from each tool’s
definition:
#![allow(unused)]
fn main() {
let tools = ToolSet::new().with(ReadTool::new());
let result = single_turn(&provider, &tools, "Read test.txt").await?;
}
match on StopReason
This is the core teaching point. Instead of checking tool_calls.is_empty(),
you explicitly match on the stop reason:
#![allow(unused)]
fn main() {
match turn.stop_reason {
StopReason::Stop => { /* return text */ }
StopReason::ToolUse => { /* execute tools */ }
}
}
This makes the protocol visible. The LLM is telling you what to do, and you handle each case explicitly.
Here is the complete flow of single_turn():
flowchart TD
A["prompt"] --> B["provider.chat()"]
B --> C{"stop_reason?"}
C -- "Stop" --> D["Return text"]
C -- "ToolUse" --> E["Execute each tool call"]
E --> F{"Tool error?"}
F -- "Ok" --> G["result = output"]
F -- "Err" --> H["result = error message"]
G --> I["Push Assistant message"]
H --> I
I --> J["Push ToolResult messages"]
J --> K["provider.chat() again"]
K --> L["Return final text"]
The key difference from the full agent loop (Chapter 5) is that there is no
outer loop here. If the LLM asks for tools a second time, single_turn() does
not handle it – that is what the agent loop is for.
The implementation
Open mini-claw-code-starter/src/agent.rs. You will see the single_turn()
function signature at the top of the file, before the SimpleAgent struct.
Step 1: Collect tool definitions
ToolSet has a definitions() method that returns all tool schemas:
#![allow(unused)]
fn main() {
let defs = tools.definitions();
}
Step 2: Create the initial message
#![allow(unused)]
fn main() {
let mut messages = vec![Message::User(prompt.to_string())];
}
Step 3: Call the provider
#![allow(unused)]
fn main() {
let turn = provider.chat(&messages, &defs).await?;
}
Step 4: Match on stop_reason
This is the heart of the function:
#![allow(unused)]
fn main() {
match turn.stop_reason {
StopReason::Stop => Ok(turn.text.unwrap_or_default()),
StopReason::ToolUse => {
// execute tools, send results, get final answer
}
}
}
For the ToolUse branch:
- For each tool call, find the matching tool and call it. Collect the
results into a
Vecfirst – you will needturn.tool_callsfor this, so you cannot moveturnyet. - Push
Message::Assistant(turn)and thenMessage::ToolResultfor each result. Pushing the assistant turn movesturn, which is why you must collect results beforehand. - Call the provider again to get the final answer.
- Return
final_turn.text.unwrap_or_default().
The tool-finding and execution logic is the same as what you will use in the agent loop (Chapter 5):
#![allow(unused)]
fn main() {
println!("{}", tool_summary(call));
let content = match tools.get(&call.name) {
Some(t) => t.call(call.arguments.clone()).await
.unwrap_or_else(|e| format!("error: {e}")),
None => format!("error: unknown tool `{}`", call.name),
};
}
The tool_summary() helper prints each tool call to the terminal so you can
see which tools the agent is using and what arguments it passed. For example,
[bash: ls -la] or [read: src/main.rs]. (The reference implementation uses
print!("\x1b[2K\r...") instead of println! to clear the thinking...
indicator line before printing – you’ll see this pattern in Chapter 7. A plain
println! works fine for now.)
Error handling – never crash the loop
Notice that tool errors are caught, not propagated. The .unwrap_or_else()
converts any error into a string like "error: failed to read 'missing.txt'".
This string is sent back to the LLM as a normal tool result. The LLM can then
decide what to do – try a different file, use another tool, or explain the
problem to the user.
The same applies to unknown tools – instead of panicking, you send an error message back as a tool result.
This is a key design principle: the agent loop should never crash because of a tool failure. Tools operate on the real world (files, processes, network), and failures are expected. The LLM is smart enough to recover if you give it the error message.
Here is the message sequence for a successful tool call:
sequenceDiagram
participant ST as single_turn()
participant P as Provider
participant T as ReadTool
ST->>P: [User("Read test.txt")] + tool defs
P-->>ST: ToolUse: read({path: "test.txt"})
ST->>T: call({path: "test.txt"})
T-->>ST: "file contents..."
Note over ST: Push Assistant + ToolResult
ST->>P: [User, Assistant, ToolResult]
P-->>ST: Stop: "Here are the contents: ..."
ST-->>ST: return text
And here is what happens when a tool fails (e.g. file not found):
sequenceDiagram
participant ST as single_turn()
participant P as Provider
participant T as ReadTool
ST->>P: [User("Read missing.txt")] + tool defs
P-->>ST: ToolUse: read({path: "missing.txt"})
ST->>T: call({path: "missing.txt"})
T--xST: Err("failed to read 'missing.txt'")
Note over ST: Catch error, use as result
Note over ST: Push Assistant + ToolResult("error: failed to read ...")
ST->>P: [User, Assistant, ToolResult]
P-->>ST: Stop: "Sorry, that file doesn't exist."
ST-->>ST: return text
The error does not crash the agent. It becomes a tool result that the LLM reads and responds to.
Running the tests
Run the Chapter 3 tests:
cargo test -p mini-claw-code-starter ch3
What the tests verify
-
test_ch3_direct_response: Provider returnsStopReason::Stop.single_turnshould return the text directly. -
test_ch3_one_tool_call: Provider returnsStopReason::ToolUsewith areadtool call, thenStopReason::Stop. Verifies the file was read and the final text is returned. -
test_ch3_unknown_tool: Provider returnsStopReason::ToolUsefor a tool that does not exist. Verifies the error message is sent as a tool result and the final text is returned. -
test_ch3_tool_error_propagates: Provider requests areadon a file that does not exist. The error should be caught and sent back to the LLM as a tool result (not crash the function). The LLM then responds with text.
There are also additional edge-case tests (empty responses, multiple tool calls in one turn, etc.) that will pass once your core implementation is correct.
Recap
You have written the simplest possible handler for the LLM protocol:
- Match on
StopReason– the model tells you what to do next. - No loop – you handle at most one round of tool calls.
ToolSet– a HashMap-backed collection with O(1) tool lookup by name.
This is the foundation. In Chapter 5 you will wrap this same logic in a loop to create the full agent.
What’s next
In Chapter 4: More Tools you will implement three
more tools: BashTool, WriteTool, and EditTool.
Chapter 4: More Tools
You have already implemented ReadTool and understand the Tool trait pattern.
Now you will implement three more tools: BashTool, WriteTool, and EditTool.
Each follows the same structure – define a schema, implement call() – so this
chapter reinforces the pattern through repetition.
By the end of this chapter your agent will have all four tools it needs to interact with the file system and execute commands.
flowchart LR
subgraph ToolSet
R["read<br/>Read a file"]
B["bash<br/>Run a command"]
W["write<br/>Write a file"]
E["edit<br/>Replace a string"]
end
Agent -- "tools.get(name)" --> ToolSet
Goal
Implement three tools:
- BashTool – run a shell command and return its output.
- WriteTool – write content to a file, creating directories as needed.
- EditTool – replace an exact string in a file (must appear exactly once).
Key Rust concepts
tokio::process::Command
Tokio provides an async wrapper around std::process::Command. You will use it
in BashTool:
#![allow(unused)]
fn main() {
let output = tokio::process::Command::new("bash")
.arg("-c")
.arg(command)
.output()
.await?;
}
This runs bash -c "<command>" and captures stdout and stderr. The output
struct has stdout and stderr fields as Vec<u8>, which you convert to
strings with String::from_utf8_lossy().
bail!() macro
The anyhow::bail!() macro is shorthand for returning an error immediately:
#![allow(unused)]
fn main() {
use anyhow::bail;
if count == 0 {
bail!("not found");
}
// equivalent to:
// return Err(anyhow::anyhow!("not found"));
}
You will use this in EditTool for validation.
Make sure to import it: use anyhow::{Context, bail};. The starter file
already includes this import in edit.rs.
create_dir_all
When writing a file to a path like a/b/c/file.txt, the parent directories
might not exist. tokio::fs::create_dir_all creates the entire directory tree:
#![allow(unused)]
fn main() {
if let Some(parent) = std::path::Path::new(path).parent() {
tokio::fs::create_dir_all(parent).await?;
}
}
Tool 1: BashTool
Open mini-claw-code-starter/src/tools/bash.rs.
Schema
Use the builder pattern you learned in Chapter 2:
#![allow(unused)]
fn main() {
ToolDefinition::new("bash", "Run a bash command and return its output.")
.param("command", "string", "The bash command to run", true)
}
Implementation
The call() method should:
- Extract
"command"from args. - Run
bash -c <command>usingtokio::process::Command. - Capture stdout and stderr.
- Build a result string:
- Start with stdout (if non-empty).
- Append stderr prefixed with
"stderr: "(if non-empty). - If both are empty, return
"(no output)".
Think about how you combine stdout and stderr. If both are present, you want them separated by a newline. Something like:
#![allow(unused)]
fn main() {
let mut result = String::new();
if !stdout.is_empty() {
result.push_str(&stdout);
}
if !stderr.is_empty() {
if !result.is_empty() {
result.push('\n');
}
result.push_str("stderr: ");
result.push_str(&stderr);
}
if result.is_empty() {
result.push_str("(no output)");
}
}
Tool 2: WriteTool
Open mini-claw-code-starter/src/tools/write.rs.
Schema
#![allow(unused)]
fn main() {
ToolDefinition::new("write", "Write content to a file, creating directories as needed.")
.param("path", "string", "The file path to write to", true)
.param("content", "string", "The content to write to the file", true)
}
Implementation
The call() method should:
- Extract
"path"and"content"from args. - Create parent directories if they do not exist.
- Write the content to the file.
- Return a confirmation message like
"wrote {path}".
For creating parent directories:
#![allow(unused)]
fn main() {
if let Some(parent) = std::path::Path::new(path).parent() {
tokio::fs::create_dir_all(parent).await
.with_context(|| format!("failed to create directories for '{path}'"))?;
}
}
Then write the file:
#![allow(unused)]
fn main() {
tokio::fs::write(path, content).await
.with_context(|| format!("failed to write '{path}'"))?;
}
Tool 3: EditTool
Open mini-claw-code-starter/src/tools/edit.rs.
Schema
#![allow(unused)]
fn main() {
ToolDefinition::new("edit", "Replace an exact string in a file (must appear exactly once).")
.param("path", "string", "The file path to edit", true)
.param("old_string", "string", "The exact string to find and replace", true)
.param("new_string", "string", "The replacement string", true)
}
Implementation
The call() method is the most interesting of the bunch. It should:
- Extract
"path","old_string", and"new_string"from args. - Read the file contents.
- Count how many times
old_stringappears in the content. - If the count is 0, return an error: the string was not found.
- If the count is greater than 1, return an error: the string is ambiguous.
- Replace the single occurrence and write the file back.
- Return a confirmation like
"edited {path}".
The validation is important – requiring exactly one match prevents accidental edits in the wrong place.
flowchart TD
A["Read file"] --> B["Count matches<br/>of old_string"]
B --> C{"count?"}
C -- "0" --> D["Error: not found"]
C -- "1" --> E["Replace + write file"]
C -- ">1" --> F["Error: ambiguous"]
E --> G["Return "edited path""]
Useful APIs:
content.matches(old).count()counts occurrences of a substring.content.replacen(old, new, 1)replaces the first occurrence.bail!("old_string not found in '{path}'")for the not-found case.bail!("old_string appears {count} times in '{path}', must be unique")for the ambiguous case.
Running the tests
Run the Chapter 4 tests:
cargo test -p mini-claw-code-starter ch4
What the tests verify
BashTool:
test_ch4_bash_definition: Checks name is"bash"and"command"is required.test_ch4_bash_runs_command: Runsecho helloand checks the output contains"hello".test_ch4_bash_captures_stderr: Runsecho err >&2and checks stderr is captured.test_ch4_bash_missing_arg: Passes empty args and expects an error.
WriteTool:
test_ch4_write_definition: Checks name is"write".test_ch4_write_creates_file: Writes to a temp file and reads it back.test_ch4_write_creates_dirs: Writes toa/b/c/out.txtand verifies directories were created.test_ch4_write_missing_arg: Passes only"path"(no"content") and expects an error.
EditTool:
test_ch4_edit_definition: Checks name is"edit".test_ch4_edit_replaces_string: Edits"hello"to"goodbye"in a file containing"hello world"and checks the result is"goodbye world".test_ch4_edit_not_found: Tries to replace a string that does not exist and expects an error.test_ch4_edit_not_unique: Tries to replace"a"in a file containing"aaa"(three occurrences) and expects an error.
There are also additional edge-case tests for each tool (wrong argument types, missing arguments, output format checks, etc.) that will pass once your core implementations are correct.
Recap
You now have four tools, and they all follow the same pattern:
- Define a
ToolDefinitionwith::new(...).param(...)builder calls. - Return
&self.definitionfromdefinition(). - Add
#[async_trait::async_trait]on theimpl Toolblock and writeasync fn call().
This is a deliberate design. The Tool trait makes every tool interchangeable
from the agent’s perspective. The agent does not know or care how a tool works
internally – it only needs the definition (to tell the LLM) and the call method
(to execute it).
What’s next
With a provider and four tools ready, it is time to connect them. In
Chapter 5: Your First Agent SDK! you will build the
SimpleAgent – the core loop that sends prompts to the provider, executes
tool calls, and iterates until the LLM gives a final answer.
Chapter 5: Your First Agent SDK!
This is the chapter where everything comes together. You have a provider that
returns AssistantTurn responses and four tools that execute actions. Now you
will build the SimpleAgent – the loop that connects them.
This is the “aha!” moment of the tutorial. The agent loop is surprisingly short, but it is the engine that makes an LLM into an agent.
What is an agent loop?
In Chapter 3 you built single_turn() – one prompt, one round of tool calls,
one final answer. That is enough when the LLM knows everything it needs after
reading a single file. But real tasks are messier:
“Find the bug in this project and fix it.”
The LLM might need to read five files, run the test suite, edit a source file, run the tests again, and then report back. Each of those is a tool call, and the LLM cannot plan them all upfront because the result of one call determines the next. It needs a loop.
The agent loop is that loop:
flowchart TD
A["User prompt"] --> B["Call LLM"]
B -- "StopReason::Stop" --> C["Return text"]
B -- "StopReason::ToolUse" --> D["Execute tool calls"]
D -- "Push assistant + tool results" --> B
- Send messages to the LLM.
- If the LLM says “I’m done” (
StopReason::Stop), return its text. - If the LLM says “I need tools” (
StopReason::ToolUse), execute them. - Append the assistant turn and tool results to the message history.
- Go to step 1.
That is the entire architecture of every coding agent – Claude Code, Cursor, OpenCode, Copilot. The details vary (streaming, parallel tool calls, safety checks), but the core loop is always the same. And you are about to build it in about 30 lines of Rust.
Goal
Implement SimpleAgent so that:
- It holds a provider and a collection of tools.
- You can register tools using a builder pattern (
.tool(ReadTool::new())). - The
run()method implements the tool-calling loop: prompt -> provider -> tool calls -> tool results -> provider -> … -> final text.
Key Rust concepts
Generics with trait bounds
#![allow(unused)]
fn main() {
pub struct SimpleAgent<P: Provider> {
provider: P,
tools: ToolSet,
}
}
The <P: Provider> means SimpleAgent is generic over any type that
implements the Provider trait. When you use MockProvider, the compiler
generates code specialized for MockProvider. When you use
OpenRouterProvider, it generates code for that type. Same logic, different
providers.
ToolSet – a HashMap of trait objects
The tools field is a ToolSet, which wraps a HashMap<String, Box<dyn Tool>>
internally. Each value is a heap-allocated trait object that implements Tool,
but the concrete types can differ. One might be a ReadTool, the next a
BashTool. The HashMap key is the tool’s name, giving O(1) lookup when executing
tool calls.
Why trait objects (Box<dyn Tool>) instead of generics? Because you need a
heterogeneous collection. A Vec<T> requires all elements to be the same
type. With Box<dyn Tool>, you erase the concrete type and store them all
behind the same interface.
This is why the Tool trait uses #[async_trait] – the macro rewrites
async fn into a boxed future with a uniform type across different tool
implementations.
The builder pattern
The tool() method takes self by value (not &mut self) and returns Self:
#![allow(unused)]
fn main() {
pub fn tool(mut self, t: impl Tool + 'static) -> Self {
// push the tool
self
}
}
This lets you chain calls:
#![allow(unused)]
fn main() {
let agent = SimpleAgent::new(provider)
.tool(BashTool::new())
.tool(ReadTool::new())
.tool(WriteTool::new())
.tool(EditTool::new());
}
The impl Tool + 'static parameter accepts any type implementing Tool with
a 'static lifetime (meaning it does not borrow temporary data). Inside the
method, you push it into the ToolSet, which boxes it and indexes it by name.
The implementation
Open mini-claw-code-starter/src/agent.rs. The struct definition and method
signatures are provided.
Step 1: Implement new()
Store the provider and initialize an empty ToolSet:
#![allow(unused)]
fn main() {
pub fn new(provider: P) -> Self {
Self {
provider,
tools: ToolSet::new(),
}
}
}
This one is straightforward.
Step 2: Implement tool()
Push the tool into the set, return self:
#![allow(unused)]
fn main() {
pub fn tool(mut self, t: impl Tool + 'static) -> Self {
self.tools.push(t);
self
}
}
Step 3: Implement run() – the core loop
This is the heart of the agent. Here is the flow:
- Collect tool definitions from all registered tools.
- Create a
messagesvector starting with the user’s prompt. - Loop:
a. Call
self.provider.chat(&messages, &defs)to get anAssistantTurn. b. Match onturn.stop_reason:StopReason::Stop– the LLM is done, returnturn.text.StopReason::ToolUse– for each tool call:- Find the matching tool by name.
- Call it with the arguments.
- Collect the result.
c. Push the
AssistantTurnas aMessage::Assistant. d. Push each tool result as aMessage::ToolResult. e. Continue the loop.
Think about the data flow carefully. After executing tools, you push both the assistant’s turn (so the LLM can see what it requested) and the tool results (so it can see what happened). This gives the LLM full context to decide what to do next.
Gathering tool definitions
At the start of run(), collect all tool definitions from the ToolSet:
#![allow(unused)]
fn main() {
let defs = self.tools.definitions();
}
The loop structure
This is single_turn() (from Chapter 3) wrapped in a loop. Instead of
handling just one round, we match on stop_reason inside a loop:
#![allow(unused)]
fn main() {
loop {
let turn = self.provider.chat(&messages, &defs).await?;
match turn.stop_reason {
StopReason::Stop => return Ok(turn.text.unwrap_or_default()),
StopReason::ToolUse => {
// Execute tool calls, collect results
// Push messages
}
}
}
}
Finding and calling tools
For each tool call, look it up by name in the ToolSet:
#![allow(unused)]
fn main() {
println!("{}", tool_summary(call));
let content = match self.tools.get(&call.name) {
Some(t) => t.call(call.arguments.clone()).await
.unwrap_or_else(|e| format!("error: {e}")),
None => format!("error: unknown tool `{}`", call.name),
};
}
The tool_summary() helper prints each tool call to the terminal – one line
per tool with its key argument, so you can watch what the agent does in real
time. For example: [bash: cat Cargo.toml] or [write: src/lib.rs].
Error handling
Tool errors are caught with .unwrap_or_else() and converted into a string
that gets sent back to the LLM as a tool result. This is the same pattern from
Chapter 3, and it is critical here because the agent loop runs multiple
iterations. If a tool error crashed the loop, the agent would die on the first
missing file or failed command. Instead, the LLM sees the error and can
recover – try a different path, adjust the command, or explain the problem.
> What's in README.md?
[read: README.md] <-- tool fails (file not found)
[read: Cargo.toml] <-- LLM recovers, tries another file
Here is the project info from Cargo.toml...
Unknown tools are handled the same way – an error string as the tool result, not a crash.
Pushing messages
After executing all tool calls for a turn, push the assistant message and the
tool results. You need to collect results first (because the turn is moved
into Message::Assistant):
#![allow(unused)]
fn main() {
let mut results = Vec::new();
for call in &turn.tool_calls {
// ... execute and collect (id, content) pairs
}
messages.push(Message::Assistant(turn));
for (id, content) in results {
messages.push(Message::ToolResult { id, content });
}
}
The order matters: assistant message first, then tool results. This matches the format that LLM APIs expect.
Running the tests
Run the Chapter 5 tests:
cargo test -p mini-claw-code-starter ch5
What the tests verify
-
test_ch5_text_response: Provider returns text immediately (no tools). Agent should return that text. -
test_ch5_single_tool_call: Provider first requests areadtool call, then returns text. Agent should execute the tool and return the final text. -
test_ch5_unknown_tool: Provider requests a tool that does not exist. Agent should handle it gracefully (return an error string as the tool result) and continue to get the final text. -
test_ch5_multi_step_loop: Provider requestsreadtwice across two turns, then returns text. Verifies the loop runs multiple iterations. -
test_ch5_empty_response: Provider returnsNonefor text and no tool calls. Agent should return an empty string. -
test_ch5_builder_chain: Verifies that.tool().tool()chaining compiles – a compile-time check for the builder pattern. -
test_ch5_tool_error_propagates: Provider requests areadon a file that does not exist. The error should be caught and sent back as a tool result. The LLM then responds with text. Verifies the loop does not crash on tool failures.
There are also additional edge-case tests (three-step loops, multi-tool pipelines, etc.) that will pass once your core implementation is correct.
Seeing it all work
Once the tests pass, take a moment to appreciate what you have built. With
about 30 lines of code in run(), you have a working agent loop. Here is what
happens when a test runs agent.run("Read test.txt"):
- Messages:
[User("Read test.txt")] - Provider returns: tool call for
readwith{"path": "test.txt"} - Agent calls
ReadTool::call(), gets file contents - Messages:
[User("Read test.txt"), Assistant(tool_call), ToolResult("file content")] - Provider returns: text response
- Agent returns the text
The mock provider makes this deterministic and testable. But the exact same
loop works with a real LLM provider – you just swap MockProvider for
OpenRouterProvider.
Recap
The agent loop is the core of the framework:
- Generics (
<P: Provider>) let it work with any provider. ToolSet(a HashMap ofBox<dyn Tool>) gives O(1) tool lookup by name.- The builder pattern makes setup ergonomic.
- Error resilience – tool errors are caught and sent back to the LLM, not propagated. The loop never crashes from a tool failure.
- The loop is simple: call provider, match on
stop_reason, execute tools, feed results back, repeat.
What’s next
Your agent works, but only with the mock provider. In
Chapter 6: The OpenRouter Provider you will implement
OpenRouterProvider, which talks to a real LLM API over HTTP. This is what
turns your agent from a testing harness into a real, usable tool.
Chapter 6: The OpenRouter Provider
Up to now, everything has run locally with the MockProvider. In this chapter
you will implement OpenRouterProvider – a provider that talks to a real LLM
over HTTP using the OpenAI-compatible chat completions API.
This is the chapter that makes your agent real.
Goal
Implement OpenRouterProvider so that:
- It can be created with an API key and model name.
- It converts our internal
MessageandToolDefinitiontypes to the API format. - It sends HTTP POST requests to the chat completions endpoint.
- It parses responses back into
AssistantTurn.
Key Rust concepts
Serde derives and attributes
The API types in openrouter.rs are already provided – you do not need to
modify them. But understanding them helps:
#![allow(unused)]
fn main() {
#[derive(Serialize, Deserialize, Clone, Debug)]
pub(crate) struct ApiToolCall {
pub(crate) id: String,
#[serde(rename = "type")]
pub(crate) type_: String,
pub(crate) function: ApiFunction,
}
}
Key serde attributes used:
-
#[serde(rename = "type")]– The JSON field is called"type", buttypeis a reserved keyword in Rust. So the struct field istype_and serde renames it during serialization/deserialization. -
#[serde(skip_serializing_if = "Option::is_none")]– Omits the field from JSON if the value isNone. This is important because the API expects certain fields to be absent (notnull) when unused. -
#[serde(skip_serializing_if = "Vec::is_empty")]– Same idea for empty vectors. If there are no tools, we omit thetoolsfield entirely.
The reqwest HTTP client
reqwest is the standard HTTP client crate in Rust. The pattern:
#![allow(unused)]
fn main() {
let response: MyType = client
.post(url)
.bearer_auth(&api_key)
.json(&body) // serialize body as JSON
.send()
.await
.context("request failed")?
.error_for_status() // turn 4xx/5xx into errors
.context("API returned error status")?
.json() // deserialize response as JSON
.await
.context("failed to parse response")?;
}
Each method returns a builder or future that you chain together. The ?
operator propagates errors at each step.
impl Into<String>
Several methods use impl Into<String> as a parameter type:
#![allow(unused)]
fn main() {
pub fn new(api_key: impl Into<String>, model: impl Into<String>) -> Self
}
This accepts anything that can be converted into a String: String, &str,
Cow<str>, etc. Inside the method, call .into() to get the String:
#![allow(unused)]
fn main() {
api_key: api_key.into(),
model: model.into(),
}
dotenvy
The dotenvy crate loads environment variables from a .env file:
#![allow(unused)]
fn main() {
let _ = dotenvy::dotenv(); // loads .env if present, ignores errors
let key = std::env::var("OPENROUTER_API_KEY")?;
}
The let _ = discards the result because it is fine if .env does not exist
(the variable might already be in the environment).
The API types
The file mini-claw-code-starter/src/providers/openrouter.rs starts with a block
of serde structs. These represent the OpenAI-compatible chat completions API
format. Here is a quick summary:
Request types:
ChatRequest– the POST body: model name, messages, toolsApiMessage– a single message with role, content, optional tool callsApiTool/ApiToolDef– tool definition in API format
Response types:
ChatResponse– the API response: a list of choicesChoice– a single choice containing a message and afinish_reasonResponseMessage– the assistant’s response: optional content, optional tool calls
The finish_reason field on Choice tells you why the model stopped
generating. Map it to StopReason in your chat() implementation:
"tool_calls" becomes StopReason::ToolUse, anything else becomes
StopReason::Stop.
These are already complete. Your job is to implement the methods that use them.
The implementation
Step 1: Implement new()
Initialize all four fields:
#![allow(unused)]
fn main() {
pub fn new(api_key: impl Into<String>, model: impl Into<String>) -> Self {
Self {
client: reqwest::Client::new(),
api_key: api_key.into(),
model: model.into(),
base_url: "https://openrouter.ai/api/v1".into(),
}
}
}
Step 2: Implement base_url()
A simple builder method that overrides the base URL:
#![allow(unused)]
fn main() {
pub fn base_url(mut self, url: impl Into<String>) -> Self {
self.base_url = url.into();
self
}
}
Step 3: Implement from_env_with_model()
- Load
.envwithdotenvy::dotenv()(ignore the result). - Read
OPENROUTER_API_KEYfrom the environment. - Call
Self::new()with the key and model.
Use std::env::var("OPENROUTER_API_KEY") and chain .context(...) for a
clear error message if the key is missing.
Step 4: Implement from_env()
This is a one-liner that calls from_env_with_model with the default model
"openrouter/free". This is a free model on OpenRouter – no credits needed
to get started.
Step 5: Implement convert_messages()
This method translates our Message enum into the API’s ApiMessage format.
Iterate over the messages and match on each variant:
-
Message::System(text)becomes anApiMessagewith role"system"andcontent: Some(text.clone()). The other fields areNone. -
Message::User(text)becomes anApiMessagewith role"user"andcontent: Some(text.clone()). The other fields areNone. -
Message::Assistant(turn)becomes anApiMessagewith role"assistant". Setcontenttoturn.text.clone(). Ifturn.tool_callsis non-empty, convert eachToolCallto anApiToolCall:#![allow(unused)] fn main() { ApiToolCall { id: c.id.clone(), type_: "function".into(), function: ApiFunction { name: c.name.clone(), arguments: c.arguments.to_string(), // Value -> String }, } }If
tool_callsis empty, settool_calls: None(notSome(vec![])). -
Message::ToolResult { id, content }becomes anApiMessagewith role"tool",content: Some(content.clone()), andtool_call_id: Some(id.clone()).
Step 6: Implement convert_tools()
Map each &ToolDefinition to an ApiTool:
#![allow(unused)]
fn main() {
ApiTool {
type_: "function",
function: ApiToolDef {
name: t.name,
description: t.description,
parameters: t.parameters.clone(),
},
}
}
Step 7: Implement chat()
This is the main method. It brings everything together:
- Build a
ChatRequestwith the model, converted messages, and converted tools. - POST it to
{base_url}/chat/completionswith bearer auth. - Parse the response as
ChatResponse. - Extract the first choice.
- Convert
tool_callsback to ourToolCalltype.
The tool call conversion is the trickiest part. The API returns
function.arguments as a string (JSON-encoded), but our ToolCall stores
it as a serde_json::Value. So you need to parse it:
#![allow(unused)]
fn main() {
let arguments = serde_json::from_str(&tc.function.arguments)
.unwrap_or(Value::Null);
}
The unwrap_or(Value::Null) handles the case where the arguments string is
not valid JSON (unlikely with a well-behaved API, but good to be safe).
Here is the skeleton for the chat() method:
#![allow(unused)]
fn main() {
async fn chat(
&self,
messages: &[Message],
tools: &[&ToolDefinition],
) -> anyhow::Result<AssistantTurn> {
let body = ChatRequest {
model: &self.model,
messages: Self::convert_messages(messages),
tools: Self::convert_tools(tools),
};
let response: ChatResponse = self.client
.post(format!("{}/chat/completions", self.base_url))
// ... bearer_auth, json, send, error_for_status, json ...
;
let choice = response.choices.into_iter().next()
.context("no choices in response")?;
// Convert choice.message.tool_calls to Vec<ToolCall>
// Map finish_reason to StopReason
// Return AssistantTurn { text, tool_calls, stop_reason }
todo!()
}
}
Fill in the HTTP call chain and the response conversion logic.
Running the tests
Run the Chapter 6 tests:
cargo test -p mini-claw-code-starter ch6
The Chapter 6 tests verify the conversion methods (convert_messages and
convert_tools), the constructor logic, and the full chat() method using a
local mock HTTP server. They do not call a real LLM API, so no API key is
needed. There are also additional edge-case tests that will pass once your core
implementation is correct.
Optional: Live test
If you want to test with a real API, set up an OpenRouter API key:
- Sign up at openrouter.ai.
- Create an API key.
- Create a
.envfile in the workspace root:
OPENROUTER_API_KEY=sk-or-v1-your-key-here
Then try building and running the chat example from Chapter 7. But first, finish reading this chapter and move on to Chapter 7 where you wire everything up.
Recap
You have implemented a real HTTP provider that:
- Constructs from an API key and model name (or from environment variables).
- Converts between your internal types and the OpenAI-compatible API format.
- Sends HTTP requests and parses responses.
The key patterns:
- Serde attributes for JSON field mapping (
rename,skip_serializing_if). reqwestfor HTTP with a fluent builder API.impl Into<String>for flexible string parameters.dotenvyfor loading.envfiles.
Your agent framework is now complete. Every piece – tools, the agent loop, and the HTTP provider – is implemented and tested.
What’s next
In Chapter 7: A Simple CLI you will wire everything into an interactive CLI with conversation memory.
Chapter 7: A Simple CLI
You have built every component: a mock provider for testing, four tools, the agent loop, and an HTTP provider. Now it is time to wire them all into a working CLI.
Goal
Add a chat() method to SimpleAgent and write examples/chat.rs so that:
- The agent remembers the conversation – each prompt builds on the previous ones.
- It prints
>, reads a line, runs the agent, and prints the result. - It shows a
thinking...indicator while the agent works. - It keeps running until the user presses Ctrl+D (EOF).
The chat() method
Open mini-claw-code-starter/src/agent.rs. Below run() you will see the chat()
method signature.
Why a new method?
run() creates a fresh Vec<Message> each time it is called. That means the
LLM has no memory of previous exchanges. A real CLI should carry context
forward, so the LLM can say “I already read that file” or “as I mentioned
earlier.”
chat() solves this by accepting the message history from the caller:
#![allow(unused)]
fn main() {
pub async fn chat(&self, messages: &mut Vec<Message>) -> anyhow::Result<String>
}
The caller pushes Message::User(…) before calling, and chat() appends the
assistant turns. When it returns, messages contains the full conversation
history ready for the next round.
The implementation
The loop body is identical to run(). The only differences are:
- Use the provided
messagesinstead of creating a new vec. - On
StopReason::Stop, clone the text before pushingMessage::Assistant(turn)– the push movesturn, so you need the text first. - Push
Message::Assistant(turn)so the history includes the final response. - Return the cloned text.
#![allow(unused)]
fn main() {
pub async fn chat(&self, messages: &mut Vec<Message>) -> anyhow::Result<String> {
let defs = self.tools.definitions();
loop {
let turn = self.provider.chat(messages, &defs).await?;
match turn.stop_reason {
StopReason::Stop => {
let text = turn.text.clone().unwrap_or_default();
messages.push(Message::Assistant(turn));
return Ok(text);
}
StopReason::ToolUse => {
// Same tool execution as run() ...
}
}
}
}
}
The ToolUse branch is exactly the same as in run(): execute each tool,
collect results, push the assistant turn, push the tool results.
Ownership detail
In run() you could do return Ok(turn.text.unwrap_or_default()) directly
because the function was done with turn. In chat() you also need to push
Message::Assistant(turn) into the history. Since that push moves turn, you
must extract the text first:
#![allow(unused)]
fn main() {
let text = turn.text.clone().unwrap_or_default();
messages.push(Message::Assistant(turn)); // moves turn
return Ok(text); // return the clone
}
This is a one-line change from run(), but it matters.
The CLI
Open mini-claw-code-starter/examples/chat.rs. You will see a skeleton with
unimplemented!(). Replace it with the full program.
Step 1: Imports
#![allow(unused)]
fn main() {
use mini_claw_code_starter::{
BashTool, EditTool, Message, OpenRouterProvider, ReadTool, SimpleAgent, WriteTool,
};
use std::io::{self, BufRead, Write};
}
Note the Message import – you need it to build the history vector.
Step 2: Create the provider and agent
#![allow(unused)]
fn main() {
let provider = OpenRouterProvider::from_env()?;
let agent = SimpleAgent::new(provider)
.tool(BashTool::new())
.tool(ReadTool::new())
.tool(WriteTool::new())
.tool(EditTool::new());
}
Same as before – nothing new here. (In Chapter 11
you’ll add AskTool here so the agent can ask you clarifying questions.)
Step 3: The system prompt and history vector
#![allow(unused)]
fn main() {
let cwd = std::env::current_dir()?.display().to_string();
let mut history: Vec<Message> = vec![Message::System(format!(
"You are a coding agent. Help the user with software engineering tasks \
using all available tools. Be concise and precise.\n\n\
Working directory: {cwd}"
))];
}
The system prompt is the first message in the history. It tells the LLM what role it should play. Two things to note:
-
No tool names in the prompt. Tool definitions are sent separately to the API. The system prompt focuses on behavior – be a coding agent, use whatever tools are available, be concise.
-
Working directory is included. The LLM needs to know where it is so that tool calls like
readandbashuse correct paths. This is what real coding agents do – Claude Code, OpenCode, and Kimi CLI all inject the current directory (and sometimes platform, date, etc.) into their system prompts.
The history vector lives outside the loop and accumulates every user prompt, assistant response, and tool result across the entire session. The system prompt stays at the front, giving the LLM consistent instructions on every turn.
Step 4: The REPL loop
#![allow(unused)]
fn main() {
let stdin = io::stdin();
loop {
print!("> ");
io::stdout().flush()?;
let mut line = String::new();
if stdin.lock().read_line(&mut line)? == 0 {
println!();
break;
}
let prompt = line.trim();
if prompt.is_empty() {
continue;
}
history.push(Message::User(prompt.to_string()));
print!(" thinking...");
io::stdout().flush()?;
match agent.chat(&mut history).await {
Ok(text) => {
print!("\x1b[2K\r");
println!("{}\n", text.trim());
}
Err(e) => {
print!("\x1b[2K\r");
println!("error: {e}\n");
}
}
}
}
A few things to note:
history.push(Message::User(…))adds the prompt before calling the agent.chat()will append the rest.print!(" thinking...")shows a status while the agent works. Theflush()is needed becauseprint!(no newline) does not flush automatically.\x1b[2K\ris an ANSI escape sequence: “erase entire line, move cursor to column 1.” This clears thethinking...text before printing the response. It also gets cleared automatically when the agent prints a tool summary (sincetool_summary()uses the same escape).stdout.flush()?afterprint!ensures the prompt and thinking indicator appear immediately.read_linereturns0on EOF (Ctrl+D), which breaks the loop.- Errors from the agent are printed instead of crashing – this keeps the loop alive even if one request fails.
The main function
Wrap everything in an async main:
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Steps 1-4 go here
Ok(())
}
The complete program
Putting it all together, the entire program is about 45 lines. That is the beauty of the framework you built – the final assembly is straightforward because each component has a clean interface.
Running the full test suite
Run the full test suite:
cargo test -p mini-claw-code-starter
This runs all tests from chapters 1 through 7. If everything passes, congratulations – your agent framework is complete and fully tested.
What the tests verify
The Chapter 7 tests are integration tests that combine all components:
- Write-then-read flows: Write a file, read it back, verify contents.
- Edit flows: Write a file, edit it, read back the result.
- Multi-tool pipelines: Use bash, write, edit, and read across multiple turns.
- Long conversations: Five-step tool-call sequences.
There are about 10 integration tests that exercise the full agent pipeline.
Running the chat example
To try it with a real LLM, you need an API key. Create a .env file in the
workspace root:
OPENROUTER_API_KEY=sk-or-v1-your-key-here
Then run:
cargo run -p mini-claw-code-starter --example chat
You will get an interactive prompt. Try a multi-turn conversation:
> List the files in the current directory
thinking...
[bash: ls]
Cargo.toml src/ examples/ ...
> What is in Cargo.toml?
thinking...
[read: Cargo.toml]
The Cargo.toml contains the package definition for mini-claw-code-starter...
> Add a new dependency for serde
thinking...
[read: Cargo.toml]
[edit: Cargo.toml]
Done! I added serde to the dependencies.
>
Notice how the second prompt (“What is in Cargo.toml?”) works without repeating context – the LLM already knows the directory listing from the first exchange. That is conversation history at work.
Press Ctrl+D (or Ctrl+C) to exit.
What you have built
Let’s step back and look at the complete picture:
examples/chat.rs
|
| creates
v
SimpleAgent<OpenRouterProvider>
|
| holds
+---> OpenRouterProvider (HTTP to LLM API)
+---> ToolSet (HashMap<String, Box<dyn Tool>>)
|
+---> BashTool
+---> ReadTool
+---> WriteTool
+---> EditTool
The chat() method drives the interaction:
User prompt
|
v
history: [User, Assistant, ToolResult, ..., User]
|
v
Provider.chat() ---HTTP---> LLM API
|
| AssistantTurn
v
Tool calls? ----yes---> Execute tools ---> append to history ---> loop
|
no
|
v
Append final Assistant to history, return text
In about 300 lines of Rust across all files, you have:
- A trait-based tool system with JSON schema definitions.
- A generic agent loop that works with any provider.
- A mock provider for deterministic testing.
- An HTTP provider for real LLM APIs.
- A CLI with conversation memory that ties it all together.
Where to go from here
This framework is intentionally minimal. Here are ideas for extending it:
Streaming responses – Instead of waiting for the full response, stream
tokens as they arrive. This means changing chat() to return a Stream
instead of a single AssistantTurn.
Token limits – Track token usage and truncate old messages when the context window fills up.
More tools – Add a web search tool, a database query tool, or anything
else you can imagine. The Tool trait makes it easy to plug in new
capabilities.
A richer UI – Add a spinner animation, markdown rendering, or collapsed
tool call display. See mini-claw-code/examples/tui.rs for an example that does
all three using termimad.
The foundation you built is solid. Every extension is a matter of adding to the
existing patterns, not rewriting them. The Provider trait, the Tool trait,
and the agent loop are the building blocks for anything you want to build next.
What’s next
Head to Chapter 8: The Singularity – your agent can now modify its own source code, and we will talk about what that means and where to go from here.
Chapter 8: The Singularity
Your agent can edit itself and it starts self-evolving. You don’t need to write any code starting from now.
Extensions
The extension chapters that follow walk through the reference implementation. You don’t need to write the code yourself – read them to understand the design, then let your agent implement them (or do it yourself for practice):
- Chapter 9: A Better TUI – Markdown rendering, spinners, collapsed tool calls.
- Chapter 10: Streaming – Stream tokens as they arrive with
StreamingAgent. - Chapter 11: User Input – Let the LLM ask you clarifying questions.
- Chapter 12: Plan Mode – Read-only planning with approval gating.
Beyond the extension chapters, here are more ideas to explore:
- Parallel tool calls – Execute concurrent tool calls with
tokio::join!. - Token tracking – Truncate old messages when approaching the context limit.
- More tools – Web search, database queries, HTTP requests. The
Tooltrait makes it easy. - MCP – Expose your tools as an MCP server or connect to external ones.
Chapter 9: A Better TUI
The chat.rs CLI works, but it dumps plain text and shows every tool call. A
real coding agent deserves markdown rendering, a thinking spinner, and
collapsed tool calls when the agent gets busy.
See mini-claw-code/examples/tui.rs for a reference implementation. It uses:
termimadfor inline markdown rendering in the terminal.crosstermfor raw terminal mode (used by the arrow-key selection UI in Chapter 11).- An animated spinner (
⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏) that ticks while the agent thinks. - Collapsed tool calls: after 3 tool calls, subsequent ones are collapsed
into a
... and N morecounter to keep the output clean.
The TUI builds on the AgentEvent stream from StreamingAgent (Chapter 10).
The event loop uses tokio::select! to multiplex three sources:
- Agent events (
AgentEvent::TextDelta,ToolCall,Done,Error) – render streaming text, tool summaries, or final output. - User input requests from
AskTool(Chapter 11) – pause the spinner and show a text prompt or arrow-key selection list. - Timer ticks – advance the spinner animation.
This chapter is exposition only – no code to write. Read through
examples/tui.rs to see how the pieces fit together, or ask your mini-claw-code
agent to build a TUI for you.
Chapter 10: Streaming
In Chapter 6 you built OpenRouterProvider::chat(), which waits for the
entire response before returning. That works, but the user stares at a blank
screen until every token has been generated. Real coding agents print tokens as
they arrive – that is streaming.
This chapter adds streaming support and a StreamingAgent – the streaming
counterpart to SimpleAgent. You will:
- Define a
StreamEventenum that represents real-time deltas. - Build a
StreamAccumulatorthat collects deltas into a completeAssistantTurn. - Write a
parse_sse_line()function that converts raw Server-Sent Events intoStreamEvents. - Define a
StreamProvidertrait – the streaming counterpart toProvider. - Implement
StreamProviderforOpenRouterProvider. - Build a
MockStreamProviderfor testing without HTTP. - Build
StreamingAgent<P: StreamProvider>– a full agent loop with real-time text streaming.
None of this touches the Provider trait or SimpleAgent. Streaming is
layered on top of the existing architecture.
Why streaming?
Without streaming, a long response (say 500 tokens) makes the CLI feel frozen. Streaming fixes three things:
- Immediate feedback – the user sees the first word within milliseconds instead of waiting seconds for the full response.
- Early cancellation – if the agent is heading in the wrong direction, the user can Ctrl-C without waiting for the full response.
- Progress visibility – watching tokens arrive confirms the agent is working, not stuck.
How SSE works
The OpenAI-compatible API supports streaming via
Server-Sent Events (SSE).
You set "stream": true in the request, and instead of one big JSON response,
the server sends a series of text lines:
data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"delta":{"content":" world"},"finish_reason":null}]}
data: {"choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Each line starts with data: followed by a JSON object (or the sentinel
[DONE]). The key difference from the non-streaming response: instead of a
message field with the complete text, each chunk has a delta field with
just the new part. Your code reads these deltas one by one, prints them
immediately, and accumulates them into the final result.
Here is the flow:
sequenceDiagram
participant A as Agent
participant L as LLM (SSE)
participant U as User
A->>L: POST /chat/completions (stream: true)
L-->>A: data: {"delta":{"content":"Hello"}}
A->>U: print "Hello"
L-->>A: data: {"delta":{"content":" world"}}
A->>U: print " world"
L-->>A: data: [DONE]
A->>U: (done)
Tool calls stream the same way, but with tool_calls deltas instead of
content deltas. The tool call’s name and arguments arrive in pieces that you
concatenate.
StreamEvent
Open mini-claw-code/src/streaming.rs. The StreamEvent enum is our domain type
for streaming deltas:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq)]
pub enum StreamEvent {
/// A chunk of assistant text.
TextDelta(String),
/// A new tool call has started.
ToolCallStart { index: usize, id: String, name: String },
/// More argument JSON for a tool call in progress.
ToolCallDelta { index: usize, arguments: String },
/// The stream is complete.
Done,
}
}
This is the interface between the SSE parser and the rest of the application.
The parser produces StreamEvents; the UI consumes them for display; the
accumulator collects them into an AssistantTurn.
StreamAccumulator
The accumulator is a simple state machine. It keeps a running text buffer
and a list of partial tool calls. Each feed() call appends to the
appropriate place:
#![allow(unused)]
fn main() {
pub struct StreamAccumulator {
text: String,
tool_calls: Vec<PartialToolCall>,
}
impl StreamAccumulator {
pub fn new() -> Self { /* ... */ }
pub fn feed(&mut self, event: &StreamEvent) { /* ... */ }
pub fn finish(self) -> AssistantTurn { /* ... */ }
}
}
The implementation is straightforward:
TextDelta→ append toself.text.ToolCallStart→ grow thetool_callsvec if needed, set theidandnameat the given index.ToolCallDelta→ append to the arguments string at the given index.Done→ no-op (we handle completion infinish()).
finish() consumes the accumulator and builds an AssistantTurn:
#![allow(unused)]
fn main() {
pub fn finish(self) -> AssistantTurn {
let text = if self.text.is_empty() { None } else { Some(self.text) };
let tool_calls: Vec<ToolCall> = self.tool_calls
.into_iter()
.filter(|tc| !tc.name.is_empty())
.map(|tc| ToolCall {
id: tc.id,
name: tc.name,
arguments: serde_json::from_str(&tc.arguments)
.unwrap_or(Value::Null),
})
.collect();
let stop_reason = if tool_calls.is_empty() {
StopReason::Stop
} else {
StopReason::ToolUse
};
AssistantTurn { text, tool_calls, stop_reason }
}
}
Notice that arguments is accumulated as a raw string and only parsed as JSON
at the very end. This is because the API sends argument fragments like
{"pa and th": "f.txt"} – they are not valid JSON until concatenated.
Parsing SSE lines
The parse_sse_line() function takes a single line from the SSE stream and
returns zero or more StreamEvents:
#![allow(unused)]
fn main() {
pub fn parse_sse_line(line: &str) -> Option<Vec<StreamEvent>> {
let data = line.strip_prefix("data: ")?;
if data == "[DONE]" {
return Some(vec![StreamEvent::Done]);
}
let chunk: ChunkResponse = serde_json::from_str(data).ok()?;
// ... extract events from chunk.choices[0].delta
}
}
The SSE chunk types mirror the OpenAI delta format:
#![allow(unused)]
fn main() {
#[derive(Deserialize)]
struct ChunkResponse { choices: Vec<ChunkChoice> }
#[derive(Deserialize)]
struct ChunkChoice { delta: Delta, finish_reason: Option<String> }
#[derive(Deserialize)]
struct Delta {
content: Option<String>,
tool_calls: Option<Vec<DeltaToolCall>>,
}
}
For tool calls, the first chunk includes id and function.name (indicating
a new tool call). Subsequent chunks only have function.arguments fragments.
The parser emits ToolCallStart when id is present, and ToolCallDelta for
non-empty argument strings.
StreamProvider trait
Just as Provider defines the non-streaming interface, StreamProvider
defines the streaming one:
#![allow(unused)]
fn main() {
pub trait StreamProvider: Send + Sync {
fn stream_chat<'a>(
&'a self,
messages: &'a [Message],
tools: &'a [&'a ToolDefinition],
tx: mpsc::UnboundedSender<StreamEvent>,
) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a;
}
}
The key difference from Provider::chat() is the tx parameter – an mpsc
channel sender. The implementation sends StreamEvents through this channel
as they arrive and returns the final accumulated AssistantTurn. This gives
callers both real-time events and the complete result.
We keep StreamProvider separate from Provider rather than adding a method
to the existing trait. This means SimpleAgent and all existing code are
completely unaffected.
Implementing StreamProvider for OpenRouterProvider
The implementation ties together SSE parsing, the accumulator, and the channel:
#![allow(unused)]
fn main() {
impl StreamProvider for OpenRouterProvider {
async fn stream_chat(
&self,
messages: &[Message],
tools: &[&ToolDefinition],
tx: mpsc::UnboundedSender<StreamEvent>,
) -> anyhow::Result<AssistantTurn> {
// 1. Build request with stream: true
// 2. Send HTTP request
// 3. Read response chunks in a loop:
// - Buffer incoming bytes
// - Split on newlines
// - parse_sse_line() each complete line
// - feed() each event into the accumulator
// - send each event through tx
// 4. Return acc.finish()
}
}
}
The buffering detail is important. HTTP responses may arrive in arbitrary byte
chunks that do not align with SSE line boundaries. So we maintain a String
buffer, append each chunk, and process only complete lines (splitting on \n):
#![allow(unused)]
fn main() {
let mut buffer = String::new();
while let Some(chunk) = resp.chunk().await? {
buffer.push_str(&String::from_utf8_lossy(&chunk));
while let Some(newline_pos) = buffer.find('\n') {
let line = buffer[..newline_pos].trim_end_matches('\r').to_string();
buffer = buffer[newline_pos + 1..].to_string();
if line.is_empty() { continue; }
if let Some(events) = parse_sse_line(&line) {
for event in events {
acc.feed(&event);
let _ = tx.send(event);
}
}
}
}
}
MockStreamProvider
For testing, we need a streaming provider that does not make HTTP calls.
MockStreamProvider wraps the existing MockProvider and synthesizes
StreamEvents from each canned AssistantTurn:
#![allow(unused)]
fn main() {
pub struct MockStreamProvider {
inner: MockProvider,
}
impl StreamProvider for MockStreamProvider {
async fn stream_chat(
&self,
messages: &[Message],
tools: &[&ToolDefinition],
tx: mpsc::UnboundedSender<StreamEvent>,
) -> anyhow::Result<AssistantTurn> {
let turn = self.inner.chat(messages, tools).await?;
// Synthesize stream events from the complete turn
if let Some(ref text) = turn.text {
for ch in text.chars() {
let _ = tx.send(StreamEvent::TextDelta(ch.to_string()));
}
}
for (i, call) in turn.tool_calls.iter().enumerate() {
let _ = tx.send(StreamEvent::ToolCallStart {
index: i, id: call.id.clone(), name: call.name.clone(),
});
let _ = tx.send(StreamEvent::ToolCallDelta {
index: i, arguments: call.arguments.to_string(),
});
}
let _ = tx.send(StreamEvent::Done);
Ok(turn)
}
}
}
It sends text one character at a time (simulating token-by-token streaming)
and each tool call as a start + delta pair. This lets us test StreamingAgent
without any network calls.
StreamingAgent
Now for the main event. StreamingAgent is the streaming counterpart to
SimpleAgent. It has the same structure – a provider, a tool set, and an
agent loop – but it uses StreamProvider and emits AgentEvent::TextDelta
events in real time:
#![allow(unused)]
fn main() {
pub struct StreamingAgent<P: StreamProvider> {
provider: P,
tools: ToolSet,
}
impl<P: StreamProvider> StreamingAgent<P> {
pub fn new(provider: P) -> Self { /* ... */ }
pub fn tool(mut self, t: impl Tool + 'static) -> Self { /* ... */ }
pub async fn run(
&self,
prompt: &str,
events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> { /* ... */ }
pub async fn chat(
&self,
messages: &mut Vec<Message>,
events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> { /* ... */ }
}
}
The chat() method is the heart of the streaming agent. Let us walk through
it:
#![allow(unused)]
fn main() {
pub async fn chat(
&self,
messages: &mut Vec<Message>,
events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
let defs = self.tools.definitions();
loop {
// 1. Set up a stream channel
let (stream_tx, mut stream_rx) = mpsc::unbounded_channel();
// 2. Spawn a forwarder that converts StreamEvent::TextDelta
// into AgentEvent::TextDelta for the UI
let events_clone = events.clone();
let forwarder = tokio::spawn(async move {
while let Some(event) = stream_rx.recv().await {
if let StreamEvent::TextDelta(text) = event {
let _ = events_clone.send(AgentEvent::TextDelta(text));
}
}
});
// 3. Call stream_chat — this streams AND returns the turn
let turn = self.provider.stream_chat(messages, &defs, stream_tx).await?;
let _ = forwarder.await;
// 4. Same stop_reason logic as SimpleAgent
match turn.stop_reason {
StopReason::Stop => {
let text = turn.text.clone().unwrap_or_default();
let _ = events.send(AgentEvent::Done(text.clone()));
messages.push(Message::Assistant(turn));
return Ok(text);
}
StopReason::ToolUse => {
// Execute tools, push results, continue loop
// (same pattern as SimpleAgent)
}
}
}
}
}
The architecture has two channels flowing simultaneously:
flowchart LR
SC["stream_chat()"] -- "StreamEvent" --> CH["mpsc channel"]
CH --> FW["forwarder task"]
FW -- "AgentEvent::TextDelta" --> UI["UI / events channel"]
SC -- "feeds" --> ACC["StreamAccumulator"]
ACC -- "finish()" --> TURN["AssistantTurn"]
TURN --> LOOP["Agent loop"]
The forwarder task is a bridge: it receives raw StreamEvents from the
provider and converts TextDelta events into AgentEvent::TextDelta for the
UI. This keeps the provider’s streaming protocol separate from the agent’s
event protocol.
Notice that AgentEvent now has a TextDelta variant:
#![allow(unused)]
fn main() {
pub enum AgentEvent {
TextDelta(String), // NEW — streaming text chunks
ToolCall { name: String, summary: String },
Done(String),
Error(String),
}
}
Using StreamingAgent in the TUI
The TUI example (examples/tui.rs) uses StreamingAgent for the full
experience:
#![allow(unused)]
fn main() {
let provider = OpenRouterProvider::from_env()?;
let agent = Arc::new(
StreamingAgent::new(provider)
.tool(BashTool::new())
.tool(ReadTool::new())
.tool(WriteTool::new())
.tool(EditTool::new()),
);
}
The agent is wrapped in Arc so it can be shared with spawned tasks. Each
turn spawns the agent and processes events with a spinner:
#![allow(unused)]
fn main() {
let (tx, mut rx) = mpsc::unbounded_channel();
let agent = agent.clone();
let mut msgs = std::mem::take(&mut history);
let handle = tokio::spawn(async move {
let _ = agent.chat(&mut msgs, tx).await;
msgs
});
// UI event loop — print TextDeltas, show spinner for tool calls
loop {
tokio::select! {
event = rx.recv() => {
match event {
Some(AgentEvent::TextDelta(text)) => print!("{text}"),
Some(AgentEvent::ToolCall { summary, .. }) => { /* spinner */ },
Some(AgentEvent::Done(_)) => break,
// ...
}
}
_ = tick.tick() => { /* animate spinner */ }
}
}
}
Compare this to the SimpleAgent version from Chapter 9: the structure is
almost identical. The only difference is that TextDelta events let us print
tokens as they arrive instead of waiting for the full Done event.
Running the tests
cargo test -p mini-claw-code ch10
The tests verify:
- Accumulator: text assembly, tool call assembly, mixed events, empty input, multiple parallel tool calls.
- SSE parsing: text deltas, tool call start/delta,
[DONE], non-data lines, empty deltas, invalid JSON, full multi-line sequences. - MockStreamProvider: text responses synthesize char-by-char events; tool call responses synthesize start + delta events.
- StreamingAgent: text-only responses, tool call loops, and multi-turn chat
history – all using
MockStreamProviderfor deterministic testing. - Integration: mock TCP servers that send real SSE responses to
stream_chat()and verify both the returnedAssistantTurnand the events sent through the channel.
Recap
StreamEventrepresents real-time deltas: text chunks, tool call starts, argument fragments, and completion.StreamAccumulatorcollects deltas into a completeAssistantTurn.parse_sse_line()converts raw SSEdata:lines intoStreamEvents.StreamProvideris the streaming counterpart toProvider– it adds anmpscchannel parameter for real-time events.MockStreamProviderwrapsMockProviderto synthesize streaming events for testing.StreamingAgentis the streaming counterpart toSimpleAgent– same tool loop, but with real-timeTextDeltaevents forwarded to the UI.- The
Providertrait andSimpleAgentare unchanged. Streaming is an additive feature layered on top.
Chapter 11: User Input
Your agent can read files, run commands, and write code – but it can’t ask you a question. If it’s unsure which approach to take, which file to target, or whether to proceed with a destructive operation, it just guesses.
Real coding agents solve this with an ask tool. Claude Code has
AskUserQuestion, Kimi CLI has approval prompts. The LLM calls a special tool,
the agent pauses, and the user types an answer. The answer goes back as a tool
result and execution continues.
In this chapter you’ll build:
- An
InputHandlertrait that abstracts how user input is collected. - An
AskToolthat the LLM calls to ask the user a question. - Three handler implementations: CLI, channel-based (for TUI), and mock (for tests).
Why a trait?
Different UIs collect input differently:
- A CLI app prints to stdout and reads from stdin.
- A TUI app sends a request through a channel and waits for the event loop to collect the answer (maybe with arrow-key selection).
- Tests need to provide canned answers without any I/O.
The InputHandler trait lets AskTool work with all three without knowing
which one it’s using:
#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait InputHandler: Send + Sync {
async fn ask(&self, question: &str, options: &[String]) -> anyhow::Result<String>;
}
}
The question is what the LLM wants to ask. The options slice is an optional
list of choices – if empty, the user types free-text. If non-empty, the UI can
present a selection list.
AskTool
AskTool implements the Tool trait. It takes an Arc<dyn InputHandler> so
the handler can be shared across threads:
#![allow(unused)]
fn main() {
pub struct AskTool {
definition: ToolDefinition,
handler: Arc<dyn InputHandler>,
}
}
Tool definition
The LLM needs to know what parameters the tool accepts. question is required
(a string). options is optional (an array of strings).
For options, we need a JSON schema for an array type – something param()
can’t express since it only handles scalar types. So first, add param_raw()
to ToolDefinition:
#![allow(unused)]
fn main() {
/// Add a parameter with a raw JSON schema value.
///
/// Use this for complex types (arrays, nested objects) that `param()` can't express.
pub fn param_raw(mut self, name: &str, schema: Value, required: bool) -> Self {
self.parameters["properties"][name] = schema;
if required {
self.parameters["required"]
.as_array_mut()
.unwrap()
.push(serde_json::Value::String(name.to_string()));
}
self
}
}
Now the tool definition uses both param() and param_raw():
#![allow(unused)]
fn main() {
impl AskTool {
pub fn new(handler: Arc<dyn InputHandler>) -> Self {
Self {
definition: ToolDefinition::new(
"ask_user",
"Ask the user a clarifying question...",
)
.param("question", "string", "The question to ask the user", true)
.param_raw(
"options",
json!({
"type": "array",
"items": { "type": "string" },
"description": "Optional list of choices to present to the user"
}),
false,
),
handler,
}
}
}
}
Tool::call
The call implementation extracts question, parses options with a helper,
and delegates to the handler:
#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl Tool for AskTool {
fn definition(&self) -> &ToolDefinition {
&self.definition
}
async fn call(&self, args: Value) -> anyhow::Result<String> {
let question = args
.get("question")
.and_then(|v| v.as_str())
.ok_or_else(|| anyhow::anyhow!("missing required parameter: question"))?;
let options = parse_options(&args);
self.handler.ask(question, &options).await
}
}
/// Extract the optional `options` array from tool arguments.
fn parse_options(args: &Value) -> Vec<String> {
args.get("options")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(String::from))
.collect()
})
.unwrap_or_default()
}
}
The parse_options helper keeps call() focused on the happy path. If
options is missing or not an array, it defaults to an empty vec – the
handler treats this as free-text input.
Three handlers
CliInputHandler
The simplest handler. Prints the question, lists numbered choices (if any), reads a line from stdin, and resolves numbered answers:
#![allow(unused)]
fn main() {
pub struct CliInputHandler;
#[async_trait::async_trait]
impl InputHandler for CliInputHandler {
async fn ask(&self, question: &str, options: &[String]) -> anyhow::Result<String> {
let question = question.to_string();
let options = options.to_vec();
// spawn_blocking because stdin is synchronous
tokio::task::spawn_blocking(move || {
// Display the question and numbered choices (if any)
println!("\n {question}");
for (i, opt) in options.iter().enumerate() {
println!(" {}) {opt}", i + 1);
}
// Read the answer
print!(" > ");
io::stdout().flush()?;
let mut line = String::new();
io::stdin().lock().read_line(&mut line)?;
let answer = line.trim().to_string();
// If the user typed a valid option number, resolve it
Ok(resolve_option(&answer, &options))
}).await?
}
}
/// If `answer` is a number matching one of the options, return that option.
/// Otherwise return the raw answer.
fn resolve_option(answer: &str, options: &[String]) -> String {
if let Ok(n) = answer.parse::<usize>()
&& n >= 1
&& n <= options.len()
{
return options[n - 1].clone();
}
answer.to_string()
}
}
The resolve_option helper keeps the closure body clean. It uses let-chain
syntax (stabilized in Rust 1.87 / edition 2024): multiple conditions joined
with && including let Ok(n) = ... pattern bindings. If the user types "2"
and there are three options, it resolves to options[1]. Otherwise the raw text
is returned.
Note the for loop over options does nothing when the slice is empty – no
special if branch needed.
Use this in simple CLI apps like examples/chat.rs:
#![allow(unused)]
fn main() {
let agent = SimpleAgent::new(provider)
.tool(BashTool::new())
.tool(ReadTool::new())
.tool(WriteTool::new())
.tool(EditTool::new())
.tool(AskTool::new(Arc::new(CliInputHandler)));
}
ChannelInputHandler
For TUI apps, input collection happens in the event loop, not in the tool. The
ChannelInputHandler bridges the gap with a channel:
#![allow(unused)]
fn main() {
pub struct UserInputRequest {
pub question: String,
pub options: Vec<String>,
pub response_tx: oneshot::Sender<String>,
}
pub struct ChannelInputHandler {
tx: mpsc::UnboundedSender<UserInputRequest>,
}
}
When ask() is called, it sends a UserInputRequest through the channel and
awaits the oneshot response:
#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl InputHandler for ChannelInputHandler {
async fn ask(&self, question: &str, options: &[String]) -> anyhow::Result<String> {
let (response_tx, response_rx) = oneshot::channel();
self.tx.send(UserInputRequest {
question: question.to_string(),
options: options.to_vec(),
response_tx,
})?;
Ok(response_rx.await?)
}
}
}
The TUI event loop receives the request and renders it however it likes –
a simple text prompt, or an arrow-key-navigable selection list using
crossterm in raw terminal mode.
MockInputHandler
For tests, pre-configure answers in a queue:
#![allow(unused)]
fn main() {
pub struct MockInputHandler {
answers: Mutex<VecDeque<String>>,
}
#[async_trait::async_trait]
impl InputHandler for MockInputHandler {
async fn ask(&self, _question: &str, _options: &[String]) -> anyhow::Result<String> {
self.answers.lock().await.pop_front()
.ok_or_else(|| anyhow::anyhow!("MockInputHandler: no more answers"))
}
}
}
This follows the same pattern as MockProvider – pop from the front, error
when empty. Note that this uses tokio::sync::Mutex (with .lock().await),
not std::sync::Mutex. The reason: ask() is an async fn, and the lock
guard must be held across the .await boundary. A std::sync::Mutex guard is
!Send, so holding it across .await won’t compile. tokio::sync::Mutex
produces a Send-safe guard that works in async contexts. Compare this with
MockProvider from Chapter 1, which uses std::sync::Mutex because its
chat() method doesn’t hold the guard across an .await.
Tool summary
Update tool_summary() in agent.rs to display "question" for ask_user
calls in the terminal output:
#![allow(unused)]
fn main() {
let detail = call.arguments
.get("command")
.or_else(|| call.arguments.get("path"))
.or_else(|| call.arguments.get("question")) // <-- new
.and_then(|v| v.as_str());
}
Plan mode integration
ask_user is read-only – it collects information without mutating anything.
Add it to PlanAgent’s default read_only set (see
Chapter 12) so the LLM can ask questions during
planning:
#![allow(unused)]
fn main() {
read_only: HashSet::from(["bash", "read", "ask_user"]),
}
Wiring it up
Add the module to mini-claw-code/src/tools/mod.rs:
#![allow(unused)]
fn main() {
mod ask;
pub use ask::*;
}
And re-export from lib.rs:
#![allow(unused)]
fn main() {
pub use tools::{
AskTool, BashTool, ChannelInputHandler, CliInputHandler,
EditTool, InputHandler, MockInputHandler, ReadTool,
UserInputRequest, WriteTool,
};
}
Running the tests
cargo test -p mini-claw-code ch11
The tests verify:
- Tool definition: schema has
question(required) andoptions(optional array). - Question only:
MockInputHandlerreturns answer for a question-only call. - With options: tool passes options to the handler correctly.
- Missing question: missing
questionargument returns an error. - Handler exhausted: empty
MockInputHandlerreturns an error. - Agent loop: LLM calls
ask_user, gets an answer, then returns final text. - Ask then tool:
ask_userfollowed by another tool call (e.g.read). - Multiple asks: two sequential
ask_usercalls with different answers. - Channel roundtrip:
ChannelInputHandlersends request and receives response via oneshot channel. - param_raw:
param_raw()adds array parameter toToolDefinitioncorrectly.
Recap
InputHandlertrait abstracts input collection across CLI, TUI, and tests.AskToollets the LLM pause execution and ask the user a question.param_raw()extendsToolDefinitionto support complex JSON schema types like arrays.- Three handlers:
CliInputHandlerfor simple apps,ChannelInputHandlerfor TUI apps,MockInputHandlerfor tests. - Plan mode:
ask_useris read-only by default, so it works during planning. - Purely additive: no changes to
SimpleAgent,StreamingAgent, or any existing tool.
Chapter 12: Plan Mode
Real coding agents can be dangerous. Give an LLM access to write, edit,
and bash and it might rewrite your config, delete a file, or run a
destructive command – all before you’ve had a chance to review what it’s doing.
Plan mode solves this with a two-phase workflow:
- Plan – the agent explores the codebase using read-only tools (
read,bash, andask_user). It cannot write, edit, or mutate anything. It returns a plan describing what it intends to do. - Execute – after the user reviews and approves the plan, the agent runs again with all tools available.
This is exactly how Claude Code’s plan mode works. In this chapter you’ll build
PlanAgent – a streaming agent with caller-driven approval gating.
You will:
- Build
PlanAgent<P: StreamProvider>withplan()andexecute()methods. - Inject a system prompt that tells the LLM it’s in planning mode.
- Add an
exit_plantool the LLM calls when its plan is ready. - Implement double defense: definition filtering and an execution guard.
- Let the caller drive the approval flow between phases.
Why plan mode?
Consider this scenario:
User: "Refactor auth.rs to use JWT instead of session cookies"
Agent (no plan mode):
→ calls write("auth.rs", ...) immediately
→ rewrites half your auth system
→ you didn't want that approach at all
With plan mode:
User: "Refactor auth.rs to use JWT instead of session cookies"
Agent (plan phase):
→ calls read("auth.rs") to understand current code
→ calls bash("grep -r 'session' src/") to find related files
→ calls exit_plan to submit its plan
→ "Plan: Replace SessionStore with JwtProvider in 3 files..."
User: "Looks good, go ahead."
Agent (execute phase):
→ calls write/edit with the approved changes
The key insight: the same agent loop works for both phases. The only difference is which tools are available.
Design
PlanAgent has the same shape as StreamingAgent – a provider, a ToolSet,
and an agent loop. Three additions make it a planning agent:
- A
HashSet<&'static str>recording which tools are allowed during planning. - A system prompt injected at the start of the planning phase.
- An
exit_plantool definition the LLM calls when its plan is ready.
#![allow(unused)]
fn main() {
pub struct PlanAgent<P: StreamProvider> {
provider: P,
tools: ToolSet,
read_only: HashSet<&'static str>,
plan_system_prompt: String,
exit_plan_def: ToolDefinition,
}
}
Two public methods drive the two phases:
plan()– injects the system prompt, runs the agent loop with only read-only tools andexit_planvisible.execute()– runs the agent loop with all tools visible.
Both delegate to a private run_loop() that takes an optional tool filter.
The builder
Construction follows the same builder pattern as SimpleAgent and
StreamingAgent:
#![allow(unused)]
fn main() {
impl<P: StreamProvider> PlanAgent<P> {
pub fn new(provider: P) -> Self {
Self {
provider,
tools: ToolSet::new(),
read_only: HashSet::from(["bash", "read", "ask_user"]),
plan_system_prompt: DEFAULT_PLAN_PROMPT.to_string(),
exit_plan_def: ToolDefinition::new(
"exit_plan",
"Signal that your plan is complete and ready for user review. \
Call this when you have finished exploring and are ready to \
present your plan.",
),
}
}
pub fn tool(mut self, t: impl Tool + 'static) -> Self {
self.tools.push(t);
self
}
pub fn read_only(mut self, names: &[&'static str]) -> Self {
self.read_only = names.iter().copied().collect();
self
}
pub fn plan_prompt(mut self, prompt: impl Into<String>) -> Self {
self.plan_system_prompt = prompt.into();
self
}
}
}
By default, bash, read, and ask_user are read-only. (Chapter 11 added
ask_user so the LLM can ask clarifying questions during planning.) The
.read_only() method lets callers override this – for example, to exclude
bash from planning if you want a stricter mode.
The .plan_prompt() method lets callers override the system prompt – useful
for specialized agents like security auditors or code reviewers.
System prompt
The LLM needs to know it’s in planning mode. Without this, it will try to accomplish the task with whatever tools it sees, rather than producing a deliberate plan.
plan() injects a system prompt at the start of the conversation:
#![allow(unused)]
fn main() {
const DEFAULT_PLAN_PROMPT: &str = "\
You are in PLANNING MODE. Explore the codebase using the available tools and \
create a plan. You can read files, run shell commands, and ask the user \
questions — but you CANNOT write, edit, or create files.\n\n\
When your plan is ready, call the `exit_plan` tool to submit it for review.";
}
The injection is conditional – if the caller already provided a System
message, plan() respects it:
#![allow(unused)]
fn main() {
pub async fn plan(
&self,
messages: &mut Vec<Message>,
events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
if !messages
.first()
.is_some_and(|m| matches!(m, Message::System(_)))
{
messages.insert(0, Message::System(self.plan_system_prompt.clone()));
}
self.run_loop(messages, Some(&self.read_only), events).await
}
}
This means:
- First call: no system message → inject the plan prompt.
- Re-plan call: system message already there → skip.
- Caller provided their own: caller’s system message → respect it.
This is how real agents work. Claude Code switches its system prompt when
entering plan mode. OpenCode uses entirely separate agent configurations with
different system prompts for plan vs build agents.
The exit_plan tool
Without exit_plan, the planning phase ends when the LLM returns
StopReason::Stop – the same way any conversation ends. This is ambiguous:
did the LLM finish planning, or did it just stop talking?
Real agents solve this with an explicit signal. Claude Code has ExitPlanMode.
OpenCode has exit_plan. The LLM calls the tool to say “my plan is ready for
review.”
In PlanAgent, exit_plan is a tool definition stored on the struct – not
registered in the ToolSet. This means:
- During plan:
exit_planis injected into the tool list alongside read-only tools. The LLM can see and call it. - During execute:
exit_planis not in the tool list. The LLM doesn’t know it exists.
When the agent loop sees an exit_plan call, it returns immediately with the
plan text (the LLM’s text from that turn):
#![allow(unused)]
fn main() {
// Handle exit_plan: signal plan completion
if allowed.is_some() && call.name == "exit_plan" {
results.push((call.id.clone(), "Plan submitted for review.".into()));
exit_plan = true;
continue;
}
}
After the tool-call loop, plan_text captures the LLM’s text from this turn
(the plan itself), and the turn is pushed onto the message history:
#![allow(unused)]
fn main() {
let plan_text = turn.text.clone().unwrap_or_default();
messages.push(Message::Assistant(turn));
}
If exit_plan was among the tool calls, we’re done:
#![allow(unused)]
fn main() {
if exit_plan {
let _ = events.send(AgentEvent::Done(plan_text.clone()));
return Ok(plan_text);
}
}
The planning phase now has two exit paths:
StopReason::Stop– LLM stops naturally (backward compatible).exit_plantool call – LLM explicitly signals plan completion.
Both work. The exit_plan path is better because it’s unambiguous.
Double defense
Tool filtering still uses two layers of protection:
Layer 1: Definition filtering
During plan(), only read-only tool definitions plus exit_plan are sent to
the LLM. The model literally cannot see write or edit in its tool list:
#![allow(unused)]
fn main() {
let all_defs = self.tools.definitions();
let defs: Vec<&ToolDefinition> = match allowed {
Some(names) => {
let mut filtered: Vec<&ToolDefinition> = all_defs
.into_iter()
.filter(|d| names.contains(d.name))
.collect();
filtered.push(&self.exit_plan_def);
filtered
}
None => all_defs,
};
}
During execute(), allowed is None, so all registered tools are sent –
and exit_plan is not included.
Layer 2: Execution guard
If the LLM somehow hallucinated a blocked tool call, the execution guard
catches it and returns an error ToolResult instead of executing the tool:
#![allow(unused)]
fn main() {
if let Some(names) = allowed
&& !names.contains(call.name.as_str())
{
results.push((
call.id.clone(),
format!(
"error: tool '{}' is not available in planning mode",
call.name
),
));
continue;
}
}
The error goes back to the LLM as a tool result, so it learns the tool is blocked and adjusts its behavior. The file is never touched.
The shared agent loop
Both plan() and execute() delegate to run_loop(). The only parameter
that differs is allowed:
#![allow(unused)]
fn main() {
pub async fn plan(
&self,
messages: &mut Vec<Message>,
events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
// System prompt injection (shown earlier)
self.run_loop(messages, Some(&self.read_only), events).await
}
pub async fn execute(
&self,
messages: &mut Vec<Message>,
events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
self.run_loop(messages, None, events).await
}
}
plan() passes Some(&self.read_only) to restrict tools. execute() passes
None to allow everything.
The run_loop itself is identical to StreamingAgent::chat() from Chapter 10,
with these additions:
- Tool definition filtering (read-only +
exit_planduring plan; all during execute). - The
exit_planhandler that breaks the loop when the LLM signals plan completion. - The execution guard for blocked tools.
Caller-driven approval flow
The approval flow lives entirely in the caller. PlanAgent does not ask for
approval – it just runs whichever phase is called. This keeps the agent
simple and lets the caller implement any approval UX they want.
Here is the typical flow:
#![allow(unused)]
fn main() {
let agent = PlanAgent::new(provider)
.tool(ReadTool::new())
.tool(WriteTool::new())
.tool(EditTool::new())
.tool(BashTool::new());
let mut messages = vec![Message::User("Refactor auth.rs".into())];
// Phase 1: Plan (read-only tools + exit_plan)
let (tx, _rx) = mpsc::unbounded_channel(); // consume _rx to handle streaming events
let plan = agent.plan(&mut messages, tx).await?;
println!("Plan: {plan}");
// Show plan to user, get approval
if user_approves() {
// Phase 2: Execute (all tools)
messages.push(Message::User("Approved. Execute the plan.".into()));
let (tx2, _rx2) = mpsc::unbounded_channel();
let result = agent.execute(&mut messages, tx2).await?;
println!("Result: {result}");
} else {
// Re-plan with feedback
messages.push(Message::User("No, try a different approach.".into()));
let (tx3, _rx3) = mpsc::unbounded_channel();
let revised_plan = agent.plan(&mut messages, tx3).await?;
println!("Revised plan: {revised_plan}");
}
}
Notice how the same messages vec is shared across phases. This is critical –
the LLM sees its own plan, the user’s approval (or rejection), and all
previous context when it enters the execute phase. Re-planning is just
pushing feedback as a User message and calling plan() again.
sequenceDiagram
participant C as Caller
participant P as PlanAgent
participant L as LLM
C->>P: plan(&mut messages)
P->>L: [read, bash, ask_user, exit_plan tools only]
L-->>P: reads files, calls exit_plan
P-->>C: "Plan: ..."
C->>C: User reviews plan
alt Approved
C->>P: execute(&mut messages)
P->>L: [all tools]
L-->>P: writes/edits files
P-->>C: "Done."
else Rejected
C->>P: plan(&mut messages) [with feedback]
P->>L: [read, bash, ask_user, exit_plan tools only]
L-->>P: revised plan
P-->>C: "Revised plan: ..."
end
Wiring it up
Add the module to mini-claw-code/src/lib.rs:
#![allow(unused)]
fn main() {
pub mod planning;
// ...
pub use planning::PlanAgent;
}
That’s it. Like streaming, plan mode is a purely additive feature – no existing code is modified.
Running the tests
cargo test -p mini-claw-code ch12
The tests verify:
- Text response:
plan()returns text when the LLM stops immediately. - Read tool allowed:
readexecutes during planning. - Write tool blocked:
writeis blocked during planning; the file is NOT created; an errorToolResultis sent back to the LLM. - Edit tool blocked: same behavior for
edit. - Execute allows write:
writeworks during execution; the file IS created. - Full plan-then-execute: end-to-end flow – plan reads a file, approval, execute writes a file.
- Message continuity: messages from the plan phase carry into the execute phase, including the injected system prompt.
- read_only override:
.read_only(&["read"])excludesbashfrom planning. - Streaming events:
TextDeltaandDoneevents are emitted during planning. - Provider error: empty mock propagates errors correctly.
- Builder pattern: chained
.tool().read_only().plan_prompt()compiles. - System prompt injection:
plan()injects a system prompt at position 0. - System prompt not duplicated: calling
plan()twice doesn’t add a second system message. - Caller system prompt respected: if the caller provides a
Systemmessage,plan()doesn’t overwrite it. exit_plantool: the LLM callsexit_planto signal plan completion;plan()returns the plan text.exit_plannot in execute: duringexecute(),exit_planis not in the tool list.- Custom plan prompt:
.plan_prompt(...)overrides the default. - Full flow with
exit_plan: plan reads file → callsexit_plan→ approve → execute writes file.
Recap
PlanAgentseparates planning (read-only) from execution (all tools) using a single shared agent loop.- System prompt:
plan()injects a system message telling the LLM it’s in planning mode — what tools are available, what’s blocked, and that it should callexit_planwhen done. exit_plantool: the LLM explicitly signals plan completion, just like Claude Code’sExitPlanMode. This is injected during planning and invisible during execution.- Double defense: definition filtering prevents the LLM from seeing blocked tools; an execution guard catches hallucinated calls.
- Caller-driven approval: the agent doesn’t manage approval – the caller
pushes approval/rejection as
Usermessages and calls the appropriate phase. - Message continuity: the same
messagesvec flows through both phases, giving the LLM full context. - Streaming: both phases use
StreamProviderand emitAgentEvents, just likeStreamingAgent. - Purely additive: no changes to
SimpleAgent,StreamingAgent, or any existing code.
Chapter 13: Subagents
Complex tasks are hard. Even the best LLM struggles when a single prompt asks it to research a codebase, design an approach, write the code, and verify the result – all while maintaining a coherent conversation. The context window fills up, the model loses focus, and quality degrades.
Subagents solve this with decomposition: the parent agent spawns a child agent for each subtask. The child has its own message history and tools, runs to completion, and returns a summary. The parent sees only the final answer – a clean, focused result without the noise of the child’s internal reasoning.
This is exactly how Claude Code’s Task tool works. When Claude Code needs to explore a large codebase or handle an independent subtask, it spawns a subagent that does the work and reports back. OpenCode and the Anthropic Agent SDK use the same pattern.
In this chapter you’ll build SubagentTool – a Tool implementation that
spawns ephemeral child agents.
You will:
- Add a blanket
impl Provider for Arc<P>so parent and child can share a provider. - Build
SubagentTool<P: Provider>with a closure-based tool factory and builder methods. - Implement the
Tooltrait with an inlined agent loop and turn limit. - Wire it up as a module and re-export.
Why subagents?
Consider this scenario:
User: "Add error handling to all API endpoints"
Agent (no subagents):
→ reads 15 files, context window fills up
→ forgets what it learned from file 3
→ produces inconsistent changes
Agent (with subagents):
→ spawns child: "Add error handling to /api/users.rs"
→ child reads 1 file, writes changes, returns "Done: added Result types"
→ spawns child: "Add error handling to /api/posts.rs"
→ child does the same
→ parent sees clean summaries, coordinates the overall task
The key insight: a subagent is just a Tool. It takes a task description as
input, does work internally, and returns a string result. The parent’s agent
loop doesn’t need any special handling – it calls the subagent tool the same
way it calls read or bash.
Provider sharing with Arc<P>
The parent and child need to use the same LLM provider. In production this means sharing an HTTP client, API key, and configuration. Cloning the provider would duplicate connections. We want to share it cheaply.
The answer is Arc<P>. But there’s a catch: our Provider trait uses RPITIT
(return-position impl Trait in trait), which means it’s not object-safe. We
can’t use dyn Provider. We can use Arc<P> where P: Provider – but
only if Arc<P> itself implements Provider.
A blanket impl makes this work. In types.rs:
#![allow(unused)]
fn main() {
impl<P: Provider> Provider for Arc<P> {
fn chat<'a>(
&'a self,
messages: &'a [Message],
tools: &'a [&'a ToolDefinition],
) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a {
(**self).chat(messages, tools)
}
}
}
This delegates to the inner P via deref. Now Arc<MockProvider> and
Arc<OpenRouterProvider> are both valid providers. Existing code is
completely unchanged – if you were passing MockProvider before, it still
works. The Arc wrapper is opt-in.
The SubagentTool struct
#![allow(unused)]
fn main() {
pub struct SubagentTool<P: Provider> {
provider: Arc<P>,
tools_factory: Box<dyn Fn() -> ToolSet + Send + Sync>,
system_prompt: Option<String>,
max_turns: usize,
definition: ToolDefinition,
}
}
Three design decisions here:
Arc<P> for the provider. Parent creates Arc::new(provider), keeps a
clone for itself, and passes a clone to SubagentTool. Both share the same
underlying provider. Cheap, safe, no cloning of HTTP clients.
A closure factory for tools. Tools are Box<dyn Tool> – they’re not
cloneable. Each child spawn needs a fresh ToolSet. A Fn() -> ToolSet
closure produces one on demand. This naturally captures Arcs for shared
state:
#![allow(unused)]
fn main() {
let provider = Arc::new(OpenRouterProvider::from_env()?);
SubagentTool::new(provider, || {
ToolSet::new()
.with(ReadTool::new())
.with(WriteTool::new())
.with(BashTool::new())
})
}
A max_turns safety limit. Without this, a confused child could loop
forever. Defaults to 10 – generous enough for real tasks, strict enough to
prevent runaway loops.
The builder
Construction uses the same fluent builder style as elsewhere in the codebase:
#![allow(unused)]
fn main() {
impl<P: Provider> SubagentTool<P> {
pub fn new(
provider: Arc<P>,
tools_factory: impl Fn() -> ToolSet + Send + Sync + 'static,
) -> Self {
Self {
provider,
tools_factory: Box::new(tools_factory),
system_prompt: None,
max_turns: 10,
definition: ToolDefinition::new(
"subagent",
"Spawn a child agent to handle a subtask independently. \
The child has its own message history and tools.",
)
.param(
"task",
"string",
"A clear description of the subtask for the child agent to complete.",
true,
),
}
}
pub fn system_prompt(mut self, prompt: impl Into<String>) -> Self {
self.system_prompt = Some(prompt.into());
self
}
pub fn max_turns(mut self, max: usize) -> Self {
self.max_turns = max;
self
}
}
}
The tool definition exposes a single task parameter – the LLM writes a
clear description of what the child should do. Minimal and effective.
The Tool implementation
The core of SubagentTool is its Tool::call() method. It inlines a minimal
agent loop – the same protocol as SimpleAgent::chat() (call provider, execute
tools, loop), but with a turn limit, no terminal output, and a locally-owned
message vec:
#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl<P: Provider + 'static> Tool for SubagentTool<P> {
fn definition(&self) -> &ToolDefinition {
&self.definition
}
async fn call(&self, args: Value) -> anyhow::Result<String> {
let task = args
.get("task")
.and_then(|v| v.as_str())
.ok_or_else(|| anyhow::anyhow!("missing required parameter: task"))?;
let tools = (self.tools_factory)();
let defs = tools.definitions();
let mut messages = Vec::new();
if let Some(ref prompt) = self.system_prompt {
messages.push(Message::System(prompt.clone()));
}
messages.push(Message::User(task.to_string()));
for _ in 0..self.max_turns {
let turn = self.provider.chat(&messages, &defs).await?;
match turn.stop_reason {
StopReason::Stop => {
return Ok(turn.text.unwrap_or_default());
}
StopReason::ToolUse => {
let mut results = Vec::with_capacity(turn.tool_calls.len());
for call in &turn.tool_calls {
let content = match tools.get(&call.name) {
Some(t) => t
.call(call.arguments.clone())
.await
.unwrap_or_else(|e| format!("error: {e}")),
None => format!("error: unknown tool `{}`", call.name),
};
results.push((call.id.clone(), content));
}
messages.push(Message::Assistant(turn));
for (id, content) in results {
messages.push(Message::ToolResult { id, content });
}
}
}
}
Ok("error: max turns exceeded".to_string())
}
}
}
A few things to notice:
No tokio::spawn. The child runs within the parent’s Tool::call()
future. This is deliberate – spawning a background task would add
coordination complexity (channels, join handles, cancellation). Running
inline keeps things simple and deterministic.
Fresh message history. The child starts with only a system prompt
(optional) and the task as a User message. It never sees the parent’s
conversation. When the child finishes, only its final text is returned to the
parent as a tool result. The child’s internal messages are dropped.
Turn limit as a soft error. When max_turns is exceeded, the tool
returns an error string rather than Err(...). This lets the parent LLM see
the failure and decide what to do (retry with a simpler task, try a different
approach, etc.), rather than crashing the entire agent loop.
Provider errors propagate. If the LLM API fails during a child turn, the
error bubbles up through ? to the parent. This is intentional – API errors
are infrastructure failures, not task failures.
Wiring it up
Add the module and re-export in mini-claw-code/src/lib.rs:
#![allow(unused)]
fn main() {
pub mod subagent;
// ...
pub use subagent::SubagentTool;
}
Usage example
Here’s how you’d wire up a parent agent with a subagent tool:
#![allow(unused)]
fn main() {
use std::sync::Arc;
use mini_claw_code::*;
let provider = Arc::new(OpenRouterProvider::from_env()?);
let p = provider.clone();
let agent = SimpleAgent::new(provider)
.tool(ReadTool::new())
.tool(WriteTool::new())
.tool(BashTool::new())
.tool(SubagentTool::new(p, || {
ToolSet::new()
.with(ReadTool::new())
.with(WriteTool::new())
.with(BashTool::new())
}));
let result = agent.run("Refactor the auth module").await?;
}
The parent LLM sees subagent in its tool list alongside read, write,
and bash. When the task is complex enough, the LLM can choose to delegate
via subagent – or handle it directly with the other tools. The LLM
decides.
You can also give the child a specialized system prompt:
#![allow(unused)]
fn main() {
SubagentTool::new(provider, || {
ToolSet::new()
.with(ReadTool::new())
.with(BashTool::new())
})
.system_prompt("You are a security auditor. Review code for vulnerabilities.")
.max_turns(15)
}
Running the tests
cargo test -p mini-claw-code ch13
The tests verify:
- Text response: child returns text immediately (no tool calls).
- With tool: child uses
ReadToolbefore answering. - Multi-step: child makes multiple tool calls across turns.
- Max turns exceeded: turn limit enforced, returns error string.
- Missing task: error on missing
taskparameter. - Provider error: child provider error propagates to parent.
- Unknown tool: child handles unknown tools gracefully.
- Builder pattern: chaining
.system_prompt().max_turns()compiles. - System prompt: child runs correctly with a system prompt configured.
- Write tool: child writes a file, parent continues afterward.
- Parent continues: parent resumes its own work after subagent completes.
- Isolated history: child messages don’t leak into parent’s message vec.
Recap
SubagentToolis aToolthat spawns ephemeral child agents. The parent sees only the final answer.Arc<P>blanket impl lets parent and child share a provider without cloning. Fully backward-compatible.- Closure factory produces a fresh
ToolSetper child spawn, sinceBox<dyn Tool>isn’t cloneable. - Inlined agent loop with
max_turnsguard keepsSimpleAgentunchanged. Notokio::spawnneeded – the child runs withinTool::call(). - Message isolation: the child’s internal messages are local to the
call()future. Only the final text crosses back to the parent. - Single
taskparameter: the LLM writes a clear task description; the child handles the rest. - Purely additive: the only existing change is the blanket impl in
types.rs. Everything else is new code.
Chapter 14: Token Tracking
Every call to an LLM costs money. A single agent run might loop ten or twenty times, reading files, running commands, and editing code. Without tracking how many tokens you are spending, costs can silently spiral – especially during development when you are iterating fast. Claude Code shows a running token count and cost estimate at the bottom of every session for exactly this reason.
In this chapter you will build CostTracker, a struct that accumulates token
usage across turns and computes an estimated cost. You will also see how the
OpenAI-compatible API reports usage in its response JSON, and how our
OpenRouterProvider already parses it into a TokenUsage struct on
AssistantTurn.
Why track tokens?
There are two practical reasons:
-
Cost control. LLM APIs charge per token. If your agent enters a loop that keeps reading large files, the bill adds up fast. A cost tracker lets you display a running total, set budgets, or abort early.
-
Context window awareness. Every model has a maximum context length. As the conversation grows, input tokens increase with each turn (because you resend the full history). Tracking input tokens gives you a signal for when you are approaching the limit and might need to summarize or truncate.
How APIs report usage
OpenAI-compatible APIs (OpenRouter, OpenAI, Anthropic’s compatibility layer)
include a usage object in every chat completion response:
{
"id": "chatcmpl-abc123",
"choices": [{ "message": { "content": "Hello!" }, "finish_reason": "stop" }],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 15
}
}
prompt_tokens– how many tokens the API consumed reading your input (system prompt + conversation history + tool definitions).completion_tokens– how many tokens the model generated in its response (text + tool calls).
Not every provider guarantees this field, so it is optional. But when it is present, we want to capture it.
Goal
Implement CostTracker so that:
- You create it with per-million-token pricing for input and output.
- You can
record()aTokenUsagefrom each turn. - It accumulates totals across turns and computes estimated cost.
- It can produce a human-readable summary string.
- It can be reset to zero.
The TokenUsage struct
Open mini-claw-code-starter/src/types.rs. You will see a new struct alongside
the types you already know:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct TokenUsage {
pub input_tokens: u64,
pub output_tokens: u64,
}
}
This is a simple data carrier – just two numbers. The Default derive gives
us TokenUsage { input_tokens: 0, output_tokens: 0 } for free, which is
useful when the API omits individual fields.
The struct lives on AssistantTurn as an optional field:
#![allow(unused)]
fn main() {
pub struct AssistantTurn {
pub text: Option<String>,
pub tool_calls: Vec<ToolCall>,
pub stop_reason: StopReason,
/// Token usage for this turn, if reported by the provider.
pub usage: Option<TokenUsage>,
}
}
The usage field is Option<TokenUsage> because not every provider reports
it. MockProvider returns None (it does not call a real API), while
OpenRouterProvider parses it from the JSON response.
How OpenRouterProvider parses usage
In Chapter 6 you built the HTTP provider. Now look at how it handles the
usage field in openrouter.rs. The response is deserialized into these
types:
#![allow(unused)]
fn main() {
#[derive(Deserialize)]
struct ChatResponse {
choices: Vec<Choice>,
usage: Option<ApiUsage>,
}
#[derive(Deserialize)]
struct ApiUsage {
prompt_tokens: Option<u64>,
completion_tokens: Option<u64>,
}
}
Both usage on ChatResponse and the individual fields on ApiUsage are
optional – some providers omit them entirely, others include the object but
leave fields null. At the end of the chat() method, the conversion looks
like this:
#![allow(unused)]
fn main() {
let usage = resp.usage.map(|u| TokenUsage {
input_tokens: u.prompt_tokens.unwrap_or(0),
output_tokens: u.completion_tokens.unwrap_or(0),
});
Ok(AssistantTurn {
text: choice.message.content,
tool_calls,
stop_reason,
usage,
})
}
The double-Option pattern – Option<ApiUsage> containing Option<u64>
fields – is a common defensive strategy when deserializing API responses.
resp.usage.map(...) handles the outer option (no usage key at all), and
unwrap_or(0) handles the inner option (key present but value null).
You do not need to modify the provider. The parsing is already done. Your job
is to build the CostTracker that consumes these TokenUsage values.
Implementing CostTracker
Open mini-claw-code-starter/src/usage.rs. You will see the struct and method
signatures already laid out with unimplemented!() bodies.
The design
CostTracker needs to be shared across the agent loop – you might pass it
into run() or hold it alongside the agent. Because the agent takes &self
(shared reference), the tracker must support mutation through &self. This is
the same interior mutability pattern you used in MockProvider:
#![allow(unused)]
fn main() {
pub struct CostTracker {
inner: Mutex<CostTrackerInner>,
/// Price per million input tokens (USD).
input_price: f64,
/// Price per million output tokens (USD).
output_price: f64,
}
struct CostTrackerInner {
total_input: u64,
total_output: u64,
turn_count: u64,
}
}
The prices are immutable after construction (they describe the model, which
does not change mid-session), so they live outside the Mutex. Only the
running totals need interior mutability.
Step 1: Implement new()
The constructor takes two prices: input and output, both in dollars per million tokens. These are the rates you find on a model’s pricing page – for example, Claude Sonnet charges $3 per million input tokens and $15 per million output tokens.
#![allow(unused)]
fn main() {
pub fn new(input_price_per_million: f64, output_price_per_million: f64) -> Self {
Self {
inner: Mutex::new(CostTrackerInner {
total_input: 0,
total_output: 0,
turn_count: 0,
}),
input_price: input_price_per_million,
output_price: output_price_per_million,
}
}
}
Store the prices on self and initialize all counters to zero inside a
Mutex.
Step 2: Implement record()
This is the method the agent loop calls after each provider response. It takes
a &TokenUsage and adds its values to the running totals:
#![allow(unused)]
fn main() {
pub fn record(&self, usage: &TokenUsage) {
let mut inner = self.inner.lock().unwrap();
inner.total_input += usage.input_tokens;
inner.total_output += usage.output_tokens;
inner.turn_count += 1;
}
}
Lock the mutex, add the token counts, bump the turn counter. That is it. The lock is held for three additions – fast enough that contention is never a problem.
Step 3: Implement the getter methods
Three simple accessors, each locking the mutex and reading a field:
#![allow(unused)]
fn main() {
pub fn total_input_tokens(&self) -> u64 {
self.inner.lock().unwrap().total_input
}
pub fn total_output_tokens(&self) -> u64 {
self.inner.lock().unwrap().total_output
}
pub fn turn_count(&self) -> u64 {
self.inner.lock().unwrap().turn_count
}
}
Each method acquires and releases the lock independently. This is fine – if you needed a consistent snapshot of all three values at once, you would lock once and read all three. But for display purposes, slight inconsistency between separate calls is acceptable.
Step 4: Implement total_cost()
The cost formula is straightforward:
cost = (input_tokens * input_price + output_tokens * output_price) / 1,000,000
We divide by one million because the prices are per million tokens:
#![allow(unused)]
fn main() {
pub fn total_cost(&self) -> f64 {
let inner = self.inner.lock().unwrap();
(inner.total_input as f64 * self.input_price
+ inner.total_output as f64 * self.output_price)
/ 1_000_000.0
}
}
Notice we lock once and read both total_input and total_output together.
This ensures the cost calculation uses a consistent pair of values.
Step 5: Implement summary()
This produces a human-readable string for display – the kind of thing you would show at the bottom of a terminal UI:
tokens: 1234 in + 567 out | cost: $0.0122
The implementation duplicates the cost calculation (instead of calling
self.total_cost()) to avoid locking the mutex twice:
#![allow(unused)]
fn main() {
pub fn summary(&self) -> String {
let inner = self.inner.lock().unwrap();
let cost = (inner.total_input as f64 * self.input_price
+ inner.total_output as f64 * self.output_price)
/ 1_000_000.0;
format!(
"tokens: {} in + {} out | cost: ${:.4}",
inner.total_input, inner.total_output, cost
)
}
}
The {:.4} format specifier gives four decimal places – enough precision
for small token counts where the cost might be fractions of a cent.
Step 6: Implement reset()
Reset all counters to zero. Useful when starting a new conversation in the same session:
#![allow(unused)]
fn main() {
pub fn reset(&self) {
let mut inner = self.inner.lock().unwrap();
inner.total_input = 0;
inner.total_output = 0;
inner.turn_count = 0;
}
}
Running the tests
Run the Chapter 14 tests:
cargo test -p mini-claw-code-starter ch14
What the tests verify
test_ch14_empty_tracker: A freshly created tracker has zero tokens, zero turns, and zero cost.test_ch14_record_single_turn: Record one usage, verify the totals match exactly.test_ch14_accumulates_across_turns: Record three usages, verify the totals are the sum of all three.test_ch14_cost_calculation: Record exactly one million input and one million output tokens at $3/M and $15/M. Verify cost is $18.00.test_ch14_cost_small_numbers: Record 1000 input and 200 output tokens. Verify cost is $0.006 (three tenths of a cent).test_ch14_summary_format: Verify the summary string contains the expected token counts and a dollar sign.test_ch14_reset: Record usage, reset, verify everything is back to zero.test_ch14_zero_usage: Record a turn with zero tokens. Turn count increments but cost stays zero.test_ch14_token_usage_default: VerifyTokenUsage::default()gives zeros – a sanity check on theDefaultderive.
Wiring it into the agent loop
The tests cover CostTracker in isolation, but in practice you would wire it
into your agent loop. After each call to self.provider.chat(), check if the
response includes usage data and record it:
#![allow(unused)]
fn main() {
let turn = self.provider.chat(&messages, &defs).await?;
if let Some(ref usage) = turn.usage {
cost_tracker.record(usage);
}
}
Then, after the agent finishes (or periodically during long runs), display the summary:
#![allow(unused)]
fn main() {
println!("{}", cost_tracker.summary());
// tokens: 4521 in + 892 out | cost: $0.0270
}
This is exactly what tools like Claude Code do – show a running cost estimate so you know what a session is costing in real time.
Recap
You have built a CostTracker that:
- Accumulates input and output token counts across multiple agent turns.
- Computes cost from per-million-token pricing.
- Produces a summary string for display.
- Uses
Mutexfor interior mutability, the same pattern asMockProvider. - Handles the full chain: API response ->
TokenUsageonAssistantTurn->CostTracker::record()-> running totals and cost estimate.
Token tracking is a small feature in terms of code, but it is essential for any agent you plan to use in production. Without it, you are flying blind on costs and context window usage.
What’s next
In Chapter 15: Safety Rails you will add guardrails to
your agent – command filtering, path validation, and permission prompts – so
it cannot accidentally rm -rf / or read files outside the project directory.
Chapter 15: Context Management
Every LLM has a context window – a fixed number of tokens it can process in a single request. Claude has 200k tokens. GPT-4o has 128k. Sounds like a lot, until your agent reads a few large files, runs a test suite, edits some code, and runs the tests again. Each tool result gets appended to the message history, and that history gets sent to the LLM on every turn. A busy session can blow past 100k tokens in minutes.
When that happens, the API either rejects the request or silently truncates your messages. Either way, the agent breaks. Real coding agents handle this automatically – Claude Code, for example, compacts the conversation when it gets too long, summarizing old messages while keeping recent context intact. The user sees a brief “auto-compacting conversation…” message and the session continues.
In this chapter you will build a ContextManager that does exactly that:
tracks token usage, decides when to compact, and uses the LLM itself to
summarize old messages into a short recap. You will also implement the
should_compact() threshold check as an exercise.
The problem
Consider this 10-turn conversation:
User: Find the bug in src/parser.rs
[read: src/parser.rs] ← 500 lines of code
[read: src/types.rs] ← 300 lines of code
Assistant: I see the issue. The parser...
[bash: cargo test] ← 200 lines of test output
[edit: src/parser.rs] ← patch
[bash: cargo test] ← 200 lines of test output
Assistant: All tests pass now.
User: Great. Now add a --verbose flag to the CLI.
[read: src/main.rs] ← 400 lines
...
By the time the user asks a second question, the message history already contains thousands of tokens of file contents, test output, and tool calls. Most of that detail is irrelevant to the new task. But the LLM still receives it all, which wastes tokens, increases latency, and eventually hits the context limit.
The solution: periodically compact the history by summarizing old messages and keeping only the recent ones.
The strategy
Compaction works in three steps:
- Detect – after each LLM turn, check if cumulative token usage has crossed a threshold.
- Summarize – take the old messages (everything except the system prompt and the most recent N messages) and ask the LLM to summarize them into a few sentences.
- Rebuild – replace the message history with: the original system prompt, the summary as a new system message, and the recent messages.
flowchart TD
A["Messages: system + msg1 + msg2 + ... + msg20"]
A --> B["Split"]
B --> C["Keep: system prompt"]
B --> D["Middle: msg1 ... msg17 → summarize"]
B --> E["Keep: msg18, msg19, msg20"]
D --> F["LLM summary"]
F --> G["Rebuilt: system + summary + msg18 + msg19 + msg20"]
After compaction, the conversation has 4-5 messages instead of 20+. The LLM loses the fine-grained detail of early messages but retains the key facts and decisions through the summary. The recent messages are preserved verbatim, so the LLM has full context for whatever it is working on right now.
This is the same approach Claude Code uses. It is simple, effective, and
requires no changes to the agent loop or provider – just a pre-processing
step before each provider.chat() call.
The ContextManager struct
Open mini-claw-code/src/context.rs. The struct has three fields:
#![allow(unused)]
fn main() {
pub struct ContextManager {
/// Maximum total tokens before compaction triggers.
max_tokens: u64,
/// Number of recent messages to always preserve during compaction.
preserve_recent: usize,
/// Running total of tokens used in the current conversation.
tokens_used: u64,
}
}
max_tokensis the budget. When cumulative usage crosses this threshold, compaction fires. This is not the model’s context window size – it is a lower number you choose to leave headroom. For a 200k-token model, you might set this to 100k so you always have room for the LLM’s response.preserve_recentis how many messages to keep verbatim. These are the messages most relevant to the current task. A value of 4-6 usually works well.tokens_usedis the running counter, updated after each LLM turn.
The constructor is straightforward:
#![allow(unused)]
fn main() {
impl ContextManager {
pub fn new(max_tokens: u64, preserve_recent: usize) -> Self {
Self {
max_tokens,
preserve_recent,
tokens_used: 0,
}
}
}
}
Tracking token usage
The LLM API reports how many tokens each request consumed. Our AssistantTurn
type carries this information in an optional usage field:
#![allow(unused)]
fn main() {
pub struct AssistantTurn {
pub text: Option<String>,
pub tool_calls: Vec<ToolCall>,
pub stop_reason: StopReason,
pub usage: Option<TokenUsage>,
}
#[derive(Debug, Clone, Default)]
pub struct TokenUsage {
pub input_tokens: u64,
pub output_tokens: u64,
}
}
After each provider call, the agent records the usage:
#![allow(unused)]
fn main() {
pub fn record(&mut self, usage: &TokenUsage) {
self.tokens_used += usage.input_tokens + usage.output_tokens;
}
}
This is a rough estimate. Input tokens grow with each turn (because the full history is resent), so summing input + output across all turns overcounts. But for a threshold check, overcounting is fine – it just means we compact a little earlier than strictly necessary, which is safer than compacting too late.
You can query the current total at any time:
#![allow(unused)]
fn main() {
pub fn tokens_used(&self) -> u64 {
self.tokens_used
}
}
Exercise: implement should_compact()
This is your exercise for the chapter. The method signature is:
#![allow(unused)]
fn main() {
pub fn should_compact(&self) -> bool {
// TODO: return true if tokens_used >= max_tokens
todo!()
}
}
The logic is a single comparison. When tokens_used meets or exceeds
max_tokens, it is time to compact. Implement it in the starter crate and
run the tests:
cargo test -p mini-claw-code-starter ch15
Here are the tests that verify your implementation:
#![allow(unused)]
fn main() {
#[test]
fn test_ch15_below_threshold_no_compact() {
let cm = ContextManager::new(10000, 4);
assert!(!cm.should_compact());
}
#[test]
fn test_ch15_triggers_at_threshold() {
let mut cm = ContextManager::new(1000, 4);
cm.record(&TokenUsage {
input_tokens: 600,
output_tokens: 500,
});
assert!(cm.should_compact());
}
#[test]
fn test_ch15_tracks_tokens() {
let mut cm = ContextManager::new(10000, 4);
cm.record(&TokenUsage {
input_tokens: 100,
output_tokens: 50,
});
cm.record(&TokenUsage {
input_tokens: 200,
output_tokens: 100,
});
assert_eq!(cm.tokens_used(), 450);
}
}
The first test creates a fresh ContextManager with zero usage – it should
not compact. The second records 1100 tokens against a budget of 1000 – it
should compact. The third verifies that multiple record() calls accumulate
correctly.
The compact() method
Once should_compact() returns true, the agent calls compact(). This is
the core of context management. Let us walk through it step by step.
Guard clause: too few messages
#![allow(unused)]
fn main() {
pub async fn compact<P: Provider>(
&mut self,
provider: &P,
messages: &mut Vec<Message>,
) -> anyhow::Result<()> {
if messages.len() <= self.preserve_recent + 1 {
return Ok(());
}
}
If the conversation is short enough that there is nothing to summarize, bail out. No point summarizing two messages into two sentences.
Splitting the history
The method divides messages into three segments:
#![allow(unused)]
fn main() {
let keep_start = if matches!(messages.first(), Some(Message::System(_))) {
1
} else {
0
};
let total = messages.len();
if total <= keep_start + self.preserve_recent {
return Ok(());
}
let middle_end = total - self.preserve_recent;
let middle = &messages[keep_start..middle_end];
}
- Head (
0..keep_start): the system prompt, if present. Always preserved. - Middle (
keep_start..middle_end): old messages that will be summarized. - Tail (
middle_end..total): the most recentpreserve_recentmessages, kept verbatim.
If the system prompt is “You are a helpful coding agent” and there are 10
messages with preserve_recent = 3, then: head = message 0, middle = messages
1-6, tail = messages 7-9.
Building the summarization prompt
The method formats each middle message into a human-readable block:
#![allow(unused)]
fn main() {
let mut summary_parts = Vec::new();
for msg in middle {
match msg {
Message::User(text) => summary_parts.push(format!("User: {text}")),
Message::Assistant(turn) => {
if let Some(ref text) = turn.text {
summary_parts.push(format!("Assistant: {text}"));
}
for call in &turn.tool_calls {
summary_parts.push(format!(" [tool: {}]", call.name));
}
}
Message::ToolResult { content, .. } => {
let preview = if content.len() > 100 {
format!("{}...", &content[..100])
} else {
content.clone()
};
summary_parts.push(format!(" Tool result: {preview}"));
}
Message::System(text) => summary_parts.push(format!("System: {text}")),
}
}
}
Notice the truncation: tool results longer than 100 characters are clipped. This matters because tool results can be huge – the entire contents of a source file, or the full output of a test suite. Including all of that in the summarization prompt would itself be expensive. The LLM only needs enough context to produce a useful summary.
The formatted parts are joined into a single summarization prompt:
#![allow(unused)]
fn main() {
let prompt = format!(
"Summarize this conversation history in 2-3 sentences, \
preserving key facts and decisions:\n\n{}",
summary_parts.join("\n")
);
}
Calling the LLM
The summary prompt is sent as a fresh conversation – no tools, no history:
#![allow(unused)]
fn main() {
let summary_messages = vec![Message::User(prompt)];
let turn = provider.chat(&summary_messages, &[]).await?;
let summary_text = turn.text.unwrap_or_else(|| "Previous conversation.".into());
}
This is a neat trick: we reuse the same Provider the agent already has. No
extra configuration, no special summarization model. The LLM that does the
coding also does the summarization. If the provider call fails, we use a
generic fallback string so the conversation can continue.
Rebuilding the message history
Finally, the method assembles the new, shorter history:
#![allow(unused)]
fn main() {
let mut new_messages = Vec::new();
// Keep leading messages (system prompt)
for msg in messages.iter().take(keep_start) {
if let Message::System(text) = msg {
new_messages.push(Message::System(text.clone()));
}
}
// Insert the summary as a system message
new_messages.push(Message::System(format!(
"[Conversation summary]: {summary_text}"
)));
// Keep recent messages
let recent_start = total - self.preserve_recent;
let recent: Vec<Message> = messages.drain(recent_start..).collect();
new_messages.extend(recent);
*messages = new_messages;
}
The summary is inserted as a Message::System(...) tagged with
[Conversation summary]. This tells the LLM it is reading a recap, not a
direct instruction. The recent messages come after the summary, so the LLM
sees the most relevant context last – right before it generates its response.
After rebuilding, the token counter is reduced:
#![allow(unused)]
fn main() {
self.tokens_used /= 3;
}
This is a rough heuristic. The actual token savings depend on how much was summarized, but dividing by 3 is a reasonable estimate that avoids re-triggering compaction immediately.
The integration point: maybe_compact()
The maybe_compact() method ties detection and compaction together:
#![allow(unused)]
fn main() {
pub async fn maybe_compact<P: Provider>(
&mut self,
provider: &P,
messages: &mut Vec<Message>,
) -> anyhow::Result<()> {
if self.should_compact() {
self.compact(provider, messages).await?;
}
Ok(())
}
}
This is the method the agent loop calls. The integration is a single line
added before each provider.chat() call:
#![allow(unused)]
fn main() {
loop {
// NEW: compact if needed before calling the LLM
context_manager.maybe_compact(&self.provider, &mut messages).await?;
let turn = self.provider.chat(&messages, &defs).await?;
// Record token usage from this turn
if let Some(ref usage) = turn.usage {
context_manager.record(usage);
}
match turn.stop_reason {
StopReason::Stop => return Ok(turn.text.unwrap_or_default()),
StopReason::ToolUse => {
// ... execute tools, push results ...
}
}
}
}
That is the entire integration. Two lines added to the existing agent loop:
one to maybe compact before the call, one to record usage after. The
ContextManager handles all the logic internally.
Running the tests
cargo test -p mini-claw-code ch15
What the tests verify
-
test_ch15_below_threshold_no_compact: A freshContextManagerwith zero usage should not trigger compaction. -
test_ch15_triggers_at_threshold: After recording 1100 tokens against a budget of 1000,should_compact()returnstrue. -
test_ch15_tracks_tokens: Tworecord()calls accumulate correctly (100 + 50 + 200 + 100 = 450). -
test_ch15_compact_preserves_system_prompt: After compacting a 6-message conversation (system + 5 user messages), the system prompt remains as the first message and a summary message is present. -
test_ch15_compact_too_few_messages: Whenpreserve_recentis larger than the message count, compaction is a no-op – nothing changes. -
test_ch15_maybe_compact_skips_when_not_needed: When token usage is below the threshold,maybe_compact()leaves messages untouched. -
test_ch15_compact_preserves_recent: After compacting a 5-message conversation withpreserve_recent = 2, the last two messages (“Recent A” and “Recent B”) are preserved verbatim.
The async tests use MockProvider to provide a canned summary response. No
real API calls, no network. The mock returns a fixed summary string, and the
tests verify that the message history is restructured correctly around it.
Design tradeoffs
Why not count tokens precisely? The tokens_used counter sums input and
output tokens across all turns, which overcounts because input tokens are
resent each turn. A precise implementation would track only incremental
tokens. But the threshold approach is intentionally conservative – it
triggers compaction a bit early, which is always safe. And it avoids the
complexity of a token counting model.
Why not truncate instead of summarize? You could simply drop old messages. But the LLM would lose context about what it already did, leading to repeated work or contradictory actions. Summarization preserves the key facts (“I found a bug in parser.rs line 42 and fixed it, all tests pass now”) in a compact form.
Why divide tokens_used by 3? After compaction, the actual token count is
unknown without re-counting. Dividing by 3 is a rough approximation that
works well in practice: the summary is much shorter than the original
messages, and the recent messages were already counted. The approximation
errs on the side of under-counting, which means the next compaction might
trigger slightly late. In practice this is fine because preserve_recent
keeps enough headroom.
Why use a system message for the summary? System messages are treated with
high priority by most LLMs. By tagging the summary as
[Conversation summary], we signal to the model that this is background
context, not an instruction or a user message. This avoids confusing the LLM
about who said what.
Recap
- Context windows are finite. Long agent sessions accumulate token-heavy tool results that eventually exhaust the budget.
ContextManagertracks cumulative token usage and triggers compaction when a threshold is reached.should_compact()is a simple threshold check:tokens_used >= max_tokens.compact()splits the message history into head (system prompt), middle (old messages to summarize), and tail (recent messages to preserve). The middle is summarized by the LLM and replaced with a single system message.maybe_compact()is the integration point – one line before eachprovider.chat()call in the agent loop.- Token counting is approximate. The system errs on the side of compacting early, which is safer than compacting late.
What’s next
Your agent now manages its own context window – it can run indefinitely without hitting token limits. Combined with tools, streaming, subagents, and plan mode from earlier chapters, you have a complete coding agent framework.
The next step is yours. Extend the agent with new tools, experiment with different summarization strategies, add token-level counting with a proper tokenizer, or deploy it as a daily-driver CLI. The architecture you have built is the same one that powers production coding agents – the difference is polish, not structure.
Chapter 16: Configuration
Every production agent needs configurable behavior. Which model should it use? What is the context window limit? Are there directories it should never touch? Hardcoding these values works for a tutorial, but a real tool needs to let users override them – and override them at different levels.
Claude Code solves this with a multi-level configuration hierarchy: defaults, project settings, user settings, and environment variables. Each layer can override the one below it. This chapter walks through our implementation of the same pattern.
The layered config model
The core idea is simple: start with sensible defaults, then let each successive layer override specific values while leaving the rest untouched.
Priority (highest wins)
========================
4. Environment variables MINI_CLAW_MODEL=...
3. User config ~/.config/mini-claw/config.toml
2. Project config .mini-claw/config.toml
1. Defaults compiled into the binary
Why four layers?
- Defaults ensure the agent works out of the box with zero configuration.
- Project config lives in the repository (
.mini-claw/config.toml). It sets project-specific rules: blocked commands, protected files, MCP servers. Every contributor on the project shares these settings. - User config lives in the user’s home directory
(
~/.config/mini-claw/config.tomlon Linux/macOS). It captures personal preferences: preferred model, API base URL, custom instructions. These apply across all projects. - Environment variables override everything. They are useful for CI pipelines, one-off experiments, or temporarily switching models without editing any file.
This is the same pattern used by Git (system, global, local config), npm
(.npmrc at multiple levels), and many other CLI tools. It is worth
understanding because you will see it everywhere and can reuse it in your own
projects.
The Config struct
Open mini-claw-code/src/config.rs. The top-level struct holds every
configurable value:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Deserialize)]
#[serde(default)]
pub struct Config {
pub model: String,
pub base_url: String,
pub max_context_tokens: u64,
pub preserve_recent: usize,
pub allowed_directory: Option<String>,
pub protected_patterns: Vec<String>,
pub blocked_commands: Vec<String>,
pub mcp_servers: Vec<McpServerConfig>,
pub hooks: HooksConfig,
pub instructions: Option<String>,
}
}
A quick field-by-field tour:
| Field | Purpose |
|---|---|
model | LLM model identifier, e.g. "anthropic/claude-sonnet-4" |
base_url | API endpoint URL |
max_context_tokens | Token budget before the agent triggers context compaction |
preserve_recent | Number of recent messages to keep during compaction |
allowed_directory | If set, tools cannot access files outside this directory |
protected_patterns | Glob patterns for files that tools should never write to |
blocked_commands | Shell command patterns that the bash tool should reject |
mcp_servers | MCP server definitions (name, command, args, env) |
hooks | Pre/post tool execution hooks |
instructions | Custom system prompt text |
The #[serde(default)] attribute on the struct is critical. It tells serde:
“if a field is missing from the TOML input, use its Default value instead of
returning an error.” This means a config file can specify just one field and
every other field gets a sensible default.
Defaults
The Default implementation defines the baseline:
#![allow(unused)]
fn main() {
impl Default for Config {
fn default() -> Self {
Self {
model: "openrouter/free".into(),
base_url: "https://openrouter.ai/api/v1".into(),
max_context_tokens: 100_000,
preserve_recent: 6,
allowed_directory: None,
protected_patterns: vec![
".env".into(),
".env.*".into(),
".git/**".into(),
],
blocked_commands: vec![
"rm -rf /".into(),
"sudo *".into(),
"curl * | bash".into(),
"curl * | sh".into(),
],
mcp_servers: Vec::new(),
hooks: HooksConfig::default(),
instructions: None,
}
}
}
}
The defaults are deliberately conservative. The free model keeps the barrier to
entry low. The protected patterns prevent the agent from overwriting .env
files or anything inside .git/. The blocked commands stop the most dangerous
shell operations. A user who wants to loosen these restrictions can do so in
their config file.
Nested config types
McpServerConfig
MCP servers are defined as a list of entries. Each entry describes how to spawn a server process:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Deserialize)]
pub struct McpServerConfig {
pub name: String,
pub command: String,
#[serde(default)]
pub args: Vec<String>,
#[serde(default)]
pub env: std::collections::HashMap<String, String>,
}
}
In TOML, this uses the double-bracket array-of-tables syntax:
[[mcp_servers]]
name = "filesystem"
command = "npx"
args = ["@anthropic/mcp-server-filesystem"]
The #[serde(default)] on args and env means you can omit them if the
server needs no arguments or extra environment variables.
HooksConfig and ShellHookConfig
Hooks let you run shell commands before or after the agent executes a tool. For example, you might lint a file after the agent writes to it, or log every bash command.
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default, Deserialize)]
#[serde(default)]
pub struct HooksConfig {
pub pre_tool: Vec<ShellHookConfig>,
pub post_tool: Vec<ShellHookConfig>,
}
#[derive(Debug, Clone, Deserialize)]
pub struct ShellHookConfig {
pub tool_pattern: Option<String>,
pub command: String,
#[serde(default = "default_hook_timeout")]
pub timeout_ms: u64,
}
fn default_hook_timeout() -> u64 {
5000
}
}
A few things to note:
HooksConfiguses#[serde(default)]at the struct level, so a config file that does not mention hooks at all will get emptypre_toolandpost_toolvectors.ShellHookConfiguses#[serde(default = "default_hook_timeout")]ontimeout_ms. This is a different form of the default attribute: instead of using the type’sDefaulttrait, it calls a specific function. Here,default_hook_timeout()returns 5000 milliseconds.tool_patternis anOption<String>. WhenNone, the hook runs for every tool. When set to something like"bash", it only runs for the bash tool.
In TOML:
[[hooks.pre_tool]]
command = "echo pre"
tool_pattern = "bash"
timeout_ms = 3000
TOML deserialization
The toml crate handles deserialization. Because Config derives
Deserialize and has #[serde(default)], parsing a minimal TOML file works
seamlessly:
#![allow(unused)]
fn main() {
let toml_str = r#"
model = "anthropic/claude-sonnet-4"
max_context_tokens = 50000
"#;
let config: Config = toml::from_str(toml_str).unwrap();
}
This produces a Config where model is "anthropic/claude-sonnet-4",
max_context_tokens is 50000, and every other field has its default value.
The #[serde(default)] attribute is doing all the heavy lifting – without it,
serde would require every field to be present in the TOML.
This is also why we chose TOML over JSON for configuration files. TOML is designed for human-editable config: it supports comments, has clean syntax for nested tables and arrays, and does not require trailing commas or quoting of simple strings.
ConfigLoader
The ConfigLoader struct ties everything together. It has no fields – it is
just a namespace for the loading logic:
#![allow(unused)]
fn main() {
pub struct ConfigLoader;
}
The load() method
ConfigLoader::load() is the main entry point. It applies all four layers in
order:
#![allow(unused)]
fn main() {
impl ConfigLoader {
pub fn load() -> Config {
let mut config = Config::default();
// Layer 1: Project config
if let Some(project_config) = Self::load_file(".mini-claw/config.toml") {
Self::merge(&mut config, project_config);
}
// Layer 2: User config
if let Some(user_dir) = dirs::config_dir() {
let user_path = user_dir.join("mini-claw/config.toml");
if let Some(user_config) = Self::load_path(&user_path) {
Self::merge(&mut config, user_config);
}
}
// Layer 3: Environment variable overrides
if let Ok(model) = std::env::var("MINI_CLAW_MODEL") {
config.model = model;
}
if let Ok(url) = std::env::var("MINI_CLAW_BASE_URL") {
config.base_url = url;
}
if let Ok(tokens) = std::env::var("MINI_CLAW_MAX_TOKENS")
&& let Ok(n) = tokens.parse()
{
config.max_context_tokens = n;
}
config
}
}
}
The flow:
- Start with
Config::default(). - If
.mini-claw/config.tomlexists in the current directory, parse it and merge it into the config. - Use the
dirscrate to find the platform-appropriate user config directory (~/.configon Linux,~/Library/Application Supporton macOS). Ifmini-claw/config.tomlexists there, merge it in. - Check three environment variables (
MINI_CLAW_MODEL,MINI_CLAW_BASE_URL,MINI_CLAW_MAX_TOKENS) and override the corresponding fields if set.
Each file loading step uses if let Some(...) – if the file does not exist or
cannot be parsed, the step is silently skipped. This is intentional: config
files are optional at every level.
Notice the let ... && let ... syntax in the environment variable parsing for
MINI_CLAW_MAX_TOKENS. This is a let-chain: the inner let Ok(n) = tokens.parse() only runs if the outer let Ok(tokens) succeeds. If the
environment variable exists but is not a valid number, the override is skipped.
File loading helpers
Two helper methods handle reading and parsing TOML files:
#![allow(unused)]
fn main() {
pub fn load_path(path: &Path) -> Option<Config> {
let content = std::fs::read_to_string(path).ok()?;
toml::from_str(&content).ok()
}
fn load_file(relative_path: &str) -> Option<Config> {
let path = PathBuf::from(relative_path);
Self::load_path(&path)
}
}
Both return Option<Config>. The ? operator on .ok() converts Result
into Option, so any I/O error or parse error produces None and the layer
is skipped.
load_path is public – callers can use it to load a config from any
arbitrary path. load_file is private and handles the relative path case for
project config.
The merge strategy
The merge() method is where the layered override logic lives:
#![allow(unused)]
fn main() {
fn merge(base: &mut Config, overlay: Config) {
if overlay.model != Config::default().model {
base.model = overlay.model;
}
if overlay.base_url != Config::default().base_url {
base.base_url = overlay.base_url;
}
if overlay.max_context_tokens != Config::default().max_context_tokens {
base.max_context_tokens = overlay.max_context_tokens;
}
if overlay.preserve_recent != Config::default().preserve_recent {
base.preserve_recent = overlay.preserve_recent;
}
if overlay.allowed_directory.is_some() {
base.allowed_directory = overlay.allowed_directory;
}
if !overlay.protected_patterns.is_empty()
&& overlay.protected_patterns != Config::default().protected_patterns
{
base.protected_patterns = overlay.protected_patterns;
}
if !overlay.blocked_commands.is_empty()
&& overlay.blocked_commands != Config::default().blocked_commands
{
base.blocked_commands = overlay.blocked_commands;
}
if !overlay.mcp_servers.is_empty() {
base.mcp_servers = overlay.mcp_servers;
}
if overlay.instructions.is_some() {
base.instructions = overlay.instructions;
}
}
}
The merge logic compares each overlay field against the default. If a field in
the overlay still has its default value, it was probably not set in the TOML
file (remember, #[serde(default)] fills missing fields with defaults). So
the base value is kept. Only explicitly-set values override.
This is a pragmatic compromise. A more sophisticated approach would track which
fields were explicitly set (using something like Option<T> for every field,
or a separate “was this set?” bitfield). But comparing against defaults works
well in practice and keeps the code simple.
One subtlety: Vec fields like protected_patterns and blocked_commands
check both that the overlay is non-empty and that it differs from the
default. This prevents an edge case where deserializing a TOML file that does
not mention protected_patterns would produce the default value (via
#[serde(default)]) and then “override” the base with the same defaults.
Environment variable overrides
The environment variable layer is the simplest – no file loading, no merging, just direct assignment:
#![allow(unused)]
fn main() {
if let Ok(model) = std::env::var("MINI_CLAW_MODEL") {
config.model = model;
}
}
Only three fields are exposed as environment variables: model, base_url,
and max_context_tokens. These are the values most likely to change between
runs. Complex structures like mcp_servers and hooks are not practical to
express as environment variables, so they are only configurable through files.
This is a common pattern in CLI tools: environment variables handle the “quick override” case, while config files handle the “persistent, structured settings” case.
Running the tests
cargo test -p mini-claw-code ch16
The tests cover each layer and their interactions:
-
test_ch16_default_config– verifies thatConfig::default()returns sensible values: the free model, 100k token limit, non-empty protected patterns and blocked commands. -
test_ch16_load_from_toml– parses a TOML string with two fields and checks that both are set correctly. -
test_ch16_default_fills_missing_fields– parses a TOML string with onlymodelset. Verifies that unspecified fields fall back to their defaults. This is the#[serde(default)]attribute in action. -
test_ch16_load_nonexistent_path– callsConfigLoader::load_path()on a path that does not exist. Confirms it returnsNoneinstead of panicking. -
test_ch16_mcp_server_config– parses TOML with a[[mcp_servers]]block. Verifies that the array-of-tables syntax deserializes into aVec<McpServerConfig>correctly. -
test_ch16_hooks_config– parses TOML with a[[hooks.pre_tool]]block. Verifies the hook’s command, tool pattern, and timeout. -
test_ch16_env_override– setsMINI_CLAW_MODELas an environment variable, callsConfigLoader::load(), and verifies the model was overridden. Note that the test usesunsafeblocks aroundset_varandremove_var– as of Rust 2024 edition, modifying environment variables is unsafe because it can cause undefined behavior when another thread reads the environment concurrently. -
test_ch16_protected_patterns_default– verifies that the default protected patterns include.envand.git/**.
Recap
- Layered configuration is a widely-used design pattern: defaults, project settings, user settings, and environment variables, each overriding the layer below.
- The
Configstruct uses#[serde(default)]so that TOML files only need to specify the fields they want to change. - Nested types (
McpServerConfig,HooksConfig,ShellHookConfig) model structured configuration with their own serde attributes and defaults. ConfigLoader::load()applies all four layers in order, using amerge()function that only overrides fields that differ from the default.- Environment variables provide the highest-priority override for the most commonly changed fields.
- File loading is resilient: missing or unparseable files are silently skipped.
This pattern is reusable well beyond coding agents. Any CLI tool that needs per-project and per-user settings can use the same approach: define a config struct with serde defaults, load files from known paths, merge non-default values, and apply environment variable overrides last.
Chapter 17: Project Instructions
Every coding agent worth its salt understands the project it is working in.
Claude Code reads CLAUDE.md files to learn your coding conventions, preferred
libraries, and project-specific quirks. Your agent should do the same.
In this chapter you will build an InstructionLoader that discovers instruction
files by walking the filesystem upward from the current directory, loads their
contents, and formats them for injection into the agent’s system prompt. It is a
small piece of infrastructure, but the payoff is immediate – your agent starts
respecting project context the moment it launches.
Goal
Implement InstructionLoader so that:
- Given a starting directory, it walks upward toward the filesystem root
looking for instruction files (e.g.
CLAUDE.md). - It returns discovered paths in root-first order (outermost files first, innermost last).
- It loads and concatenates file contents with clear headers.
- It produces a formatted section ready for the system prompt.
The discovery pattern
Consider a project with this layout:
/home/user/CLAUDE.md <-- global preferences
/home/user/projects/CLAUDE.md <-- org-level conventions
/home/user/projects/my-app/CLAUDE.md <-- project-specific rules
/home/user/projects/my-app/src/ <-- you are here
When the agent starts in /home/user/projects/my-app/src/, it should walk
upward, checking each directory for instruction files. After collecting
everything, it reverses the list so that the broadest context (closest to root)
appears first and project-specific overrides appear last. This mirrors how
Claude Code layers its own CLAUDE.md files.
flowchart TB
A["/home/user/projects/my-app/src/"] -->|"parent()"| B["/home/user/projects/my-app/"]
B -->|"parent()"| C["/home/user/projects/"]
C -->|"parent()"| D["/home/user/"]
D -->|"parent()"| E["/home/"]
E -->|"parent()"| F["/"]
B -. "CLAUDE.md found" .-> G["Collect"]
C -. "CLAUDE.md found" .-> G
D -. "CLAUDE.md found" .-> G
G -->|"reverse()"| H["Root-first order"]
The implementation
Create a new file at mini-claw-code-starter/src/instructions.rs. You will also
need to add pub mod instructions; to your lib.rs and re-export the struct:
#![allow(unused)]
fn main() {
pub use instructions::InstructionLoader;
}
The struct
InstructionLoader holds a list of file names to search for:
#![allow(unused)]
fn main() {
use std::path::{Path, PathBuf};
pub struct InstructionLoader {
file_names: Vec<String>,
}
}
It is deliberately simple – no async, no caching, just a synchronous walker. Instruction files are tiny and loaded once at startup, so there is no need for the complexity of async I/O here.
Step 1: Constructors
Provide two ways to create a loader. The first accepts an explicit list of file names:
#![allow(unused)]
fn main() {
impl InstructionLoader {
pub fn new(file_names: &[&str]) -> Self {
Self {
file_names: file_names.iter().map(|s| s.to_string()).collect(),
}
}
}
}
The second provides sensible defaults:
#![allow(unused)]
fn main() {
pub fn default_files() -> Self {
Self::new(&["CLAUDE.md", ".mini-claw/instructions.md"])
}
}
This lets users customize the file names if they want, while the common case requires no configuration at all.
Step 2: discover() – filesystem traversal
This is the core method. It takes a starting directory and walks upward:
#![allow(unused)]
fn main() {
pub fn discover(&self, start_dir: &Path) -> Vec<PathBuf> {
let mut found = Vec::new();
let mut dir = Some(start_dir.to_path_buf());
while let Some(current) = dir {
for name in &self.file_names {
let candidate = current.join(name);
if candidate.is_file() {
found.push(candidate);
}
}
dir = current.parent().map(|p| p.to_path_buf());
}
// Reverse so root-level files come first
found.reverse();
found
}
}
Walk through the key details:
dir = Some(start_dir.to_path_buf())– We useOption<PathBuf>to drive the loop. Whenparent()returnsNone(we have reached the root), the loop ends.- Inner loop over
file_names– At each directory level we check every file name in the search list. This means a single directory can contribute multiple instruction files if bothCLAUDE.mdand.mini-claw/instructions.mdexist there. candidate.is_file()– A synchronous filesystem check. We only collect paths that actually exist and are files.found.reverse()– The traversal naturally produces innermost-first order (we start at the deepest directory). Reversing gives us root-first order, which is what we want for layering: broad context first, specific overrides last.
Step 3: load() – reading and concatenating
With discovery in hand, loading is straightforward:
#![allow(unused)]
fn main() {
pub fn load(&self, start_dir: &Path) -> Option<String> {
let paths = self.discover(start_dir);
if paths.is_empty() {
return None;
}
let mut sections = Vec::new();
for path in &paths {
if let Ok(content) = std::fs::read_to_string(path) {
let content = content.trim().to_string();
if !content.is_empty() {
sections.push(format!(
"# Instructions from {}\n\n{}",
path.display(),
content
));
}
}
}
if sections.is_empty() {
None
} else {
Some(sections.join("\n\n---\n\n"))
}
}
}
A few things to note:
- Returns
Option<String>–Nonemeans no instruction files were found (or all were empty). This makes it easy for the caller to skip injection entirely. content.trim()– Strips leading/trailing whitespace so empty files (or files with only whitespace) are excluded.- Header per file – Each section starts with
# Instructions from /path/to/CLAUDE.mdso the LLM (and you, when debugging) can see exactly where each instruction came from. ---separator – A horizontal rule between sections keeps the output readable when multiple files are concatenated.
Step 4: system_prompt_section() – ready for the agent
The final method wraps the loaded content with a preamble that tells the LLM to follow the instructions:
#![allow(unused)]
fn main() {
pub fn system_prompt_section(&self, start_dir: &Path) -> Option<String> {
self.load(start_dir).map(|content| {
format!(
"The following project instructions were loaded automatically. \
Follow them carefully:\n\n{content}"
)
})
}
}
This returns None when there are no instructions, so integrating it is clean:
#![allow(unused)]
fn main() {
// In your agent setup code:
let loader = InstructionLoader::default_files();
if let Some(section) = loader.system_prompt_section(¤t_dir) {
messages.insert(0, Message::System(section));
}
}
The Message::System variant you defined back in Chapter 1 is the right place
for this. System messages sit at the front of the conversation and guide the
LLM’s behavior for the entire session.
Integrating with the agent
To wire this into your agent, add instruction loading to your startup code
(for example, in main() or wherever you build the initial message list).
The pattern is:
- Determine the current working directory.
- Create an
InstructionLoader(usually withdefault_files()). - Call
system_prompt_section(). - If it returns
Some, prepend aMessage::Systemto your conversation.
#![allow(unused)]
fn main() {
use std::env;
use mini_claw_code::{InstructionLoader, Message};
let cwd = env::current_dir().expect("failed to get current directory");
let loader = InstructionLoader::default_files();
let mut messages = Vec::new();
if let Some(instructions) = loader.system_prompt_section(&cwd) {
messages.push(Message::System(instructions));
}
// ... continue with user prompt and agent loop
}
That is it. No changes to the agent loop, no changes to the provider. The instructions flow in as part of the system prompt and the LLM sees them on every turn.
Running the tests
Run the Chapter 17 tests:
cargo test -p mini-claw-code-starter ch17
What the tests verify
test_ch17_discover_in_current_dir: Creates a temp directory with aCLAUDE.mdfile and verifiesdiscover()finds it.test_ch17_discover_in_parent: Creates aCLAUDE.mdin a parent directory and starts discovery from a child. The file should still be found.test_ch17_no_files_found: Searches for a nonexistent file name and verifies the result is empty.test_ch17_load_content: Writes aCLAUDE.mdwith known content and verifiesload()returns it.test_ch17_load_empty_file: An empty file should causeload()to returnNone– empty instructions are not useful.test_ch17_multiple_file_names: Creates bothCLAUDE.mdand.mini-claw/instructions.mdin the same directory and verifies both are loaded.test_ch17_system_prompt_section: Verifies the output includes the preamble text (“project instructions”) and the file content.test_ch17_default_files: Confirmsdefault_files()does not panic.
Recap
You built a project instruction loader with three layers:
discover()walks the filesystem upward, collecting instruction file paths in root-first order.load()reads and concatenates those files with clear headers and separators.system_prompt_section()wraps the result for direct injection intoMessage::System.
The key design choices:
- Root-first ordering ensures broad conventions appear before project-specific overrides, letting the LLM resolve conflicts by giving priority to the most specific instructions (which appear last).
Option<String>return types make it trivial to skip injection when no files exist.- Synchronous I/O is appropriate here – instruction files are small and loaded once at startup.
Your agent now reads project context automatically. Drop a CLAUDE.md in any
directory and the agent picks it up. This is the same pattern that makes tools
like Claude Code project-aware from the first prompt.
Chapter 18: Safety Rails
Your agent can now read files, write files, edit code, and run arbitrary shell
commands. Take a moment to appreciate what that means: the LLM – a statistical
model that occasionally hallucinates – has root-level access to your file
system and can execute any command your user account can. It can rm -rf /. It
can read /etc/passwd. It can overwrite your .env file with your API keys
exposed. That is terrifying.
Production coding agents like Claude Code invest heavily in multi-layered safety. In this chapter you will build a miniature version of those safety rails: a set of composable checks that run before every tool call, blocking dangerous operations before they reach the file system or shell.
flowchart LR
LLM -- "tool call" --> SC["Safety Checks"]
SC -- "pass" --> Tool
SC -- "blocked" --> Err["Error returned<br/>to LLM"]
Goal
Implement four types in safety.rs:
SafetyChecktrait – the common interface every check implements.PathValidator– ensures file paths stay inside an allowed directory.CommandFilter– blocks dangerous shell commands by glob pattern.ProtectedFileCheck– prevents writes to sensitive files like.env.
Then implement SafeToolWrapper – a decorator that wraps any Box<dyn Tool>
with a list of safety checks, running them before delegating to the inner tool.
Key Rust concepts
The decorator pattern with trait objects
Rust does not have class inheritance, but you can achieve the decorator pattern
with trait objects. A decorator struct holds a Box<dyn Tool> and itself
implements Tool. From the outside it looks like any other tool. Inside, it
adds behavior (safety checks) before delegating to the wrapped tool.
#![allow(unused)]
fn main() {
struct SafeToolWrapper {
inner: Box<dyn Tool>,
checks: Vec<Box<dyn SafetyCheck>>,
}
impl Tool for SafeToolWrapper {
fn definition(&self) -> &ToolDefinition {
self.inner.definition() // delegate
}
async fn call(&self, args: Value) -> anyhow::Result<String> {
// run checks first, then delegate
self.inner.call(args).await
}
}
}
This is the same idea as Python’s functools.wraps or the classic Gang of Four
decorator, but expressed through Rust’s trait system.
std::path::Path::canonicalize()
Canonicalizing a path resolves all ., .., and symbolic links, producing an
absolute path that cannot be tricked by directory traversal:
#![allow(unused)]
fn main() {
let sneaky = Path::new("/home/user/project/../../../etc/passwd");
let resolved = sneaky.canonicalize()?;
// resolved == "/etc/passwd"
}
This is how you defeat ../ attacks. After canonicalization, a simple
starts_with check is enough to verify containment.
glob::Pattern
The glob crate provides Unix-style glob matching. You will use it to match
both commands and file paths against patterns:
#![allow(unused)]
fn main() {
let pattern = glob::Pattern::new("sudo *").unwrap();
assert!(pattern.matches("sudo reboot"));
assert!(!pattern.matches("echo hello"));
}
The * matches any sequence of characters, ? matches any single character,
and [abc] matches character classes. This gives you flexible pattern-based
filtering without writing complex regex.
Step 1: The SafetyCheck trait
Open mini-claw-code-starter/src/safety.rs and start with the trait that all
safety checks will implement.
#![allow(unused)]
fn main() {
use std::path::{Path, PathBuf};
use async_trait::async_trait;
use serde_json::Value;
use crate::types::{Tool, ToolDefinition};
/// A check that runs before a tool call is executed.
///
/// Implementations validate tool arguments and return `Ok(())` to allow
/// execution or `Err(reason)` to block it.
pub trait SafetyCheck: Send + Sync {
fn check(&self, tool_name: &str, args: &Value) -> Result<(), String>;
}
}
A few things to notice:
- The method is synchronous. Safety checks inspect arguments – they do not need to do I/O or anything async. Keeping them sync makes them cheap and easy to compose.
- It returns
Result<(), String>, notanyhow::Result. TheStringerror is the human-readable reason the check failed. This keeps safety checks self-contained with no dependency onanyhow. - The trait is
Send + Syncbecause tools run inside an async runtime and may be shared across tasks. - Every check receives the tool name and the raw arguments. This lets a
single check implementation decide which tools it cares about (e.g. a path
validator only inspects
read,write, andedit).
Step 2: PathValidator
The first real check prevents directory traversal attacks. A user (or a confused
LLM) might ask to read ../../etc/passwd or write to /root/.ssh/authorized_keys.
PathValidator ensures every file path resolves to somewhere inside an allowed
directory.
The struct
#![allow(unused)]
fn main() {
pub struct PathValidator {
allowed_dir: PathBuf,
}
impl PathValidator {
pub fn new(allowed_dir: impl Into<PathBuf>) -> Self {
Self {
allowed_dir: allowed_dir.into(),
}
}
}
}
The core method: validate_path
This is where the real logic lives. The method takes a raw path string and either accepts or rejects it.
#![allow(unused)]
fn main() {
pub fn validate_path(&self, path: &str) -> Result<(), String> {
let target = Path::new(path);
// Resolve to absolute path
let resolved = if target.is_absolute() {
target.to_path_buf()
} else {
self.allowed_dir.join(target)
};
}
If the path is relative (like src/main.rs), we join it with the allowed
directory to get an absolute path. If it is already absolute, we use it as-is.
Next, canonicalize both paths. This is the critical step – it collapses any
.. segments:
#![allow(unused)]
fn main() {
let canonical_allowed = self
.allowed_dir
.canonicalize()
.map_err(|e| format!("cannot resolve allowed directory: {e}"))?;
let canonical_target = if resolved.exists() {
resolved
.canonicalize()
.map_err(|e| format!("cannot resolve path: {e}"))?
}
But what about new files that do not exist yet? You cannot canonicalize a non-existent path. The trick is to canonicalize the parent directory and then append the filename:
#![allow(unused)]
fn main() {
} else {
// For new files, check the parent directory
let parent = resolved.parent().ok_or("invalid path")?;
if parent.exists() {
let mut canonical = parent
.canonicalize()
.map_err(|e| format!("cannot resolve parent: {e}"))?;
if let Some(filename) = resolved.file_name() {
canonical.push(filename);
}
canonical
} else {
return Err(format!(
"parent directory does not exist: {}",
parent.display()
));
}
};
}
Finally, the containment check. After canonicalization, starts_with is safe:
#![allow(unused)]
fn main() {
if canonical_target.starts_with(&canonical_allowed) {
Ok(())
} else {
Err(format!(
"path {} is outside allowed directory {}",
canonical_target.display(),
canonical_allowed.display()
))
}
}
}
flowchart TD
A["Raw path string"] --> B["Resolve to absolute"]
B --> C{"File exists?"}
C -- "yes" --> D["canonicalize()"]
C -- "no" --> E["canonicalize parent<br/>+ append filename"]
D --> F{"starts_with<br/>allowed_dir?"}
E --> F
F -- "yes" --> G["Ok(())"]
F -- "no" --> H["Err: outside allowed dir"]
Implementing SafetyCheck for PathValidator
The trait implementation decides which tools this check applies to. Path
validation only makes sense for tools that take a "path" argument:
#![allow(unused)]
fn main() {
impl SafetyCheck for PathValidator {
fn check(&self, tool_name: &str, args: &Value) -> Result<(), String> {
match tool_name {
"read" | "write" | "edit" => {
if let Some(path) = args.get("path").and_then(|v| v.as_str()) {
self.validate_path(path)
} else {
Ok(()) // No path argument, nothing to check
}
}
_ => Ok(()),
}
}
}
}
Notice the _ => Ok(()) arm. The bash tool does not have a "path" argument,
so the path validator silently allows it. Each check is responsible only for what
it understands.
Step 3: CommandFilter
The second layer blocks dangerous shell commands. You do not want the LLM to
run rm -rf /, sudo anything, or write directly to block devices.
The struct
#![allow(unused)]
fn main() {
pub struct CommandFilter {
blocked_patterns: Vec<glob::Pattern>,
}
}
Constructor and defaults
#![allow(unused)]
fn main() {
impl CommandFilter {
pub fn new(patterns: &[String]) -> Self {
Self {
blocked_patterns: patterns
.iter()
.filter_map(|p| glob::Pattern::new(p).ok())
.collect(),
}
}
pub fn default_filters() -> Self {
Self::new(&[
"rm -rf /".into(),
"rm -rf /*".into(),
"sudo *".into(),
"> /dev/sda*".into(),
"mkfs.*".into(),
"dd if=*of=/dev/*".into(),
":(){:|:&};:".into(),
])
}
}
}
The default_filters() method creates a baseline set of blocked patterns. That
last one – :(){:|:&};: – is the infamous bash fork bomb. The filter_map
call in the constructor silently drops any patterns that fail to parse, which is
a reasonable default for a list of glob strings.
The matching method
#![allow(unused)]
fn main() {
pub fn is_blocked(&self, command: &str) -> Option<&str> {
let trimmed = command.trim();
for pattern in &self.blocked_patterns {
if pattern.matches(trimmed) {
return Some(pattern.as_str());
}
}
None
}
}
It returns Some(pattern_str) when a match is found so the error message can
tell the user which pattern was triggered. Returning None means the command
is allowed.
Implementing SafetyCheck
#![allow(unused)]
fn main() {
impl SafetyCheck for CommandFilter {
fn check(&self, tool_name: &str, args: &Value) -> Result<(), String> {
if tool_name != "bash" {
return Ok(());
}
if let Some(command) = args.get("command").and_then(|v| v.as_str()) {
if let Some(pattern) = self.is_blocked(command) {
Err(format!("blocked command matching pattern `{pattern}`"))
} else {
Ok(())
}
} else {
Ok(())
}
}
}
}
This check only fires for the bash tool. It extracts the "command" argument
and tests it against every blocked pattern. Clean and focused.
Step 4: ProtectedFileCheck
The third layer protects sensitive files from being overwritten. Even if a path
is inside the allowed directory, you might not want the LLM writing to .env,
.git/config, or credentials.json.
The struct
#![allow(unused)]
fn main() {
pub struct ProtectedFileCheck {
patterns: Vec<glob::Pattern>,
}
impl ProtectedFileCheck {
pub fn new(patterns: &[String]) -> Self {
Self {
patterns: patterns
.iter()
.filter_map(|p| glob::Pattern::new(p).ok())
.collect(),
}
}
}
}
Implementing SafetyCheck
This check only applies to write operations (write and edit). Reading a
sensitive file is less dangerous than overwriting it:
#![allow(unused)]
fn main() {
impl SafetyCheck for ProtectedFileCheck {
fn check(&self, tool_name: &str, args: &Value) -> Result<(), String> {
match tool_name {
"write" | "edit" => {
if let Some(path) = args.get("path").and_then(|v| v.as_str()) {
for pattern in &self.patterns {
if pattern.matches(path)
|| pattern.matches(
Path::new(path)
.file_name()
.unwrap_or_default()
.to_str()
.unwrap_or(""),
)
{
return Err(format!(
"file `{path}` is protected (matches pattern `{}`)",
pattern.as_str()
));
}
}
Ok(())
} else {
Ok(())
}
}
_ => Ok(()),
}
}
}
}
There is a subtlety here: the check matches the pattern against both the full
path and just the filename. This means a pattern like .env will match
/home/user/project/.env as well as just .env. Without this, a user would
need to write patterns for every possible directory prefix.
Step 5: SafeToolWrapper – the decorator
Now you have three independent safety checks. The final piece is the glue that
attaches them to actual tools. SafeToolWrapper wraps any Box<dyn Tool> and
runs all checks before delegating to the inner tool.
The struct
#![allow(unused)]
fn main() {
pub struct SafeToolWrapper {
inner: Box<dyn Tool>,
checks: Vec<Box<dyn SafetyCheck>>,
}
}
Two fields: the wrapped tool and a list of checks (each a trait object). This means you can mix and match checks freely – attach just a path validator, or stack all three.
Constructors
#![allow(unused)]
fn main() {
impl SafeToolWrapper {
pub fn new(tool: Box<dyn Tool>, checks: Vec<Box<dyn SafetyCheck>>) -> Self {
Self {
inner: tool,
checks,
}
}
pub fn with_check(tool: Box<dyn Tool>, check: impl SafetyCheck + 'static) -> Self {
Self::new(tool, vec![Box::new(check)])
}
}
}
with_check is a convenience for the common case of a single check. The
'static bound is needed because the check will be stored in a Box.
Implementing Tool
This is the core of the decorator pattern:
#![allow(unused)]
fn main() {
#[async_trait]
impl Tool for SafeToolWrapper {
fn definition(&self) -> &ToolDefinition {
self.inner.definition()
}
async fn call(&self, args: Value) -> anyhow::Result<String> {
let tool_name = self.inner.definition().name;
for check in &self.checks {
if let Err(reason) = check.check(tool_name, &args) {
return Ok(format!("error: safety check failed: {reason}"));
}
}
self.inner.call(args).await
}
}
}
Key design decisions:
-
definition()delegates directly. The wrapped tool’s schema is unchanged. The LLM sees the exact same tool definition – it has no idea safety checks exist. The safety layer is invisible. -
Failed checks return
Ok(...), notErr(...). This is intentional. A safety check failure is not a program crash – it is a message back to the LLM explaining why the operation was blocked. The LLM can then adjust its approach. If we returnedErr, the agent loop might interpret it as a fatal error and abort. -
All checks run sequentially. If any check fails, the tool call is blocked immediately. The remaining checks do not run. This is a fail-fast approach – one “no” is enough.
-
The tool name comes from the inner tool’s definition. This means checks see the real tool name (e.g.
"read","bash") and can filter accordingly.
Putting it together
Here is how you would wire up safety checks when building your agent:
#![allow(unused)]
fn main() {
use crate::safety::*;
use crate::tools::*;
// Create a ReadTool with path validation
let allowed_dir = std::env::current_dir().unwrap();
let validator = PathValidator::new(&allowed_dir);
let safe_read = SafeToolWrapper::with_check(
Box::new(ReadTool::new()),
validator,
);
// Create a BashTool with command filtering
let safe_bash = SafeToolWrapper::with_check(
Box::new(BashTool),
CommandFilter::default_filters(),
);
// Create a WriteTool with multiple checks
let safe_write = SafeToolWrapper::new(
Box::new(WriteTool),
vec![
Box::new(PathValidator::new(&allowed_dir)),
Box::new(ProtectedFileCheck::new(&[
".env".into(),
".env.*".into(),
"*.pem".into(),
"*.key".into(),
])),
],
);
}
Because SafeToolWrapper itself implements Tool, it slots into the existing
ToolSet with no changes to the agent loop. The agent does not know or care
that safety checks exist. This is the power of the decorator pattern – you add
behavior without modifying existing code.
Running the tests
Run the Chapter 18 tests:
cargo test -p mini-claw-code ch18
What the tests verify
PathValidator:
test_ch18_path_within_allowed: A file inside the allowed directory is accepted.test_ch18_path_outside_allowed:/etc/passwdis rejected when the allowed directory is a temp dir.test_ch18_path_traversal_blocked: A path likeallowed/sub/../../../etc/passwdis rejected after canonicalization.test_ch18_path_new_file_in_allowed: A file that does not exist yet but whose parent is inside the allowed directory is accepted.test_ch18_safety_check_read_tool: TheSafetyCheckimpl correctly checks paths for thereadtool.test_ch18_safety_check_ignores_bash: ThePathValidatorignores thebashtool (no"path"argument).
CommandFilter:
test_ch18_command_filter_blocks_rm_rf:rm -rf /andrm -rf /*are blocked.test_ch18_command_filter_blocks_sudo:sudo rm filematches thesudo *pattern.test_ch18_command_filter_allows_safe:ls -la,echo hello, andcargo testpass through.test_ch18_command_filter_safety_check: TheSafetyCheckimpl blockssudo rebootvia thebashtool and allowsecho safe.test_ch18_custom_blocked_commands: Custom patterns likedocker rm *andnpm publish*work correctly.
ProtectedFileCheck:
test_ch18_protected_file_blocks_env: Writing to.envor.env.localis blocked.test_ch18_protected_file_allows_normal: Writing tosrc/main.rsis allowed.
SafeToolWrapper:
test_ch18_wrapper_blocks_on_check_failure: A wrappedReadToolreturns a"safety check failed"message when the path is outside the allowed directory.test_ch18_wrapper_allows_valid_call: A wrappedReadToolsuccessfully reads a file inside the allowed directory, proving the decorator delegates correctly.
Defense in depth
No single check catches everything. That is the point of layered security.
Consider what happens when the LLM asks to write to
/home/user/project/.env:
- PathValidator – checks if the path is inside the allowed directory.
If the allowed directory is
/home/user/project, this passes. The path is technically inside the project. - ProtectedFileCheck – catches it.
.envmatches the protected pattern. The write is blocked. - CommandFilter – does not apply. This is a
writetool call, notbash.
Now consider rm -rf / via the bash tool:
- PathValidator – does not apply.
bashhas no"path"argument. - ProtectedFileCheck – does not apply. This is not a
writeoredit. - CommandFilter – catches it. The command matches
rm -rf /.
And a path traversal attack via read:
- PathValidator – catches it. Canonicalization resolves the
..segments and the path ends up outside the allowed directory. - The other checks never need to fire.
Each layer covers a different attack surface. Together they form a mesh that is much harder to slip through than any single check. This is the principle of defense in depth – do not rely on one gatekeeper; stack them.
Limitations
This is a tutorial implementation. A production safety system would also need:
- Confirmation prompts for destructive but non-blocked operations (e.g. deleting a file within the project).
- Rate limiting to prevent an LLM from making thousands of tool calls.
- Regex-based command filtering for more precise matching than globs allow.
- Audit logging so you can review every tool call after the fact.
- Sandboxing (containers, VMs) as the ultimate backstop.
But the architecture you built here – a trait-based system of composable checks
wired through a decorator – is exactly the right foundation. Adding more checks
is just implementing one more SafetyCheck.
Recap
You built a safety layer with four components:
| Type | Purpose | Applies to |
|---|---|---|
SafetyCheck trait | Common interface | All checks |
PathValidator | Prevent directory traversal | read, write, edit |
CommandFilter | Block dangerous commands | bash |
ProtectedFileCheck | Guard sensitive files | write, edit |
SafeToolWrapper | Decorator that runs checks | Any Box<dyn Tool> |
The key patterns:
- Canonicalize before comparing – never trust raw path strings.
- Glob matching – flexible pattern-based filtering for both commands and file paths.
- Decorator pattern – wrap a trait object with additional behavior without modifying the original.
- Defense in depth – layer independent checks so no single bypass defeats the entire system.
Your agent is no longer a terrifying root-access footgun. It still has power, but now that power flows through safety rails that you control.
Chapter 19: Permissions
If you’ve used Claude Code, you’ve seen this prompt:
Claude wants to use bash:
command: git status
Allow? (y/n/always)
The agent doesn’t just run every tool call blindly. Before executing, it checks a permission system to decide: should this tool call proceed automatically, be blocked outright, or require user approval?
This is the permission system. Three possible decisions:
- Allow – execute immediately, no questions asked.
- Deny – block the call, return an error to the LLM.
- Ask – pause and prompt the user for approval.
In this chapter you’ll build:
- A
Permissionenum with the three decisions. - A
PermissionRulethat matches tool names using glob patterns. - A
PermissionEnginethat evaluates rules in order, supports a default fallback, and remembers session-level overrides.
Why permissions?
Chapter 18 introduced safety rails – SafeToolWrapper blocks dangerous
arguments (path traversal, rm -rf /) based on static checks. But safety
checks are binary: pass or fail. They can’t express “this tool is fine for
reading, but I want to approve writes.”
Permissions add a human-in-the-loop layer. A typical configuration might look like:
| Tool | Permission |
|---|---|
read | Allow |
bash | Ask |
write | Ask |
edit | Ask |
mcp__* | Deny |
| (default) | Ask |
The read tool runs freely. bash, write, and edit require approval.
Any MCP tool is blocked entirely. Anything else falls through to the default:
ask the user.
The Permission enum
Three variants, nothing more:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq)]
pub enum Permission {
/// Tool call is allowed without asking.
Allow,
/// Tool call is blocked without asking.
Deny,
/// User must be prompted for approval.
Ask,
}
}
PartialEq lets tests assert on decisions. Clone is needed because
evaluate() returns owned values (you’ll see why shortly).
PermissionRule
A rule pairs a glob pattern with a permission:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct PermissionRule {
/// Glob pattern matching tool names (e.g. "bash", "write", "*").
pub tool_pattern: String,
/// The permission to assign when the pattern matches.
pub permission: Permission,
}
}
The matches() method checks whether a tool name matches the rule’s pattern:
#![allow(unused)]
fn main() {
impl PermissionRule {
pub fn new(tool_pattern: impl Into<String>, permission: Permission) -> Self {
Self {
tool_pattern: tool_pattern.into(),
permission,
}
}
/// Check if this rule matches a tool name.
pub fn matches(&self, tool_name: &str) -> bool {
if let Ok(pattern) = glob::Pattern::new(&self.tool_pattern) {
pattern.matches(tool_name)
} else {
self.tool_pattern == tool_name
}
}
}
}
Glob patterns give you flexible matching:
"bash"– matches exactlybash."*"– matches everything (a catch-all rule)."mcp__*"– matches any MCP tool (mcp__fs__read,mcp__git__status, etc.).
If the pattern string is invalid as a glob, matches() falls back to exact
string comparison. This means plain tool names always work even if the glob
crate can’t parse them.
PermissionEngine
The engine holds an ordered list of rules, a default permission, and a set of session-level overrides:
#![allow(unused)]
fn main() {
pub struct PermissionEngine {
rules: Vec<PermissionRule>,
default_permission: Permission,
/// Session-level overrides (tool calls the user has already approved).
session_allows: std::collections::HashSet<String>,
}
}
Construction
Three constructors cover the common cases:
#![allow(unused)]
fn main() {
impl PermissionEngine {
pub fn new(rules: Vec<PermissionRule>, default_permission: Permission) -> Self {
Self {
rules,
default_permission,
session_allows: std::collections::HashSet::new(),
}
}
/// Create an engine that asks for everything by default.
pub fn ask_by_default(rules: Vec<PermissionRule>) -> Self {
Self::new(rules, Permission::Ask)
}
/// Create an engine that allows everything (no permission checks).
pub fn allow_all() -> Self {
Self::new(vec![], Permission::Allow)
}
}
}
allow_all() is useful during development or in trusted environments.
ask_by_default() is the safe default – if a tool doesn’t match any rule,
the user gets prompted.
The evaluate() method – your exercise
This is the core of the engine. Given a tool name and its arguments, return the permission decision.
The evaluation order is:
- Session overrides first. If the user already approved this tool during
the current session, return
Allow. - Rules in order. Walk the rules list. The first rule whose pattern matches the tool name wins – return its permission.
- Default. If no rule matches, return the default permission.
Here is the signature:
#![allow(unused)]
fn main() {
/// Evaluate permission for a tool call.
///
/// Returns the permission decision. If the result is `Ask`, the caller
/// should prompt the user and then call `record_session_allow` if approved.
pub fn evaluate(&self, tool_name: &str, _args: &Value) -> Permission {
todo!()
}
}
The _args parameter is reserved for future use – argument-level rules (e.g.
“allow bash only for cargo test”) are a natural extension, but we won’t
implement them here.
Implement evaluate() using the three-step logic above. The rest of this
section shows the solution.
Solution
#![allow(unused)]
fn main() {
pub fn evaluate(&self, tool_name: &str, _args: &Value) -> Permission {
// Check session-level overrides first
if self.session_allows.contains(tool_name) {
return Permission::Allow;
}
// Check rules in order
for rule in &self.rules {
if rule.matches(tool_name) {
return rule.permission.clone();
}
}
self.default_permission.clone()
}
}
Three things to note:
- Session overrides take priority over rules. Even if a rule says
Askforbash, a session override makes itAllow. This is intentional – when the user says “always allow” for a session, we honor that. - First match wins. If two rules match the same tool, the first one in the
list is used. This is the same precedence model used by firewalls,
.gitignore, and most rule-based systems. clone()on the return.Permissionis a simple enum, so cloning is cheap. We clone rather than returning a reference because the caller often needs to match on the owned value.
First-match semantics
The “first match wins” rule is important. Consider:
#![allow(unused)]
fn main() {
let rules = vec![
PermissionRule::new("bash", Permission::Allow),
PermissionRule::new("bash", Permission::Deny), // never reached
];
let engine = PermissionEngine::new(rules, Permission::Ask);
assert_eq!(engine.evaluate("bash", &json!({})), Permission::Allow);
}
The second rule is dead code. This lets you put specific rules before broad ones:
#![allow(unused)]
fn main() {
let rules = vec![
PermissionRule::new("read", Permission::Allow), // specific
PermissionRule::new("*", Permission::Ask), // catch-all
];
}
read gets Allow. Everything else falls through to the wildcard and gets
Ask.
Session-level overrides
When the user responds “always allow” (or just “y”) to a permission prompt,
you don’t want to ask again for the same tool in the same session. The engine
tracks this with a HashSet<String>:
#![allow(unused)]
fn main() {
/// Record that the user approved a tool for this session.
pub fn record_session_allow(&mut self, tool_name: &str) {
self.session_allows.insert(tool_name.to_string());
}
}
The typical flow in an agent loop:
#![allow(unused)]
fn main() {
let permission = engine.evaluate("bash", &args);
match permission {
Permission::Allow => { /* execute */ }
Permission::Deny => { /* return error to LLM */ }
Permission::Ask => {
if user_approves() {
engine.record_session_allow("bash");
// execute
} else {
// return error to LLM
}
}
}
}
After record_session_allow("bash"), every subsequent evaluate("bash", ...)
returns Allow immediately – the session override is checked before rules.
Note that session overrides are per-tool, not global:
#![allow(unused)]
fn main() {
let mut engine = PermissionEngine::ask_by_default(vec![]);
engine.record_session_allow("read");
assert_eq!(engine.evaluate("read", &json!({})), Permission::Allow);
assert_eq!(engine.evaluate("write", &json!({})), Permission::Ask); // still asks
}
Approving read doesn’t approve write. Each tool must be approved
individually.
Convenience methods
Two helpers reduce boilerplate at call sites:
#![allow(unused)]
fn main() {
/// Check if a tool is allowed (returns true for Allow, false for Deny/Ask).
pub fn is_allowed(&self, tool_name: &str, args: &Value) -> bool {
matches!(self.evaluate(tool_name, args), Permission::Allow)
}
/// Check if a tool requires user approval.
pub fn needs_approval(&self, tool_name: &str, args: &Value) -> bool {
matches!(self.evaluate(tool_name, args), Permission::Ask)
}
}
These are useful when you need a boolean check rather than a full match:
#![allow(unused)]
fn main() {
if engine.is_allowed("read", &args) {
// fast path, no prompt needed
}
}
Composing with SafeToolWrapper and InputHandler
Permissions, safety checks, and user input are three independent layers that compose naturally. Here is how they fit together in an agent loop:
Tool call arrives
|
v
PermissionEngine::evaluate()
|-- Allow --> SafeToolWrapper::call()
| |-- safety check passes --> inner tool executes
| |-- safety check fails --> error returned to LLM
|
|-- Deny --> error returned to LLM
|
|-- Ask --> InputHandler::ask("Allow bash?", &["yes", "no"])
|-- user says yes --> record_session_allow() + execute
|-- user says no --> error returned to LLM
Permissions decide whether to run. Safety checks (Ch18) validate how the
tool is called. The InputHandler (Ch11) collects the user’s answer when
permission is Ask.
In code, this might look like:
#![allow(unused)]
fn main() {
let permission = engine.evaluate(&call.name, &call.arguments);
match permission {
Permission::Allow => {
// SafeToolWrapper handles safety checks internally
let result = tools.call(&call.name, call.arguments.clone()).await?;
results.push((call.id.clone(), result));
}
Permission::Deny => {
results.push((
call.id.clone(),
format!("error: tool '{}' is not permitted", call.name),
));
}
Permission::Ask => {
let answer = input_handler
.ask(
&format!("Allow {} tool?", call.name),
&["yes".into(), "no".into()],
)
.await?;
if answer == "yes" {
engine.record_session_allow(&call.name);
let result = tools.call(&call.name, call.arguments.clone()).await?;
results.push((call.id.clone(), result));
} else {
results.push((
call.id.clone(),
format!("error: user denied tool '{}'", call.name),
));
}
}
}
}
Each layer is optional. You can use permissions without safety checks, safety checks without permissions, or all three together. This is the benefit of composable design – each piece does one job.
Wiring it up
Add the module to mini-claw-code/src/lib.rs:
#![allow(unused)]
fn main() {
pub mod permissions;
// ...
pub use permissions::{Permission, PermissionEngine, PermissionRule};
}
Running the tests
cargo test -p mini-claw-code ch19
The tests verify:
allow_all:PermissionEngine::allow_all()returnsAllowfor any tool.ask_by_default: engine with no rules andAskdefault returnsAsk.- Rule matching: explicit rules for
read,bash,writeeach return the correct permission. - Glob pattern:
"mcp__*"matchesmcp__fs__readbut notread. - First rule wins: duplicate rules for
bash– the first one wins. - Session allow: after
record_session_allow("bash"),evaluate("bash")returnsAllow. - Session allow per tool: approving
readdoes not approvewrite. is_allowed: returnstrueonly forAllow,falseforDenyandAsk.needs_approval: returnstrueonly forAsk.- Wildcard rule:
"*"matches any tool name. - Deny overrides default: a
Denyrule takes precedence over anAllowdefault.
Recap
Permissionhas three variants:Allow,Deny,Ask. Simple and exhaustive.PermissionRulepairs a glob pattern with a permission decision. Glob matching supports wildcards for tool families likemcp__*.PermissionEngineevaluates rules in order – first match wins. When no rule matches, the default permission applies.- Session overrides let the user approve a tool once and skip the prompt for the rest of the session. They take priority over rules.
- Composable: permissions layer on top of
SafeToolWrapper(Ch18) andInputHandler(Ch11) without coupling to either. - Purely additive: no changes to existing tools, agents, or safety checks.
Chapter 20: Hooks
Your agent can run tools, stream responses, ask the user questions, and plan before acting. But every new behavior – logging, auditing, blocking dangerous commands, running shell scripts on tool events – requires touching the agent loop directly. That does not scale.
Claude Code solves this with hooks: 12+ event types that let users and
extensions inject custom behavior at key points without rebuilding the agent.
Want to log every tool call? Register a hook. Want to block bash in
production? Register a hook. Want to run a linter after every file write?
Register a hook. The agent itself does not change.
In this chapter you will walk through:
- A
HookEventenum for the events hooks respond to. - A
HookActionenum for what hooks tell the agent to do. - A
Hooktrait – the async interface every hook implements. - A
HookRegistrythat stores hooks and dispatches events. - Three built-in hooks:
LoggingHook,BlockingHook, andShellHook. - How hooks integrate with the agent loop.
The event model
Open mini-claw-code/src/hooks.rs. At the top you will find two enums that
define the vocabulary between hooks and the agent.
HookEvent
HookEvent describes what happened:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub enum HookEvent {
/// Before a tool is executed.
PreToolCall {
tool_name: String,
args: Value,
},
/// After a tool finishes executing.
PostToolCall {
tool_name: String,
args: Value,
result: String,
},
/// The agent is starting a new run.
AgentStart {
prompt: String,
},
/// The agent finished with a final response.
AgentEnd {
response: String,
},
}
}
Four variants, each carrying the data a hook might need:
PreToolCallfires before a tool runs. It carries the tool name and the arguments the LLM chose. A hook can inspect these, log them, or decide to block the call entirely.PostToolCallfires after a tool completes. It adds theresultstring so hooks can audit what happened.AgentStartfires once when the agent begins a new run, carrying the user’s prompt.AgentEndfires once when the agent produces its final response.
This gives hooks four natural insertion points: two per tool call (before and after), plus the boundaries of the entire run.
HookAction
HookAction describes what should happen next:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq)]
pub enum HookAction {
/// Continue normally.
Continue,
/// Block the tool call with a reason.
Block(String),
/// Modify the tool arguments (PreToolCall only).
ModifyArgs(Value),
}
}
Three options:
Continue– do nothing special, proceed as normal.Block(reason)– abort the tool call. The reason string becomes the tool result so the LLM knows what happened and can adjust.ModifyArgs(new_args)– replace the tool arguments before execution. This only makes sense forPreToolCallevents (you cannot retroactively change args after the tool ran).
The combination of HookEvent and HookAction is the entire contract. Hooks
receive events and return actions. Nothing more.
The Hook trait
#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait Hook: Send + Sync {
/// Handle an event and return an action.
async fn on_event(&self, event: &HookEvent) -> HookAction;
}
}
One method. It takes an immutable reference to a HookEvent and returns a
HookAction. The trait requires Send + Sync because hooks live inside the
agent, which may be shared across threads (e.g. wrapped in Arc for TUI
apps).
The method is async because some hooks need I/O – ShellHook spawns a
child process, and future hooks might call HTTP endpoints. But simple hooks
like LoggingHook just push to a Vec and return immediately.
HookRegistry
Individual hooks are useful, but you typically want multiple hooks active at
once – a logger and a blocker and a shell script. HookRegistry manages
the collection:
#![allow(unused)]
fn main() {
pub struct HookRegistry {
hooks: Vec<Box<dyn Hook>>,
}
}
It stores hooks as trait objects in registration order. register() takes
&mut self for imperative use. with() takes self and returns it for
builder-pattern chaining:
#![allow(unused)]
fn main() {
let registry = HookRegistry::new()
.with(LoggingHook::new())
.with(BlockingHook::new(vec!["bash".into()], "blocked"));
}
There is also is_empty() so the agent loop can skip dispatch entirely when
no hooks are registered – a minor optimization, but a nice one.
Dispatch logic
The heart of the registry is dispatch():
#![allow(unused)]
fn main() {
pub async fn dispatch(&self, event: &HookEvent) -> HookAction {
let mut modified_args: Option<Value> = None;
for hook in &self.hooks {
match hook.on_event(event).await {
HookAction::Continue => {}
HookAction::Block(reason) => return HookAction::Block(reason),
HookAction::ModifyArgs(new_args) => {
modified_args = Some(new_args);
}
}
}
match modified_args {
Some(args) => HookAction::ModifyArgs(args),
None => HookAction::Continue,
}
}
}
Three rules govern dispatch:
-
Iterate in order. Hooks fire in the order they were registered. Registration order is your priority system.
-
Short-circuit on Block. The moment any hook returns
Block, dispatch stops immediately and returns thatBlock. Hooks registered after the blocking hook never see the event. This is important for correctness – if a security hook blocksbash, a logging hook registered later should not log a call that never happened. -
Collect ModifyArgs. If multiple hooks modify args, the last one wins (each overwrites
modified_args). If no hook blocked and at least one modified args,ModifyArgsis returned. If nobody did anything,Continueis returned.
This gives you a clean priority chain: blocking hooks should be registered before logging hooks so they can short-circuit first.
Built-in hooks
The module provides three hooks out of the box. They cover the most common patterns and serve as examples for writing your own.
LoggingHook
#![allow(unused)]
fn main() {
pub struct LoggingHook {
log: std::sync::Mutex<Vec<String>>,
}
}
LoggingHook records a one-line summary of every event into a Vec<String>.
Its on_event formats each variant into a compact tag – "pre:bash",
"post:read", "agent:start", "agent:end" – pushes it into the vec
behind the mutex, and returns Continue. Logging is observation, not
intervention.
The messages() method clones and returns the accumulated log.
Notice this uses std::sync::Mutex, not tokio::sync::Mutex. The lock is
held only long enough to push a string or clone the vec – no .await inside
the critical section. A std::sync::Mutex is cheaper than a tokio::sync::Mutex
for these short, synchronous operations. Compare this with MockInputHandler
from Chapter 11, which needed tokio::sync::Mutex because its lock guard was
held across an .await boundary.
LoggingHook is particularly useful in tests. Register it alongside other
hooks, run the agent, and then inspect messages() to verify exactly which
events fired and in what order.
BlockingHook
#![allow(unused)]
fn main() {
pub struct BlockingHook {
blocked_tools: Vec<String>,
reason: String,
}
}
BlockingHook takes a list of tool names and a reason string. If a
PreToolCall event matches any blocked tool, it returns Block:
#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl Hook for BlockingHook {
async fn on_event(&self, event: &HookEvent) -> HookAction {
if let HookEvent::PreToolCall { tool_name, .. } = event
&& self.blocked_tools.iter().any(|b| b == tool_name)
{
return HookAction::Block(self.reason.clone());
}
HookAction::Continue
}
}
}
This uses a let-chain (same syntax as resolve_option in Chapter 11):
the if let pattern match and the .any() check are joined with &&. If
either condition fails, the hook returns Continue.
Use this for safety rails. For example, block bash in a read-only review
mode:
#![allow(unused)]
fn main() {
let registry = HookRegistry::new()
.with(BlockingHook::new(
vec!["bash".into(), "write".into(), "edit".into()],
"read-only mode: mutation tools are disabled",
));
}
The LLM receives the reason string as the tool result, so it knows why the call was blocked and can adapt its approach.
ShellHook
#![allow(unused)]
fn main() {
pub struct ShellHook {
command: String,
tool_pattern: Option<glob::Pattern>,
}
}
ShellHook runs a shell command whenever a tool event fires. It is the escape
hatch: anything you can do in a shell script, you can do in a hook.
The for_tool() builder method restricts the hook to tools matching a glob
pattern. Without it, the hook fires on every tool event. With it, only
matching tool names trigger the command. The glob crate provides
Unix-style pattern matching – "write*" would match write and
write_file, "*" matches everything.
The Hook implementation only responds to PreToolCall and PostToolCall
events (it ignores AgentStart and AgentEnd). It extracts the tool name,
checks matches_tool(), then spawns the command with
tokio::process::Command::new("sh").arg("-c").arg(&self.command):
#![allow(unused)]
fn main() {
match result {
Ok(output) => {
if output.status.success() {
HookAction::Continue
} else {
let stderr = String::from_utf8_lossy(&output.stderr).to_string();
HookAction::Block(format!("hook failed: {stderr}"))
}
}
Err(e) => HookAction::Block(format!("hook error: {e}")),
}
}
If the command succeeds (exit code 0), the hook returns Continue. If it
fails, the hook returns Block with the stderr output. This means a
ShellHook can act as a gate: run a linter after a file write, and block the
result if the linter fails.
Example – run cargo fmt --check after every write or edit:
#![allow(unused)]
fn main() {
let registry = HookRegistry::new()
.with(ShellHook::new("cargo fmt --check").for_tool("write"))
.with(ShellHook::new("cargo fmt --check").for_tool("edit"));
}
Integrating with the agent loop
Hooks are designed to sit at two points in the agent loop: before and after tool execution. Here is how the dispatch points look conceptually in a hook-aware agent:
#![allow(unused)]
fn main() {
for call in &turn.tool_calls {
// 1. Dispatch PreToolCall
let pre_action = registry.dispatch(&HookEvent::PreToolCall {
tool_name: call.name.clone(),
args: call.arguments.clone(),
}).await;
let result = match pre_action {
HookAction::Block(reason) => reason, // skip the tool entirely
HookAction::ModifyArgs(new_args) => {
tool.call(new_args).await.unwrap_or_else(|e| format!("error: {e}"))
}
HookAction::Continue => {
tool.call(call.arguments.clone()).await.unwrap_or_else(|e| format!("error: {e}"))
}
};
// 2. Dispatch PostToolCall
registry.dispatch(&HookEvent::PostToolCall {
tool_name: call.name.clone(),
args: call.arguments.clone(),
result: result.clone(),
}).await;
}
}
The pattern is:
-
Before execution: dispatch
PreToolCall. If the action isBlock, skip the tool entirely and use the reason as the result. IfModifyArgs, execute with the new args. IfContinue, execute normally. -
After execution: dispatch
PostToolCallwith the result. The return action is typicallyContinue(you cannot undo a tool call), but hooks can still log, audit, or trigger side effects. -
Run boundaries: dispatch
AgentStartat the beginning ofrun()andAgentEndwhen the agent produces its final response.
The existing SimpleAgent and StreamingAgent do not have hooks wired in –
this is an extension point you would add when building a production agent. The
HookRegistry is intentionally separate so you can compose it into whatever
agent architecture you have.
Tests
Run the tests with:
cargo test -p mini-claw-code ch20
The tests verify each component in isolation, then test composition:
- LoggingHook: fires a single
PreToolCalland checksmessages() == ["pre:bash"]. A second test fires all four event types and asserts the log matches["agent:start", "pre:read", "post:read", "agent:end"]. - BlockingHook:
PreToolCallfor a blocked tool returnsBlock("bash is disabled"); the same hook returnsContinueforread. - Registry dispatch: a registry with only
LoggingHookreturnsContinue. Adding aBlockingHookproducesBlockfor the targeted tool. - Multiple hooks: two
LoggingHooks both see the event (both logs have length 1). - Short-circuit: the most important test. A
BlockingHookis registered first, aLoggingHooksecond:
#![allow(unused)]
fn main() {
let registry = HookRegistry::new()
.with(BlockingHook::new(vec!["bash".into()], "blocked"))
.with(ArcHook(log.clone()));
let action = registry.dispatch(&event).await;
assert_eq!(action, HookAction::Block("blocked".into()));
// The second hook should NOT have been called
assert_eq!(log.messages().len(), 0);
}
The logger never saw the event – Block stopped iteration. Registration
order matters.
- PostToolCall:
LoggingHookcorrectly logs"post:write". - is_empty: an empty registry returns
true; adding a hook flips it tofalse.
The observer/middleware pattern
If you have worked with web frameworks, hooks will feel familiar. They implement two overlapping patterns:
-
Observer pattern: hooks observe events without affecting them.
LoggingHookis a pure observer – it watches everything and changes nothing. -
Middleware pattern: hooks can intercept and modify the pipeline.
BlockingHookshort-circuits execution.ModifyArgsrewrites the request before it reaches the tool. This is middleware.
The HookRegistry is a middleware chain with observer capabilities. The
dispatch loop is the pipeline, Block is early return, and ModifyArgs is
request transformation.
This design keeps the agent loop clean. Instead of scattering if statements
for every new behavior, you register hooks. The agent loop just calls
dispatch() at two points and obeys the returned action. New behaviors are
added by implementing Hook, not by modifying the agent.
Recap
HookEventrepresents four lifecycle points:PreToolCall,PostToolCall,AgentStart,AgentEnd.HookActiongives hooks three options:Continue,Block, orModifyArgs.Hooktrait has a single async method:on_event.HookRegistrydispatches events to hooks in order, short-circuiting onBlockand collectingModifyArgs.LoggingHookrecords events for inspection – ideal for testing.BlockingHookblocks specific tools by name – ideal for safety rails.ShellHookruns arbitrary shell commands on tool events – the escape hatch for anything else.- Hooks follow the observer/middleware pattern: observe without changing, or intercept and modify the pipeline.
- The agent loop stays clean – just call
dispatch()before and after tool execution and obey the returned action.
Chapter 21: MCP – Model Context Protocol
Your agent has tools – read, write, bash, subagents – but they are all compiled into the binary. What happens when someone wants to give your agent access to a database, a Kubernetes cluster, or a Slack workspace?
You could write a Tool implementation for each one. That doesn’t scale.
Every integration means new code, a new release, tight coupling.
MCP (Model Context Protocol) solves this. It is an open standard created by Anthropic that lets AI agents discover and use tools from external server processes. Claude Code uses MCP. Cursor uses MCP. There are hundreds of community MCP servers for everything from GitHub to PostgreSQL.
The idea: spawn a separate process that speaks JSON-RPC over stdio. Your agent asks “what tools do you have?”, gets back definitions, and calls them like any other tool. The server handles the integration. Your agent just speaks the protocol.
In this chapter you will:
- Understand the MCP protocol: JSON-RPC 2.0 over stdio, the handshake sequence, and the tool lifecycle.
- Define the protocol types:
JsonRpcRequest,JsonRpcResponse,McpToolDef. - Build
McpClient: spawn a child process, perform the handshake, list tools, and call them. - Implement
McpTool: a wrapper that bridges MCP tools into theTooltrait so the agent loop handles them transparently. - Wire it into the config system with
McpServerConfig.
This is the capstone chapter. When you finish, your agent will be able to connect to any MCP server and use its tools – the same way the real Claude Code does.
The protocol
MCP uses JSON-RPC 2.0 over stdio. The client (your agent) spawns the server as a child process, writes JSON to its stdin, and reads JSON from its stdout. Each message is a single line of JSON terminated by a newline.
The lifecycle has three phases:
Client Server
| |
|--- initialize --------------->| Phase 1: Handshake
|<-- initialize result ---------|
|--- notifications/initialized ->|
| |
|--- tools/list --------------->| Phase 2: Discovery
|<-- tools list ----------------|
| |
|--- tools/call --------------->| Phase 3: Execution
|<-- tool result ---------------|
| ... |
Phase 1: Handshake. The client sends initialize with its protocol
version and capabilities. The server responds. The client sends
notifications/initialized to signal completion.
Phase 2: Discovery. tools/list returns tool definitions – name,
description, and JSON Schema for input parameters.
Phase 3: Execution. tools/call sends a tool name and arguments. The
server executes and returns the result.
Every request is {"jsonrpc": "2.0", "id": 1, "method": "...", "params": {...}}.
Responses carry either "result" or "error". That’s the entire protocol
surface we need. MCP has more features (resources, prompts, sampling), but
tools are the core.
Protocol types
Create mini-claw-code/src/mcp/types.rs. These types map directly to the
JSON-RPC wire format.
#![allow(unused)]
fn main() {
use serde::{Deserialize, Serialize};
use serde_json::Value;
#[derive(Serialize)]
pub(crate) struct JsonRpcRequest {
pub jsonrpc: &'static str,
pub id: u64,
pub method: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub params: Option<Value>,
}
impl JsonRpcRequest {
pub fn new(id: u64, method: impl Into<String>, params: Option<Value>) -> Self {
Self {
jsonrpc: "2.0",
id,
method: method.into(),
params,
}
}
}
}
jsonrpc is always "2.0" – no allocation. params uses
skip_serializing_if because JSON-RPC omits the field when absent. id
is a monotonically increasing u64 for matching responses to requests.
The response side:
#![allow(unused)]
fn main() {
#[derive(Deserialize)]
pub(crate) struct JsonRpcResponse {
pub jsonrpc: String,
pub id: u64,
pub result: Option<Value>,
pub error: Option<JsonRpcError>,
}
#[derive(Deserialize, Debug)]
pub(crate) struct JsonRpcError {
pub code: i64,
pub message: String,
}
}
And the MCP-specific types:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Deserialize)]
pub struct McpToolDef {
pub name: String,
#[serde(default)]
pub description: Option<String>,
#[serde(rename = "inputSchema", default)]
pub input_schema: Option<Value>,
}
#[derive(Deserialize)]
pub(crate) struct InitializeResult {
pub capabilities: Option<Value>,
}
#[derive(Deserialize)]
pub(crate) struct ToolsListResult {
pub tools: Vec<McpToolDef>,
}
#[derive(Deserialize)]
pub(crate) struct ToolCallResult {
pub content: Vec<ToolCallContent>,
}
#[derive(Deserialize)]
pub(crate) struct ToolCallContent {
#[serde(rename = "type")]
pub type_: Option<String>,
pub text: Option<String>,
}
}
McpToolDef is what the server returns from tools/list. The inputSchema
field uses camelCase on the wire (MCP convention), so we rename it with
serde. Both description and input_schema are optional – a minimal tool
only needs a name.
ToolCallResult returns an array of content blocks (similar to Claude’s
API). Each block has a type (usually "text") and a text field. We will
extract and join the text blocks to produce a single string.
Building McpClient
Create mini-claw-code/src/mcp/client.rs. The McpClient manages a child
process and speaks JSON-RPC to it.
#![allow(unused)]
fn main() {
use std::sync::atomic::{AtomicU64, Ordering};
use anyhow::Context;
use serde_json::Value;
use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
use tokio::process::{Child, Command};
use tokio::sync::Mutex;
pub struct McpClient {
stdin: Mutex<tokio::process::ChildStdin>,
stdout: Mutex<BufReader<tokio::process::ChildStdout>>,
_child: Mutex<Child>,
next_id: AtomicU64,
server_name: String,
}
}
Why Mutex? Stdin and stdout are not Clone. We need shared access
(McpTool holds an Arc<McpClient>), so we wrap them in
tokio::sync::Mutex. The _child field holds ownership of the process so
it doesn’t get dropped. AtomicU64 gives us lock-free request IDs.
Connecting and handshaking
The connect constructor spawns the process and performs the handshake:
#![allow(unused)]
fn main() {
impl McpClient {
pub async fn connect(
server_name: impl Into<String>,
command: &str,
args: &[String],
) -> anyhow::Result<Self> {
let server_name = server_name.into();
let mut child = Command::new(command)
.args(args)
.stdin(std::process::Stdio::piped())
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::null())
.spawn()
.with_context(|| format!("failed to spawn MCP server: {command}"))?;
let stdin = child.stdin.take().context("failed to get stdin")?;
let stdout = child.stdout.take().context("failed to get stdout")?;
let client = Self {
stdin: Mutex::new(stdin),
stdout: Mutex::new(BufReader::new(stdout)),
_child: Mutex::new(child),
next_id: AtomicU64::new(1),
server_name,
};
client.initialize().await?;
Ok(client)
}
}
}
We use tokio::process::Command for async I/O. Stderr goes to null – MCP
servers communicate exclusively over stdout. The initialize method sends
the two-part handshake:
#![allow(unused)]
fn main() {
async fn initialize(&self) -> anyhow::Result<()> {
let params = serde_json::json!({
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": { "name": "mini-claw-code", "version": "0.1.0" }
});
let result = self.request("initialize", Some(params)).await?;
let _: InitializeResult = serde_json::from_value(result)
.context("failed to parse initialize response")?;
// Send initialized notification
let id = self.next_id.fetch_add(1, Ordering::Relaxed);
let notification = JsonRpcRequest::new(id, "notifications/initialized", None);
let mut payload = serde_json::to_string(¬ification)?;
payload.push('\n');
let mut stdin = self.stdin.lock().await;
stdin.write_all(payload.as_bytes()).await?;
stdin.flush().await?;
Ok(())
}
}
First initialize – a request-response pair. Then
notifications/initialized – technically a notification, but we format it
as a request for simplicity. The core method driving all communication:
#![allow(unused)]
fn main() {
async fn request(&self, method: &str, params: Option<Value>) -> anyhow::Result<Value> {
let id = self.next_id.fetch_add(1, Ordering::Relaxed);
let request = JsonRpcRequest::new(id, method, params);
let mut payload = serde_json::to_string(&request)?;
payload.push('\n');
{
let mut stdin = self.stdin.lock().await;
stdin.write_all(payload.as_bytes()).await
.context("failed to write to MCP server")?;
stdin.flush().await
.context("failed to flush MCP server stdin")?;
}
let mut line = String::new();
{
let mut stdout = self.stdout.lock().await;
loop {
line.clear();
let bytes_read = stdout.read_line(&mut line).await
.context("failed to read from MCP server")?;
if bytes_read == 0 {
anyhow::bail!("MCP server closed stdout unexpectedly");
}
let trimmed = line.trim();
if trimmed.is_empty() { continue; }
if let Ok(resp) = serde_json::from_str::<JsonRpcResponse>(trimmed) {
if let Some(error) = resp.error {
anyhow::bail!("MCP server error ({}): {}", error.code, error.message);
}
return Ok(resp.result.unwrap_or(Value::Null));
}
// Not a valid response -- skip (could be a notification)
}
}
}
}
The read loop skips notifications and blank lines. The scope blocks drop
the stdin lock before acquiring stdout, preventing deadlocks. With
request() in place, the public methods are short:
#![allow(unused)]
fn main() {
pub async fn list_tools(&self) -> anyhow::Result<Vec<McpToolDef>> {
let result = self.request("tools/list", None).await?;
let list: ToolsListResult =
serde_json::from_value(result).context("failed to parse tools/list")?;
Ok(list.tools)
}
pub async fn call_tool(&self, name: &str, arguments: Value) -> anyhow::Result<String> {
let params = serde_json::json!({ "name": name, "arguments": arguments });
let result = self.request("tools/call", Some(params)).await?;
let call_result: ToolCallResult =
serde_json::from_value(result).context("failed to parse tools/call")?;
let text: Vec<String> = call_result.content.into_iter()
.filter_map(|c| c.text)
.collect();
Ok(text.join("\n"))
}
}
call_tool extracts just the text content blocks and joins them with
newlines – matching how our agent represents tool results as plain strings.
Converting MCP tools to ToolDefinition
There’s a gap between MCP’s McpToolDef (owned String fields) and our
ToolDefinition (&'static str fields). The convert_tool_defs method
bridges it:
#![allow(unused)]
fn main() {
pub fn convert_tool_defs(tools: &[McpToolDef], prefix: &str) -> Vec<ToolDefinition> {
tools.iter().map(|t| {
let name = format!("mcp__{prefix}__{}", t.name);
let desc = t.description.clone()
.unwrap_or_else(|| format!("MCP tool: {}", t.name));
let params = t.input_schema.clone()
.unwrap_or_else(|| serde_json::json!({"type": "object", "properties": {}}));
// Leak strings for 'static lifetime (loaded once at startup)
let name: &'static str = Box::leak(name.into_boxed_str());
let desc: &'static str = Box::leak(desc.into_boxed_str());
ToolDefinition { name, description: desc, parameters: params }
}).collect()
}
}
Two important design decisions here:
The naming convention: mcp__servername__toolname. Double underscores
separate the MCP prefix, server name, and tool name. A filesystem server
named fs with a tool called read_file becomes mcp__fs__read_file.
This prevents collisions between MCP servers and between MCP tools and
built-in tools. Claude Code uses the exact same convention.
String leaking with Box::leak. Our ToolDefinition uses
&'static str – a design choice from Chapter 1 that avoids lifetime
parameters everywhere. MCP tool names are dynamically constructed, so they
can’t be &'static str naturally. Box::leak converts an owned String
by intentionally leaking the heap allocation.
Is this okay? Yes. MCP tools are loaded once at startup – typically dozens of strings. They live for the entire program duration anyway. This is a well-known Rust pattern for configuration data loaded once and never freed.
The McpTool wrapper
The agent works with the Tool trait. We need a struct that implements
Tool and forwards calls to the MCP server. This goes in
mini-claw-code/src/mcp/mod.rs:
#![allow(unused)]
fn main() {
pub(crate) mod client;
pub(crate) mod types;
pub use client::McpClient;
pub use types::McpToolDef;
use async_trait::async_trait;
use serde_json::Value;
use crate::types::{Tool, ToolDefinition};
pub struct McpTool {
client: std::sync::Arc<McpClient>,
definition: ToolDefinition,
remote_name: String,
}
impl McpTool {
pub fn new(
client: std::sync::Arc<McpClient>,
remote_name: String,
definition: ToolDefinition,
) -> Self {
Self { client, definition, remote_name }
}
}
#[async_trait]
impl Tool for McpTool {
fn definition(&self) -> &ToolDefinition {
&self.definition
}
async fn call(&self, args: Value) -> anyhow::Result<String> {
self.client.call_tool(&self.remote_name, args).await
}
}
}
Arc<McpClient> gives shared ownership (multiple tools from one server
share a client). definition is the mcp__server__tool name the LLM sees.
remote_name is the original name the server expects. The Tool
implementation is glue: definition() returns the local definition,
call() forwards to client.call_tool() with the remote name.
Configuration
In Chapter 16 you built the config system. MCP servers slot right in with
McpServerConfig in config.rs:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Deserialize)]
pub struct McpServerConfig {
pub name: String,
pub command: String,
#[serde(default)]
pub args: Vec<String>,
#[serde(default)]
pub env: std::collections::HashMap<String, String>,
}
}
In the config file:
[[mcp_servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@anthropic/mcp-filesystem-server", "/home/user/projects"]
[[mcp_servers]]
name = "github"
command = "npx"
args = ["-y", "@anthropic/mcp-github-server"]
env = { GITHUB_TOKEN = "ghp_..." }
At startup, iterate over configured servers, connect, discover, and register:
#![allow(unused)]
fn main() {
use std::sync::Arc;
for server_config in &config.mcp_servers {
let client = McpClient::connect(
&server_config.name,
&server_config.command,
&server_config.args,
).await?;
let client = Arc::new(client);
let mcp_tools = client.list_tools().await?;
let defs = McpClient::convert_tool_defs(&mcp_tools, client.server_name());
for (mcp_def, tool_def) in mcp_tools.into_iter().zip(defs) {
tools.push(McpTool::new(client.clone(), mcp_def.name, tool_def));
}
}
}
The agent loop doesn’t know or care that some tools are local and others are
remote MCP servers. They all implement Tool. The abstraction works.
Module structure
Wire up the module and re-export from lib.rs:
#![allow(unused)]
fn main() {
pub mod mcp;
// ...
pub use mcp::{McpClient, McpTool};
}
The submodules client and types are pub(crate) – internal
implementation details. Only McpClient, McpTool, and McpToolDef are
part of the public API.
Running the tests
cargo test -p mini-claw-code ch21
The tests verify protocol types and conversion logic without a real MCP
server. They cover: convert_tool_defs with empty, single, multiple, and
missing-description inputs; McpToolDef deserialization (including the
inputSchema rename and minimal name-only definitions); JsonRpcRequest
serialization (with and without params, verifying skip_serializing_if);
and ToolCallResult content extraction.
Integration tests for McpClient::connect require a real MCP server process
and are better suited for CI.
What you’ve built
Take a step back and look at what you have.
Your agent started as type definitions in Chapter 1. Now it has streaming, subagents, safety rails, token tracking, context management, permissions – and with MCP, it is extensible without recompilation. Anyone can write an MCP server in any language and your agent will discover and use its tools at runtime. The same protocol Claude Code and Cursor speak.
Here’s the full lifecycle when a user configures an MCP server:
1. Config loads McpServerConfig from config.toml
2. McpClient::connect() spawns the server process
3. Client sends initialize, receives capabilities
4. Client sends notifications/initialized
5. Client sends tools/list, receives tool definitions
6. convert_tool_defs() creates ToolDefinitions with mcp__ prefix
7. McpTool wrappers are added to the ToolSet
8. User asks a question
9. Agent loop sends prompt + all tool definitions to the LLM
10. LLM decides to call mcp__github__search_repos
11. Agent finds the McpTool, calls it
12. McpTool forwards to McpClient::call_tool()
13. Client sends tools/call JSON-RPC to the server process
14. Server executes, returns results
15. Client parses the response, returns text
16. Agent loop adds result to the conversation
17. LLM uses the result to answer the user
Seventeen steps, three process boundaries, one seamless experience.
Recap
- MCP is the standard protocol for AI tool servers. JSON-RPC 2.0 over stdio, line-delimited.
- The handshake:
initialize->notifications/initialized->tools/list. Three messages and the client knows what the server can do. McpClientspawns the server, manages stdio viatokio::sync::Mutex, usesAtomicU64for request IDs. The read loop skips notifications.convert_tool_defsbridges MCP’s owned strings to&'static strviaBox::leak. Themcp__server__toolconvention prevents collisions.McpToolwrapsArc<McpClient>and implementsTool. The agent loop treats MCP tools identically to built-in tools.McpServerConfigmeans zero code changes to add new servers.- The abstraction holds. A tool is a tool – whether
call()reads a local file or sends JSON-RPC to a remote process.