Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Overview

Welcome to Build Your Own Mini Coding Agent in Rust. Over the next seven chapters you will implement a mini coding agent from scratch – a small version of programs like Claude Code or OpenCode – a program that takes a prompt, talks to a large-language model (LLM), and uses tools to interact with the real world. After that, a series of extension chapters add streaming, a TUI, user input, plan mode, and more.

By the end of this book you will have an agent that can run shell commands, read and write files, and edit code, all driven by an LLM. No API key is required until Chapter 6, and when you get there the default model is openrouter/free – a zero-cost endpoint on OpenRouter, no credits needed.

What is an AI agent?

An LLM on its own is a function: text in, text out. Ask it to summarize doc.pdf and it will either refuse or hallucinate – it has no way to open the file.

An agent solves this by giving the LLM tools. A tool is just a function your code can run – read a file, execute a shell command, hit an API. The agent sits in a loop:

  1. Send the user’s prompt to the LLM.
  2. The LLM decides it needs to read doc.pdf and outputs a tool call.
  3. Your code executes the read tool and feeds the file contents back.
  4. The LLM now has the text and returns a summary.

The LLM never touches the filesystem. It just asks, and your code does. That loop – ask, execute, feed back – is the entire idea.

How does an LLM use a tool?

An LLM cannot execute code. It is a text generator. So “calling a tool” really means the LLM outputs a structured request and your code does the rest.

When you send a request to the LLM, you include a list of tool definitions alongside the conversation. Each definition is a name, a description, and a JSON schema describing the arguments. For our read tool that looks like:

{
  "name": "read",
  "description": "Read the contents of a file.",
  "parameters": {
    "type": "object",
    "properties": {
      "path": { "type": "string" }
    },
    "required": ["path"]
  }
}

The LLM reads these definitions the same way it reads the user’s prompt – they are just part of the input. When it decides it needs to read a file, it does not run any code. It produces a structured output like:

{ "name": "read", "arguments": { "path": "doc.pdf" } }

along with a signal that says “I’m not done yet – I made a tool call.” Your code parses this, runs the real function, and sends the result back as a new message. The LLM then continues with that result in context.

Here is the full exchange for our “Summarize doc.pdf” example:

sequenceDiagram
    participant U as User
    participant A as Agent
    participant L as LLM
    participant T as read tool

    U->>A: "Summarize doc.pdf"
    A->>L: prompt + tool definitions
    L-->>A: tool_call: read("doc.pdf")
    A->>T: read("doc.pdf")
    T-->>A: file contents
    A->>L: tool result (file contents)
    L-->>A: "Here is a summary: ..."
    A->>U: "Here is a summary: ..."

The LLM’s only job is deciding which tool to call and what arguments to pass. Your code does the actual work.

A minimal agent in pseudocode

Here is that example as code:

tools    = [read_file]
messages = ["Summarize doc.pdf"]

loop:
    response = llm(messages, tools)

    if response.done:
        print(response.text)
        break

    // The LLM wants to call a tool -- run it and feed the result back.
    for call in response.tool_calls:
        result = execute(call.name, call.args)
        messages.append(result)

That is the entire agent. The rest of this book is implementing each piece – the llm function, the tools, and the types that connect them – in Rust.

The tool-calling loop

Here is the flow of a single agent invocation:

flowchart TD
    A["👤 User prompt"] --> B["🤖 LLM"]
    B -- "StopReason::Stop" --> C["✅ Text response"]
    B -- "StopReason::ToolUse" --> D["🔧 Execute tool calls"]
    D -- "tool results" --> B
  1. The user sends a prompt.
  2. The LLM either responds with text (done) or requests one or more tool calls.
  3. Your code executes each tool and gathers the results.
  4. The results are fed back to the LLM as new messages.
  5. Repeat from step 2 until the LLM responds with text.

That is the entire architecture. Everything else is implementation detail.

What we will build

We will build a simple agent framework consisting of:

4 tools:

ToolWhat it does
readRead the contents of a file
writeWrite content to a file (creating directories as needed)
editReplace an exact string in a file
bashRun a shell command and capture its output

1 provider:

ProviderPurpose
OpenRouterProviderTalks to a real LLM over HTTP via the OpenAI-compatible API

Tests use a MockProvider that returns pre-configured responses so you can run the full test suite without an API key.

Project structure

The project is a Cargo workspace with three crates and a tutorial book:

mini-claw-code/
  Cargo.toml              # workspace root
  mini-claw-code/             # reference solution (do not peek!)
  mini-claw-code-starter/     # YOUR code -- you implement things here
  mini-claw-code-xtask/             # helper commands (cargo x ...)
  mini-claw-code-book/              # this tutorial
  • mini-claw-code contains the complete, working implementation. It is there so the test suite can verify that the exercises are solvable, but you should avoid reading it until you have tried on your own.
  • mini-claw-code-starter is your working crate. Each source file contains struct definitions, trait implementations with unimplemented!() bodies, and doc-comment hints. Your job is to replace the unimplemented!() calls with real code.
  • mini-claw-code-xtask provides the cargo x helper with check, solution-check, and book commands.
  • mini-claw-code-book is this mdbook tutorial.

Prerequisites

Before starting, make sure you have:

  • Rust installed (1.85+ required, for edition 2024). Install from https://rustup.rs.
  • Basic Rust knowledge: ownership, structs, enums, pattern matching, and Result / Option. If you have read the first half of The Rust Programming Language book, you are ready.
  • A terminal and a text editor.
  • mdbook (optional, for reading the tutorial locally). Install with cargo install mdbook mdbook-mermaid.

You do not need an API key until Chapter 6. Chapters 1 through 5 use the MockProvider for testing, so everything runs locally.

Setup

Clone the repository and verify things build:

git clone https://github.com/odysa/mini-claw-code.git
cd mini-claw-code
cargo build

Then verify the test harness works:

cargo test -p mini-claw-code-starter ch1

The tests should fail – that is expected! Your job in Chapter 1 is to make them pass.

If cargo x does not work, make sure you are in the workspace root (the directory containing the top-level Cargo.toml).

Chapter roadmap

ChapterTopicWhat you build
1Core TypesMockProvider – understand the core types by building a test helper
2Your First ToolReadTool – reading files
3Single Turnsingle_turn() – explicit match on StopReason, one round of tool calls
4More ToolsBashTool, WriteTool, EditTool
5Your First Agent SDK!SimpleAgent – generalizes single_turn() into a loop
6The OpenRouter ProviderOpenRouterProvider – talking to a real LLM API
7A Simple CLIWire everything into an interactive CLI with conversation memory
8The SingularityYour agent can now code itself – what’s next

Chapters 1–7 are hands-on: you write code in mini-claw-code-starter and run tests to check your work. Chapter 8 marks the transition to extension chapters (9+) which walk through the reference implementation:

ChapterTopicWhat it adds
9A Better TUIMarkdown rendering, spinners, collapsed tool calls
10StreamingStreamingAgent with SSE parsing and AgentEvents
11User InputAskTool – let the LLM ask you clarifying questions
12Plan ModePlanAgent – read-only planning phase with approval gating

Chapters 1–7 follow the same rhythm:

  1. Read the chapter to understand the concepts.
  2. Open the corresponding source file in mini-claw-code-starter/src/.
  3. Replace the unimplemented!() calls with your implementation.
  4. Run cargo test -p mini-claw-code-starter chN to check your work.

Ready? Let’s build an agent.

What’s next

Head to Chapter 1: Core Types to understand the foundational types – StopReason, Message, and the Provider trait – and build MockProvider, the test helper you will use throughout the next four chapters.

Chapter 1: Core Types

In this chapter you will understand the types that make up the agent protocol – StopReason, AssistantTurn, Message, and the Provider trait. These are the building blocks everything else is built on.

To verify your understanding, you will implement a small test helper: MockProvider, a struct that returns pre-configured responses so that you can test future chapters without an API key.

Goal

Understand the core types, then implement MockProvider so that:

  1. You create it with a VecDeque<AssistantTurn> of canned responses.
  2. Each call to chat() returns the next response in sequence.
  3. If all responses have been consumed, it returns an error.

The core types

Open mini-claw-code-starter/src/types.rs. These types define the protocol between the agent and any LLM backend.

Here is how they relate to each other:

classDiagram
    class Provider {
        <<trait>>
        +chat(messages, tools) AssistantTurn
    }

    class AssistantTurn {
        text: Option~String~
        tool_calls: Vec~ToolCall~
        stop_reason: StopReason
    }

    class StopReason {
        <<enum>>
        Stop
        ToolUse
    }

    class ToolCall {
        id: String
        name: String
        arguments: Value
    }

    class Message {
        <<enum>>
        System(String)
        User(String)
        Assistant(AssistantTurn)
        ToolResult(id, content)
    }

    class ToolDefinition {
        name: &'static str
        description: &'static str
        parameters: Value
    }

    Provider --> AssistantTurn : returns
    Provider --> Message : receives
    Provider --> ToolDefinition : receives
    AssistantTurn --> StopReason
    AssistantTurn --> ToolCall : contains 0..*
    Message --> AssistantTurn : wraps

Provider takes in messages and tool definitions, and returns an AssistantTurn. The turn’s stop_reason tells you what to do next.

ToolDefinition and its builder

#![allow(unused)]
fn main() {
pub struct ToolDefinition {
    pub name: &'static str,
    pub description: &'static str,
    pub parameters: Value,
}
}

Each tool declares a ToolDefinition that tells the LLM what it can do. The parameters field is a JSON Schema object describing the tool’s arguments.

Rather than building JSON by hand every time, ToolDefinition has a builder API:

#![allow(unused)]
fn main() {
ToolDefinition::new("read", "Read the contents of a file.")
    .param("path", "string", "The file path to read", true)
}
  • new(name, description) creates a definition with an empty parameter schema.
  • param(name, type, description, required) adds a parameter and returns self, so you can chain calls.

You will use this builder in every tool starting from Chapter 2.

StopReason and AssistantTurn

#![allow(unused)]
fn main() {
pub enum StopReason {
    Stop,
    ToolUse,
}

pub struct AssistantTurn {
    pub text: Option<String>,
    pub tool_calls: Vec<ToolCall>,
    pub stop_reason: StopReason,
}
}

The ToolCall struct holds a single tool invocation:

#![allow(unused)]
fn main() {
pub struct ToolCall {
    pub id: String,
    pub name: String,
    pub arguments: Value,
}
}

Each tool call has an id (for matching results back to requests), a name (which tool to call), and arguments (a JSON value the tool will parse).

Every response from the LLM comes with a stop_reason that tells you why the model stopped generating:

  • StopReason::Stop – the model is done. Check text for the response.
  • StopReason::ToolUse – the model wants to call tools. Check tool_calls.

This is the raw LLM protocol: the model tells you what to do next. In Chapter 3 you will write a function that explicitly matches on stop_reason to handle each case. In Chapter 5 you will wrap that match inside a loop to create the full agent.

The Provider trait

#![allow(unused)]
fn main() {
pub trait Provider: Send + Sync {
    fn chat<'a>(
        &'a self,
        messages: &'a [Message],
        tools: &'a [&'a ToolDefinition],
    ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a;
}
}

This says: “A Provider is something that can take a slice of messages and a slice of tool definitions, and asynchronously return an AssistantTurn.”

The Send + Sync bounds mean the provider must be safe to share across threads. This is important because tokio (the async runtime) may move tasks between threads.

Notice that chat() takes &self, not &mut self. The real provider (OpenRouterProvider) does not need mutation – it just fires HTTP requests. Making the trait &mut self would force every caller to hold exclusive access, which is unnecessarily restrictive. The trade-off: MockProvider (a test helper) does need to mutate its response list, so it must use interior mutability to conform to the trait.

The Message enum

#![allow(unused)]
fn main() {
pub enum Message {
    System(String),
    User(String),
    Assistant(AssistantTurn),
    ToolResult { id: String, content: String },
}
}

The conversation history is a list of Message values:

  • System(text) – a system prompt that sets the agent’s role and behavior. Typically the first message in the history.
  • User(text) – a prompt from the user.
  • Assistant(turn) – a response from the LLM (text, tool calls, or both).
  • ToolResult { id, content } – the result of executing a tool call. The id matches the ToolCall::id so the LLM knows which call this result belongs to.

You will use these variants starting in Chapter 3 when building the single_turn() function.

Why Provider uses impl Future but Tool uses #[async_trait]

You may notice in Chapter 2 that the Tool trait uses #[async_trait] while Provider uses impl Future directly. The difference is about how the trait is used:

  • Provider is used generically (SimpleAgent<P: Provider>). The compiler knows the concrete type at compile time, so impl Future works.
  • Tool is stored as a trait object (Box<dyn Tool>) in a collection of different tool types. Trait objects require a uniform return type, which #[async_trait] provides by boxing the future.

When implementing a trait that uses impl Future, you can simply write async fn in the impl block – Rust desugars it to the impl Future form automatically. So while the trait definition says -> impl Future<...>, your implementation can just write async fn chat(...).

If this distinction is unclear now, it will click in Chapter 5 when you see both patterns in action.

ToolSet – a collection of tools

One more type you will use starting in Chapter 3: ToolSet. It wraps a HashMap<String, Box<dyn Tool>> and indexes tools by name, giving O(1) lookup when executing tool calls. You build one with a builder:

#![allow(unused)]
fn main() {
let tools = ToolSet::new()
    .with(ReadTool::new())
    .with(BashTool::new());
}

You do not need to implement ToolSet – it is provided in types.rs.

Implementing MockProvider

Now that you understand the types, let’s put them to use. MockProvider is a test helper – it implements Provider by returning canned responses instead of calling a real LLM. You will use it throughout chapters 2–5 to test tools and the agent loop without needing an API key.

Open mini-claw-code-starter/src/mock.rs. You will see the struct and method signatures already laid out with unimplemented!() bodies.

Interior mutability with Mutex

MockProvider needs to remove responses from a list each time chat() is called. But chat() takes &self. How do we mutate through a shared reference?

Rust’s std::sync::Mutex provides interior mutability: you wrap a value in a Mutex, and calling .lock().unwrap() gives you a mutable guard even through &self. The lock ensures only one thread accesses the data at a time.

#![allow(unused)]
fn main() {
use std::collections::VecDeque;
use std::sync::Mutex;

struct MyState {
    items: Mutex<VecDeque<String>>,
}

impl MyState {
    fn take_one(&self) -> Option<String> {
        self.items.lock().unwrap().pop_front()
    }
}
}

Step 1: The struct fields

The struct already has the field you need: a Mutex<VecDeque<AssistantTurn>> to hold the responses. This is provided so that the method signatures compile. Your job is to implement the methods that use this field.

Step 2: Implement new()

The new() method receives a VecDeque<AssistantTurn>. We want FIFO order – each call to chat() should return the first remaining response, not the last. VecDeque::pop_front() does exactly that in O(1):

flowchart LR
    subgraph "VecDeque (FIFO)"
        direction LR
        A["A"] ~~~ B["B"] ~~~ C["C"]
    end
    A -- "pop_front()" --> out1["chat() → A"]
    B -. "next call" .-> out2["chat() → B"]
    C -. "next call" .-> out3["chat() → C"]

So in new():

  1. Wrap the input deque in a Mutex.
  2. Store it in Self.

Step 3: Implement chat()

The chat() method should:

  1. Lock the mutex.
  2. pop_front() the next response.
  3. If there is one, return Ok(response).
  4. If the deque is empty, return an error.

The mock provider intentionally ignores the messages and tools parameters. It does not care what the “user” said – it just returns the next canned response.

A useful pattern for converting Option to Result:

#![allow(unused)]
fn main() {
some_option.ok_or_else(|| anyhow::anyhow!("no more responses"))
}

Running the tests

Run the Chapter 1 tests:

cargo test -p mini-claw-code-starter ch1

What the tests verify

  • test_ch1_returns_text: Creates a MockProvider with one response containing text. Calls chat() once and checks the text matches.
  • test_ch1_returns_tool_calls: Creates a provider with one response containing a tool call. Verifies the tool call name and id.
  • test_ch1_steps_through_sequence: Creates a provider with three responses. Calls chat() three times and verifies they come back in the correct order (First, Second, Third).

These are the core tests. There are also additional edge-case tests (empty responses, exhausted queue, multiple tool calls, etc.) that will pass once your core implementation is correct.

Recap

You have learned the core types that define the agent protocol:

  • StopReason tells you whether the LLM is done or wants to call tools.
  • AssistantTurn carries the LLM’s response – text, tool calls, or both.
  • Provider is the trait any LLM backend implements.

You also built MockProvider, a test helper you will use throughout the next four chapters to simulate LLM conversations without HTTP requests.

What’s next

In Chapter 2: Your First Tool you will implement the ReadTool – a tool that reads file contents and returns them to the LLM.

Chapter 2: Your First Tool

Now that you have a mock provider, it is time to build your first tool. You will implement ReadTool – a tool that reads a file and returns its contents. This is the simplest tool in our agent, but it introduces the Tool trait pattern that every other tool follows.

Goal

Implement ReadTool so that:

  1. It declares its name, description, and parameter schema.
  2. When called with a {"path": "some/file.txt"} argument, it reads the file and returns its contents as a string.
  3. Missing arguments or non-existent files produce errors.

Key Rust concepts

The Tool trait

Open mini-claw-code-starter/src/types.rs and look at the Tool trait:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait Tool: Send + Sync {
    fn definition(&self) -> &ToolDefinition;
    async fn call(&self, args: Value) -> anyhow::Result<String>;
}
}

Two methods:

  • definition() returns metadata about the tool: its name, a description, and a JSON schema describing its parameters. The LLM uses this to decide which tool to call and how to format the arguments.
  • call() actually executes the tool. It receives a serde_json::Value containing the arguments and returns a string result.

ToolDefinition

#![allow(unused)]
fn main() {
pub struct ToolDefinition {
    pub name: &'static str,
    pub description: &'static str,
    pub parameters: Value,
}
}

As you saw in Chapter 1, ToolDefinition has a builder API for declaring parameters. For ReadTool, we need a single required parameter called "path" of type "string":

#![allow(unused)]
fn main() {
ToolDefinition::new("read", "Read the contents of a file.")
    .param("path", "string", "The file path to read", true)
}

Under the hood, the builder constructs the JSON Schema you saw in Chapter 1. The last argument (true) marks the parameter as required.

Why #[async_trait] instead of plain async fn?

You might wonder why we use the async_trait macro instead of writing async fn directly in the trait. The reason is trait object compatibility.

Later, in the agent loop, we will store tools in a ToolSet – a HashMap-backed collection of different tool types behind a common interface. This requires dynamic dispatch, which means the compiler needs to know the size of the return type at compile time.

async fn in traits generates a different, uniquely-sized Future type for each implementation. That breaks dynamic dispatch. The #[async_trait] macro automatically rewrites async fn into a method that returns Pin<Box<dyn Future<...>>>, which has a known, fixed size regardless of which tool produced it. You write normal async fn code, and the macro handles the boxing for you.

Here is the data flow when the agent calls a tool:

flowchart LR
    A["LLM returns<br/>ToolCall"] --> B["args: JSON Value<br/>{&quot;path&quot;: &quot;f.txt&quot;}"]
    B --> C["Tool::call(args)"]
    C --> D["Result: String<br/>(file contents)"]
    D --> E["Sent back to LLM<br/>as ToolResult"]

The LLM never touches the filesystem. It produces a JSON request, your code executes it, and returns a string.

The implementation

Open mini-claw-code-starter/src/tools/read.rs. The struct, Default impl, and method signatures are already provided.

Remember to annotate your impl Tool for ReadTool block with #[async_trait::async_trait]. The starter file already has this in place.

Step 1: Implement new()

Create a ToolDefinition and store it in self.definition. Use the builder:

#![allow(unused)]
fn main() {
ToolDefinition::new("read", "Read the contents of a file.")
    .param("path", "string", "The file path to read", true)
}

Step 2: definition() – already provided

The definition() method is already implemented in the starter – it simply returns &self.definition. No work needed here.

Step 3: Implement call()

This is where the real work happens. Your implementation should:

  1. Extract the "path" argument from args.
  2. Read the file asynchronously.
  3. Return the file contents.

Here is the shape:

#![allow(unused)]
fn main() {
async fn call(&self, args: Value) -> anyhow::Result<String> {
    // 1. Extract path
    // 2. Read file with tokio::fs::read_to_string
    // 3. Return contents
}
}

Some useful APIs:

  • args["path"].as_str() returns Option<&str>. Use .context("missing 'path' argument")? from anyhow to convert None into a descriptive error.
  • tokio::fs::read_to_string(path).await reads a file asynchronously. Chain .with_context(|| format!("failed to read '{path}'"))? for a clear error message.

That is it – extract the path, read the file, return the contents.

Running the tests

Run the Chapter 2 tests:

cargo test -p mini-claw-code-starter ch2

What the tests verify

  • test_ch2_read_definition: Creates a ReadTool and checks that its name is "read", description is non-empty, and "path" is in the required parameters.
  • test_ch2_read_file: Creates a temp file with known content, calls ReadTool with the file path, and checks the returned content matches.
  • test_ch2_read_missing_file: Calls ReadTool with a path that does not exist and verifies it returns an error.
  • test_ch2_read_missing_arg: Calls ReadTool with an empty JSON object (no "path" key) and verifies it returns an error.

There are also additional edge-case tests (empty files, unicode content, wrong argument types, etc.) that will pass once your core implementation is correct.

Recap

You built your first tool by implementing the Tool trait. The key patterns:

  • ToolDefinition::new(...).param(...) declares the tool’s name, description, and parameters.
  • #[async_trait::async_trait] on the impl block lets you write async fn call() while keeping trait object compatibility.
  • tokio::fs for async file I/O.
  • anyhow::Context for adding descriptive error messages.

Every tool in the agent follows this exact same structure. Once you understand ReadTool, the remaining tools are variations on the theme.

What’s next

In Chapter 3: Single Turn you will write a function that matches on StopReason to handle a single round of tool calls.

Chapter 3: Single Turn

You have a provider and a tool. Before jumping to the full agent loop, let’s see the raw protocol: the LLM returns a stop_reason that tells you whether it is done or wants to use tools. In this chapter you will write a function that handles exactly one prompt with at most one round of tool calls.

Goal

Implement single_turn() so that:

  1. It sends a prompt to the provider.
  2. It matches on stop_reason.
  3. If Stop – return the text.
  4. If ToolUse – execute the tools, send results back, return the final text.

No loop. Just one turn.

Key Rust concepts

ToolSet – a HashMap of tools

The function signature takes a &ToolSet instead of a raw slice or vector:

#![allow(unused)]
fn main() {
pub async fn single_turn<P: Provider>(
    provider: &P,
    tools: &ToolSet,
    prompt: &str,
) -> anyhow::Result<String>
}

ToolSet wraps a HashMap<String, Box<dyn Tool>> and indexes tools by their definition name. This gives O(1) lookup when executing tool calls instead of scanning a list. The builder API auto-extracts the name from each tool’s definition:

#![allow(unused)]
fn main() {
let tools = ToolSet::new().with(ReadTool::new());
let result = single_turn(&provider, &tools, "Read test.txt").await?;
}

match on StopReason

This is the core teaching point. Instead of checking tool_calls.is_empty(), you explicitly match on the stop reason:

#![allow(unused)]
fn main() {
match turn.stop_reason {
    StopReason::Stop => { /* return text */ }
    StopReason::ToolUse => { /* execute tools */ }
}
}

This makes the protocol visible. The LLM is telling you what to do, and you handle each case explicitly.

Here is the complete flow of single_turn():

flowchart TD
    A["prompt"] --> B["provider.chat()"]
    B --> C{"stop_reason?"}
    C -- "Stop" --> D["Return text"]
    C -- "ToolUse" --> E["Execute each tool call"]
    E --> F{"Tool error?"}
    F -- "Ok" --> G["result = output"]
    F -- "Err" --> H["result = error message"]
    G --> I["Push Assistant message"]
    H --> I
    I --> J["Push ToolResult messages"]
    J --> K["provider.chat() again"]
    K --> L["Return final text"]

The key difference from the full agent loop (Chapter 5) is that there is no outer loop here. If the LLM asks for tools a second time, single_turn() does not handle it – that is what the agent loop is for.

The implementation

Open mini-claw-code-starter/src/agent.rs. You will see the single_turn() function signature at the top of the file, before the SimpleAgent struct.

Step 1: Collect tool definitions

ToolSet has a definitions() method that returns all tool schemas:

#![allow(unused)]
fn main() {
let defs = tools.definitions();
}

Step 2: Create the initial message

#![allow(unused)]
fn main() {
let mut messages = vec![Message::User(prompt.to_string())];
}

Step 3: Call the provider

#![allow(unused)]
fn main() {
let turn = provider.chat(&messages, &defs).await?;
}

Step 4: Match on stop_reason

This is the heart of the function:

#![allow(unused)]
fn main() {
match turn.stop_reason {
    StopReason::Stop => Ok(turn.text.unwrap_or_default()),
    StopReason::ToolUse => {
        // execute tools, send results, get final answer
    }
}
}

For the ToolUse branch:

  1. For each tool call, find the matching tool and call it. Collect the results into a Vec first – you will need turn.tool_calls for this, so you cannot move turn yet.
  2. Push Message::Assistant(turn) and then Message::ToolResult for each result. Pushing the assistant turn moves turn, which is why you must collect results beforehand.
  3. Call the provider again to get the final answer.
  4. Return final_turn.text.unwrap_or_default().

The tool-finding and execution logic is the same as what you will use in the agent loop (Chapter 5):

#![allow(unused)]
fn main() {
println!("{}", tool_summary(call));
let content = match tools.get(&call.name) {
    Some(t) => t.call(call.arguments.clone()).await
        .unwrap_or_else(|e| format!("error: {e}")),
    None => format!("error: unknown tool `{}`", call.name),
};
}

The tool_summary() helper prints each tool call to the terminal so you can see which tools the agent is using and what arguments it passed. For example, [bash: ls -la] or [read: src/main.rs]. (The reference implementation uses print!("\x1b[2K\r...") instead of println! to clear the thinking... indicator line before printing – you’ll see this pattern in Chapter 7. A plain println! works fine for now.)

Error handling – never crash the loop

Notice that tool errors are caught, not propagated. The .unwrap_or_else() converts any error into a string like "error: failed to read 'missing.txt'". This string is sent back to the LLM as a normal tool result. The LLM can then decide what to do – try a different file, use another tool, or explain the problem to the user.

The same applies to unknown tools – instead of panicking, you send an error message back as a tool result.

This is a key design principle: the agent loop should never crash because of a tool failure. Tools operate on the real world (files, processes, network), and failures are expected. The LLM is smart enough to recover if you give it the error message.

Here is the message sequence for a successful tool call:

sequenceDiagram
    participant ST as single_turn()
    participant P as Provider
    participant T as ReadTool

    ST->>P: [User("Read test.txt")] + tool defs
    P-->>ST: ToolUse: read({path: "test.txt"})
    ST->>T: call({path: "test.txt"})
    T-->>ST: "file contents..."
    Note over ST: Push Assistant + ToolResult
    ST->>P: [User, Assistant, ToolResult]
    P-->>ST: Stop: "Here are the contents: ..."
    ST-->>ST: return text

And here is what happens when a tool fails (e.g. file not found):

sequenceDiagram
    participant ST as single_turn()
    participant P as Provider
    participant T as ReadTool

    ST->>P: [User("Read missing.txt")] + tool defs
    P-->>ST: ToolUse: read({path: "missing.txt"})
    ST->>T: call({path: "missing.txt"})
    T--xST: Err("failed to read 'missing.txt'")
    Note over ST: Catch error, use as result
    Note over ST: Push Assistant + ToolResult("error: failed to read ...")
    ST->>P: [User, Assistant, ToolResult]
    P-->>ST: Stop: "Sorry, that file doesn't exist."
    ST-->>ST: return text

The error does not crash the agent. It becomes a tool result that the LLM reads and responds to.

Running the tests

Run the Chapter 3 tests:

cargo test -p mini-claw-code-starter ch3

What the tests verify

  • test_ch3_direct_response: Provider returns StopReason::Stop. single_turn should return the text directly.

  • test_ch3_one_tool_call: Provider returns StopReason::ToolUse with a read tool call, then StopReason::Stop. Verifies the file was read and the final text is returned.

  • test_ch3_unknown_tool: Provider returns StopReason::ToolUse for a tool that does not exist. Verifies the error message is sent as a tool result and the final text is returned.

  • test_ch3_tool_error_propagates: Provider requests a read on a file that does not exist. The error should be caught and sent back to the LLM as a tool result (not crash the function). The LLM then responds with text.

There are also additional edge-case tests (empty responses, multiple tool calls in one turn, etc.) that will pass once your core implementation is correct.

Recap

You have written the simplest possible handler for the LLM protocol:

  • Match on StopReason – the model tells you what to do next.
  • No loop – you handle at most one round of tool calls.
  • ToolSet – a HashMap-backed collection with O(1) tool lookup by name.

This is the foundation. In Chapter 5 you will wrap this same logic in a loop to create the full agent.

What’s next

In Chapter 4: More Tools you will implement three more tools: BashTool, WriteTool, and EditTool.

Chapter 4: More Tools

You have already implemented ReadTool and understand the Tool trait pattern. Now you will implement three more tools: BashTool, WriteTool, and EditTool. Each follows the same structure – define a schema, implement call() – so this chapter reinforces the pattern through repetition.

By the end of this chapter your agent will have all four tools it needs to interact with the file system and execute commands.

flowchart LR
    subgraph ToolSet
        R["read<br/>Read a file"]
        B["bash<br/>Run a command"]
        W["write<br/>Write a file"]
        E["edit<br/>Replace a string"]
    end
    Agent -- "tools.get(name)" --> ToolSet

Goal

Implement three tools:

  1. BashTool – run a shell command and return its output.
  2. WriteTool – write content to a file, creating directories as needed.
  3. EditTool – replace an exact string in a file (must appear exactly once).

Key Rust concepts

tokio::process::Command

Tokio provides an async wrapper around std::process::Command. You will use it in BashTool:

#![allow(unused)]
fn main() {
let output = tokio::process::Command::new("bash")
    .arg("-c")
    .arg(command)
    .output()
    .await?;
}

This runs bash -c "<command>" and captures stdout and stderr. The output struct has stdout and stderr fields as Vec<u8>, which you convert to strings with String::from_utf8_lossy().

bail!() macro

The anyhow::bail!() macro is shorthand for returning an error immediately:

#![allow(unused)]
fn main() {
use anyhow::bail;

if count == 0 {
    bail!("not found");
}
// equivalent to:
// return Err(anyhow::anyhow!("not found"));
}

You will use this in EditTool for validation.

Make sure to import it: use anyhow::{Context, bail};. The starter file already includes this import in edit.rs.

create_dir_all

When writing a file to a path like a/b/c/file.txt, the parent directories might not exist. tokio::fs::create_dir_all creates the entire directory tree:

#![allow(unused)]
fn main() {
if let Some(parent) = std::path::Path::new(path).parent() {
    tokio::fs::create_dir_all(parent).await?;
}
}

Tool 1: BashTool

Open mini-claw-code-starter/src/tools/bash.rs.

Schema

Use the builder pattern you learned in Chapter 2:

#![allow(unused)]
fn main() {
ToolDefinition::new("bash", "Run a bash command and return its output.")
    .param("command", "string", "The bash command to run", true)
}

Implementation

The call() method should:

  1. Extract "command" from args.
  2. Run bash -c <command> using tokio::process::Command.
  3. Capture stdout and stderr.
  4. Build a result string:
    • Start with stdout (if non-empty).
    • Append stderr prefixed with "stderr: " (if non-empty).
    • If both are empty, return "(no output)".

Think about how you combine stdout and stderr. If both are present, you want them separated by a newline. Something like:

#![allow(unused)]
fn main() {
let mut result = String::new();
if !stdout.is_empty() {
    result.push_str(&stdout);
}
if !stderr.is_empty() {
    if !result.is_empty() {
        result.push('\n');
    }
    result.push_str("stderr: ");
    result.push_str(&stderr);
}
if result.is_empty() {
    result.push_str("(no output)");
}
}

Tool 2: WriteTool

Open mini-claw-code-starter/src/tools/write.rs.

Schema

#![allow(unused)]
fn main() {
ToolDefinition::new("write", "Write content to a file, creating directories as needed.")
    .param("path", "string", "The file path to write to", true)
    .param("content", "string", "The content to write to the file", true)
}

Implementation

The call() method should:

  1. Extract "path" and "content" from args.
  2. Create parent directories if they do not exist.
  3. Write the content to the file.
  4. Return a confirmation message like "wrote {path}".

For creating parent directories:

#![allow(unused)]
fn main() {
if let Some(parent) = std::path::Path::new(path).parent() {
    tokio::fs::create_dir_all(parent).await
        .with_context(|| format!("failed to create directories for '{path}'"))?;
}
}

Then write the file:

#![allow(unused)]
fn main() {
tokio::fs::write(path, content).await
    .with_context(|| format!("failed to write '{path}'"))?;
}

Tool 3: EditTool

Open mini-claw-code-starter/src/tools/edit.rs.

Schema

#![allow(unused)]
fn main() {
ToolDefinition::new("edit", "Replace an exact string in a file (must appear exactly once).")
    .param("path", "string", "The file path to edit", true)
    .param("old_string", "string", "The exact string to find and replace", true)
    .param("new_string", "string", "The replacement string", true)
}

Implementation

The call() method is the most interesting of the bunch. It should:

  1. Extract "path", "old_string", and "new_string" from args.
  2. Read the file contents.
  3. Count how many times old_string appears in the content.
  4. If the count is 0, return an error: the string was not found.
  5. If the count is greater than 1, return an error: the string is ambiguous.
  6. Replace the single occurrence and write the file back.
  7. Return a confirmation like "edited {path}".

The validation is important – requiring exactly one match prevents accidental edits in the wrong place.

flowchart TD
    A["Read file"] --> B["Count matches<br/>of old_string"]
    B --> C{"count?"}
    C -- "0" --> D["Error: not found"]
    C -- "1" --> E["Replace + write file"]
    C -- ">1" --> F["Error: ambiguous"]
    E --> G["Return &quot;edited path&quot;"]

Useful APIs:

  • content.matches(old).count() counts occurrences of a substring.
  • content.replacen(old, new, 1) replaces the first occurrence.
  • bail!("old_string not found in '{path}'") for the not-found case.
  • bail!("old_string appears {count} times in '{path}', must be unique") for the ambiguous case.

Running the tests

Run the Chapter 4 tests:

cargo test -p mini-claw-code-starter ch4

What the tests verify

BashTool:

  • test_ch4_bash_definition: Checks name is "bash" and "command" is required.
  • test_ch4_bash_runs_command: Runs echo hello and checks the output contains "hello".
  • test_ch4_bash_captures_stderr: Runs echo err >&2 and checks stderr is captured.
  • test_ch4_bash_missing_arg: Passes empty args and expects an error.

WriteTool:

  • test_ch4_write_definition: Checks name is "write".
  • test_ch4_write_creates_file: Writes to a temp file and reads it back.
  • test_ch4_write_creates_dirs: Writes to a/b/c/out.txt and verifies directories were created.
  • test_ch4_write_missing_arg: Passes only "path" (no "content") and expects an error.

EditTool:

  • test_ch4_edit_definition: Checks name is "edit".
  • test_ch4_edit_replaces_string: Edits "hello" to "goodbye" in a file containing "hello world" and checks the result is "goodbye world".
  • test_ch4_edit_not_found: Tries to replace a string that does not exist and expects an error.
  • test_ch4_edit_not_unique: Tries to replace "a" in a file containing "aaa" (three occurrences) and expects an error.

There are also additional edge-case tests for each tool (wrong argument types, missing arguments, output format checks, etc.) that will pass once your core implementations are correct.

Recap

You now have four tools, and they all follow the same pattern:

  1. Define a ToolDefinition with ::new(...).param(...) builder calls.
  2. Return &self.definition from definition().
  3. Add #[async_trait::async_trait] on the impl Tool block and write async fn call().

This is a deliberate design. The Tool trait makes every tool interchangeable from the agent’s perspective. The agent does not know or care how a tool works internally – it only needs the definition (to tell the LLM) and the call method (to execute it).

What’s next

With a provider and four tools ready, it is time to connect them. In Chapter 5: Your First Agent SDK! you will build the SimpleAgent – the core loop that sends prompts to the provider, executes tool calls, and iterates until the LLM gives a final answer.

Chapter 5: Your First Agent SDK!

This is the chapter where everything comes together. You have a provider that returns AssistantTurn responses and four tools that execute actions. Now you will build the SimpleAgent – the loop that connects them.

This is the “aha!” moment of the tutorial. The agent loop is surprisingly short, but it is the engine that makes an LLM into an agent.

What is an agent loop?

In Chapter 3 you built single_turn() – one prompt, one round of tool calls, one final answer. That is enough when the LLM knows everything it needs after reading a single file. But real tasks are messier:

“Find the bug in this project and fix it.”

The LLM might need to read five files, run the test suite, edit a source file, run the tests again, and then report back. Each of those is a tool call, and the LLM cannot plan them all upfront because the result of one call determines the next. It needs a loop.

The agent loop is that loop:

flowchart TD
    A["User prompt"] --> B["Call LLM"]
    B -- "StopReason::Stop" --> C["Return text"]
    B -- "StopReason::ToolUse" --> D["Execute tool calls"]
    D -- "Push assistant + tool results" --> B
  1. Send messages to the LLM.
  2. If the LLM says “I’m done” (StopReason::Stop), return its text.
  3. If the LLM says “I need tools” (StopReason::ToolUse), execute them.
  4. Append the assistant turn and tool results to the message history.
  5. Go to step 1.

That is the entire architecture of every coding agent – Claude Code, Cursor, OpenCode, Copilot. The details vary (streaming, parallel tool calls, safety checks), but the core loop is always the same. And you are about to build it in about 30 lines of Rust.

Goal

Implement SimpleAgent so that:

  1. It holds a provider and a collection of tools.
  2. You can register tools using a builder pattern (.tool(ReadTool::new())).
  3. The run() method implements the tool-calling loop: prompt -> provider -> tool calls -> tool results -> provider -> … -> final text.

Key Rust concepts

Generics with trait bounds

#![allow(unused)]
fn main() {
pub struct SimpleAgent<P: Provider> {
    provider: P,
    tools: ToolSet,
}
}

The <P: Provider> means SimpleAgent is generic over any type that implements the Provider trait. When you use MockProvider, the compiler generates code specialized for MockProvider. When you use OpenRouterProvider, it generates code for that type. Same logic, different providers.

ToolSet – a HashMap of trait objects

The tools field is a ToolSet, which wraps a HashMap<String, Box<dyn Tool>> internally. Each value is a heap-allocated trait object that implements Tool, but the concrete types can differ. One might be a ReadTool, the next a BashTool. The HashMap key is the tool’s name, giving O(1) lookup when executing tool calls.

Why trait objects (Box<dyn Tool>) instead of generics? Because you need a heterogeneous collection. A Vec<T> requires all elements to be the same type. With Box<dyn Tool>, you erase the concrete type and store them all behind the same interface.

This is why the Tool trait uses #[async_trait] – the macro rewrites async fn into a boxed future with a uniform type across different tool implementations.

The builder pattern

The tool() method takes self by value (not &mut self) and returns Self:

#![allow(unused)]
fn main() {
pub fn tool(mut self, t: impl Tool + 'static) -> Self {
    // push the tool
    self
}
}

This lets you chain calls:

#![allow(unused)]
fn main() {
let agent = SimpleAgent::new(provider)
    .tool(BashTool::new())
    .tool(ReadTool::new())
    .tool(WriteTool::new())
    .tool(EditTool::new());
}

The impl Tool + 'static parameter accepts any type implementing Tool with a 'static lifetime (meaning it does not borrow temporary data). Inside the method, you push it into the ToolSet, which boxes it and indexes it by name.

The implementation

Open mini-claw-code-starter/src/agent.rs. The struct definition and method signatures are provided.

Step 1: Implement new()

Store the provider and initialize an empty ToolSet:

#![allow(unused)]
fn main() {
pub fn new(provider: P) -> Self {
    Self {
        provider,
        tools: ToolSet::new(),
    }
}
}

This one is straightforward.

Step 2: Implement tool()

Push the tool into the set, return self:

#![allow(unused)]
fn main() {
pub fn tool(mut self, t: impl Tool + 'static) -> Self {
    self.tools.push(t);
    self
}
}

Step 3: Implement run() – the core loop

This is the heart of the agent. Here is the flow:

  1. Collect tool definitions from all registered tools.
  2. Create a messages vector starting with the user’s prompt.
  3. Loop: a. Call self.provider.chat(&messages, &defs) to get an AssistantTurn. b. Match on turn.stop_reason:
    • StopReason::Stop – the LLM is done, return turn.text.
    • StopReason::ToolUse – for each tool call:
      1. Find the matching tool by name.
      2. Call it with the arguments.
      3. Collect the result. c. Push the AssistantTurn as a Message::Assistant. d. Push each tool result as a Message::ToolResult. e. Continue the loop.

Think about the data flow carefully. After executing tools, you push both the assistant’s turn (so the LLM can see what it requested) and the tool results (so it can see what happened). This gives the LLM full context to decide what to do next.

Gathering tool definitions

At the start of run(), collect all tool definitions from the ToolSet:

#![allow(unused)]
fn main() {
let defs = self.tools.definitions();
}

The loop structure

This is single_turn() (from Chapter 3) wrapped in a loop. Instead of handling just one round, we match on stop_reason inside a loop:

#![allow(unused)]
fn main() {
loop {
    let turn = self.provider.chat(&messages, &defs).await?;

    match turn.stop_reason {
        StopReason::Stop => return Ok(turn.text.unwrap_or_default()),
        StopReason::ToolUse => {
            // Execute tool calls, collect results
            // Push messages
        }
    }
}
}

Finding and calling tools

For each tool call, look it up by name in the ToolSet:

#![allow(unused)]
fn main() {
println!("{}", tool_summary(call));
let content = match self.tools.get(&call.name) {
    Some(t) => t.call(call.arguments.clone()).await
        .unwrap_or_else(|e| format!("error: {e}")),
    None => format!("error: unknown tool `{}`", call.name),
};
}

The tool_summary() helper prints each tool call to the terminal – one line per tool with its key argument, so you can watch what the agent does in real time. For example: [bash: cat Cargo.toml] or [write: src/lib.rs].

Error handling

Tool errors are caught with .unwrap_or_else() and converted into a string that gets sent back to the LLM as a tool result. This is the same pattern from Chapter 3, and it is critical here because the agent loop runs multiple iterations. If a tool error crashed the loop, the agent would die on the first missing file or failed command. Instead, the LLM sees the error and can recover – try a different path, adjust the command, or explain the problem.

> What's in README.md?
[read: README.md]          <-- tool fails (file not found)
[read: Cargo.toml]         <-- LLM recovers, tries another file
Here is the project info from Cargo.toml...

Unknown tools are handled the same way – an error string as the tool result, not a crash.

Pushing messages

After executing all tool calls for a turn, push the assistant message and the tool results. You need to collect results first (because the turn is moved into Message::Assistant):

#![allow(unused)]
fn main() {
let mut results = Vec::new();
for call in &turn.tool_calls {
    // ... execute and collect (id, content) pairs
}

messages.push(Message::Assistant(turn));
for (id, content) in results {
    messages.push(Message::ToolResult { id, content });
}
}

The order matters: assistant message first, then tool results. This matches the format that LLM APIs expect.

Running the tests

Run the Chapter 5 tests:

cargo test -p mini-claw-code-starter ch5

What the tests verify

  • test_ch5_text_response: Provider returns text immediately (no tools). Agent should return that text.

  • test_ch5_single_tool_call: Provider first requests a read tool call, then returns text. Agent should execute the tool and return the final text.

  • test_ch5_unknown_tool: Provider requests a tool that does not exist. Agent should handle it gracefully (return an error string as the tool result) and continue to get the final text.

  • test_ch5_multi_step_loop: Provider requests read twice across two turns, then returns text. Verifies the loop runs multiple iterations.

  • test_ch5_empty_response: Provider returns None for text and no tool calls. Agent should return an empty string.

  • test_ch5_builder_chain: Verifies that .tool().tool() chaining compiles – a compile-time check for the builder pattern.

  • test_ch5_tool_error_propagates: Provider requests a read on a file that does not exist. The error should be caught and sent back as a tool result. The LLM then responds with text. Verifies the loop does not crash on tool failures.

There are also additional edge-case tests (three-step loops, multi-tool pipelines, etc.) that will pass once your core implementation is correct.

Seeing it all work

Once the tests pass, take a moment to appreciate what you have built. With about 30 lines of code in run(), you have a working agent loop. Here is what happens when a test runs agent.run("Read test.txt"):

  1. Messages: [User("Read test.txt")]
  2. Provider returns: tool call for read with {"path": "test.txt"}
  3. Agent calls ReadTool::call(), gets file contents
  4. Messages: [User("Read test.txt"), Assistant(tool_call), ToolResult("file content")]
  5. Provider returns: text response
  6. Agent returns the text

The mock provider makes this deterministic and testable. But the exact same loop works with a real LLM provider – you just swap MockProvider for OpenRouterProvider.

Recap

The agent loop is the core of the framework:

  • Generics (<P: Provider>) let it work with any provider.
  • ToolSet (a HashMap of Box<dyn Tool>) gives O(1) tool lookup by name.
  • The builder pattern makes setup ergonomic.
  • Error resilience – tool errors are caught and sent back to the LLM, not propagated. The loop never crashes from a tool failure.
  • The loop is simple: call provider, match on stop_reason, execute tools, feed results back, repeat.

What’s next

Your agent works, but only with the mock provider. In Chapter 6: The OpenRouter Provider you will implement OpenRouterProvider, which talks to a real LLM API over HTTP. This is what turns your agent from a testing harness into a real, usable tool.

Chapter 6: The OpenRouter Provider

Up to now, everything has run locally with the MockProvider. In this chapter you will implement OpenRouterProvider – a provider that talks to a real LLM over HTTP using the OpenAI-compatible chat completions API.

This is the chapter that makes your agent real.

Goal

Implement OpenRouterProvider so that:

  1. It can be created with an API key and model name.
  2. It converts our internal Message and ToolDefinition types to the API format.
  3. It sends HTTP POST requests to the chat completions endpoint.
  4. It parses responses back into AssistantTurn.

Key Rust concepts

Serde derives and attributes

The API types in openrouter.rs are already provided – you do not need to modify them. But understanding them helps:

#![allow(unused)]
fn main() {
#[derive(Serialize, Deserialize, Clone, Debug)]
pub(crate) struct ApiToolCall {
    pub(crate) id: String,
    #[serde(rename = "type")]
    pub(crate) type_: String,
    pub(crate) function: ApiFunction,
}
}

Key serde attributes used:

  • #[serde(rename = "type")] – The JSON field is called "type", but type is a reserved keyword in Rust. So the struct field is type_ and serde renames it during serialization/deserialization.

  • #[serde(skip_serializing_if = "Option::is_none")] – Omits the field from JSON if the value is None. This is important because the API expects certain fields to be absent (not null) when unused.

  • #[serde(skip_serializing_if = "Vec::is_empty")] – Same idea for empty vectors. If there are no tools, we omit the tools field entirely.

The reqwest HTTP client

reqwest is the standard HTTP client crate in Rust. The pattern:

#![allow(unused)]
fn main() {
let response: MyType = client
    .post(url)
    .bearer_auth(&api_key)
    .json(&body)        // serialize body as JSON
    .send()
    .await
    .context("request failed")?
    .error_for_status() // turn 4xx/5xx into errors
    .context("API returned error status")?
    .json()             // deserialize response as JSON
    .await
    .context("failed to parse response")?;
}

Each method returns a builder or future that you chain together. The ? operator propagates errors at each step.

impl Into<String>

Several methods use impl Into<String> as a parameter type:

#![allow(unused)]
fn main() {
pub fn new(api_key: impl Into<String>, model: impl Into<String>) -> Self
}

This accepts anything that can be converted into a String: String, &str, Cow<str>, etc. Inside the method, call .into() to get the String:

#![allow(unused)]
fn main() {
api_key: api_key.into(),
model: model.into(),
}

dotenvy

The dotenvy crate loads environment variables from a .env file:

#![allow(unused)]
fn main() {
let _ = dotenvy::dotenv(); // loads .env if present, ignores errors
let key = std::env::var("OPENROUTER_API_KEY")?;
}

The let _ = discards the result because it is fine if .env does not exist (the variable might already be in the environment).

The API types

The file mini-claw-code-starter/src/providers/openrouter.rs starts with a block of serde structs. These represent the OpenAI-compatible chat completions API format. Here is a quick summary:

Request types:

  • ChatRequest – the POST body: model name, messages, tools
  • ApiMessage – a single message with role, content, optional tool calls
  • ApiTool / ApiToolDef – tool definition in API format

Response types:

  • ChatResponse – the API response: a list of choices
  • Choice – a single choice containing a message and a finish_reason
  • ResponseMessage – the assistant’s response: optional content, optional tool calls

The finish_reason field on Choice tells you why the model stopped generating. Map it to StopReason in your chat() implementation: "tool_calls" becomes StopReason::ToolUse, anything else becomes StopReason::Stop.

These are already complete. Your job is to implement the methods that use them.

The implementation

Step 1: Implement new()

Initialize all four fields:

#![allow(unused)]
fn main() {
pub fn new(api_key: impl Into<String>, model: impl Into<String>) -> Self {
    Self {
        client: reqwest::Client::new(),
        api_key: api_key.into(),
        model: model.into(),
        base_url: "https://openrouter.ai/api/v1".into(),
    }
}
}

Step 2: Implement base_url()

A simple builder method that overrides the base URL:

#![allow(unused)]
fn main() {
pub fn base_url(mut self, url: impl Into<String>) -> Self {
    self.base_url = url.into();
    self
}
}

Step 3: Implement from_env_with_model()

  1. Load .env with dotenvy::dotenv() (ignore the result).
  2. Read OPENROUTER_API_KEY from the environment.
  3. Call Self::new() with the key and model.

Use std::env::var("OPENROUTER_API_KEY") and chain .context(...) for a clear error message if the key is missing.

Step 4: Implement from_env()

This is a one-liner that calls from_env_with_model with the default model "openrouter/free". This is a free model on OpenRouter – no credits needed to get started.

Step 5: Implement convert_messages()

This method translates our Message enum into the API’s ApiMessage format. Iterate over the messages and match on each variant:

  • Message::System(text) becomes an ApiMessage with role "system" and content: Some(text.clone()). The other fields are None.

  • Message::User(text) becomes an ApiMessage with role "user" and content: Some(text.clone()). The other fields are None.

  • Message::Assistant(turn) becomes an ApiMessage with role "assistant". Set content to turn.text.clone(). If turn.tool_calls is non-empty, convert each ToolCall to an ApiToolCall:

    #![allow(unused)]
    fn main() {
    ApiToolCall {
        id: c.id.clone(),
        type_: "function".into(),
        function: ApiFunction {
            name: c.name.clone(),
            arguments: c.arguments.to_string(), // Value -> String
        },
    }
    }

    If tool_calls is empty, set tool_calls: None (not Some(vec![])).

  • Message::ToolResult { id, content } becomes an ApiMessage with role "tool", content: Some(content.clone()), and tool_call_id: Some(id.clone()).

Step 6: Implement convert_tools()

Map each &ToolDefinition to an ApiTool:

#![allow(unused)]
fn main() {
ApiTool {
    type_: "function",
    function: ApiToolDef {
        name: t.name,
        description: t.description,
        parameters: t.parameters.clone(),
    },
}
}

Step 7: Implement chat()

This is the main method. It brings everything together:

  1. Build a ChatRequest with the model, converted messages, and converted tools.
  2. POST it to {base_url}/chat/completions with bearer auth.
  3. Parse the response as ChatResponse.
  4. Extract the first choice.
  5. Convert tool_calls back to our ToolCall type.

The tool call conversion is the trickiest part. The API returns function.arguments as a string (JSON-encoded), but our ToolCall stores it as a serde_json::Value. So you need to parse it:

#![allow(unused)]
fn main() {
let arguments = serde_json::from_str(&tc.function.arguments)
    .unwrap_or(Value::Null);
}

The unwrap_or(Value::Null) handles the case where the arguments string is not valid JSON (unlikely with a well-behaved API, but good to be safe).

Here is the skeleton for the chat() method:

#![allow(unused)]
fn main() {
async fn chat(
    &self,
    messages: &[Message],
    tools: &[&ToolDefinition],
) -> anyhow::Result<AssistantTurn> {
    let body = ChatRequest {
        model: &self.model,
        messages: Self::convert_messages(messages),
        tools: Self::convert_tools(tools),
    };

    let response: ChatResponse = self.client
        .post(format!("{}/chat/completions", self.base_url))
        // ... bearer_auth, json, send, error_for_status, json ...
        ;

    let choice = response.choices.into_iter().next()
        .context("no choices in response")?;

    // Convert choice.message.tool_calls to Vec<ToolCall>
    // Map finish_reason to StopReason
    // Return AssistantTurn { text, tool_calls, stop_reason }
    todo!()
}
}

Fill in the HTTP call chain and the response conversion logic.

Running the tests

Run the Chapter 6 tests:

cargo test -p mini-claw-code-starter ch6

The Chapter 6 tests verify the conversion methods (convert_messages and convert_tools), the constructor logic, and the full chat() method using a local mock HTTP server. They do not call a real LLM API, so no API key is needed. There are also additional edge-case tests that will pass once your core implementation is correct.

Optional: Live test

If you want to test with a real API, set up an OpenRouter API key:

  1. Sign up at openrouter.ai.
  2. Create an API key.
  3. Create a .env file in the workspace root:
OPENROUTER_API_KEY=sk-or-v1-your-key-here

Then try building and running the chat example from Chapter 7. But first, finish reading this chapter and move on to Chapter 7 where you wire everything up.

Recap

You have implemented a real HTTP provider that:

  • Constructs from an API key and model name (or from environment variables).
  • Converts between your internal types and the OpenAI-compatible API format.
  • Sends HTTP requests and parses responses.

The key patterns:

  • Serde attributes for JSON field mapping (rename, skip_serializing_if).
  • reqwest for HTTP with a fluent builder API.
  • impl Into<String> for flexible string parameters.
  • dotenvy for loading .env files.

Your agent framework is now complete. Every piece – tools, the agent loop, and the HTTP provider – is implemented and tested.

What’s next

In Chapter 7: A Simple CLI you will wire everything into an interactive CLI with conversation memory.

Chapter 7: A Simple CLI

You have built every component: a mock provider for testing, four tools, the agent loop, and an HTTP provider. Now it is time to wire them all into a working CLI.

Goal

Add a chat() method to SimpleAgent and write examples/chat.rs so that:

  1. The agent remembers the conversation – each prompt builds on the previous ones.
  2. It prints > , reads a line, runs the agent, and prints the result.
  3. It shows a thinking... indicator while the agent works.
  4. It keeps running until the user presses Ctrl+D (EOF).

The chat() method

Open mini-claw-code-starter/src/agent.rs. Below run() you will see the chat() method signature.

Why a new method?

run() creates a fresh Vec<Message> each time it is called. That means the LLM has no memory of previous exchanges. A real CLI should carry context forward, so the LLM can say “I already read that file” or “as I mentioned earlier.”

chat() solves this by accepting the message history from the caller:

#![allow(unused)]
fn main() {
pub async fn chat(&self, messages: &mut Vec<Message>) -> anyhow::Result<String>
}

The caller pushes Message::User(…) before calling, and chat() appends the assistant turns. When it returns, messages contains the full conversation history ready for the next round.

The implementation

The loop body is identical to run(). The only differences are:

  1. Use the provided messages instead of creating a new vec.
  2. On StopReason::Stop, clone the text before pushing Message::Assistant(turn) – the push moves turn, so you need the text first.
  3. Push Message::Assistant(turn) so the history includes the final response.
  4. Return the cloned text.
#![allow(unused)]
fn main() {
pub async fn chat(&self, messages: &mut Vec<Message>) -> anyhow::Result<String> {
    let defs = self.tools.definitions();

    loop {
        let turn = self.provider.chat(messages, &defs).await?;

        match turn.stop_reason {
            StopReason::Stop => {
                let text = turn.text.clone().unwrap_or_default();
                messages.push(Message::Assistant(turn));
                return Ok(text);
            }
            StopReason::ToolUse => {
                // Same tool execution as run() ...
            }
        }
    }
}
}

The ToolUse branch is exactly the same as in run(): execute each tool, collect results, push the assistant turn, push the tool results.

Ownership detail

In run() you could do return Ok(turn.text.unwrap_or_default()) directly because the function was done with turn. In chat() you also need to push Message::Assistant(turn) into the history. Since that push moves turn, you must extract the text first:

#![allow(unused)]
fn main() {
let text = turn.text.clone().unwrap_or_default();
messages.push(Message::Assistant(turn));  // moves turn
return Ok(text);                          // return the clone
}

This is a one-line change from run(), but it matters.

The CLI

Open mini-claw-code-starter/examples/chat.rs. You will see a skeleton with unimplemented!(). Replace it with the full program.

Step 1: Imports

#![allow(unused)]
fn main() {
use mini_claw_code_starter::{
    BashTool, EditTool, Message, OpenRouterProvider, ReadTool, SimpleAgent, WriteTool,
};
use std::io::{self, BufRead, Write};
}

Note the Message import – you need it to build the history vector.

Step 2: Create the provider and agent

#![allow(unused)]
fn main() {
let provider = OpenRouterProvider::from_env()?;
let agent = SimpleAgent::new(provider)
    .tool(BashTool::new())
    .tool(ReadTool::new())
    .tool(WriteTool::new())
    .tool(EditTool::new());
}

Same as before – nothing new here. (In Chapter 11 you’ll add AskTool here so the agent can ask you clarifying questions.)

Step 3: The system prompt and history vector

#![allow(unused)]
fn main() {
let cwd = std::env::current_dir()?.display().to_string();
let mut history: Vec<Message> = vec![Message::System(format!(
    "You are a coding agent. Help the user with software engineering tasks \
     using all available tools. Be concise and precise.\n\n\
     Working directory: {cwd}"
))];
}

The system prompt is the first message in the history. It tells the LLM what role it should play. Two things to note:

  1. No tool names in the prompt. Tool definitions are sent separately to the API. The system prompt focuses on behavior – be a coding agent, use whatever tools are available, be concise.

  2. Working directory is included. The LLM needs to know where it is so that tool calls like read and bash use correct paths. This is what real coding agents do – Claude Code, OpenCode, and Kimi CLI all inject the current directory (and sometimes platform, date, etc.) into their system prompts.

The history vector lives outside the loop and accumulates every user prompt, assistant response, and tool result across the entire session. The system prompt stays at the front, giving the LLM consistent instructions on every turn.

Step 4: The REPL loop

#![allow(unused)]
fn main() {
let stdin = io::stdin();

loop {
    print!("> ");
    io::stdout().flush()?;

    let mut line = String::new();
    if stdin.lock().read_line(&mut line)? == 0 {
        println!();
        break;
    }

    let prompt = line.trim();
    if prompt.is_empty() {
        continue;
    }

    history.push(Message::User(prompt.to_string()));
    print!("    thinking...");
    io::stdout().flush()?;
    match agent.chat(&mut history).await {
        Ok(text) => {
            print!("\x1b[2K\r");
            println!("{}\n", text.trim());
        }
        Err(e) => {
            print!("\x1b[2K\r");
            println!("error: {e}\n");
        }
    }
}
}

A few things to note:

  • history.push(Message::User(…)) adds the prompt before calling the agent. chat() will append the rest.
  • print!(" thinking...") shows a status while the agent works. The flush() is needed because print! (no newline) does not flush automatically.
  • \x1b[2K\r is an ANSI escape sequence: “erase entire line, move cursor to column 1.” This clears the thinking... text before printing the response. It also gets cleared automatically when the agent prints a tool summary (since tool_summary() uses the same escape).
  • stdout.flush()? after print! ensures the prompt and thinking indicator appear immediately.
  • read_line returns 0 on EOF (Ctrl+D), which breaks the loop.
  • Errors from the agent are printed instead of crashing – this keeps the loop alive even if one request fails.

The main function

Wrap everything in an async main:

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Steps 1-4 go here
    Ok(())
}

The complete program

Putting it all together, the entire program is about 45 lines. That is the beauty of the framework you built – the final assembly is straightforward because each component has a clean interface.

Running the full test suite

Run the full test suite:

cargo test -p mini-claw-code-starter

This runs all tests from chapters 1 through 7. If everything passes, congratulations – your agent framework is complete and fully tested.

What the tests verify

The Chapter 7 tests are integration tests that combine all components:

  • Write-then-read flows: Write a file, read it back, verify contents.
  • Edit flows: Write a file, edit it, read back the result.
  • Multi-tool pipelines: Use bash, write, edit, and read across multiple turns.
  • Long conversations: Five-step tool-call sequences.

There are about 10 integration tests that exercise the full agent pipeline.

Running the chat example

To try it with a real LLM, you need an API key. Create a .env file in the workspace root:

OPENROUTER_API_KEY=sk-or-v1-your-key-here

Then run:

cargo run -p mini-claw-code-starter --example chat

You will get an interactive prompt. Try a multi-turn conversation:

> List the files in the current directory
    thinking...
    [bash: ls]
Cargo.toml  src/  examples/  ...

> What is in Cargo.toml?
    thinking...
    [read: Cargo.toml]
The Cargo.toml contains the package definition for mini-claw-code-starter...

> Add a new dependency for serde
    thinking...
    [read: Cargo.toml]
    [edit: Cargo.toml]
Done! I added serde to the dependencies.

>

Notice how the second prompt (“What is in Cargo.toml?”) works without repeating context – the LLM already knows the directory listing from the first exchange. That is conversation history at work.

Press Ctrl+D (or Ctrl+C) to exit.

What you have built

Let’s step back and look at the complete picture:

examples/chat.rs
    |
    | creates
    v
SimpleAgent<OpenRouterProvider>
    |
    | holds
    +---> OpenRouterProvider (HTTP to LLM API)
    +---> ToolSet (HashMap<String, Box<dyn Tool>>)
              |
              +---> BashTool
              +---> ReadTool
              +---> WriteTool
              +---> EditTool

The chat() method drives the interaction:

User prompt
    |
    v
history: [User, Assistant, ToolResult, ..., User]
    |
    v
Provider.chat() ---HTTP---> LLM API
    |
    | AssistantTurn
    v
Tool calls? ----yes---> Execute tools ---> append to history ---> loop
    |
    no
    |
    v
Append final Assistant to history, return text

In about 300 lines of Rust across all files, you have:

  • A trait-based tool system with JSON schema definitions.
  • A generic agent loop that works with any provider.
  • A mock provider for deterministic testing.
  • An HTTP provider for real LLM APIs.
  • A CLI with conversation memory that ties it all together.

Where to go from here

This framework is intentionally minimal. Here are ideas for extending it:

Streaming responses – Instead of waiting for the full response, stream tokens as they arrive. This means changing chat() to return a Stream instead of a single AssistantTurn.

Token limits – Track token usage and truncate old messages when the context window fills up.

More tools – Add a web search tool, a database query tool, or anything else you can imagine. The Tool trait makes it easy to plug in new capabilities.

A richer UI – Add a spinner animation, markdown rendering, or collapsed tool call display. See mini-claw-code/examples/tui.rs for an example that does all three using termimad.

The foundation you built is solid. Every extension is a matter of adding to the existing patterns, not rewriting them. The Provider trait, the Tool trait, and the agent loop are the building blocks for anything you want to build next.

What’s next

Head to Chapter 8: The Singularity – your agent can now modify its own source code, and we will talk about what that means and where to go from here.

Chapter 8: The Singularity

Your agent can edit itself and it starts self-evolving. You don’t need to write any code starting from now.

Extensions

The extension chapters that follow walk through the reference implementation. You don’t need to write the code yourself – read them to understand the design, then let your agent implement them (or do it yourself for practice):

Beyond the extension chapters, here are more ideas to explore:

  • Parallel tool calls – Execute concurrent tool calls with tokio::join!.
  • Token tracking – Truncate old messages when approaching the context limit.
  • More tools – Web search, database queries, HTTP requests. The Tool trait makes it easy.
  • MCP – Expose your tools as an MCP server or connect to external ones.

Chapter 9: A Better TUI

The chat.rs CLI works, but it dumps plain text and shows every tool call. A real coding agent deserves markdown rendering, a thinking spinner, and collapsed tool calls when the agent gets busy.

See mini-claw-code/examples/tui.rs for a reference implementation. It uses:

  • termimad for inline markdown rendering in the terminal.
  • crossterm for raw terminal mode (used by the arrow-key selection UI in Chapter 11).
  • An animated spinner (⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏) that ticks while the agent thinks.
  • Collapsed tool calls: after 3 tool calls, subsequent ones are collapsed into a ... and N more counter to keep the output clean.

The TUI builds on the AgentEvent stream from StreamingAgent (Chapter 10). The event loop uses tokio::select! to multiplex three sources:

  1. Agent events (AgentEvent::TextDelta, ToolCall, Done, Error) – render streaming text, tool summaries, or final output.
  2. User input requests from AskTool (Chapter 11) – pause the spinner and show a text prompt or arrow-key selection list.
  3. Timer ticks – advance the spinner animation.

This chapter is exposition only – no code to write. Read through examples/tui.rs to see how the pieces fit together, or ask your mini-claw-code agent to build a TUI for you.

Chapter 10: Streaming

In Chapter 6 you built OpenRouterProvider::chat(), which waits for the entire response before returning. That works, but the user stares at a blank screen until every token has been generated. Real coding agents print tokens as they arrive – that is streaming.

This chapter adds streaming support and a StreamingAgent – the streaming counterpart to SimpleAgent. You will:

  1. Define a StreamEvent enum that represents real-time deltas.
  2. Build a StreamAccumulator that collects deltas into a complete AssistantTurn.
  3. Write a parse_sse_line() function that converts raw Server-Sent Events into StreamEvents.
  4. Define a StreamProvider trait – the streaming counterpart to Provider.
  5. Implement StreamProvider for OpenRouterProvider.
  6. Build a MockStreamProvider for testing without HTTP.
  7. Build StreamingAgent<P: StreamProvider> – a full agent loop with real-time text streaming.

None of this touches the Provider trait or SimpleAgent. Streaming is layered on top of the existing architecture.

Why streaming?

Without streaming, a long response (say 500 tokens) makes the CLI feel frozen. Streaming fixes three things:

  • Immediate feedback – the user sees the first word within milliseconds instead of waiting seconds for the full response.
  • Early cancellation – if the agent is heading in the wrong direction, the user can Ctrl-C without waiting for the full response.
  • Progress visibility – watching tokens arrive confirms the agent is working, not stuck.

How SSE works

The OpenAI-compatible API supports streaming via Server-Sent Events (SSE). You set "stream": true in the request, and instead of one big JSON response, the server sends a series of text lines:

data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"choices":[{"delta":{"content":" world"},"finish_reason":null}]}

data: {"choices":[{"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Each line starts with data: followed by a JSON object (or the sentinel [DONE]). The key difference from the non-streaming response: instead of a message field with the complete text, each chunk has a delta field with just the new part. Your code reads these deltas one by one, prints them immediately, and accumulates them into the final result.

Here is the flow:

sequenceDiagram
    participant A as Agent
    participant L as LLM (SSE)
    participant U as User

    A->>L: POST /chat/completions (stream: true)
    L-->>A: data: {"delta":{"content":"Hello"}}
    A->>U: print "Hello"
    L-->>A: data: {"delta":{"content":" world"}}
    A->>U: print " world"
    L-->>A: data: [DONE]
    A->>U: (done)

Tool calls stream the same way, but with tool_calls deltas instead of content deltas. The tool call’s name and arguments arrive in pieces that you concatenate.

StreamEvent

Open mini-claw-code/src/streaming.rs. The StreamEvent enum is our domain type for streaming deltas:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq)]
pub enum StreamEvent {
    /// A chunk of assistant text.
    TextDelta(String),
    /// A new tool call has started.
    ToolCallStart { index: usize, id: String, name: String },
    /// More argument JSON for a tool call in progress.
    ToolCallDelta { index: usize, arguments: String },
    /// The stream is complete.
    Done,
}
}

This is the interface between the SSE parser and the rest of the application. The parser produces StreamEvents; the UI consumes them for display; the accumulator collects them into an AssistantTurn.

StreamAccumulator

The accumulator is a simple state machine. It keeps a running text buffer and a list of partial tool calls. Each feed() call appends to the appropriate place:

#![allow(unused)]
fn main() {
pub struct StreamAccumulator {
    text: String,
    tool_calls: Vec<PartialToolCall>,
}

impl StreamAccumulator {
    pub fn new() -> Self { /* ... */ }
    pub fn feed(&mut self, event: &StreamEvent) { /* ... */ }
    pub fn finish(self) -> AssistantTurn { /* ... */ }
}
}

The implementation is straightforward:

  • TextDelta → append to self.text.
  • ToolCallStart → grow the tool_calls vec if needed, set the id and name at the given index.
  • ToolCallDelta → append to the arguments string at the given index.
  • Done → no-op (we handle completion in finish()).

finish() consumes the accumulator and builds an AssistantTurn:

#![allow(unused)]
fn main() {
pub fn finish(self) -> AssistantTurn {
    let text = if self.text.is_empty() { None } else { Some(self.text) };

    let tool_calls: Vec<ToolCall> = self.tool_calls
        .into_iter()
        .filter(|tc| !tc.name.is_empty())
        .map(|tc| ToolCall {
            id: tc.id,
            name: tc.name,
            arguments: serde_json::from_str(&tc.arguments)
                .unwrap_or(Value::Null),
        })
        .collect();

    let stop_reason = if tool_calls.is_empty() {
        StopReason::Stop
    } else {
        StopReason::ToolUse
    };

    AssistantTurn { text, tool_calls, stop_reason }
}
}

Notice that arguments is accumulated as a raw string and only parsed as JSON at the very end. This is because the API sends argument fragments like {"pa and th": "f.txt"} – they are not valid JSON until concatenated.

Parsing SSE lines

The parse_sse_line() function takes a single line from the SSE stream and returns zero or more StreamEvents:

#![allow(unused)]
fn main() {
pub fn parse_sse_line(line: &str) -> Option<Vec<StreamEvent>> {
    let data = line.strip_prefix("data: ")?;

    if data == "[DONE]" {
        return Some(vec![StreamEvent::Done]);
    }

    let chunk: ChunkResponse = serde_json::from_str(data).ok()?;
    // ... extract events from chunk.choices[0].delta
}
}

The SSE chunk types mirror the OpenAI delta format:

#![allow(unused)]
fn main() {
#[derive(Deserialize)]
struct ChunkResponse { choices: Vec<ChunkChoice> }

#[derive(Deserialize)]
struct ChunkChoice { delta: Delta, finish_reason: Option<String> }

#[derive(Deserialize)]
struct Delta {
    content: Option<String>,
    tool_calls: Option<Vec<DeltaToolCall>>,
}
}

For tool calls, the first chunk includes id and function.name (indicating a new tool call). Subsequent chunks only have function.arguments fragments. The parser emits ToolCallStart when id is present, and ToolCallDelta for non-empty argument strings.

StreamProvider trait

Just as Provider defines the non-streaming interface, StreamProvider defines the streaming one:

#![allow(unused)]
fn main() {
pub trait StreamProvider: Send + Sync {
    fn stream_chat<'a>(
        &'a self,
        messages: &'a [Message],
        tools: &'a [&'a ToolDefinition],
        tx: mpsc::UnboundedSender<StreamEvent>,
    ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a;
}
}

The key difference from Provider::chat() is the tx parameter – an mpsc channel sender. The implementation sends StreamEvents through this channel as they arrive and returns the final accumulated AssistantTurn. This gives callers both real-time events and the complete result.

We keep StreamProvider separate from Provider rather than adding a method to the existing trait. This means SimpleAgent and all existing code are completely unaffected.

Implementing StreamProvider for OpenRouterProvider

The implementation ties together SSE parsing, the accumulator, and the channel:

#![allow(unused)]
fn main() {
impl StreamProvider for OpenRouterProvider {
    async fn stream_chat(
        &self,
        messages: &[Message],
        tools: &[&ToolDefinition],
        tx: mpsc::UnboundedSender<StreamEvent>,
    ) -> anyhow::Result<AssistantTurn> {
        // 1. Build request with stream: true
        // 2. Send HTTP request
        // 3. Read response chunks in a loop:
        //    - Buffer incoming bytes
        //    - Split on newlines
        //    - parse_sse_line() each complete line
        //    - feed() each event into the accumulator
        //    - send each event through tx
        // 4. Return acc.finish()
    }
}
}

The buffering detail is important. HTTP responses may arrive in arbitrary byte chunks that do not align with SSE line boundaries. So we maintain a String buffer, append each chunk, and process only complete lines (splitting on \n):

#![allow(unused)]
fn main() {
let mut buffer = String::new();

while let Some(chunk) = resp.chunk().await? {
    buffer.push_str(&String::from_utf8_lossy(&chunk));

    while let Some(newline_pos) = buffer.find('\n') {
        let line = buffer[..newline_pos].trim_end_matches('\r').to_string();
        buffer = buffer[newline_pos + 1..].to_string();

        if line.is_empty() { continue; }

        if let Some(events) = parse_sse_line(&line) {
            for event in events {
                acc.feed(&event);
                let _ = tx.send(event);
            }
        }
    }
}
}

MockStreamProvider

For testing, we need a streaming provider that does not make HTTP calls. MockStreamProvider wraps the existing MockProvider and synthesizes StreamEvents from each canned AssistantTurn:

#![allow(unused)]
fn main() {
pub struct MockStreamProvider {
    inner: MockProvider,
}

impl StreamProvider for MockStreamProvider {
    async fn stream_chat(
        &self,
        messages: &[Message],
        tools: &[&ToolDefinition],
        tx: mpsc::UnboundedSender<StreamEvent>,
    ) -> anyhow::Result<AssistantTurn> {
        let turn = self.inner.chat(messages, tools).await?;

        // Synthesize stream events from the complete turn
        if let Some(ref text) = turn.text {
            for ch in text.chars() {
                let _ = tx.send(StreamEvent::TextDelta(ch.to_string()));
            }
        }
        for (i, call) in turn.tool_calls.iter().enumerate() {
            let _ = tx.send(StreamEvent::ToolCallStart {
                index: i, id: call.id.clone(), name: call.name.clone(),
            });
            let _ = tx.send(StreamEvent::ToolCallDelta {
                index: i, arguments: call.arguments.to_string(),
            });
        }
        let _ = tx.send(StreamEvent::Done);

        Ok(turn)
    }
}
}

It sends text one character at a time (simulating token-by-token streaming) and each tool call as a start + delta pair. This lets us test StreamingAgent without any network calls.

StreamingAgent

Now for the main event. StreamingAgent is the streaming counterpart to SimpleAgent. It has the same structure – a provider, a tool set, and an agent loop – but it uses StreamProvider and emits AgentEvent::TextDelta events in real time:

#![allow(unused)]
fn main() {
pub struct StreamingAgent<P: StreamProvider> {
    provider: P,
    tools: ToolSet,
}

impl<P: StreamProvider> StreamingAgent<P> {
    pub fn new(provider: P) -> Self { /* ... */ }
    pub fn tool(mut self, t: impl Tool + 'static) -> Self { /* ... */ }

    pub async fn run(
        &self,
        prompt: &str,
        events: mpsc::UnboundedSender<AgentEvent>,
    ) -> anyhow::Result<String> { /* ... */ }

    pub async fn chat(
        &self,
        messages: &mut Vec<Message>,
        events: mpsc::UnboundedSender<AgentEvent>,
    ) -> anyhow::Result<String> { /* ... */ }
}
}

The chat() method is the heart of the streaming agent. Let us walk through it:

#![allow(unused)]
fn main() {
pub async fn chat(
    &self,
    messages: &mut Vec<Message>,
    events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
    let defs = self.tools.definitions();

    loop {
        // 1. Set up a stream channel
        let (stream_tx, mut stream_rx) = mpsc::unbounded_channel();

        // 2. Spawn a forwarder that converts StreamEvent::TextDelta
        //    into AgentEvent::TextDelta for the UI
        let events_clone = events.clone();
        let forwarder = tokio::spawn(async move {
            while let Some(event) = stream_rx.recv().await {
                if let StreamEvent::TextDelta(text) = event {
                    let _ = events_clone.send(AgentEvent::TextDelta(text));
                }
            }
        });

        // 3. Call stream_chat — this streams AND returns the turn
        let turn = self.provider.stream_chat(messages, &defs, stream_tx).await?;
        let _ = forwarder.await;

        // 4. Same stop_reason logic as SimpleAgent
        match turn.stop_reason {
            StopReason::Stop => {
                let text = turn.text.clone().unwrap_or_default();
                let _ = events.send(AgentEvent::Done(text.clone()));
                messages.push(Message::Assistant(turn));
                return Ok(text);
            }
            StopReason::ToolUse => {
                // Execute tools, push results, continue loop
                // (same pattern as SimpleAgent)
            }
        }
    }
}
}

The architecture has two channels flowing simultaneously:

flowchart LR
    SC["stream_chat()"] -- "StreamEvent" --> CH["mpsc channel"]
    CH --> FW["forwarder task"]
    FW -- "AgentEvent::TextDelta" --> UI["UI / events channel"]
    SC -- "feeds" --> ACC["StreamAccumulator"]
    ACC -- "finish()" --> TURN["AssistantTurn"]
    TURN --> LOOP["Agent loop"]

The forwarder task is a bridge: it receives raw StreamEvents from the provider and converts TextDelta events into AgentEvent::TextDelta for the UI. This keeps the provider’s streaming protocol separate from the agent’s event protocol.

Notice that AgentEvent now has a TextDelta variant:

#![allow(unused)]
fn main() {
pub enum AgentEvent {
    TextDelta(String),  // NEW — streaming text chunks
    ToolCall { name: String, summary: String },
    Done(String),
    Error(String),
}
}

Using StreamingAgent in the TUI

The TUI example (examples/tui.rs) uses StreamingAgent for the full experience:

#![allow(unused)]
fn main() {
let provider = OpenRouterProvider::from_env()?;
let agent = Arc::new(
    StreamingAgent::new(provider)
        .tool(BashTool::new())
        .tool(ReadTool::new())
        .tool(WriteTool::new())
        .tool(EditTool::new()),
);
}

The agent is wrapped in Arc so it can be shared with spawned tasks. Each turn spawns the agent and processes events with a spinner:

#![allow(unused)]
fn main() {
let (tx, mut rx) = mpsc::unbounded_channel();
let agent = agent.clone();
let mut msgs = std::mem::take(&mut history);
let handle = tokio::spawn(async move {
    let _ = agent.chat(&mut msgs, tx).await;
    msgs
});

// UI event loop — print TextDeltas, show spinner for tool calls
loop {
    tokio::select! {
        event = rx.recv() => {
            match event {
                Some(AgentEvent::TextDelta(text)) => print!("{text}"),
                Some(AgentEvent::ToolCall { summary, .. }) => { /* spinner */ },
                Some(AgentEvent::Done(_)) => break,
                // ...
            }
        }
        _ = tick.tick() => { /* animate spinner */ }
    }
}
}

Compare this to the SimpleAgent version from Chapter 9: the structure is almost identical. The only difference is that TextDelta events let us print tokens as they arrive instead of waiting for the full Done event.

Running the tests

cargo test -p mini-claw-code ch10

The tests verify:

  • Accumulator: text assembly, tool call assembly, mixed events, empty input, multiple parallel tool calls.
  • SSE parsing: text deltas, tool call start/delta, [DONE], non-data lines, empty deltas, invalid JSON, full multi-line sequences.
  • MockStreamProvider: text responses synthesize char-by-char events; tool call responses synthesize start + delta events.
  • StreamingAgent: text-only responses, tool call loops, and multi-turn chat history – all using MockStreamProvider for deterministic testing.
  • Integration: mock TCP servers that send real SSE responses to stream_chat() and verify both the returned AssistantTurn and the events sent through the channel.

Recap

  • StreamEvent represents real-time deltas: text chunks, tool call starts, argument fragments, and completion.
  • StreamAccumulator collects deltas into a complete AssistantTurn.
  • parse_sse_line() converts raw SSE data: lines into StreamEvents.
  • StreamProvider is the streaming counterpart to Provider – it adds an mpsc channel parameter for real-time events.
  • MockStreamProvider wraps MockProvider to synthesize streaming events for testing.
  • StreamingAgent is the streaming counterpart to SimpleAgent – same tool loop, but with real-time TextDelta events forwarded to the UI.
  • The Provider trait and SimpleAgent are unchanged. Streaming is an additive feature layered on top.

Chapter 11: User Input

Your agent can read files, run commands, and write code – but it can’t ask you a question. If it’s unsure which approach to take, which file to target, or whether to proceed with a destructive operation, it just guesses.

Real coding agents solve this with an ask tool. Claude Code has AskUserQuestion, Kimi CLI has approval prompts. The LLM calls a special tool, the agent pauses, and the user types an answer. The answer goes back as a tool result and execution continues.

In this chapter you’ll build:

  1. An InputHandler trait that abstracts how user input is collected.
  2. An AskTool that the LLM calls to ask the user a question.
  3. Three handler implementations: CLI, channel-based (for TUI), and mock (for tests).

Why a trait?

Different UIs collect input differently:

  • A CLI app prints to stdout and reads from stdin.
  • A TUI app sends a request through a channel and waits for the event loop to collect the answer (maybe with arrow-key selection).
  • Tests need to provide canned answers without any I/O.

The InputHandler trait lets AskTool work with all three without knowing which one it’s using:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait InputHandler: Send + Sync {
    async fn ask(&self, question: &str, options: &[String]) -> anyhow::Result<String>;
}
}

The question is what the LLM wants to ask. The options slice is an optional list of choices – if empty, the user types free-text. If non-empty, the UI can present a selection list.

AskTool

AskTool implements the Tool trait. It takes an Arc<dyn InputHandler> so the handler can be shared across threads:

#![allow(unused)]
fn main() {
pub struct AskTool {
    definition: ToolDefinition,
    handler: Arc<dyn InputHandler>,
}
}

Tool definition

The LLM needs to know what parameters the tool accepts. question is required (a string). options is optional (an array of strings).

For options, we need a JSON schema for an array type – something param() can’t express since it only handles scalar types. So first, add param_raw() to ToolDefinition:

#![allow(unused)]
fn main() {
/// Add a parameter with a raw JSON schema value.
///
/// Use this for complex types (arrays, nested objects) that `param()` can't express.
pub fn param_raw(mut self, name: &str, schema: Value, required: bool) -> Self {
    self.parameters["properties"][name] = schema;
    if required {
        self.parameters["required"]
            .as_array_mut()
            .unwrap()
            .push(serde_json::Value::String(name.to_string()));
    }
    self
}
}

Now the tool definition uses both param() and param_raw():

#![allow(unused)]
fn main() {
impl AskTool {
    pub fn new(handler: Arc<dyn InputHandler>) -> Self {
        Self {
            definition: ToolDefinition::new(
                "ask_user",
                "Ask the user a clarifying question...",
            )
            .param("question", "string", "The question to ask the user", true)
            .param_raw(
                "options",
                json!({
                    "type": "array",
                    "items": { "type": "string" },
                    "description": "Optional list of choices to present to the user"
                }),
                false,
            ),
            handler,
        }
    }
}
}

Tool::call

The call implementation extracts question, parses options with a helper, and delegates to the handler:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl Tool for AskTool {
    fn definition(&self) -> &ToolDefinition {
        &self.definition
    }

    async fn call(&self, args: Value) -> anyhow::Result<String> {
        let question = args
            .get("question")
            .and_then(|v| v.as_str())
            .ok_or_else(|| anyhow::anyhow!("missing required parameter: question"))?;

        let options = parse_options(&args);

        self.handler.ask(question, &options).await
    }
}

/// Extract the optional `options` array from tool arguments.
fn parse_options(args: &Value) -> Vec<String> {
    args.get("options")
        .and_then(|v| v.as_array())
        .map(|arr| {
            arr.iter()
                .filter_map(|v| v.as_str().map(String::from))
                .collect()
        })
        .unwrap_or_default()
}
}

The parse_options helper keeps call() focused on the happy path. If options is missing or not an array, it defaults to an empty vec – the handler treats this as free-text input.

Three handlers

CliInputHandler

The simplest handler. Prints the question, lists numbered choices (if any), reads a line from stdin, and resolves numbered answers:

#![allow(unused)]
fn main() {
pub struct CliInputHandler;

#[async_trait::async_trait]
impl InputHandler for CliInputHandler {
    async fn ask(&self, question: &str, options: &[String]) -> anyhow::Result<String> {
        let question = question.to_string();
        let options = options.to_vec();

        // spawn_blocking because stdin is synchronous
        tokio::task::spawn_blocking(move || {
            // Display the question and numbered choices (if any)
            println!("\n  {question}");
            for (i, opt) in options.iter().enumerate() {
                println!("    {}) {opt}", i + 1);
            }

            // Read the answer
            print!("  > ");
            io::stdout().flush()?;
            let mut line = String::new();
            io::stdin().lock().read_line(&mut line)?;
            let answer = line.trim().to_string();

            // If the user typed a valid option number, resolve it
            Ok(resolve_option(&answer, &options))
        }).await?
    }
}

/// If `answer` is a number matching one of the options, return that option.
/// Otherwise return the raw answer.
fn resolve_option(answer: &str, options: &[String]) -> String {
    if let Ok(n) = answer.parse::<usize>()
        && n >= 1
        && n <= options.len()
    {
        return options[n - 1].clone();
    }
    answer.to_string()
}
}

The resolve_option helper keeps the closure body clean. It uses let-chain syntax (stabilized in Rust 1.87 / edition 2024): multiple conditions joined with && including let Ok(n) = ... pattern bindings. If the user types "2" and there are three options, it resolves to options[1]. Otherwise the raw text is returned.

Note the for loop over options does nothing when the slice is empty – no special if branch needed.

Use this in simple CLI apps like examples/chat.rs:

#![allow(unused)]
fn main() {
let agent = SimpleAgent::new(provider)
    .tool(BashTool::new())
    .tool(ReadTool::new())
    .tool(WriteTool::new())
    .tool(EditTool::new())
    .tool(AskTool::new(Arc::new(CliInputHandler)));
}

ChannelInputHandler

For TUI apps, input collection happens in the event loop, not in the tool. The ChannelInputHandler bridges the gap with a channel:

#![allow(unused)]
fn main() {
pub struct UserInputRequest {
    pub question: String,
    pub options: Vec<String>,
    pub response_tx: oneshot::Sender<String>,
}

pub struct ChannelInputHandler {
    tx: mpsc::UnboundedSender<UserInputRequest>,
}
}

When ask() is called, it sends a UserInputRequest through the channel and awaits the oneshot response:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl InputHandler for ChannelInputHandler {
    async fn ask(&self, question: &str, options: &[String]) -> anyhow::Result<String> {
        let (response_tx, response_rx) = oneshot::channel();
        self.tx.send(UserInputRequest {
            question: question.to_string(),
            options: options.to_vec(),
            response_tx,
        })?;
        Ok(response_rx.await?)
    }
}
}

The TUI event loop receives the request and renders it however it likes – a simple text prompt, or an arrow-key-navigable selection list using crossterm in raw terminal mode.

MockInputHandler

For tests, pre-configure answers in a queue:

#![allow(unused)]
fn main() {
pub struct MockInputHandler {
    answers: Mutex<VecDeque<String>>,
}

#[async_trait::async_trait]
impl InputHandler for MockInputHandler {
    async fn ask(&self, _question: &str, _options: &[String]) -> anyhow::Result<String> {
        self.answers.lock().await.pop_front()
            .ok_or_else(|| anyhow::anyhow!("MockInputHandler: no more answers"))
    }
}
}

This follows the same pattern as MockProvider – pop from the front, error when empty. Note that this uses tokio::sync::Mutex (with .lock().await), not std::sync::Mutex. The reason: ask() is an async fn, and the lock guard must be held across the .await boundary. A std::sync::Mutex guard is !Send, so holding it across .await won’t compile. tokio::sync::Mutex produces a Send-safe guard that works in async contexts. Compare this with MockProvider from Chapter 1, which uses std::sync::Mutex because its chat() method doesn’t hold the guard across an .await.

Tool summary

Update tool_summary() in agent.rs to display "question" for ask_user calls in the terminal output:

#![allow(unused)]
fn main() {
let detail = call.arguments
    .get("command")
    .or_else(|| call.arguments.get("path"))
    .or_else(|| call.arguments.get("question"))  // <-- new
    .and_then(|v| v.as_str());
}

Plan mode integration

ask_user is read-only – it collects information without mutating anything. Add it to PlanAgent’s default read_only set (see Chapter 12) so the LLM can ask questions during planning:

#![allow(unused)]
fn main() {
read_only: HashSet::from(["bash", "read", "ask_user"]),
}

Wiring it up

Add the module to mini-claw-code/src/tools/mod.rs:

#![allow(unused)]
fn main() {
mod ask;
pub use ask::*;
}

And re-export from lib.rs:

#![allow(unused)]
fn main() {
pub use tools::{
    AskTool, BashTool, ChannelInputHandler, CliInputHandler,
    EditTool, InputHandler, MockInputHandler, ReadTool,
    UserInputRequest, WriteTool,
};
}

Running the tests

cargo test -p mini-claw-code ch11

The tests verify:

  • Tool definition: schema has question (required) and options (optional array).
  • Question only: MockInputHandler returns answer for a question-only call.
  • With options: tool passes options to the handler correctly.
  • Missing question: missing question argument returns an error.
  • Handler exhausted: empty MockInputHandler returns an error.
  • Agent loop: LLM calls ask_user, gets an answer, then returns final text.
  • Ask then tool: ask_user followed by another tool call (e.g. read).
  • Multiple asks: two sequential ask_user calls with different answers.
  • Channel roundtrip: ChannelInputHandler sends request and receives response via oneshot channel.
  • param_raw: param_raw() adds array parameter to ToolDefinition correctly.

Recap

  • InputHandler trait abstracts input collection across CLI, TUI, and tests.
  • AskTool lets the LLM pause execution and ask the user a question.
  • param_raw() extends ToolDefinition to support complex JSON schema types like arrays.
  • Three handlers: CliInputHandler for simple apps, ChannelInputHandler for TUI apps, MockInputHandler for tests.
  • Plan mode: ask_user is read-only by default, so it works during planning.
  • Purely additive: no changes to SimpleAgent, StreamingAgent, or any existing tool.

Chapter 12: Plan Mode

Real coding agents can be dangerous. Give an LLM access to write, edit, and bash and it might rewrite your config, delete a file, or run a destructive command – all before you’ve had a chance to review what it’s doing.

Plan mode solves this with a two-phase workflow:

  1. Plan – the agent explores the codebase using read-only tools (read, bash, and ask_user). It cannot write, edit, or mutate anything. It returns a plan describing what it intends to do.
  2. Execute – after the user reviews and approves the plan, the agent runs again with all tools available.

This is exactly how Claude Code’s plan mode works. In this chapter you’ll build PlanAgent – a streaming agent with caller-driven approval gating.

You will:

  1. Build PlanAgent<P: StreamProvider> with plan() and execute() methods.
  2. Inject a system prompt that tells the LLM it’s in planning mode.
  3. Add an exit_plan tool the LLM calls when its plan is ready.
  4. Implement double defense: definition filtering and an execution guard.
  5. Let the caller drive the approval flow between phases.

Why plan mode?

Consider this scenario:

User: "Refactor auth.rs to use JWT instead of session cookies"

Agent (no plan mode):
  → calls write("auth.rs", ...) immediately
  → rewrites half your auth system
  → you didn't want that approach at all

With plan mode:

User: "Refactor auth.rs to use JWT instead of session cookies"

Agent (plan phase):
  → calls read("auth.rs") to understand current code
  → calls bash("grep -r 'session' src/") to find related files
  → calls exit_plan to submit its plan
  → "Plan: Replace SessionStore with JwtProvider in 3 files..."

User: "Looks good, go ahead."

Agent (execute phase):
  → calls write/edit with the approved changes

The key insight: the same agent loop works for both phases. The only difference is which tools are available.

Design

PlanAgent has the same shape as StreamingAgent – a provider, a ToolSet, and an agent loop. Three additions make it a planning agent:

  1. A HashSet<&'static str> recording which tools are allowed during planning.
  2. A system prompt injected at the start of the planning phase.
  3. An exit_plan tool definition the LLM calls when its plan is ready.
#![allow(unused)]
fn main() {
pub struct PlanAgent<P: StreamProvider> {
    provider: P,
    tools: ToolSet,
    read_only: HashSet<&'static str>,
    plan_system_prompt: String,
    exit_plan_def: ToolDefinition,
}
}

Two public methods drive the two phases:

  • plan() – injects the system prompt, runs the agent loop with only read-only tools and exit_plan visible.
  • execute() – runs the agent loop with all tools visible.

Both delegate to a private run_loop() that takes an optional tool filter.

The builder

Construction follows the same builder pattern as SimpleAgent and StreamingAgent:

#![allow(unused)]
fn main() {
impl<P: StreamProvider> PlanAgent<P> {
    pub fn new(provider: P) -> Self {
        Self {
            provider,
            tools: ToolSet::new(),
            read_only: HashSet::from(["bash", "read", "ask_user"]),
            plan_system_prompt: DEFAULT_PLAN_PROMPT.to_string(),
            exit_plan_def: ToolDefinition::new(
                "exit_plan",
                "Signal that your plan is complete and ready for user review. \
                 Call this when you have finished exploring and are ready to \
                 present your plan.",
            ),
        }
    }

    pub fn tool(mut self, t: impl Tool + 'static) -> Self {
        self.tools.push(t);
        self
    }

    pub fn read_only(mut self, names: &[&'static str]) -> Self {
        self.read_only = names.iter().copied().collect();
        self
    }

    pub fn plan_prompt(mut self, prompt: impl Into<String>) -> Self {
        self.plan_system_prompt = prompt.into();
        self
    }
}
}

By default, bash, read, and ask_user are read-only. (Chapter 11 added ask_user so the LLM can ask clarifying questions during planning.) The .read_only() method lets callers override this – for example, to exclude bash from planning if you want a stricter mode.

The .plan_prompt() method lets callers override the system prompt – useful for specialized agents like security auditors or code reviewers.

System prompt

The LLM needs to know it’s in planning mode. Without this, it will try to accomplish the task with whatever tools it sees, rather than producing a deliberate plan.

plan() injects a system prompt at the start of the conversation:

#![allow(unused)]
fn main() {
const DEFAULT_PLAN_PROMPT: &str = "\
You are in PLANNING MODE. Explore the codebase using the available tools and \
create a plan. You can read files, run shell commands, and ask the user \
questions — but you CANNOT write, edit, or create files.\n\n\
When your plan is ready, call the `exit_plan` tool to submit it for review.";
}

The injection is conditional – if the caller already provided a System message, plan() respects it:

#![allow(unused)]
fn main() {
pub async fn plan(
    &self,
    messages: &mut Vec<Message>,
    events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
    if !messages
        .first()
        .is_some_and(|m| matches!(m, Message::System(_)))
    {
        messages.insert(0, Message::System(self.plan_system_prompt.clone()));
    }
    self.run_loop(messages, Some(&self.read_only), events).await
}
}

This means:

  • First call: no system message → inject the plan prompt.
  • Re-plan call: system message already there → skip.
  • Caller provided their own: caller’s system message → respect it.

This is how real agents work. Claude Code switches its system prompt when entering plan mode. OpenCode uses entirely separate agent configurations with different system prompts for plan vs build agents.

The exit_plan tool

Without exit_plan, the planning phase ends when the LLM returns StopReason::Stop – the same way any conversation ends. This is ambiguous: did the LLM finish planning, or did it just stop talking?

Real agents solve this with an explicit signal. Claude Code has ExitPlanMode. OpenCode has exit_plan. The LLM calls the tool to say “my plan is ready for review.”

In PlanAgent, exit_plan is a tool definition stored on the struct – not registered in the ToolSet. This means:

  • During plan: exit_plan is injected into the tool list alongside read-only tools. The LLM can see and call it.
  • During execute: exit_plan is not in the tool list. The LLM doesn’t know it exists.

When the agent loop sees an exit_plan call, it returns immediately with the plan text (the LLM’s text from that turn):

#![allow(unused)]
fn main() {
// Handle exit_plan: signal plan completion
if allowed.is_some() && call.name == "exit_plan" {
    results.push((call.id.clone(), "Plan submitted for review.".into()));
    exit_plan = true;
    continue;
}
}

After the tool-call loop, plan_text captures the LLM’s text from this turn (the plan itself), and the turn is pushed onto the message history:

#![allow(unused)]
fn main() {
let plan_text = turn.text.clone().unwrap_or_default();
messages.push(Message::Assistant(turn));
}

If exit_plan was among the tool calls, we’re done:

#![allow(unused)]
fn main() {
if exit_plan {
    let _ = events.send(AgentEvent::Done(plan_text.clone()));
    return Ok(plan_text);
}
}

The planning phase now has two exit paths:

  1. StopReason::Stop – LLM stops naturally (backward compatible).
  2. exit_plan tool call – LLM explicitly signals plan completion.

Both work. The exit_plan path is better because it’s unambiguous.

Double defense

Tool filtering still uses two layers of protection:

Layer 1: Definition filtering

During plan(), only read-only tool definitions plus exit_plan are sent to the LLM. The model literally cannot see write or edit in its tool list:

#![allow(unused)]
fn main() {
let all_defs = self.tools.definitions();
let defs: Vec<&ToolDefinition> = match allowed {
    Some(names) => {
        let mut filtered: Vec<&ToolDefinition> = all_defs
            .into_iter()
            .filter(|d| names.contains(d.name))
            .collect();
        filtered.push(&self.exit_plan_def);
        filtered
    }
    None => all_defs,
};
}

During execute(), allowed is None, so all registered tools are sent – and exit_plan is not included.

Layer 2: Execution guard

If the LLM somehow hallucinated a blocked tool call, the execution guard catches it and returns an error ToolResult instead of executing the tool:

#![allow(unused)]
fn main() {
if let Some(names) = allowed
    && !names.contains(call.name.as_str())
{
    results.push((
        call.id.clone(),
        format!(
            "error: tool '{}' is not available in planning mode",
            call.name
        ),
    ));
    continue;
}
}

The error goes back to the LLM as a tool result, so it learns the tool is blocked and adjusts its behavior. The file is never touched.

The shared agent loop

Both plan() and execute() delegate to run_loop(). The only parameter that differs is allowed:

#![allow(unused)]
fn main() {
pub async fn plan(
    &self,
    messages: &mut Vec<Message>,
    events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
    // System prompt injection (shown earlier)
    self.run_loop(messages, Some(&self.read_only), events).await
}

pub async fn execute(
    &self,
    messages: &mut Vec<Message>,
    events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
    self.run_loop(messages, None, events).await
}
}

plan() passes Some(&self.read_only) to restrict tools. execute() passes None to allow everything.

The run_loop itself is identical to StreamingAgent::chat() from Chapter 10, with these additions:

  1. Tool definition filtering (read-only + exit_plan during plan; all during execute).
  2. The exit_plan handler that breaks the loop when the LLM signals plan completion.
  3. The execution guard for blocked tools.

Caller-driven approval flow

The approval flow lives entirely in the caller. PlanAgent does not ask for approval – it just runs whichever phase is called. This keeps the agent simple and lets the caller implement any approval UX they want.

Here is the typical flow:

#![allow(unused)]
fn main() {
let agent = PlanAgent::new(provider)
    .tool(ReadTool::new())
    .tool(WriteTool::new())
    .tool(EditTool::new())
    .tool(BashTool::new());

let mut messages = vec![Message::User("Refactor auth.rs".into())];

// Phase 1: Plan (read-only tools + exit_plan)
let (tx, _rx) = mpsc::unbounded_channel(); // consume _rx to handle streaming events
let plan = agent.plan(&mut messages, tx).await?;
println!("Plan: {plan}");

// Show plan to user, get approval
if user_approves() {
    // Phase 2: Execute (all tools)
    messages.push(Message::User("Approved. Execute the plan.".into()));
    let (tx2, _rx2) = mpsc::unbounded_channel();
    let result = agent.execute(&mut messages, tx2).await?;
    println!("Result: {result}");
} else {
    // Re-plan with feedback
    messages.push(Message::User("No, try a different approach.".into()));
    let (tx3, _rx3) = mpsc::unbounded_channel();
    let revised_plan = agent.plan(&mut messages, tx3).await?;
    println!("Revised plan: {revised_plan}");
}
}

Notice how the same messages vec is shared across phases. This is critical – the LLM sees its own plan, the user’s approval (or rejection), and all previous context when it enters the execute phase. Re-planning is just pushing feedback as a User message and calling plan() again.

sequenceDiagram
    participant C as Caller
    participant P as PlanAgent
    participant L as LLM

    C->>P: plan(&mut messages)
    P->>L: [read, bash, ask_user, exit_plan tools only]
    L-->>P: reads files, calls exit_plan
    P-->>C: "Plan: ..."

    C->>C: User reviews plan

    alt Approved
        C->>P: execute(&mut messages)
        P->>L: [all tools]
        L-->>P: writes/edits files
        P-->>C: "Done."
    else Rejected
        C->>P: plan(&mut messages) [with feedback]
        P->>L: [read, bash, ask_user, exit_plan tools only]
        L-->>P: revised plan
        P-->>C: "Revised plan: ..."
    end

Wiring it up

Add the module to mini-claw-code/src/lib.rs:

#![allow(unused)]
fn main() {
pub mod planning;
// ...
pub use planning::PlanAgent;
}

That’s it. Like streaming, plan mode is a purely additive feature – no existing code is modified.

Running the tests

cargo test -p mini-claw-code ch12

The tests verify:

  • Text response: plan() returns text when the LLM stops immediately.
  • Read tool allowed: read executes during planning.
  • Write tool blocked: write is blocked during planning; the file is NOT created; an error ToolResult is sent back to the LLM.
  • Edit tool blocked: same behavior for edit.
  • Execute allows write: write works during execution; the file IS created.
  • Full plan-then-execute: end-to-end flow – plan reads a file, approval, execute writes a file.
  • Message continuity: messages from the plan phase carry into the execute phase, including the injected system prompt.
  • read_only override: .read_only(&["read"]) excludes bash from planning.
  • Streaming events: TextDelta and Done events are emitted during planning.
  • Provider error: empty mock propagates errors correctly.
  • Builder pattern: chained .tool().read_only().plan_prompt() compiles.
  • System prompt injection: plan() injects a system prompt at position 0.
  • System prompt not duplicated: calling plan() twice doesn’t add a second system message.
  • Caller system prompt respected: if the caller provides a System message, plan() doesn’t overwrite it.
  • exit_plan tool: the LLM calls exit_plan to signal plan completion; plan() returns the plan text.
  • exit_plan not in execute: during execute(), exit_plan is not in the tool list.
  • Custom plan prompt: .plan_prompt(...) overrides the default.
  • Full flow with exit_plan: plan reads file → calls exit_plan → approve → execute writes file.

Recap

  • PlanAgent separates planning (read-only) from execution (all tools) using a single shared agent loop.
  • System prompt: plan() injects a system message telling the LLM it’s in planning mode — what tools are available, what’s blocked, and that it should call exit_plan when done.
  • exit_plan tool: the LLM explicitly signals plan completion, just like Claude Code’s ExitPlanMode. This is injected during planning and invisible during execution.
  • Double defense: definition filtering prevents the LLM from seeing blocked tools; an execution guard catches hallucinated calls.
  • Caller-driven approval: the agent doesn’t manage approval – the caller pushes approval/rejection as User messages and calls the appropriate phase.
  • Message continuity: the same messages vec flows through both phases, giving the LLM full context.
  • Streaming: both phases use StreamProvider and emit AgentEvents, just like StreamingAgent.
  • Purely additive: no changes to SimpleAgent, StreamingAgent, or any existing code.

Chapter 13: Subagents

Complex tasks are hard. Even the best LLM struggles when a single prompt asks it to research a codebase, design an approach, write the code, and verify the result – all while maintaining a coherent conversation. The context window fills up, the model loses focus, and quality degrades.

Subagents solve this with decomposition: the parent agent spawns a child agent for each subtask. The child has its own message history and tools, runs to completion, and returns a summary. The parent sees only the final answer – a clean, focused result without the noise of the child’s internal reasoning.

This is exactly how Claude Code’s Task tool works. When Claude Code needs to explore a large codebase or handle an independent subtask, it spawns a subagent that does the work and reports back. OpenCode and the Anthropic Agent SDK use the same pattern.

In this chapter you’ll build SubagentTool – a Tool implementation that spawns ephemeral child agents.

You will:

  1. Add a blanket impl Provider for Arc<P> so parent and child can share a provider.
  2. Build SubagentTool<P: Provider> with a closure-based tool factory and builder methods.
  3. Implement the Tool trait with an inlined agent loop and turn limit.
  4. Wire it up as a module and re-export.

Why subagents?

Consider this scenario:

User: "Add error handling to all API endpoints"

Agent (no subagents):
  → reads 15 files, context window fills up
  → forgets what it learned from file 3
  → produces inconsistent changes

Agent (with subagents):
  → spawns child: "Add error handling to /api/users.rs"
  → child reads 1 file, writes changes, returns "Done: added Result types"
  → spawns child: "Add error handling to /api/posts.rs"
  → child does the same
  → parent sees clean summaries, coordinates the overall task

The key insight: a subagent is just a Tool. It takes a task description as input, does work internally, and returns a string result. The parent’s agent loop doesn’t need any special handling – it calls the subagent tool the same way it calls read or bash.

Provider sharing with Arc<P>

The parent and child need to use the same LLM provider. In production this means sharing an HTTP client, API key, and configuration. Cloning the provider would duplicate connections. We want to share it cheaply.

The answer is Arc<P>. But there’s a catch: our Provider trait uses RPITIT (return-position impl Trait in trait), which means it’s not object-safe. We can’t use dyn Provider. We can use Arc<P> where P: Provider – but only if Arc<P> itself implements Provider.

A blanket impl makes this work. In types.rs:

#![allow(unused)]
fn main() {
impl<P: Provider> Provider for Arc<P> {
    fn chat<'a>(
        &'a self,
        messages: &'a [Message],
        tools: &'a [&'a ToolDefinition],
    ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a {
        (**self).chat(messages, tools)
    }
}
}

This delegates to the inner P via deref. Now Arc<MockProvider> and Arc<OpenRouterProvider> are both valid providers. Existing code is completely unchanged – if you were passing MockProvider before, it still works. The Arc wrapper is opt-in.

The SubagentTool struct

#![allow(unused)]
fn main() {
pub struct SubagentTool<P: Provider> {
    provider: Arc<P>,
    tools_factory: Box<dyn Fn() -> ToolSet + Send + Sync>,
    system_prompt: Option<String>,
    max_turns: usize,
    definition: ToolDefinition,
}
}

Three design decisions here:

Arc<P> for the provider. Parent creates Arc::new(provider), keeps a clone for itself, and passes a clone to SubagentTool. Both share the same underlying provider. Cheap, safe, no cloning of HTTP clients.

A closure factory for tools. Tools are Box<dyn Tool> – they’re not cloneable. Each child spawn needs a fresh ToolSet. A Fn() -> ToolSet closure produces one on demand. This naturally captures Arcs for shared state:

#![allow(unused)]
fn main() {
let provider = Arc::new(OpenRouterProvider::from_env()?);

SubagentTool::new(provider, || {
    ToolSet::new()
        .with(ReadTool::new())
        .with(WriteTool::new())
        .with(BashTool::new())
})
}

A max_turns safety limit. Without this, a confused child could loop forever. Defaults to 10 – generous enough for real tasks, strict enough to prevent runaway loops.

The builder

Construction uses the same fluent builder style as elsewhere in the codebase:

#![allow(unused)]
fn main() {
impl<P: Provider> SubagentTool<P> {
    pub fn new(
        provider: Arc<P>,
        tools_factory: impl Fn() -> ToolSet + Send + Sync + 'static,
    ) -> Self {
        Self {
            provider,
            tools_factory: Box::new(tools_factory),
            system_prompt: None,
            max_turns: 10,
            definition: ToolDefinition::new(
                "subagent",
                "Spawn a child agent to handle a subtask independently. \
                 The child has its own message history and tools.",
            )
            .param(
                "task",
                "string",
                "A clear description of the subtask for the child agent to complete.",
                true,
            ),
        }
    }

    pub fn system_prompt(mut self, prompt: impl Into<String>) -> Self {
        self.system_prompt = Some(prompt.into());
        self
    }

    pub fn max_turns(mut self, max: usize) -> Self {
        self.max_turns = max;
        self
    }
}
}

The tool definition exposes a single task parameter – the LLM writes a clear description of what the child should do. Minimal and effective.

The Tool implementation

The core of SubagentTool is its Tool::call() method. It inlines a minimal agent loop – the same protocol as SimpleAgent::chat() (call provider, execute tools, loop), but with a turn limit, no terminal output, and a locally-owned message vec:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl<P: Provider + 'static> Tool for SubagentTool<P> {
    fn definition(&self) -> &ToolDefinition {
        &self.definition
    }

    async fn call(&self, args: Value) -> anyhow::Result<String> {
        let task = args
            .get("task")
            .and_then(|v| v.as_str())
            .ok_or_else(|| anyhow::anyhow!("missing required parameter: task"))?;

        let tools = (self.tools_factory)();
        let defs = tools.definitions();

        let mut messages = Vec::new();
        if let Some(ref prompt) = self.system_prompt {
            messages.push(Message::System(prompt.clone()));
        }
        messages.push(Message::User(task.to_string()));

        for _ in 0..self.max_turns {
            let turn = self.provider.chat(&messages, &defs).await?;

            match turn.stop_reason {
                StopReason::Stop => {
                    return Ok(turn.text.unwrap_or_default());
                }
                StopReason::ToolUse => {
                    let mut results = Vec::with_capacity(turn.tool_calls.len());
                    for call in &turn.tool_calls {
                        let content = match tools.get(&call.name) {
                            Some(t) => t
                                .call(call.arguments.clone())
                                .await
                                .unwrap_or_else(|e| format!("error: {e}")),
                            None => format!("error: unknown tool `{}`", call.name),
                        };
                        results.push((call.id.clone(), content));
                    }
                    messages.push(Message::Assistant(turn));
                    for (id, content) in results {
                        messages.push(Message::ToolResult { id, content });
                    }
                }
            }
        }

        Ok("error: max turns exceeded".to_string())
    }
}
}

A few things to notice:

No tokio::spawn. The child runs within the parent’s Tool::call() future. This is deliberate – spawning a background task would add coordination complexity (channels, join handles, cancellation). Running inline keeps things simple and deterministic.

Fresh message history. The child starts with only a system prompt (optional) and the task as a User message. It never sees the parent’s conversation. When the child finishes, only its final text is returned to the parent as a tool result. The child’s internal messages are dropped.

Turn limit as a soft error. When max_turns is exceeded, the tool returns an error string rather than Err(...). This lets the parent LLM see the failure and decide what to do (retry with a simpler task, try a different approach, etc.), rather than crashing the entire agent loop.

Provider errors propagate. If the LLM API fails during a child turn, the error bubbles up through ? to the parent. This is intentional – API errors are infrastructure failures, not task failures.

Wiring it up

Add the module and re-export in mini-claw-code/src/lib.rs:

#![allow(unused)]
fn main() {
pub mod subagent;
// ...
pub use subagent::SubagentTool;
}

Usage example

Here’s how you’d wire up a parent agent with a subagent tool:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use mini_claw_code::*;

let provider = Arc::new(OpenRouterProvider::from_env()?);
let p = provider.clone();

let agent = SimpleAgent::new(provider)
    .tool(ReadTool::new())
    .tool(WriteTool::new())
    .tool(BashTool::new())
    .tool(SubagentTool::new(p, || {
        ToolSet::new()
            .with(ReadTool::new())
            .with(WriteTool::new())
            .with(BashTool::new())
    }));

let result = agent.run("Refactor the auth module").await?;
}

The parent LLM sees subagent in its tool list alongside read, write, and bash. When the task is complex enough, the LLM can choose to delegate via subagent – or handle it directly with the other tools. The LLM decides.

You can also give the child a specialized system prompt:

#![allow(unused)]
fn main() {
SubagentTool::new(provider, || {
    ToolSet::new()
        .with(ReadTool::new())
        .with(BashTool::new())
})
.system_prompt("You are a security auditor. Review code for vulnerabilities.")
.max_turns(15)
}

Running the tests

cargo test -p mini-claw-code ch13

The tests verify:

  • Text response: child returns text immediately (no tool calls).
  • With tool: child uses ReadTool before answering.
  • Multi-step: child makes multiple tool calls across turns.
  • Max turns exceeded: turn limit enforced, returns error string.
  • Missing task: error on missing task parameter.
  • Provider error: child provider error propagates to parent.
  • Unknown tool: child handles unknown tools gracefully.
  • Builder pattern: chaining .system_prompt().max_turns() compiles.
  • System prompt: child runs correctly with a system prompt configured.
  • Write tool: child writes a file, parent continues afterward.
  • Parent continues: parent resumes its own work after subagent completes.
  • Isolated history: child messages don’t leak into parent’s message vec.

Recap

  • SubagentTool is a Tool that spawns ephemeral child agents. The parent sees only the final answer.
  • Arc<P> blanket impl lets parent and child share a provider without cloning. Fully backward-compatible.
  • Closure factory produces a fresh ToolSet per child spawn, since Box<dyn Tool> isn’t cloneable.
  • Inlined agent loop with max_turns guard keeps SimpleAgent unchanged. No tokio::spawn needed – the child runs within Tool::call().
  • Message isolation: the child’s internal messages are local to the call() future. Only the final text crosses back to the parent.
  • Single task parameter: the LLM writes a clear task description; the child handles the rest.
  • Purely additive: the only existing change is the blanket impl in types.rs. Everything else is new code.

Chapter 14: Token Tracking

Every call to an LLM costs money. A single agent run might loop ten or twenty times, reading files, running commands, and editing code. Without tracking how many tokens you are spending, costs can silently spiral – especially during development when you are iterating fast. Claude Code shows a running token count and cost estimate at the bottom of every session for exactly this reason.

In this chapter you will build CostTracker, a struct that accumulates token usage across turns and computes an estimated cost. You will also see how the OpenAI-compatible API reports usage in its response JSON, and how our OpenRouterProvider already parses it into a TokenUsage struct on AssistantTurn.

Why track tokens?

There are two practical reasons:

  1. Cost control. LLM APIs charge per token. If your agent enters a loop that keeps reading large files, the bill adds up fast. A cost tracker lets you display a running total, set budgets, or abort early.

  2. Context window awareness. Every model has a maximum context length. As the conversation grows, input tokens increase with each turn (because you resend the full history). Tracking input tokens gives you a signal for when you are approaching the limit and might need to summarize or truncate.

How APIs report usage

OpenAI-compatible APIs (OpenRouter, OpenAI, Anthropic’s compatibility layer) include a usage object in every chat completion response:

{
  "id": "chatcmpl-abc123",
  "choices": [{ "message": { "content": "Hello!" }, "finish_reason": "stop" }],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 15
  }
}
  • prompt_tokens – how many tokens the API consumed reading your input (system prompt + conversation history + tool definitions).
  • completion_tokens – how many tokens the model generated in its response (text + tool calls).

Not every provider guarantees this field, so it is optional. But when it is present, we want to capture it.

Goal

Implement CostTracker so that:

  1. You create it with per-million-token pricing for input and output.
  2. You can record() a TokenUsage from each turn.
  3. It accumulates totals across turns and computes estimated cost.
  4. It can produce a human-readable summary string.
  5. It can be reset to zero.

The TokenUsage struct

Open mini-claw-code-starter/src/types.rs. You will see a new struct alongside the types you already know:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
}
}

This is a simple data carrier – just two numbers. The Default derive gives us TokenUsage { input_tokens: 0, output_tokens: 0 } for free, which is useful when the API omits individual fields.

The struct lives on AssistantTurn as an optional field:

#![allow(unused)]
fn main() {
pub struct AssistantTurn {
    pub text: Option<String>,
    pub tool_calls: Vec<ToolCall>,
    pub stop_reason: StopReason,
    /// Token usage for this turn, if reported by the provider.
    pub usage: Option<TokenUsage>,
}
}

The usage field is Option<TokenUsage> because not every provider reports it. MockProvider returns None (it does not call a real API), while OpenRouterProvider parses it from the JSON response.

How OpenRouterProvider parses usage

In Chapter 6 you built the HTTP provider. Now look at how it handles the usage field in openrouter.rs. The response is deserialized into these types:

#![allow(unused)]
fn main() {
#[derive(Deserialize)]
struct ChatResponse {
    choices: Vec<Choice>,
    usage: Option<ApiUsage>,
}

#[derive(Deserialize)]
struct ApiUsage {
    prompt_tokens: Option<u64>,
    completion_tokens: Option<u64>,
}
}

Both usage on ChatResponse and the individual fields on ApiUsage are optional – some providers omit them entirely, others include the object but leave fields null. At the end of the chat() method, the conversion looks like this:

#![allow(unused)]
fn main() {
let usage = resp.usage.map(|u| TokenUsage {
    input_tokens: u.prompt_tokens.unwrap_or(0),
    output_tokens: u.completion_tokens.unwrap_or(0),
});

Ok(AssistantTurn {
    text: choice.message.content,
    tool_calls,
    stop_reason,
    usage,
})
}

The double-Option pattern – Option<ApiUsage> containing Option<u64> fields – is a common defensive strategy when deserializing API responses. resp.usage.map(...) handles the outer option (no usage key at all), and unwrap_or(0) handles the inner option (key present but value null).

You do not need to modify the provider. The parsing is already done. Your job is to build the CostTracker that consumes these TokenUsage values.

Implementing CostTracker

Open mini-claw-code-starter/src/usage.rs. You will see the struct and method signatures already laid out with unimplemented!() bodies.

The design

CostTracker needs to be shared across the agent loop – you might pass it into run() or hold it alongside the agent. Because the agent takes &self (shared reference), the tracker must support mutation through &self. This is the same interior mutability pattern you used in MockProvider:

#![allow(unused)]
fn main() {
pub struct CostTracker {
    inner: Mutex<CostTrackerInner>,
    /// Price per million input tokens (USD).
    input_price: f64,
    /// Price per million output tokens (USD).
    output_price: f64,
}

struct CostTrackerInner {
    total_input: u64,
    total_output: u64,
    turn_count: u64,
}
}

The prices are immutable after construction (they describe the model, which does not change mid-session), so they live outside the Mutex. Only the running totals need interior mutability.

Step 1: Implement new()

The constructor takes two prices: input and output, both in dollars per million tokens. These are the rates you find on a model’s pricing page – for example, Claude Sonnet charges $3 per million input tokens and $15 per million output tokens.

#![allow(unused)]
fn main() {
pub fn new(input_price_per_million: f64, output_price_per_million: f64) -> Self {
    Self {
        inner: Mutex::new(CostTrackerInner {
            total_input: 0,
            total_output: 0,
            turn_count: 0,
        }),
        input_price: input_price_per_million,
        output_price: output_price_per_million,
    }
}
}

Store the prices on self and initialize all counters to zero inside a Mutex.

Step 2: Implement record()

This is the method the agent loop calls after each provider response. It takes a &TokenUsage and adds its values to the running totals:

#![allow(unused)]
fn main() {
pub fn record(&self, usage: &TokenUsage) {
    let mut inner = self.inner.lock().unwrap();
    inner.total_input += usage.input_tokens;
    inner.total_output += usage.output_tokens;
    inner.turn_count += 1;
}
}

Lock the mutex, add the token counts, bump the turn counter. That is it. The lock is held for three additions – fast enough that contention is never a problem.

Step 3: Implement the getter methods

Three simple accessors, each locking the mutex and reading a field:

#![allow(unused)]
fn main() {
pub fn total_input_tokens(&self) -> u64 {
    self.inner.lock().unwrap().total_input
}

pub fn total_output_tokens(&self) -> u64 {
    self.inner.lock().unwrap().total_output
}

pub fn turn_count(&self) -> u64 {
    self.inner.lock().unwrap().turn_count
}
}

Each method acquires and releases the lock independently. This is fine – if you needed a consistent snapshot of all three values at once, you would lock once and read all three. But for display purposes, slight inconsistency between separate calls is acceptable.

Step 4: Implement total_cost()

The cost formula is straightforward:

cost = (input_tokens * input_price + output_tokens * output_price) / 1,000,000

We divide by one million because the prices are per million tokens:

#![allow(unused)]
fn main() {
pub fn total_cost(&self) -> f64 {
    let inner = self.inner.lock().unwrap();
    (inner.total_input as f64 * self.input_price
        + inner.total_output as f64 * self.output_price)
        / 1_000_000.0
}
}

Notice we lock once and read both total_input and total_output together. This ensures the cost calculation uses a consistent pair of values.

Step 5: Implement summary()

This produces a human-readable string for display – the kind of thing you would show at the bottom of a terminal UI:

tokens: 1234 in + 567 out | cost: $0.0122

The implementation duplicates the cost calculation (instead of calling self.total_cost()) to avoid locking the mutex twice:

#![allow(unused)]
fn main() {
pub fn summary(&self) -> String {
    let inner = self.inner.lock().unwrap();
    let cost = (inner.total_input as f64 * self.input_price
        + inner.total_output as f64 * self.output_price)
        / 1_000_000.0;
    format!(
        "tokens: {} in + {} out | cost: ${:.4}",
        inner.total_input, inner.total_output, cost
    )
}
}

The {:.4} format specifier gives four decimal places – enough precision for small token counts where the cost might be fractions of a cent.

Step 6: Implement reset()

Reset all counters to zero. Useful when starting a new conversation in the same session:

#![allow(unused)]
fn main() {
pub fn reset(&self) {
    let mut inner = self.inner.lock().unwrap();
    inner.total_input = 0;
    inner.total_output = 0;
    inner.turn_count = 0;
}
}

Running the tests

Run the Chapter 14 tests:

cargo test -p mini-claw-code-starter ch14

What the tests verify

  • test_ch14_empty_tracker: A freshly created tracker has zero tokens, zero turns, and zero cost.
  • test_ch14_record_single_turn: Record one usage, verify the totals match exactly.
  • test_ch14_accumulates_across_turns: Record three usages, verify the totals are the sum of all three.
  • test_ch14_cost_calculation: Record exactly one million input and one million output tokens at $3/M and $15/M. Verify cost is $18.00.
  • test_ch14_cost_small_numbers: Record 1000 input and 200 output tokens. Verify cost is $0.006 (three tenths of a cent).
  • test_ch14_summary_format: Verify the summary string contains the expected token counts and a dollar sign.
  • test_ch14_reset: Record usage, reset, verify everything is back to zero.
  • test_ch14_zero_usage: Record a turn with zero tokens. Turn count increments but cost stays zero.
  • test_ch14_token_usage_default: Verify TokenUsage::default() gives zeros – a sanity check on the Default derive.

Wiring it into the agent loop

The tests cover CostTracker in isolation, but in practice you would wire it into your agent loop. After each call to self.provider.chat(), check if the response includes usage data and record it:

#![allow(unused)]
fn main() {
let turn = self.provider.chat(&messages, &defs).await?;

if let Some(ref usage) = turn.usage {
    cost_tracker.record(usage);
}
}

Then, after the agent finishes (or periodically during long runs), display the summary:

#![allow(unused)]
fn main() {
println!("{}", cost_tracker.summary());
// tokens: 4521 in + 892 out | cost: $0.0270
}

This is exactly what tools like Claude Code do – show a running cost estimate so you know what a session is costing in real time.

Recap

You have built a CostTracker that:

  • Accumulates input and output token counts across multiple agent turns.
  • Computes cost from per-million-token pricing.
  • Produces a summary string for display.
  • Uses Mutex for interior mutability, the same pattern as MockProvider.
  • Handles the full chain: API response -> TokenUsage on AssistantTurn -> CostTracker::record() -> running totals and cost estimate.

Token tracking is a small feature in terms of code, but it is essential for any agent you plan to use in production. Without it, you are flying blind on costs and context window usage.

What’s next

In Chapter 15: Safety Rails you will add guardrails to your agent – command filtering, path validation, and permission prompts – so it cannot accidentally rm -rf / or read files outside the project directory.

Chapter 15: Context Management

Every LLM has a context window – a fixed number of tokens it can process in a single request. Claude has 200k tokens. GPT-4o has 128k. Sounds like a lot, until your agent reads a few large files, runs a test suite, edits some code, and runs the tests again. Each tool result gets appended to the message history, and that history gets sent to the LLM on every turn. A busy session can blow past 100k tokens in minutes.

When that happens, the API either rejects the request or silently truncates your messages. Either way, the agent breaks. Real coding agents handle this automatically – Claude Code, for example, compacts the conversation when it gets too long, summarizing old messages while keeping recent context intact. The user sees a brief “auto-compacting conversation…” message and the session continues.

In this chapter you will build a ContextManager that does exactly that: tracks token usage, decides when to compact, and uses the LLM itself to summarize old messages into a short recap. You will also implement the should_compact() threshold check as an exercise.

The problem

Consider this 10-turn conversation:

User: Find the bug in src/parser.rs
  [read: src/parser.rs]              ← 500 lines of code
  [read: src/types.rs]               ← 300 lines of code
Assistant: I see the issue. The parser...
  [bash: cargo test]                  ← 200 lines of test output
  [edit: src/parser.rs]               ← patch
  [bash: cargo test]                  ← 200 lines of test output
Assistant: All tests pass now.
User: Great. Now add a --verbose flag to the CLI.
  [read: src/main.rs]                 ← 400 lines
  ...

By the time the user asks a second question, the message history already contains thousands of tokens of file contents, test output, and tool calls. Most of that detail is irrelevant to the new task. But the LLM still receives it all, which wastes tokens, increases latency, and eventually hits the context limit.

The solution: periodically compact the history by summarizing old messages and keeping only the recent ones.

The strategy

Compaction works in three steps:

  1. Detect – after each LLM turn, check if cumulative token usage has crossed a threshold.
  2. Summarize – take the old messages (everything except the system prompt and the most recent N messages) and ask the LLM to summarize them into a few sentences.
  3. Rebuild – replace the message history with: the original system prompt, the summary as a new system message, and the recent messages.
flowchart TD
    A["Messages: system + msg1 + msg2 + ... + msg20"]
    A --> B["Split"]
    B --> C["Keep: system prompt"]
    B --> D["Middle: msg1 ... msg17 → summarize"]
    B --> E["Keep: msg18, msg19, msg20"]
    D --> F["LLM summary"]
    F --> G["Rebuilt: system + summary + msg18 + msg19 + msg20"]

After compaction, the conversation has 4-5 messages instead of 20+. The LLM loses the fine-grained detail of early messages but retains the key facts and decisions through the summary. The recent messages are preserved verbatim, so the LLM has full context for whatever it is working on right now.

This is the same approach Claude Code uses. It is simple, effective, and requires no changes to the agent loop or provider – just a pre-processing step before each provider.chat() call.

The ContextManager struct

Open mini-claw-code/src/context.rs. The struct has three fields:

#![allow(unused)]
fn main() {
pub struct ContextManager {
    /// Maximum total tokens before compaction triggers.
    max_tokens: u64,
    /// Number of recent messages to always preserve during compaction.
    preserve_recent: usize,
    /// Running total of tokens used in the current conversation.
    tokens_used: u64,
}
}
  • max_tokens is the budget. When cumulative usage crosses this threshold, compaction fires. This is not the model’s context window size – it is a lower number you choose to leave headroom. For a 200k-token model, you might set this to 100k so you always have room for the LLM’s response.
  • preserve_recent is how many messages to keep verbatim. These are the messages most relevant to the current task. A value of 4-6 usually works well.
  • tokens_used is the running counter, updated after each LLM turn.

The constructor is straightforward:

#![allow(unused)]
fn main() {
impl ContextManager {
    pub fn new(max_tokens: u64, preserve_recent: usize) -> Self {
        Self {
            max_tokens,
            preserve_recent,
            tokens_used: 0,
        }
    }
}
}

Tracking token usage

The LLM API reports how many tokens each request consumed. Our AssistantTurn type carries this information in an optional usage field:

#![allow(unused)]
fn main() {
pub struct AssistantTurn {
    pub text: Option<String>,
    pub tool_calls: Vec<ToolCall>,
    pub stop_reason: StopReason,
    pub usage: Option<TokenUsage>,
}

#[derive(Debug, Clone, Default)]
pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
}
}

After each provider call, the agent records the usage:

#![allow(unused)]
fn main() {
pub fn record(&mut self, usage: &TokenUsage) {
    self.tokens_used += usage.input_tokens + usage.output_tokens;
}
}

This is a rough estimate. Input tokens grow with each turn (because the full history is resent), so summing input + output across all turns overcounts. But for a threshold check, overcounting is fine – it just means we compact a little earlier than strictly necessary, which is safer than compacting too late.

You can query the current total at any time:

#![allow(unused)]
fn main() {
pub fn tokens_used(&self) -> u64 {
    self.tokens_used
}
}

Exercise: implement should_compact()

This is your exercise for the chapter. The method signature is:

#![allow(unused)]
fn main() {
pub fn should_compact(&self) -> bool {
    // TODO: return true if tokens_used >= max_tokens
    todo!()
}
}

The logic is a single comparison. When tokens_used meets or exceeds max_tokens, it is time to compact. Implement it in the starter crate and run the tests:

cargo test -p mini-claw-code-starter ch15

Here are the tests that verify your implementation:

#![allow(unused)]
fn main() {
#[test]
fn test_ch15_below_threshold_no_compact() {
    let cm = ContextManager::new(10000, 4);
    assert!(!cm.should_compact());
}

#[test]
fn test_ch15_triggers_at_threshold() {
    let mut cm = ContextManager::new(1000, 4);
    cm.record(&TokenUsage {
        input_tokens: 600,
        output_tokens: 500,
    });
    assert!(cm.should_compact());
}

#[test]
fn test_ch15_tracks_tokens() {
    let mut cm = ContextManager::new(10000, 4);
    cm.record(&TokenUsage {
        input_tokens: 100,
        output_tokens: 50,
    });
    cm.record(&TokenUsage {
        input_tokens: 200,
        output_tokens: 100,
    });
    assert_eq!(cm.tokens_used(), 450);
}
}

The first test creates a fresh ContextManager with zero usage – it should not compact. The second records 1100 tokens against a budget of 1000 – it should compact. The third verifies that multiple record() calls accumulate correctly.

The compact() method

Once should_compact() returns true, the agent calls compact(). This is the core of context management. Let us walk through it step by step.

Guard clause: too few messages

#![allow(unused)]
fn main() {
pub async fn compact<P: Provider>(
    &mut self,
    provider: &P,
    messages: &mut Vec<Message>,
) -> anyhow::Result<()> {
    if messages.len() <= self.preserve_recent + 1 {
        return Ok(());
    }
}

If the conversation is short enough that there is nothing to summarize, bail out. No point summarizing two messages into two sentences.

Splitting the history

The method divides messages into three segments:

#![allow(unused)]
fn main() {
let keep_start = if matches!(messages.first(), Some(Message::System(_))) {
    1
} else {
    0
};

let total = messages.len();
if total <= keep_start + self.preserve_recent {
    return Ok(());
}

let middle_end = total - self.preserve_recent;
let middle = &messages[keep_start..middle_end];
}
  • Head (0..keep_start): the system prompt, if present. Always preserved.
  • Middle (keep_start..middle_end): old messages that will be summarized.
  • Tail (middle_end..total): the most recent preserve_recent messages, kept verbatim.

If the system prompt is “You are a helpful coding agent” and there are 10 messages with preserve_recent = 3, then: head = message 0, middle = messages 1-6, tail = messages 7-9.

Building the summarization prompt

The method formats each middle message into a human-readable block:

#![allow(unused)]
fn main() {
let mut summary_parts = Vec::new();
for msg in middle {
    match msg {
        Message::User(text) => summary_parts.push(format!("User: {text}")),
        Message::Assistant(turn) => {
            if let Some(ref text) = turn.text {
                summary_parts.push(format!("Assistant: {text}"));
            }
            for call in &turn.tool_calls {
                summary_parts.push(format!("  [tool: {}]", call.name));
            }
        }
        Message::ToolResult { content, .. } => {
            let preview = if content.len() > 100 {
                format!("{}...", &content[..100])
            } else {
                content.clone()
            };
            summary_parts.push(format!("  Tool result: {preview}"));
        }
        Message::System(text) => summary_parts.push(format!("System: {text}")),
    }
}
}

Notice the truncation: tool results longer than 100 characters are clipped. This matters because tool results can be huge – the entire contents of a source file, or the full output of a test suite. Including all of that in the summarization prompt would itself be expensive. The LLM only needs enough context to produce a useful summary.

The formatted parts are joined into a single summarization prompt:

#![allow(unused)]
fn main() {
let prompt = format!(
    "Summarize this conversation history in 2-3 sentences, \
     preserving key facts and decisions:\n\n{}",
    summary_parts.join("\n")
);
}

Calling the LLM

The summary prompt is sent as a fresh conversation – no tools, no history:

#![allow(unused)]
fn main() {
let summary_messages = vec![Message::User(prompt)];
let turn = provider.chat(&summary_messages, &[]).await?;
let summary_text = turn.text.unwrap_or_else(|| "Previous conversation.".into());
}

This is a neat trick: we reuse the same Provider the agent already has. No extra configuration, no special summarization model. The LLM that does the coding also does the summarization. If the provider call fails, we use a generic fallback string so the conversation can continue.

Rebuilding the message history

Finally, the method assembles the new, shorter history:

#![allow(unused)]
fn main() {
let mut new_messages = Vec::new();

// Keep leading messages (system prompt)
for msg in messages.iter().take(keep_start) {
    if let Message::System(text) = msg {
        new_messages.push(Message::System(text.clone()));
    }
}

// Insert the summary as a system message
new_messages.push(Message::System(format!(
    "[Conversation summary]: {summary_text}"
)));

// Keep recent messages
let recent_start = total - self.preserve_recent;
let recent: Vec<Message> = messages.drain(recent_start..).collect();
new_messages.extend(recent);

*messages = new_messages;
}

The summary is inserted as a Message::System(...) tagged with [Conversation summary]. This tells the LLM it is reading a recap, not a direct instruction. The recent messages come after the summary, so the LLM sees the most relevant context last – right before it generates its response.

After rebuilding, the token counter is reduced:

#![allow(unused)]
fn main() {
self.tokens_used /= 3;
}

This is a rough heuristic. The actual token savings depend on how much was summarized, but dividing by 3 is a reasonable estimate that avoids re-triggering compaction immediately.

The integration point: maybe_compact()

The maybe_compact() method ties detection and compaction together:

#![allow(unused)]
fn main() {
pub async fn maybe_compact<P: Provider>(
    &mut self,
    provider: &P,
    messages: &mut Vec<Message>,
) -> anyhow::Result<()> {
    if self.should_compact() {
        self.compact(provider, messages).await?;
    }
    Ok(())
}
}

This is the method the agent loop calls. The integration is a single line added before each provider.chat() call:

#![allow(unused)]
fn main() {
loop {
    // NEW: compact if needed before calling the LLM
    context_manager.maybe_compact(&self.provider, &mut messages).await?;

    let turn = self.provider.chat(&messages, &defs).await?;

    // Record token usage from this turn
    if let Some(ref usage) = turn.usage {
        context_manager.record(usage);
    }

    match turn.stop_reason {
        StopReason::Stop => return Ok(turn.text.unwrap_or_default()),
        StopReason::ToolUse => {
            // ... execute tools, push results ...
        }
    }
}
}

That is the entire integration. Two lines added to the existing agent loop: one to maybe compact before the call, one to record usage after. The ContextManager handles all the logic internally.

Running the tests

cargo test -p mini-claw-code ch15

What the tests verify

  • test_ch15_below_threshold_no_compact: A fresh ContextManager with zero usage should not trigger compaction.

  • test_ch15_triggers_at_threshold: After recording 1100 tokens against a budget of 1000, should_compact() returns true.

  • test_ch15_tracks_tokens: Two record() calls accumulate correctly (100 + 50 + 200 + 100 = 450).

  • test_ch15_compact_preserves_system_prompt: After compacting a 6-message conversation (system + 5 user messages), the system prompt remains as the first message and a summary message is present.

  • test_ch15_compact_too_few_messages: When preserve_recent is larger than the message count, compaction is a no-op – nothing changes.

  • test_ch15_maybe_compact_skips_when_not_needed: When token usage is below the threshold, maybe_compact() leaves messages untouched.

  • test_ch15_compact_preserves_recent: After compacting a 5-message conversation with preserve_recent = 2, the last two messages (“Recent A” and “Recent B”) are preserved verbatim.

The async tests use MockProvider to provide a canned summary response. No real API calls, no network. The mock returns a fixed summary string, and the tests verify that the message history is restructured correctly around it.

Design tradeoffs

Why not count tokens precisely? The tokens_used counter sums input and output tokens across all turns, which overcounts because input tokens are resent each turn. A precise implementation would track only incremental tokens. But the threshold approach is intentionally conservative – it triggers compaction a bit early, which is always safe. And it avoids the complexity of a token counting model.

Why not truncate instead of summarize? You could simply drop old messages. But the LLM would lose context about what it already did, leading to repeated work or contradictory actions. Summarization preserves the key facts (“I found a bug in parser.rs line 42 and fixed it, all tests pass now”) in a compact form.

Why divide tokens_used by 3? After compaction, the actual token count is unknown without re-counting. Dividing by 3 is a rough approximation that works well in practice: the summary is much shorter than the original messages, and the recent messages were already counted. The approximation errs on the side of under-counting, which means the next compaction might trigger slightly late. In practice this is fine because preserve_recent keeps enough headroom.

Why use a system message for the summary? System messages are treated with high priority by most LLMs. By tagging the summary as [Conversation summary], we signal to the model that this is background context, not an instruction or a user message. This avoids confusing the LLM about who said what.

Recap

  • Context windows are finite. Long agent sessions accumulate token-heavy tool results that eventually exhaust the budget.
  • ContextManager tracks cumulative token usage and triggers compaction when a threshold is reached.
  • should_compact() is a simple threshold check: tokens_used >= max_tokens.
  • compact() splits the message history into head (system prompt), middle (old messages to summarize), and tail (recent messages to preserve). The middle is summarized by the LLM and replaced with a single system message.
  • maybe_compact() is the integration point – one line before each provider.chat() call in the agent loop.
  • Token counting is approximate. The system errs on the side of compacting early, which is safer than compacting late.

What’s next

Your agent now manages its own context window – it can run indefinitely without hitting token limits. Combined with tools, streaming, subagents, and plan mode from earlier chapters, you have a complete coding agent framework.

The next step is yours. Extend the agent with new tools, experiment with different summarization strategies, add token-level counting with a proper tokenizer, or deploy it as a daily-driver CLI. The architecture you have built is the same one that powers production coding agents – the difference is polish, not structure.

Chapter 16: Configuration

Every production agent needs configurable behavior. Which model should it use? What is the context window limit? Are there directories it should never touch? Hardcoding these values works for a tutorial, but a real tool needs to let users override them – and override them at different levels.

Claude Code solves this with a multi-level configuration hierarchy: defaults, project settings, user settings, and environment variables. Each layer can override the one below it. This chapter walks through our implementation of the same pattern.

The layered config model

The core idea is simple: start with sensible defaults, then let each successive layer override specific values while leaving the rest untouched.

Priority (highest wins)
========================
  4. Environment variables   MINI_CLAW_MODEL=...
  3. User config             ~/.config/mini-claw/config.toml
  2. Project config          .mini-claw/config.toml
  1. Defaults                compiled into the binary

Why four layers?

  • Defaults ensure the agent works out of the box with zero configuration.
  • Project config lives in the repository (.mini-claw/config.toml). It sets project-specific rules: blocked commands, protected files, MCP servers. Every contributor on the project shares these settings.
  • User config lives in the user’s home directory (~/.config/mini-claw/config.toml on Linux/macOS). It captures personal preferences: preferred model, API base URL, custom instructions. These apply across all projects.
  • Environment variables override everything. They are useful for CI pipelines, one-off experiments, or temporarily switching models without editing any file.

This is the same pattern used by Git (system, global, local config), npm (.npmrc at multiple levels), and many other CLI tools. It is worth understanding because you will see it everywhere and can reuse it in your own projects.

The Config struct

Open mini-claw-code/src/config.rs. The top-level struct holds every configurable value:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Deserialize)]
#[serde(default)]
pub struct Config {
    pub model: String,
    pub base_url: String,
    pub max_context_tokens: u64,
    pub preserve_recent: usize,
    pub allowed_directory: Option<String>,
    pub protected_patterns: Vec<String>,
    pub blocked_commands: Vec<String>,
    pub mcp_servers: Vec<McpServerConfig>,
    pub hooks: HooksConfig,
    pub instructions: Option<String>,
}
}

A quick field-by-field tour:

FieldPurpose
modelLLM model identifier, e.g. "anthropic/claude-sonnet-4"
base_urlAPI endpoint URL
max_context_tokensToken budget before the agent triggers context compaction
preserve_recentNumber of recent messages to keep during compaction
allowed_directoryIf set, tools cannot access files outside this directory
protected_patternsGlob patterns for files that tools should never write to
blocked_commandsShell command patterns that the bash tool should reject
mcp_serversMCP server definitions (name, command, args, env)
hooksPre/post tool execution hooks
instructionsCustom system prompt text

The #[serde(default)] attribute on the struct is critical. It tells serde: “if a field is missing from the TOML input, use its Default value instead of returning an error.” This means a config file can specify just one field and every other field gets a sensible default.

Defaults

The Default implementation defines the baseline:

#![allow(unused)]
fn main() {
impl Default for Config {
    fn default() -> Self {
        Self {
            model: "openrouter/free".into(),
            base_url: "https://openrouter.ai/api/v1".into(),
            max_context_tokens: 100_000,
            preserve_recent: 6,
            allowed_directory: None,
            protected_patterns: vec![
                ".env".into(),
                ".env.*".into(),
                ".git/**".into(),
            ],
            blocked_commands: vec![
                "rm -rf /".into(),
                "sudo *".into(),
                "curl * | bash".into(),
                "curl * | sh".into(),
            ],
            mcp_servers: Vec::new(),
            hooks: HooksConfig::default(),
            instructions: None,
        }
    }
}
}

The defaults are deliberately conservative. The free model keeps the barrier to entry low. The protected patterns prevent the agent from overwriting .env files or anything inside .git/. The blocked commands stop the most dangerous shell operations. A user who wants to loosen these restrictions can do so in their config file.

Nested config types

McpServerConfig

MCP servers are defined as a list of entries. Each entry describes how to spawn a server process:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Deserialize)]
pub struct McpServerConfig {
    pub name: String,
    pub command: String,
    #[serde(default)]
    pub args: Vec<String>,
    #[serde(default)]
    pub env: std::collections::HashMap<String, String>,
}
}

In TOML, this uses the double-bracket array-of-tables syntax:

[[mcp_servers]]
name = "filesystem"
command = "npx"
args = ["@anthropic/mcp-server-filesystem"]

The #[serde(default)] on args and env means you can omit them if the server needs no arguments or extra environment variables.

HooksConfig and ShellHookConfig

Hooks let you run shell commands before or after the agent executes a tool. For example, you might lint a file after the agent writes to it, or log every bash command.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default, Deserialize)]
#[serde(default)]
pub struct HooksConfig {
    pub pre_tool: Vec<ShellHookConfig>,
    pub post_tool: Vec<ShellHookConfig>,
}

#[derive(Debug, Clone, Deserialize)]
pub struct ShellHookConfig {
    pub tool_pattern: Option<String>,
    pub command: String,
    #[serde(default = "default_hook_timeout")]
    pub timeout_ms: u64,
}

fn default_hook_timeout() -> u64 {
    5000
}
}

A few things to note:

  • HooksConfig uses #[serde(default)] at the struct level, so a config file that does not mention hooks at all will get empty pre_tool and post_tool vectors.
  • ShellHookConfig uses #[serde(default = "default_hook_timeout")] on timeout_ms. This is a different form of the default attribute: instead of using the type’s Default trait, it calls a specific function. Here, default_hook_timeout() returns 5000 milliseconds.
  • tool_pattern is an Option<String>. When None, the hook runs for every tool. When set to something like "bash", it only runs for the bash tool.

In TOML:

[[hooks.pre_tool]]
command = "echo pre"
tool_pattern = "bash"
timeout_ms = 3000

TOML deserialization

The toml crate handles deserialization. Because Config derives Deserialize and has #[serde(default)], parsing a minimal TOML file works seamlessly:

#![allow(unused)]
fn main() {
let toml_str = r#"
    model = "anthropic/claude-sonnet-4"
    max_context_tokens = 50000
"#;
let config: Config = toml::from_str(toml_str).unwrap();
}

This produces a Config where model is "anthropic/claude-sonnet-4", max_context_tokens is 50000, and every other field has its default value. The #[serde(default)] attribute is doing all the heavy lifting – without it, serde would require every field to be present in the TOML.

This is also why we chose TOML over JSON for configuration files. TOML is designed for human-editable config: it supports comments, has clean syntax for nested tables and arrays, and does not require trailing commas or quoting of simple strings.

ConfigLoader

The ConfigLoader struct ties everything together. It has no fields – it is just a namespace for the loading logic:

#![allow(unused)]
fn main() {
pub struct ConfigLoader;
}

The load() method

ConfigLoader::load() is the main entry point. It applies all four layers in order:

#![allow(unused)]
fn main() {
impl ConfigLoader {
    pub fn load() -> Config {
        let mut config = Config::default();

        // Layer 1: Project config
        if let Some(project_config) = Self::load_file(".mini-claw/config.toml") {
            Self::merge(&mut config, project_config);
        }

        // Layer 2: User config
        if let Some(user_dir) = dirs::config_dir() {
            let user_path = user_dir.join("mini-claw/config.toml");
            if let Some(user_config) = Self::load_path(&user_path) {
                Self::merge(&mut config, user_config);
            }
        }

        // Layer 3: Environment variable overrides
        if let Ok(model) = std::env::var("MINI_CLAW_MODEL") {
            config.model = model;
        }
        if let Ok(url) = std::env::var("MINI_CLAW_BASE_URL") {
            config.base_url = url;
        }
        if let Ok(tokens) = std::env::var("MINI_CLAW_MAX_TOKENS")
            && let Ok(n) = tokens.parse()
        {
            config.max_context_tokens = n;
        }

        config
    }
}
}

The flow:

  1. Start with Config::default().
  2. If .mini-claw/config.toml exists in the current directory, parse it and merge it into the config.
  3. Use the dirs crate to find the platform-appropriate user config directory (~/.config on Linux, ~/Library/Application Support on macOS). If mini-claw/config.toml exists there, merge it in.
  4. Check three environment variables (MINI_CLAW_MODEL, MINI_CLAW_BASE_URL, MINI_CLAW_MAX_TOKENS) and override the corresponding fields if set.

Each file loading step uses if let Some(...) – if the file does not exist or cannot be parsed, the step is silently skipped. This is intentional: config files are optional at every level.

Notice the let ... && let ... syntax in the environment variable parsing for MINI_CLAW_MAX_TOKENS. This is a let-chain: the inner let Ok(n) = tokens.parse() only runs if the outer let Ok(tokens) succeeds. If the environment variable exists but is not a valid number, the override is skipped.

File loading helpers

Two helper methods handle reading and parsing TOML files:

#![allow(unused)]
fn main() {
pub fn load_path(path: &Path) -> Option<Config> {
    let content = std::fs::read_to_string(path).ok()?;
    toml::from_str(&content).ok()
}

fn load_file(relative_path: &str) -> Option<Config> {
    let path = PathBuf::from(relative_path);
    Self::load_path(&path)
}
}

Both return Option<Config>. The ? operator on .ok() converts Result into Option, so any I/O error or parse error produces None and the layer is skipped.

load_path is public – callers can use it to load a config from any arbitrary path. load_file is private and handles the relative path case for project config.

The merge strategy

The merge() method is where the layered override logic lives:

#![allow(unused)]
fn main() {
fn merge(base: &mut Config, overlay: Config) {
    if overlay.model != Config::default().model {
        base.model = overlay.model;
    }
    if overlay.base_url != Config::default().base_url {
        base.base_url = overlay.base_url;
    }
    if overlay.max_context_tokens != Config::default().max_context_tokens {
        base.max_context_tokens = overlay.max_context_tokens;
    }
    if overlay.preserve_recent != Config::default().preserve_recent {
        base.preserve_recent = overlay.preserve_recent;
    }
    if overlay.allowed_directory.is_some() {
        base.allowed_directory = overlay.allowed_directory;
    }
    if !overlay.protected_patterns.is_empty()
        && overlay.protected_patterns != Config::default().protected_patterns
    {
        base.protected_patterns = overlay.protected_patterns;
    }
    if !overlay.blocked_commands.is_empty()
        && overlay.blocked_commands != Config::default().blocked_commands
    {
        base.blocked_commands = overlay.blocked_commands;
    }
    if !overlay.mcp_servers.is_empty() {
        base.mcp_servers = overlay.mcp_servers;
    }
    if overlay.instructions.is_some() {
        base.instructions = overlay.instructions;
    }
}
}

The merge logic compares each overlay field against the default. If a field in the overlay still has its default value, it was probably not set in the TOML file (remember, #[serde(default)] fills missing fields with defaults). So the base value is kept. Only explicitly-set values override.

This is a pragmatic compromise. A more sophisticated approach would track which fields were explicitly set (using something like Option<T> for every field, or a separate “was this set?” bitfield). But comparing against defaults works well in practice and keeps the code simple.

One subtlety: Vec fields like protected_patterns and blocked_commands check both that the overlay is non-empty and that it differs from the default. This prevents an edge case where deserializing a TOML file that does not mention protected_patterns would produce the default value (via #[serde(default)]) and then “override” the base with the same defaults.

Environment variable overrides

The environment variable layer is the simplest – no file loading, no merging, just direct assignment:

#![allow(unused)]
fn main() {
if let Ok(model) = std::env::var("MINI_CLAW_MODEL") {
    config.model = model;
}
}

Only three fields are exposed as environment variables: model, base_url, and max_context_tokens. These are the values most likely to change between runs. Complex structures like mcp_servers and hooks are not practical to express as environment variables, so they are only configurable through files.

This is a common pattern in CLI tools: environment variables handle the “quick override” case, while config files handle the “persistent, structured settings” case.

Running the tests

cargo test -p mini-claw-code ch16

The tests cover each layer and their interactions:

  • test_ch16_default_config – verifies that Config::default() returns sensible values: the free model, 100k token limit, non-empty protected patterns and blocked commands.

  • test_ch16_load_from_toml – parses a TOML string with two fields and checks that both are set correctly.

  • test_ch16_default_fills_missing_fields – parses a TOML string with only model set. Verifies that unspecified fields fall back to their defaults. This is the #[serde(default)] attribute in action.

  • test_ch16_load_nonexistent_path – calls ConfigLoader::load_path() on a path that does not exist. Confirms it returns None instead of panicking.

  • test_ch16_mcp_server_config – parses TOML with a [[mcp_servers]] block. Verifies that the array-of-tables syntax deserializes into a Vec<McpServerConfig> correctly.

  • test_ch16_hooks_config – parses TOML with a [[hooks.pre_tool]] block. Verifies the hook’s command, tool pattern, and timeout.

  • test_ch16_env_override – sets MINI_CLAW_MODEL as an environment variable, calls ConfigLoader::load(), and verifies the model was overridden. Note that the test uses unsafe blocks around set_var and remove_var – as of Rust 2024 edition, modifying environment variables is unsafe because it can cause undefined behavior when another thread reads the environment concurrently.

  • test_ch16_protected_patterns_default – verifies that the default protected patterns include .env and .git/**.

Recap

  • Layered configuration is a widely-used design pattern: defaults, project settings, user settings, and environment variables, each overriding the layer below.
  • The Config struct uses #[serde(default)] so that TOML files only need to specify the fields they want to change.
  • Nested types (McpServerConfig, HooksConfig, ShellHookConfig) model structured configuration with their own serde attributes and defaults.
  • ConfigLoader::load() applies all four layers in order, using a merge() function that only overrides fields that differ from the default.
  • Environment variables provide the highest-priority override for the most commonly changed fields.
  • File loading is resilient: missing or unparseable files are silently skipped.

This pattern is reusable well beyond coding agents. Any CLI tool that needs per-project and per-user settings can use the same approach: define a config struct with serde defaults, load files from known paths, merge non-default values, and apply environment variable overrides last.

Chapter 17: Project Instructions

Every coding agent worth its salt understands the project it is working in. Claude Code reads CLAUDE.md files to learn your coding conventions, preferred libraries, and project-specific quirks. Your agent should do the same.

In this chapter you will build an InstructionLoader that discovers instruction files by walking the filesystem upward from the current directory, loads their contents, and formats them for injection into the agent’s system prompt. It is a small piece of infrastructure, but the payoff is immediate – your agent starts respecting project context the moment it launches.

Goal

Implement InstructionLoader so that:

  1. Given a starting directory, it walks upward toward the filesystem root looking for instruction files (e.g. CLAUDE.md).
  2. It returns discovered paths in root-first order (outermost files first, innermost last).
  3. It loads and concatenates file contents with clear headers.
  4. It produces a formatted section ready for the system prompt.

The discovery pattern

Consider a project with this layout:

/home/user/CLAUDE.md               <-- global preferences
/home/user/projects/CLAUDE.md      <-- org-level conventions
/home/user/projects/my-app/CLAUDE.md  <-- project-specific rules
/home/user/projects/my-app/src/    <-- you are here

When the agent starts in /home/user/projects/my-app/src/, it should walk upward, checking each directory for instruction files. After collecting everything, it reverses the list so that the broadest context (closest to root) appears first and project-specific overrides appear last. This mirrors how Claude Code layers its own CLAUDE.md files.

flowchart TB
    A["/home/user/projects/my-app/src/"] -->|"parent()"| B["/home/user/projects/my-app/"]
    B -->|"parent()"| C["/home/user/projects/"]
    C -->|"parent()"| D["/home/user/"]
    D -->|"parent()"| E["/home/"]
    E -->|"parent()"| F["/"]
    B -. "CLAUDE.md found" .-> G["Collect"]
    C -. "CLAUDE.md found" .-> G
    D -. "CLAUDE.md found" .-> G
    G -->|"reverse()"| H["Root-first order"]

The implementation

Create a new file at mini-claw-code-starter/src/instructions.rs. You will also need to add pub mod instructions; to your lib.rs and re-export the struct:

#![allow(unused)]
fn main() {
pub use instructions::InstructionLoader;
}

The struct

InstructionLoader holds a list of file names to search for:

#![allow(unused)]
fn main() {
use std::path::{Path, PathBuf};

pub struct InstructionLoader {
    file_names: Vec<String>,
}
}

It is deliberately simple – no async, no caching, just a synchronous walker. Instruction files are tiny and loaded once at startup, so there is no need for the complexity of async I/O here.

Step 1: Constructors

Provide two ways to create a loader. The first accepts an explicit list of file names:

#![allow(unused)]
fn main() {
impl InstructionLoader {
    pub fn new(file_names: &[&str]) -> Self {
        Self {
            file_names: file_names.iter().map(|s| s.to_string()).collect(),
        }
    }
}
}

The second provides sensible defaults:

#![allow(unused)]
fn main() {
pub fn default_files() -> Self {
    Self::new(&["CLAUDE.md", ".mini-claw/instructions.md"])
}
}

This lets users customize the file names if they want, while the common case requires no configuration at all.

Step 2: discover() – filesystem traversal

This is the core method. It takes a starting directory and walks upward:

#![allow(unused)]
fn main() {
pub fn discover(&self, start_dir: &Path) -> Vec<PathBuf> {
    let mut found = Vec::new();
    let mut dir = Some(start_dir.to_path_buf());

    while let Some(current) = dir {
        for name in &self.file_names {
            let candidate = current.join(name);
            if candidate.is_file() {
                found.push(candidate);
            }
        }
        dir = current.parent().map(|p| p.to_path_buf());
    }

    // Reverse so root-level files come first
    found.reverse();
    found
}
}

Walk through the key details:

  • dir = Some(start_dir.to_path_buf()) – We use Option<PathBuf> to drive the loop. When parent() returns None (we have reached the root), the loop ends.
  • Inner loop over file_names – At each directory level we check every file name in the search list. This means a single directory can contribute multiple instruction files if both CLAUDE.md and .mini-claw/instructions.md exist there.
  • candidate.is_file() – A synchronous filesystem check. We only collect paths that actually exist and are files.
  • found.reverse() – The traversal naturally produces innermost-first order (we start at the deepest directory). Reversing gives us root-first order, which is what we want for layering: broad context first, specific overrides last.

Step 3: load() – reading and concatenating

With discovery in hand, loading is straightforward:

#![allow(unused)]
fn main() {
pub fn load(&self, start_dir: &Path) -> Option<String> {
    let paths = self.discover(start_dir);
    if paths.is_empty() {
        return None;
    }

    let mut sections = Vec::new();
    for path in &paths {
        if let Ok(content) = std::fs::read_to_string(path) {
            let content = content.trim().to_string();
            if !content.is_empty() {
                sections.push(format!(
                    "# Instructions from {}\n\n{}",
                    path.display(),
                    content
                ));
            }
        }
    }

    if sections.is_empty() {
        None
    } else {
        Some(sections.join("\n\n---\n\n"))
    }
}
}

A few things to note:

  • Returns Option<String>None means no instruction files were found (or all were empty). This makes it easy for the caller to skip injection entirely.
  • content.trim() – Strips leading/trailing whitespace so empty files (or files with only whitespace) are excluded.
  • Header per file – Each section starts with # Instructions from /path/to/CLAUDE.md so the LLM (and you, when debugging) can see exactly where each instruction came from.
  • --- separator – A horizontal rule between sections keeps the output readable when multiple files are concatenated.

Step 4: system_prompt_section() – ready for the agent

The final method wraps the loaded content with a preamble that tells the LLM to follow the instructions:

#![allow(unused)]
fn main() {
pub fn system_prompt_section(&self, start_dir: &Path) -> Option<String> {
    self.load(start_dir).map(|content| {
        format!(
            "The following project instructions were loaded automatically. \
             Follow them carefully:\n\n{content}"
        )
    })
}
}

This returns None when there are no instructions, so integrating it is clean:

#![allow(unused)]
fn main() {
// In your agent setup code:
let loader = InstructionLoader::default_files();
if let Some(section) = loader.system_prompt_section(&current_dir) {
    messages.insert(0, Message::System(section));
}
}

The Message::System variant you defined back in Chapter 1 is the right place for this. System messages sit at the front of the conversation and guide the LLM’s behavior for the entire session.

Integrating with the agent

To wire this into your agent, add instruction loading to your startup code (for example, in main() or wherever you build the initial message list). The pattern is:

  1. Determine the current working directory.
  2. Create an InstructionLoader (usually with default_files()).
  3. Call system_prompt_section().
  4. If it returns Some, prepend a Message::System to your conversation.
#![allow(unused)]
fn main() {
use std::env;
use mini_claw_code::{InstructionLoader, Message};

let cwd = env::current_dir().expect("failed to get current directory");
let loader = InstructionLoader::default_files();

let mut messages = Vec::new();
if let Some(instructions) = loader.system_prompt_section(&cwd) {
    messages.push(Message::System(instructions));
}
// ... continue with user prompt and agent loop
}

That is it. No changes to the agent loop, no changes to the provider. The instructions flow in as part of the system prompt and the LLM sees them on every turn.

Running the tests

Run the Chapter 17 tests:

cargo test -p mini-claw-code-starter ch17

What the tests verify

  • test_ch17_discover_in_current_dir: Creates a temp directory with a CLAUDE.md file and verifies discover() finds it.
  • test_ch17_discover_in_parent: Creates a CLAUDE.md in a parent directory and starts discovery from a child. The file should still be found.
  • test_ch17_no_files_found: Searches for a nonexistent file name and verifies the result is empty.
  • test_ch17_load_content: Writes a CLAUDE.md with known content and verifies load() returns it.
  • test_ch17_load_empty_file: An empty file should cause load() to return None – empty instructions are not useful.
  • test_ch17_multiple_file_names: Creates both CLAUDE.md and .mini-claw/instructions.md in the same directory and verifies both are loaded.
  • test_ch17_system_prompt_section: Verifies the output includes the preamble text (“project instructions”) and the file content.
  • test_ch17_default_files: Confirms default_files() does not panic.

Recap

You built a project instruction loader with three layers:

  • discover() walks the filesystem upward, collecting instruction file paths in root-first order.
  • load() reads and concatenates those files with clear headers and separators.
  • system_prompt_section() wraps the result for direct injection into Message::System.

The key design choices:

  • Root-first ordering ensures broad conventions appear before project-specific overrides, letting the LLM resolve conflicts by giving priority to the most specific instructions (which appear last).
  • Option<String> return types make it trivial to skip injection when no files exist.
  • Synchronous I/O is appropriate here – instruction files are small and loaded once at startup.

Your agent now reads project context automatically. Drop a CLAUDE.md in any directory and the agent picks it up. This is the same pattern that makes tools like Claude Code project-aware from the first prompt.

Chapter 18: Safety Rails

Your agent can now read files, write files, edit code, and run arbitrary shell commands. Take a moment to appreciate what that means: the LLM – a statistical model that occasionally hallucinates – has root-level access to your file system and can execute any command your user account can. It can rm -rf /. It can read /etc/passwd. It can overwrite your .env file with your API keys exposed. That is terrifying.

Production coding agents like Claude Code invest heavily in multi-layered safety. In this chapter you will build a miniature version of those safety rails: a set of composable checks that run before every tool call, blocking dangerous operations before they reach the file system or shell.

flowchart LR
    LLM -- "tool call" --> SC["Safety Checks"]
    SC -- "pass" --> Tool
    SC -- "blocked" --> Err["Error returned<br/>to LLM"]

Goal

Implement four types in safety.rs:

  1. SafetyCheck trait – the common interface every check implements.
  2. PathValidator – ensures file paths stay inside an allowed directory.
  3. CommandFilter – blocks dangerous shell commands by glob pattern.
  4. ProtectedFileCheck – prevents writes to sensitive files like .env.

Then implement SafeToolWrapper – a decorator that wraps any Box<dyn Tool> with a list of safety checks, running them before delegating to the inner tool.

Key Rust concepts

The decorator pattern with trait objects

Rust does not have class inheritance, but you can achieve the decorator pattern with trait objects. A decorator struct holds a Box<dyn Tool> and itself implements Tool. From the outside it looks like any other tool. Inside, it adds behavior (safety checks) before delegating to the wrapped tool.

#![allow(unused)]
fn main() {
struct SafeToolWrapper {
    inner: Box<dyn Tool>,
    checks: Vec<Box<dyn SafetyCheck>>,
}

impl Tool for SafeToolWrapper {
    fn definition(&self) -> &ToolDefinition {
        self.inner.definition()  // delegate
    }

    async fn call(&self, args: Value) -> anyhow::Result<String> {
        // run checks first, then delegate
        self.inner.call(args).await
    }
}
}

This is the same idea as Python’s functools.wraps or the classic Gang of Four decorator, but expressed through Rust’s trait system.

std::path::Path::canonicalize()

Canonicalizing a path resolves all ., .., and symbolic links, producing an absolute path that cannot be tricked by directory traversal:

#![allow(unused)]
fn main() {
let sneaky = Path::new("/home/user/project/../../../etc/passwd");
let resolved = sneaky.canonicalize()?;
// resolved == "/etc/passwd"
}

This is how you defeat ../ attacks. After canonicalization, a simple starts_with check is enough to verify containment.

glob::Pattern

The glob crate provides Unix-style glob matching. You will use it to match both commands and file paths against patterns:

#![allow(unused)]
fn main() {
let pattern = glob::Pattern::new("sudo *").unwrap();
assert!(pattern.matches("sudo reboot"));
assert!(!pattern.matches("echo hello"));
}

The * matches any sequence of characters, ? matches any single character, and [abc] matches character classes. This gives you flexible pattern-based filtering without writing complex regex.


Step 1: The SafetyCheck trait

Open mini-claw-code-starter/src/safety.rs and start with the trait that all safety checks will implement.

#![allow(unused)]
fn main() {
use std::path::{Path, PathBuf};

use async_trait::async_trait;
use serde_json::Value;

use crate::types::{Tool, ToolDefinition};

/// A check that runs before a tool call is executed.
///
/// Implementations validate tool arguments and return `Ok(())` to allow
/// execution or `Err(reason)` to block it.
pub trait SafetyCheck: Send + Sync {
    fn check(&self, tool_name: &str, args: &Value) -> Result<(), String>;
}
}

A few things to notice:

  • The method is synchronous. Safety checks inspect arguments – they do not need to do I/O or anything async. Keeping them sync makes them cheap and easy to compose.
  • It returns Result<(), String>, not anyhow::Result. The String error is the human-readable reason the check failed. This keeps safety checks self-contained with no dependency on anyhow.
  • The trait is Send + Sync because tools run inside an async runtime and may be shared across tasks.
  • Every check receives the tool name and the raw arguments. This lets a single check implementation decide which tools it cares about (e.g. a path validator only inspects read, write, and edit).

Step 2: PathValidator

The first real check prevents directory traversal attacks. A user (or a confused LLM) might ask to read ../../etc/passwd or write to /root/.ssh/authorized_keys. PathValidator ensures every file path resolves to somewhere inside an allowed directory.

The struct

#![allow(unused)]
fn main() {
pub struct PathValidator {
    allowed_dir: PathBuf,
}

impl PathValidator {
    pub fn new(allowed_dir: impl Into<PathBuf>) -> Self {
        Self {
            allowed_dir: allowed_dir.into(),
        }
    }
}
}

The core method: validate_path

This is where the real logic lives. The method takes a raw path string and either accepts or rejects it.

#![allow(unused)]
fn main() {
pub fn validate_path(&self, path: &str) -> Result<(), String> {
    let target = Path::new(path);

    // Resolve to absolute path
    let resolved = if target.is_absolute() {
        target.to_path_buf()
    } else {
        self.allowed_dir.join(target)
    };
}

If the path is relative (like src/main.rs), we join it with the allowed directory to get an absolute path. If it is already absolute, we use it as-is.

Next, canonicalize both paths. This is the critical step – it collapses any .. segments:

#![allow(unused)]
fn main() {
    let canonical_allowed = self
        .allowed_dir
        .canonicalize()
        .map_err(|e| format!("cannot resolve allowed directory: {e}"))?;

    let canonical_target = if resolved.exists() {
        resolved
            .canonicalize()
            .map_err(|e| format!("cannot resolve path: {e}"))?
}

But what about new files that do not exist yet? You cannot canonicalize a non-existent path. The trick is to canonicalize the parent directory and then append the filename:

#![allow(unused)]
fn main() {
    } else {
        // For new files, check the parent directory
        let parent = resolved.parent().ok_or("invalid path")?;
        if parent.exists() {
            let mut canonical = parent
                .canonicalize()
                .map_err(|e| format!("cannot resolve parent: {e}"))?;
            if let Some(filename) = resolved.file_name() {
                canonical.push(filename);
            }
            canonical
        } else {
            return Err(format!(
                "parent directory does not exist: {}",
                parent.display()
            ));
        }
    };
}

Finally, the containment check. After canonicalization, starts_with is safe:

#![allow(unused)]
fn main() {
    if canonical_target.starts_with(&canonical_allowed) {
        Ok(())
    } else {
        Err(format!(
            "path {} is outside allowed directory {}",
            canonical_target.display(),
            canonical_allowed.display()
        ))
    }
}
}
flowchart TD
    A["Raw path string"] --> B["Resolve to absolute"]
    B --> C{"File exists?"}
    C -- "yes" --> D["canonicalize()"]
    C -- "no" --> E["canonicalize parent<br/>+ append filename"]
    D --> F{"starts_with<br/>allowed_dir?"}
    E --> F
    F -- "yes" --> G["Ok(())"]
    F -- "no" --> H["Err: outside allowed dir"]

Implementing SafetyCheck for PathValidator

The trait implementation decides which tools this check applies to. Path validation only makes sense for tools that take a "path" argument:

#![allow(unused)]
fn main() {
impl SafetyCheck for PathValidator {
    fn check(&self, tool_name: &str, args: &Value) -> Result<(), String> {
        match tool_name {
            "read" | "write" | "edit" => {
                if let Some(path) = args.get("path").and_then(|v| v.as_str()) {
                    self.validate_path(path)
                } else {
                    Ok(()) // No path argument, nothing to check
                }
            }
            _ => Ok(()),
        }
    }
}
}

Notice the _ => Ok(()) arm. The bash tool does not have a "path" argument, so the path validator silently allows it. Each check is responsible only for what it understands.


Step 3: CommandFilter

The second layer blocks dangerous shell commands. You do not want the LLM to run rm -rf /, sudo anything, or write directly to block devices.

The struct

#![allow(unused)]
fn main() {
pub struct CommandFilter {
    blocked_patterns: Vec<glob::Pattern>,
}
}

Constructor and defaults

#![allow(unused)]
fn main() {
impl CommandFilter {
    pub fn new(patterns: &[String]) -> Self {
        Self {
            blocked_patterns: patterns
                .iter()
                .filter_map(|p| glob::Pattern::new(p).ok())
                .collect(),
        }
    }

    pub fn default_filters() -> Self {
        Self::new(&[
            "rm -rf /".into(),
            "rm -rf /*".into(),
            "sudo *".into(),
            "> /dev/sda*".into(),
            "mkfs.*".into(),
            "dd if=*of=/dev/*".into(),
            ":(){:|:&};:".into(),
        ])
    }
}
}

The default_filters() method creates a baseline set of blocked patterns. That last one – :(){:|:&};: – is the infamous bash fork bomb. The filter_map call in the constructor silently drops any patterns that fail to parse, which is a reasonable default for a list of glob strings.

The matching method

#![allow(unused)]
fn main() {
pub fn is_blocked(&self, command: &str) -> Option<&str> {
    let trimmed = command.trim();
    for pattern in &self.blocked_patterns {
        if pattern.matches(trimmed) {
            return Some(pattern.as_str());
        }
    }
    None
}
}

It returns Some(pattern_str) when a match is found so the error message can tell the user which pattern was triggered. Returning None means the command is allowed.

Implementing SafetyCheck

#![allow(unused)]
fn main() {
impl SafetyCheck for CommandFilter {
    fn check(&self, tool_name: &str, args: &Value) -> Result<(), String> {
        if tool_name != "bash" {
            return Ok(());
        }
        if let Some(command) = args.get("command").and_then(|v| v.as_str()) {
            if let Some(pattern) = self.is_blocked(command) {
                Err(format!("blocked command matching pattern `{pattern}`"))
            } else {
                Ok(())
            }
        } else {
            Ok(())
        }
    }
}
}

This check only fires for the bash tool. It extracts the "command" argument and tests it against every blocked pattern. Clean and focused.


Step 4: ProtectedFileCheck

The third layer protects sensitive files from being overwritten. Even if a path is inside the allowed directory, you might not want the LLM writing to .env, .git/config, or credentials.json.

The struct

#![allow(unused)]
fn main() {
pub struct ProtectedFileCheck {
    patterns: Vec<glob::Pattern>,
}

impl ProtectedFileCheck {
    pub fn new(patterns: &[String]) -> Self {
        Self {
            patterns: patterns
                .iter()
                .filter_map(|p| glob::Pattern::new(p).ok())
                .collect(),
        }
    }
}
}

Implementing SafetyCheck

This check only applies to write operations (write and edit). Reading a sensitive file is less dangerous than overwriting it:

#![allow(unused)]
fn main() {
impl SafetyCheck for ProtectedFileCheck {
    fn check(&self, tool_name: &str, args: &Value) -> Result<(), String> {
        match tool_name {
            "write" | "edit" => {
                if let Some(path) = args.get("path").and_then(|v| v.as_str()) {
                    for pattern in &self.patterns {
                        if pattern.matches(path)
                            || pattern.matches(
                                Path::new(path)
                                    .file_name()
                                    .unwrap_or_default()
                                    .to_str()
                                    .unwrap_or(""),
                            )
                        {
                            return Err(format!(
                                "file `{path}` is protected (matches pattern `{}`)",
                                pattern.as_str()
                            ));
                        }
                    }
                    Ok(())
                } else {
                    Ok(())
                }
            }
            _ => Ok(()),
        }
    }
}
}

There is a subtlety here: the check matches the pattern against both the full path and just the filename. This means a pattern like .env will match /home/user/project/.env as well as just .env. Without this, a user would need to write patterns for every possible directory prefix.


Step 5: SafeToolWrapper – the decorator

Now you have three independent safety checks. The final piece is the glue that attaches them to actual tools. SafeToolWrapper wraps any Box<dyn Tool> and runs all checks before delegating to the inner tool.

The struct

#![allow(unused)]
fn main() {
pub struct SafeToolWrapper {
    inner: Box<dyn Tool>,
    checks: Vec<Box<dyn SafetyCheck>>,
}
}

Two fields: the wrapped tool and a list of checks (each a trait object). This means you can mix and match checks freely – attach just a path validator, or stack all three.

Constructors

#![allow(unused)]
fn main() {
impl SafeToolWrapper {
    pub fn new(tool: Box<dyn Tool>, checks: Vec<Box<dyn SafetyCheck>>) -> Self {
        Self {
            inner: tool,
            checks,
        }
    }

    pub fn with_check(tool: Box<dyn Tool>, check: impl SafetyCheck + 'static) -> Self {
        Self::new(tool, vec![Box::new(check)])
    }
}
}

with_check is a convenience for the common case of a single check. The 'static bound is needed because the check will be stored in a Box.

Implementing Tool

This is the core of the decorator pattern:

#![allow(unused)]
fn main() {
#[async_trait]
impl Tool for SafeToolWrapper {
    fn definition(&self) -> &ToolDefinition {
        self.inner.definition()
    }

    async fn call(&self, args: Value) -> anyhow::Result<String> {
        let tool_name = self.inner.definition().name;
        for check in &self.checks {
            if let Err(reason) = check.check(tool_name, &args) {
                return Ok(format!("error: safety check failed: {reason}"));
            }
        }
        self.inner.call(args).await
    }
}
}

Key design decisions:

  1. definition() delegates directly. The wrapped tool’s schema is unchanged. The LLM sees the exact same tool definition – it has no idea safety checks exist. The safety layer is invisible.

  2. Failed checks return Ok(...), not Err(...). This is intentional. A safety check failure is not a program crash – it is a message back to the LLM explaining why the operation was blocked. The LLM can then adjust its approach. If we returned Err, the agent loop might interpret it as a fatal error and abort.

  3. All checks run sequentially. If any check fails, the tool call is blocked immediately. The remaining checks do not run. This is a fail-fast approach – one “no” is enough.

  4. The tool name comes from the inner tool’s definition. This means checks see the real tool name (e.g. "read", "bash") and can filter accordingly.


Putting it together

Here is how you would wire up safety checks when building your agent:

#![allow(unused)]
fn main() {
use crate::safety::*;
use crate::tools::*;

// Create a ReadTool with path validation
let allowed_dir = std::env::current_dir().unwrap();
let validator = PathValidator::new(&allowed_dir);
let safe_read = SafeToolWrapper::with_check(
    Box::new(ReadTool::new()),
    validator,
);

// Create a BashTool with command filtering
let safe_bash = SafeToolWrapper::with_check(
    Box::new(BashTool),
    CommandFilter::default_filters(),
);

// Create a WriteTool with multiple checks
let safe_write = SafeToolWrapper::new(
    Box::new(WriteTool),
    vec![
        Box::new(PathValidator::new(&allowed_dir)),
        Box::new(ProtectedFileCheck::new(&[
            ".env".into(),
            ".env.*".into(),
            "*.pem".into(),
            "*.key".into(),
        ])),
    ],
);
}

Because SafeToolWrapper itself implements Tool, it slots into the existing ToolSet with no changes to the agent loop. The agent does not know or care that safety checks exist. This is the power of the decorator pattern – you add behavior without modifying existing code.


Running the tests

Run the Chapter 18 tests:

cargo test -p mini-claw-code ch18

What the tests verify

PathValidator:

  • test_ch18_path_within_allowed: A file inside the allowed directory is accepted.
  • test_ch18_path_outside_allowed: /etc/passwd is rejected when the allowed directory is a temp dir.
  • test_ch18_path_traversal_blocked: A path like allowed/sub/../../../etc/passwd is rejected after canonicalization.
  • test_ch18_path_new_file_in_allowed: A file that does not exist yet but whose parent is inside the allowed directory is accepted.
  • test_ch18_safety_check_read_tool: The SafetyCheck impl correctly checks paths for the read tool.
  • test_ch18_safety_check_ignores_bash: The PathValidator ignores the bash tool (no "path" argument).

CommandFilter:

  • test_ch18_command_filter_blocks_rm_rf: rm -rf / and rm -rf /* are blocked.
  • test_ch18_command_filter_blocks_sudo: sudo rm file matches the sudo * pattern.
  • test_ch18_command_filter_allows_safe: ls -la, echo hello, and cargo test pass through.
  • test_ch18_command_filter_safety_check: The SafetyCheck impl blocks sudo reboot via the bash tool and allows echo safe.
  • test_ch18_custom_blocked_commands: Custom patterns like docker rm * and npm publish* work correctly.

ProtectedFileCheck:

  • test_ch18_protected_file_blocks_env: Writing to .env or .env.local is blocked.
  • test_ch18_protected_file_allows_normal: Writing to src/main.rs is allowed.

SafeToolWrapper:

  • test_ch18_wrapper_blocks_on_check_failure: A wrapped ReadTool returns a "safety check failed" message when the path is outside the allowed directory.
  • test_ch18_wrapper_allows_valid_call: A wrapped ReadTool successfully reads a file inside the allowed directory, proving the decorator delegates correctly.

Defense in depth

No single check catches everything. That is the point of layered security. Consider what happens when the LLM asks to write to /home/user/project/.env:

  1. PathValidator – checks if the path is inside the allowed directory. If the allowed directory is /home/user/project, this passes. The path is technically inside the project.
  2. ProtectedFileCheck – catches it. .env matches the protected pattern. The write is blocked.
  3. CommandFilter – does not apply. This is a write tool call, not bash.

Now consider rm -rf / via the bash tool:

  1. PathValidator – does not apply. bash has no "path" argument.
  2. ProtectedFileCheck – does not apply. This is not a write or edit.
  3. CommandFilter – catches it. The command matches rm -rf /.

And a path traversal attack via read:

  1. PathValidator – catches it. Canonicalization resolves the .. segments and the path ends up outside the allowed directory.
  2. The other checks never need to fire.

Each layer covers a different attack surface. Together they form a mesh that is much harder to slip through than any single check. This is the principle of defense in depth – do not rely on one gatekeeper; stack them.

Limitations

This is a tutorial implementation. A production safety system would also need:

  • Confirmation prompts for destructive but non-blocked operations (e.g. deleting a file within the project).
  • Rate limiting to prevent an LLM from making thousands of tool calls.
  • Regex-based command filtering for more precise matching than globs allow.
  • Audit logging so you can review every tool call after the fact.
  • Sandboxing (containers, VMs) as the ultimate backstop.

But the architecture you built here – a trait-based system of composable checks wired through a decorator – is exactly the right foundation. Adding more checks is just implementing one more SafetyCheck.

Recap

You built a safety layer with four components:

TypePurposeApplies to
SafetyCheck traitCommon interfaceAll checks
PathValidatorPrevent directory traversalread, write, edit
CommandFilterBlock dangerous commandsbash
ProtectedFileCheckGuard sensitive fileswrite, edit
SafeToolWrapperDecorator that runs checksAny Box<dyn Tool>

The key patterns:

  • Canonicalize before comparing – never trust raw path strings.
  • Glob matching – flexible pattern-based filtering for both commands and file paths.
  • Decorator pattern – wrap a trait object with additional behavior without modifying the original.
  • Defense in depth – layer independent checks so no single bypass defeats the entire system.

Your agent is no longer a terrifying root-access footgun. It still has power, but now that power flows through safety rails that you control.

Chapter 19: Permissions

If you’ve used Claude Code, you’ve seen this prompt:

  Claude wants to use bash:
    command: git status

  Allow? (y/n/always)

The agent doesn’t just run every tool call blindly. Before executing, it checks a permission system to decide: should this tool call proceed automatically, be blocked outright, or require user approval?

This is the permission system. Three possible decisions:

  • Allow – execute immediately, no questions asked.
  • Deny – block the call, return an error to the LLM.
  • Ask – pause and prompt the user for approval.

In this chapter you’ll build:

  1. A Permission enum with the three decisions.
  2. A PermissionRule that matches tool names using glob patterns.
  3. A PermissionEngine that evaluates rules in order, supports a default fallback, and remembers session-level overrides.

Why permissions?

Chapter 18 introduced safety rails – SafeToolWrapper blocks dangerous arguments (path traversal, rm -rf /) based on static checks. But safety checks are binary: pass or fail. They can’t express “this tool is fine for reading, but I want to approve writes.”

Permissions add a human-in-the-loop layer. A typical configuration might look like:

ToolPermission
readAllow
bashAsk
writeAsk
editAsk
mcp__*Deny
(default)Ask

The read tool runs freely. bash, write, and edit require approval. Any MCP tool is blocked entirely. Anything else falls through to the default: ask the user.

The Permission enum

Three variants, nothing more:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq)]
pub enum Permission {
    /// Tool call is allowed without asking.
    Allow,
    /// Tool call is blocked without asking.
    Deny,
    /// User must be prompted for approval.
    Ask,
}
}

PartialEq lets tests assert on decisions. Clone is needed because evaluate() returns owned values (you’ll see why shortly).

PermissionRule

A rule pairs a glob pattern with a permission:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct PermissionRule {
    /// Glob pattern matching tool names (e.g. "bash", "write", "*").
    pub tool_pattern: String,
    /// The permission to assign when the pattern matches.
    pub permission: Permission,
}
}

The matches() method checks whether a tool name matches the rule’s pattern:

#![allow(unused)]
fn main() {
impl PermissionRule {
    pub fn new(tool_pattern: impl Into<String>, permission: Permission) -> Self {
        Self {
            tool_pattern: tool_pattern.into(),
            permission,
        }
    }

    /// Check if this rule matches a tool name.
    pub fn matches(&self, tool_name: &str) -> bool {
        if let Ok(pattern) = glob::Pattern::new(&self.tool_pattern) {
            pattern.matches(tool_name)
        } else {
            self.tool_pattern == tool_name
        }
    }
}
}

Glob patterns give you flexible matching:

  • "bash" – matches exactly bash.
  • "*" – matches everything (a catch-all rule).
  • "mcp__*" – matches any MCP tool (mcp__fs__read, mcp__git__status, etc.).

If the pattern string is invalid as a glob, matches() falls back to exact string comparison. This means plain tool names always work even if the glob crate can’t parse them.

PermissionEngine

The engine holds an ordered list of rules, a default permission, and a set of session-level overrides:

#![allow(unused)]
fn main() {
pub struct PermissionEngine {
    rules: Vec<PermissionRule>,
    default_permission: Permission,
    /// Session-level overrides (tool calls the user has already approved).
    session_allows: std::collections::HashSet<String>,
}
}

Construction

Three constructors cover the common cases:

#![allow(unused)]
fn main() {
impl PermissionEngine {
    pub fn new(rules: Vec<PermissionRule>, default_permission: Permission) -> Self {
        Self {
            rules,
            default_permission,
            session_allows: std::collections::HashSet::new(),
        }
    }

    /// Create an engine that asks for everything by default.
    pub fn ask_by_default(rules: Vec<PermissionRule>) -> Self {
        Self::new(rules, Permission::Ask)
    }

    /// Create an engine that allows everything (no permission checks).
    pub fn allow_all() -> Self {
        Self::new(vec![], Permission::Allow)
    }
}
}

allow_all() is useful during development or in trusted environments. ask_by_default() is the safe default – if a tool doesn’t match any rule, the user gets prompted.

The evaluate() method – your exercise

This is the core of the engine. Given a tool name and its arguments, return the permission decision.

The evaluation order is:

  1. Session overrides first. If the user already approved this tool during the current session, return Allow.
  2. Rules in order. Walk the rules list. The first rule whose pattern matches the tool name wins – return its permission.
  3. Default. If no rule matches, return the default permission.

Here is the signature:

#![allow(unused)]
fn main() {
/// Evaluate permission for a tool call.
///
/// Returns the permission decision. If the result is `Ask`, the caller
/// should prompt the user and then call `record_session_allow` if approved.
pub fn evaluate(&self, tool_name: &str, _args: &Value) -> Permission {
    todo!()
}
}

The _args parameter is reserved for future use – argument-level rules (e.g. “allow bash only for cargo test”) are a natural extension, but we won’t implement them here.

Implement evaluate() using the three-step logic above. The rest of this section shows the solution.

Solution

#![allow(unused)]
fn main() {
pub fn evaluate(&self, tool_name: &str, _args: &Value) -> Permission {
    // Check session-level overrides first
    if self.session_allows.contains(tool_name) {
        return Permission::Allow;
    }

    // Check rules in order
    for rule in &self.rules {
        if rule.matches(tool_name) {
            return rule.permission.clone();
        }
    }

    self.default_permission.clone()
}
}

Three things to note:

  1. Session overrides take priority over rules. Even if a rule says Ask for bash, a session override makes it Allow. This is intentional – when the user says “always allow” for a session, we honor that.
  2. First match wins. If two rules match the same tool, the first one in the list is used. This is the same precedence model used by firewalls, .gitignore, and most rule-based systems.
  3. clone() on the return. Permission is a simple enum, so cloning is cheap. We clone rather than returning a reference because the caller often needs to match on the owned value.

First-match semantics

The “first match wins” rule is important. Consider:

#![allow(unused)]
fn main() {
let rules = vec![
    PermissionRule::new("bash", Permission::Allow),
    PermissionRule::new("bash", Permission::Deny),  // never reached
];
let engine = PermissionEngine::new(rules, Permission::Ask);

assert_eq!(engine.evaluate("bash", &json!({})), Permission::Allow);
}

The second rule is dead code. This lets you put specific rules before broad ones:

#![allow(unused)]
fn main() {
let rules = vec![
    PermissionRule::new("read", Permission::Allow),   // specific
    PermissionRule::new("*", Permission::Ask),         // catch-all
];
}

read gets Allow. Everything else falls through to the wildcard and gets Ask.

Session-level overrides

When the user responds “always allow” (or just “y”) to a permission prompt, you don’t want to ask again for the same tool in the same session. The engine tracks this with a HashSet<String>:

#![allow(unused)]
fn main() {
/// Record that the user approved a tool for this session.
pub fn record_session_allow(&mut self, tool_name: &str) {
    self.session_allows.insert(tool_name.to_string());
}
}

The typical flow in an agent loop:

#![allow(unused)]
fn main() {
let permission = engine.evaluate("bash", &args);
match permission {
    Permission::Allow => { /* execute */ }
    Permission::Deny => { /* return error to LLM */ }
    Permission::Ask => {
        if user_approves() {
            engine.record_session_allow("bash");
            // execute
        } else {
            // return error to LLM
        }
    }
}
}

After record_session_allow("bash"), every subsequent evaluate("bash", ...) returns Allow immediately – the session override is checked before rules.

Note that session overrides are per-tool, not global:

#![allow(unused)]
fn main() {
let mut engine = PermissionEngine::ask_by_default(vec![]);
engine.record_session_allow("read");

assert_eq!(engine.evaluate("read", &json!({})), Permission::Allow);
assert_eq!(engine.evaluate("write", &json!({})), Permission::Ask); // still asks
}

Approving read doesn’t approve write. Each tool must be approved individually.

Convenience methods

Two helpers reduce boilerplate at call sites:

#![allow(unused)]
fn main() {
/// Check if a tool is allowed (returns true for Allow, false for Deny/Ask).
pub fn is_allowed(&self, tool_name: &str, args: &Value) -> bool {
    matches!(self.evaluate(tool_name, args), Permission::Allow)
}

/// Check if a tool requires user approval.
pub fn needs_approval(&self, tool_name: &str, args: &Value) -> bool {
    matches!(self.evaluate(tool_name, args), Permission::Ask)
}
}

These are useful when you need a boolean check rather than a full match:

#![allow(unused)]
fn main() {
if engine.is_allowed("read", &args) {
    // fast path, no prompt needed
}
}

Composing with SafeToolWrapper and InputHandler

Permissions, safety checks, and user input are three independent layers that compose naturally. Here is how they fit together in an agent loop:

Tool call arrives
  |
  v
PermissionEngine::evaluate()
  |-- Allow --> SafeToolWrapper::call()
  |               |-- safety check passes --> inner tool executes
  |               |-- safety check fails  --> error returned to LLM
  |
  |-- Deny  --> error returned to LLM
  |
  |-- Ask   --> InputHandler::ask("Allow bash?", &["yes", "no"])
                  |-- user says yes --> record_session_allow() + execute
                  |-- user says no  --> error returned to LLM

Permissions decide whether to run. Safety checks (Ch18) validate how the tool is called. The InputHandler (Ch11) collects the user’s answer when permission is Ask.

In code, this might look like:

#![allow(unused)]
fn main() {
let permission = engine.evaluate(&call.name, &call.arguments);
match permission {
    Permission::Allow => {
        // SafeToolWrapper handles safety checks internally
        let result = tools.call(&call.name, call.arguments.clone()).await?;
        results.push((call.id.clone(), result));
    }
    Permission::Deny => {
        results.push((
            call.id.clone(),
            format!("error: tool '{}' is not permitted", call.name),
        ));
    }
    Permission::Ask => {
        let answer = input_handler
            .ask(
                &format!("Allow {} tool?", call.name),
                &["yes".into(), "no".into()],
            )
            .await?;
        if answer == "yes" {
            engine.record_session_allow(&call.name);
            let result = tools.call(&call.name, call.arguments.clone()).await?;
            results.push((call.id.clone(), result));
        } else {
            results.push((
                call.id.clone(),
                format!("error: user denied tool '{}'", call.name),
            ));
        }
    }
}
}

Each layer is optional. You can use permissions without safety checks, safety checks without permissions, or all three together. This is the benefit of composable design – each piece does one job.

Wiring it up

Add the module to mini-claw-code/src/lib.rs:

#![allow(unused)]
fn main() {
pub mod permissions;
// ...
pub use permissions::{Permission, PermissionEngine, PermissionRule};
}

Running the tests

cargo test -p mini-claw-code ch19

The tests verify:

  • allow_all: PermissionEngine::allow_all() returns Allow for any tool.
  • ask_by_default: engine with no rules and Ask default returns Ask.
  • Rule matching: explicit rules for read, bash, write each return the correct permission.
  • Glob pattern: "mcp__*" matches mcp__fs__read but not read.
  • First rule wins: duplicate rules for bash – the first one wins.
  • Session allow: after record_session_allow("bash"), evaluate("bash") returns Allow.
  • Session allow per tool: approving read does not approve write.
  • is_allowed: returns true only for Allow, false for Deny and Ask.
  • needs_approval: returns true only for Ask.
  • Wildcard rule: "*" matches any tool name.
  • Deny overrides default: a Deny rule takes precedence over an Allow default.

Recap

  • Permission has three variants: Allow, Deny, Ask. Simple and exhaustive.
  • PermissionRule pairs a glob pattern with a permission decision. Glob matching supports wildcards for tool families like mcp__*.
  • PermissionEngine evaluates rules in order – first match wins. When no rule matches, the default permission applies.
  • Session overrides let the user approve a tool once and skip the prompt for the rest of the session. They take priority over rules.
  • Composable: permissions layer on top of SafeToolWrapper (Ch18) and InputHandler (Ch11) without coupling to either.
  • Purely additive: no changes to existing tools, agents, or safety checks.

Chapter 20: Hooks

Your agent can run tools, stream responses, ask the user questions, and plan before acting. But every new behavior – logging, auditing, blocking dangerous commands, running shell scripts on tool events – requires touching the agent loop directly. That does not scale.

Claude Code solves this with hooks: 12+ event types that let users and extensions inject custom behavior at key points without rebuilding the agent. Want to log every tool call? Register a hook. Want to block bash in production? Register a hook. Want to run a linter after every file write? Register a hook. The agent itself does not change.

In this chapter you will walk through:

  1. A HookEvent enum for the events hooks respond to.
  2. A HookAction enum for what hooks tell the agent to do.
  3. A Hook trait – the async interface every hook implements.
  4. A HookRegistry that stores hooks and dispatches events.
  5. Three built-in hooks: LoggingHook, BlockingHook, and ShellHook.
  6. How hooks integrate with the agent loop.

The event model

Open mini-claw-code/src/hooks.rs. At the top you will find two enums that define the vocabulary between hooks and the agent.

HookEvent

HookEvent describes what happened:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub enum HookEvent {
    /// Before a tool is executed.
    PreToolCall {
        tool_name: String,
        args: Value,
    },
    /// After a tool finishes executing.
    PostToolCall {
        tool_name: String,
        args: Value,
        result: String,
    },
    /// The agent is starting a new run.
    AgentStart {
        prompt: String,
    },
    /// The agent finished with a final response.
    AgentEnd {
        response: String,
    },
}
}

Four variants, each carrying the data a hook might need:

  • PreToolCall fires before a tool runs. It carries the tool name and the arguments the LLM chose. A hook can inspect these, log them, or decide to block the call entirely.
  • PostToolCall fires after a tool completes. It adds the result string so hooks can audit what happened.
  • AgentStart fires once when the agent begins a new run, carrying the user’s prompt.
  • AgentEnd fires once when the agent produces its final response.

This gives hooks four natural insertion points: two per tool call (before and after), plus the boundaries of the entire run.

HookAction

HookAction describes what should happen next:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq)]
pub enum HookAction {
    /// Continue normally.
    Continue,
    /// Block the tool call with a reason.
    Block(String),
    /// Modify the tool arguments (PreToolCall only).
    ModifyArgs(Value),
}
}

Three options:

  • Continue – do nothing special, proceed as normal.
  • Block(reason) – abort the tool call. The reason string becomes the tool result so the LLM knows what happened and can adjust.
  • ModifyArgs(new_args) – replace the tool arguments before execution. This only makes sense for PreToolCall events (you cannot retroactively change args after the tool ran).

The combination of HookEvent and HookAction is the entire contract. Hooks receive events and return actions. Nothing more.

The Hook trait

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait Hook: Send + Sync {
    /// Handle an event and return an action.
    async fn on_event(&self, event: &HookEvent) -> HookAction;
}
}

One method. It takes an immutable reference to a HookEvent and returns a HookAction. The trait requires Send + Sync because hooks live inside the agent, which may be shared across threads (e.g. wrapped in Arc for TUI apps).

The method is async because some hooks need I/O – ShellHook spawns a child process, and future hooks might call HTTP endpoints. But simple hooks like LoggingHook just push to a Vec and return immediately.

HookRegistry

Individual hooks are useful, but you typically want multiple hooks active at once – a logger and a blocker and a shell script. HookRegistry manages the collection:

#![allow(unused)]
fn main() {
pub struct HookRegistry {
    hooks: Vec<Box<dyn Hook>>,
}
}

It stores hooks as trait objects in registration order. register() takes &mut self for imperative use. with() takes self and returns it for builder-pattern chaining:

#![allow(unused)]
fn main() {
let registry = HookRegistry::new()
    .with(LoggingHook::new())
    .with(BlockingHook::new(vec!["bash".into()], "blocked"));
}

There is also is_empty() so the agent loop can skip dispatch entirely when no hooks are registered – a minor optimization, but a nice one.

Dispatch logic

The heart of the registry is dispatch():

#![allow(unused)]
fn main() {
pub async fn dispatch(&self, event: &HookEvent) -> HookAction {
    let mut modified_args: Option<Value> = None;

    for hook in &self.hooks {
        match hook.on_event(event).await {
            HookAction::Continue => {}
            HookAction::Block(reason) => return HookAction::Block(reason),
            HookAction::ModifyArgs(new_args) => {
                modified_args = Some(new_args);
            }
        }
    }

    match modified_args {
        Some(args) => HookAction::ModifyArgs(args),
        None => HookAction::Continue,
    }
}
}

Three rules govern dispatch:

  1. Iterate in order. Hooks fire in the order they were registered. Registration order is your priority system.

  2. Short-circuit on Block. The moment any hook returns Block, dispatch stops immediately and returns that Block. Hooks registered after the blocking hook never see the event. This is important for correctness – if a security hook blocks bash, a logging hook registered later should not log a call that never happened.

  3. Collect ModifyArgs. If multiple hooks modify args, the last one wins (each overwrites modified_args). If no hook blocked and at least one modified args, ModifyArgs is returned. If nobody did anything, Continue is returned.

This gives you a clean priority chain: blocking hooks should be registered before logging hooks so they can short-circuit first.

Built-in hooks

The module provides three hooks out of the box. They cover the most common patterns and serve as examples for writing your own.

LoggingHook

#![allow(unused)]
fn main() {
pub struct LoggingHook {
    log: std::sync::Mutex<Vec<String>>,
}
}

LoggingHook records a one-line summary of every event into a Vec<String>. Its on_event formats each variant into a compact tag – "pre:bash", "post:read", "agent:start", "agent:end" – pushes it into the vec behind the mutex, and returns Continue. Logging is observation, not intervention.

The messages() method clones and returns the accumulated log.

Notice this uses std::sync::Mutex, not tokio::sync::Mutex. The lock is held only long enough to push a string or clone the vec – no .await inside the critical section. A std::sync::Mutex is cheaper than a tokio::sync::Mutex for these short, synchronous operations. Compare this with MockInputHandler from Chapter 11, which needed tokio::sync::Mutex because its lock guard was held across an .await boundary.

LoggingHook is particularly useful in tests. Register it alongside other hooks, run the agent, and then inspect messages() to verify exactly which events fired and in what order.

BlockingHook

#![allow(unused)]
fn main() {
pub struct BlockingHook {
    blocked_tools: Vec<String>,
    reason: String,
}
}

BlockingHook takes a list of tool names and a reason string. If a PreToolCall event matches any blocked tool, it returns Block:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl Hook for BlockingHook {
    async fn on_event(&self, event: &HookEvent) -> HookAction {
        if let HookEvent::PreToolCall { tool_name, .. } = event
            && self.blocked_tools.iter().any(|b| b == tool_name)
        {
            return HookAction::Block(self.reason.clone());
        }
        HookAction::Continue
    }
}
}

This uses a let-chain (same syntax as resolve_option in Chapter 11): the if let pattern match and the .any() check are joined with &&. If either condition fails, the hook returns Continue.

Use this for safety rails. For example, block bash in a read-only review mode:

#![allow(unused)]
fn main() {
let registry = HookRegistry::new()
    .with(BlockingHook::new(
        vec!["bash".into(), "write".into(), "edit".into()],
        "read-only mode: mutation tools are disabled",
    ));
}

The LLM receives the reason string as the tool result, so it knows why the call was blocked and can adapt its approach.

ShellHook

#![allow(unused)]
fn main() {
pub struct ShellHook {
    command: String,
    tool_pattern: Option<glob::Pattern>,
}
}

ShellHook runs a shell command whenever a tool event fires. It is the escape hatch: anything you can do in a shell script, you can do in a hook.

The for_tool() builder method restricts the hook to tools matching a glob pattern. Without it, the hook fires on every tool event. With it, only matching tool names trigger the command. The glob crate provides Unix-style pattern matching – "write*" would match write and write_file, "*" matches everything.

The Hook implementation only responds to PreToolCall and PostToolCall events (it ignores AgentStart and AgentEnd). It extracts the tool name, checks matches_tool(), then spawns the command with tokio::process::Command::new("sh").arg("-c").arg(&self.command):

#![allow(unused)]
fn main() {
match result {
    Ok(output) => {
        if output.status.success() {
            HookAction::Continue
        } else {
            let stderr = String::from_utf8_lossy(&output.stderr).to_string();
            HookAction::Block(format!("hook failed: {stderr}"))
        }
    }
    Err(e) => HookAction::Block(format!("hook error: {e}")),
}
}

If the command succeeds (exit code 0), the hook returns Continue. If it fails, the hook returns Block with the stderr output. This means a ShellHook can act as a gate: run a linter after a file write, and block the result if the linter fails.

Example – run cargo fmt --check after every write or edit:

#![allow(unused)]
fn main() {
let registry = HookRegistry::new()
    .with(ShellHook::new("cargo fmt --check").for_tool("write"))
    .with(ShellHook::new("cargo fmt --check").for_tool("edit"));
}

Integrating with the agent loop

Hooks are designed to sit at two points in the agent loop: before and after tool execution. Here is how the dispatch points look conceptually in a hook-aware agent:

#![allow(unused)]
fn main() {
for call in &turn.tool_calls {
    // 1. Dispatch PreToolCall
    let pre_action = registry.dispatch(&HookEvent::PreToolCall {
        tool_name: call.name.clone(),
        args: call.arguments.clone(),
    }).await;

    let result = match pre_action {
        HookAction::Block(reason) => reason, // skip the tool entirely
        HookAction::ModifyArgs(new_args) => {
            tool.call(new_args).await.unwrap_or_else(|e| format!("error: {e}"))
        }
        HookAction::Continue => {
            tool.call(call.arguments.clone()).await.unwrap_or_else(|e| format!("error: {e}"))
        }
    };

    // 2. Dispatch PostToolCall
    registry.dispatch(&HookEvent::PostToolCall {
        tool_name: call.name.clone(),
        args: call.arguments.clone(),
        result: result.clone(),
    }).await;
}
}

The pattern is:

  1. Before execution: dispatch PreToolCall. If the action is Block, skip the tool entirely and use the reason as the result. If ModifyArgs, execute with the new args. If Continue, execute normally.

  2. After execution: dispatch PostToolCall with the result. The return action is typically Continue (you cannot undo a tool call), but hooks can still log, audit, or trigger side effects.

  3. Run boundaries: dispatch AgentStart at the beginning of run() and AgentEnd when the agent produces its final response.

The existing SimpleAgent and StreamingAgent do not have hooks wired in – this is an extension point you would add when building a production agent. The HookRegistry is intentionally separate so you can compose it into whatever agent architecture you have.

Tests

Run the tests with:

cargo test -p mini-claw-code ch20

The tests verify each component in isolation, then test composition:

  • LoggingHook: fires a single PreToolCall and checks messages() == ["pre:bash"]. A second test fires all four event types and asserts the log matches ["agent:start", "pre:read", "post:read", "agent:end"].
  • BlockingHook: PreToolCall for a blocked tool returns Block("bash is disabled"); the same hook returns Continue for read.
  • Registry dispatch: a registry with only LoggingHook returns Continue. Adding a BlockingHook produces Block for the targeted tool.
  • Multiple hooks: two LoggingHooks both see the event (both logs have length 1).
  • Short-circuit: the most important test. A BlockingHook is registered first, a LoggingHook second:
#![allow(unused)]
fn main() {
let registry = HookRegistry::new()
    .with(BlockingHook::new(vec!["bash".into()], "blocked"))
    .with(ArcHook(log.clone()));

let action = registry.dispatch(&event).await;
assert_eq!(action, HookAction::Block("blocked".into()));

// The second hook should NOT have been called
assert_eq!(log.messages().len(), 0);
}

The logger never saw the event – Block stopped iteration. Registration order matters.

  • PostToolCall: LoggingHook correctly logs "post:write".
  • is_empty: an empty registry returns true; adding a hook flips it to false.

The observer/middleware pattern

If you have worked with web frameworks, hooks will feel familiar. They implement two overlapping patterns:

  • Observer pattern: hooks observe events without affecting them. LoggingHook is a pure observer – it watches everything and changes nothing.

  • Middleware pattern: hooks can intercept and modify the pipeline. BlockingHook short-circuits execution. ModifyArgs rewrites the request before it reaches the tool. This is middleware.

The HookRegistry is a middleware chain with observer capabilities. The dispatch loop is the pipeline, Block is early return, and ModifyArgs is request transformation.

This design keeps the agent loop clean. Instead of scattering if statements for every new behavior, you register hooks. The agent loop just calls dispatch() at two points and obeys the returned action. New behaviors are added by implementing Hook, not by modifying the agent.

Recap

  • HookEvent represents four lifecycle points: PreToolCall, PostToolCall, AgentStart, AgentEnd.
  • HookAction gives hooks three options: Continue, Block, or ModifyArgs.
  • Hook trait has a single async method: on_event.
  • HookRegistry dispatches events to hooks in order, short-circuiting on Block and collecting ModifyArgs.
  • LoggingHook records events for inspection – ideal for testing.
  • BlockingHook blocks specific tools by name – ideal for safety rails.
  • ShellHook runs arbitrary shell commands on tool events – the escape hatch for anything else.
  • Hooks follow the observer/middleware pattern: observe without changing, or intercept and modify the pipeline.
  • The agent loop stays clean – just call dispatch() before and after tool execution and obey the returned action.

Chapter 21: MCP – Model Context Protocol

Your agent has tools – read, write, bash, subagents – but they are all compiled into the binary. What happens when someone wants to give your agent access to a database, a Kubernetes cluster, or a Slack workspace?

You could write a Tool implementation for each one. That doesn’t scale. Every integration means new code, a new release, tight coupling.

MCP (Model Context Protocol) solves this. It is an open standard created by Anthropic that lets AI agents discover and use tools from external server processes. Claude Code uses MCP. Cursor uses MCP. There are hundreds of community MCP servers for everything from GitHub to PostgreSQL.

The idea: spawn a separate process that speaks JSON-RPC over stdio. Your agent asks “what tools do you have?”, gets back definitions, and calls them like any other tool. The server handles the integration. Your agent just speaks the protocol.

In this chapter you will:

  1. Understand the MCP protocol: JSON-RPC 2.0 over stdio, the handshake sequence, and the tool lifecycle.
  2. Define the protocol types: JsonRpcRequest, JsonRpcResponse, McpToolDef.
  3. Build McpClient: spawn a child process, perform the handshake, list tools, and call them.
  4. Implement McpTool: a wrapper that bridges MCP tools into the Tool trait so the agent loop handles them transparently.
  5. Wire it into the config system with McpServerConfig.

This is the capstone chapter. When you finish, your agent will be able to connect to any MCP server and use its tools – the same way the real Claude Code does.

The protocol

MCP uses JSON-RPC 2.0 over stdio. The client (your agent) spawns the server as a child process, writes JSON to its stdin, and reads JSON from its stdout. Each message is a single line of JSON terminated by a newline.

The lifecycle has three phases:

Client                          Server
  |                               |
  |--- initialize --------------->|   Phase 1: Handshake
  |<-- initialize result ---------|
  |--- notifications/initialized ->|
  |                               |
  |--- tools/list --------------->|   Phase 2: Discovery
  |<-- tools list ----------------|
  |                               |
  |--- tools/call --------------->|   Phase 3: Execution
  |<-- tool result ---------------|
  |          ...                  |

Phase 1: Handshake. The client sends initialize with its protocol version and capabilities. The server responds. The client sends notifications/initialized to signal completion.

Phase 2: Discovery. tools/list returns tool definitions – name, description, and JSON Schema for input parameters.

Phase 3: Execution. tools/call sends a tool name and arguments. The server executes and returns the result.

Every request is {"jsonrpc": "2.0", "id": 1, "method": "...", "params": {...}}. Responses carry either "result" or "error". That’s the entire protocol surface we need. MCP has more features (resources, prompts, sampling), but tools are the core.

Protocol types

Create mini-claw-code/src/mcp/types.rs. These types map directly to the JSON-RPC wire format.

#![allow(unused)]
fn main() {
use serde::{Deserialize, Serialize};
use serde_json::Value;

#[derive(Serialize)]
pub(crate) struct JsonRpcRequest {
    pub jsonrpc: &'static str,
    pub id: u64,
    pub method: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub params: Option<Value>,
}

impl JsonRpcRequest {
    pub fn new(id: u64, method: impl Into<String>, params: Option<Value>) -> Self {
        Self {
            jsonrpc: "2.0",
            id,
            method: method.into(),
            params,
        }
    }
}
}

jsonrpc is always "2.0" – no allocation. params uses skip_serializing_if because JSON-RPC omits the field when absent. id is a monotonically increasing u64 for matching responses to requests.

The response side:

#![allow(unused)]
fn main() {
#[derive(Deserialize)]
pub(crate) struct JsonRpcResponse {
    pub jsonrpc: String,
    pub id: u64,
    pub result: Option<Value>,
    pub error: Option<JsonRpcError>,
}

#[derive(Deserialize, Debug)]
pub(crate) struct JsonRpcError {
    pub code: i64,
    pub message: String,
}
}

And the MCP-specific types:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Deserialize)]
pub struct McpToolDef {
    pub name: String,
    #[serde(default)]
    pub description: Option<String>,
    #[serde(rename = "inputSchema", default)]
    pub input_schema: Option<Value>,
}

#[derive(Deserialize)]
pub(crate) struct InitializeResult {
    pub capabilities: Option<Value>,
}

#[derive(Deserialize)]
pub(crate) struct ToolsListResult {
    pub tools: Vec<McpToolDef>,
}

#[derive(Deserialize)]
pub(crate) struct ToolCallResult {
    pub content: Vec<ToolCallContent>,
}

#[derive(Deserialize)]
pub(crate) struct ToolCallContent {
    #[serde(rename = "type")]
    pub type_: Option<String>,
    pub text: Option<String>,
}
}

McpToolDef is what the server returns from tools/list. The inputSchema field uses camelCase on the wire (MCP convention), so we rename it with serde. Both description and input_schema are optional – a minimal tool only needs a name.

ToolCallResult returns an array of content blocks (similar to Claude’s API). Each block has a type (usually "text") and a text field. We will extract and join the text blocks to produce a single string.

Building McpClient

Create mini-claw-code/src/mcp/client.rs. The McpClient manages a child process and speaks JSON-RPC to it.

#![allow(unused)]
fn main() {
use std::sync::atomic::{AtomicU64, Ordering};

use anyhow::Context;
use serde_json::Value;
use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
use tokio::process::{Child, Command};
use tokio::sync::Mutex;

pub struct McpClient {
    stdin: Mutex<tokio::process::ChildStdin>,
    stdout: Mutex<BufReader<tokio::process::ChildStdout>>,
    _child: Mutex<Child>,
    next_id: AtomicU64,
    server_name: String,
}
}

Why Mutex? Stdin and stdout are not Clone. We need shared access (McpTool holds an Arc<McpClient>), so we wrap them in tokio::sync::Mutex. The _child field holds ownership of the process so it doesn’t get dropped. AtomicU64 gives us lock-free request IDs.

Connecting and handshaking

The connect constructor spawns the process and performs the handshake:

#![allow(unused)]
fn main() {
impl McpClient {
    pub async fn connect(
        server_name: impl Into<String>,
        command: &str,
        args: &[String],
    ) -> anyhow::Result<Self> {
        let server_name = server_name.into();
        let mut child = Command::new(command)
            .args(args)
            .stdin(std::process::Stdio::piped())
            .stdout(std::process::Stdio::piped())
            .stderr(std::process::Stdio::null())
            .spawn()
            .with_context(|| format!("failed to spawn MCP server: {command}"))?;

        let stdin = child.stdin.take().context("failed to get stdin")?;
        let stdout = child.stdout.take().context("failed to get stdout")?;
        let client = Self {
            stdin: Mutex::new(stdin),
            stdout: Mutex::new(BufReader::new(stdout)),
            _child: Mutex::new(child),
            next_id: AtomicU64::new(1),
            server_name,
        };
        client.initialize().await?;
        Ok(client)
    }
}
}

We use tokio::process::Command for async I/O. Stderr goes to null – MCP servers communicate exclusively over stdout. The initialize method sends the two-part handshake:

#![allow(unused)]
fn main() {
async fn initialize(&self) -> anyhow::Result<()> {
    let params = serde_json::json!({
        "protocolVersion": "2024-11-05",
        "capabilities": {},
        "clientInfo": { "name": "mini-claw-code", "version": "0.1.0" }
    });

    let result = self.request("initialize", Some(params)).await?;
    let _: InitializeResult = serde_json::from_value(result)
        .context("failed to parse initialize response")?;

    // Send initialized notification
    let id = self.next_id.fetch_add(1, Ordering::Relaxed);
    let notification = JsonRpcRequest::new(id, "notifications/initialized", None);
    let mut payload = serde_json::to_string(&notification)?;
    payload.push('\n');

    let mut stdin = self.stdin.lock().await;
    stdin.write_all(payload.as_bytes()).await?;
    stdin.flush().await?;

    Ok(())
}
}

First initialize – a request-response pair. Then notifications/initialized – technically a notification, but we format it as a request for simplicity. The core method driving all communication:

#![allow(unused)]
fn main() {
async fn request(&self, method: &str, params: Option<Value>) -> anyhow::Result<Value> {
    let id = self.next_id.fetch_add(1, Ordering::Relaxed);
    let request = JsonRpcRequest::new(id, method, params);
    let mut payload = serde_json::to_string(&request)?;
    payload.push('\n');

    {
        let mut stdin = self.stdin.lock().await;
        stdin.write_all(payload.as_bytes()).await
            .context("failed to write to MCP server")?;
        stdin.flush().await
            .context("failed to flush MCP server stdin")?;
    }

    let mut line = String::new();
    {
        let mut stdout = self.stdout.lock().await;
        loop {
            line.clear();
            let bytes_read = stdout.read_line(&mut line).await
                .context("failed to read from MCP server")?;
            if bytes_read == 0 {
                anyhow::bail!("MCP server closed stdout unexpectedly");
            }
            let trimmed = line.trim();
            if trimmed.is_empty() { continue; }
            if let Ok(resp) = serde_json::from_str::<JsonRpcResponse>(trimmed) {
                if let Some(error) = resp.error {
                    anyhow::bail!("MCP server error ({}): {}", error.code, error.message);
                }
                return Ok(resp.result.unwrap_or(Value::Null));
            }
            // Not a valid response -- skip (could be a notification)
        }
    }
}
}

The read loop skips notifications and blank lines. The scope blocks drop the stdin lock before acquiring stdout, preventing deadlocks. With request() in place, the public methods are short:

#![allow(unused)]
fn main() {
pub async fn list_tools(&self) -> anyhow::Result<Vec<McpToolDef>> {
    let result = self.request("tools/list", None).await?;
    let list: ToolsListResult =
        serde_json::from_value(result).context("failed to parse tools/list")?;
    Ok(list.tools)
}

pub async fn call_tool(&self, name: &str, arguments: Value) -> anyhow::Result<String> {
    let params = serde_json::json!({ "name": name, "arguments": arguments });
    let result = self.request("tools/call", Some(params)).await?;
    let call_result: ToolCallResult =
        serde_json::from_value(result).context("failed to parse tools/call")?;

    let text: Vec<String> = call_result.content.into_iter()
        .filter_map(|c| c.text)
        .collect();
    Ok(text.join("\n"))
}
}

call_tool extracts just the text content blocks and joins them with newlines – matching how our agent represents tool results as plain strings.

Converting MCP tools to ToolDefinition

There’s a gap between MCP’s McpToolDef (owned String fields) and our ToolDefinition (&'static str fields). The convert_tool_defs method bridges it:

#![allow(unused)]
fn main() {
pub fn convert_tool_defs(tools: &[McpToolDef], prefix: &str) -> Vec<ToolDefinition> {
    tools.iter().map(|t| {
        let name = format!("mcp__{prefix}__{}", t.name);
        let desc = t.description.clone()
            .unwrap_or_else(|| format!("MCP tool: {}", t.name));
        let params = t.input_schema.clone()
            .unwrap_or_else(|| serde_json::json!({"type": "object", "properties": {}}));

        // Leak strings for 'static lifetime (loaded once at startup)
        let name: &'static str = Box::leak(name.into_boxed_str());
        let desc: &'static str = Box::leak(desc.into_boxed_str());

        ToolDefinition { name, description: desc, parameters: params }
    }).collect()
}
}

Two important design decisions here:

The naming convention: mcp__servername__toolname. Double underscores separate the MCP prefix, server name, and tool name. A filesystem server named fs with a tool called read_file becomes mcp__fs__read_file. This prevents collisions between MCP servers and between MCP tools and built-in tools. Claude Code uses the exact same convention.

String leaking with Box::leak. Our ToolDefinition uses &'static str – a design choice from Chapter 1 that avoids lifetime parameters everywhere. MCP tool names are dynamically constructed, so they can’t be &'static str naturally. Box::leak converts an owned String by intentionally leaking the heap allocation.

Is this okay? Yes. MCP tools are loaded once at startup – typically dozens of strings. They live for the entire program duration anyway. This is a well-known Rust pattern for configuration data loaded once and never freed.

The McpTool wrapper

The agent works with the Tool trait. We need a struct that implements Tool and forwards calls to the MCP server. This goes in mini-claw-code/src/mcp/mod.rs:

#![allow(unused)]
fn main() {
pub(crate) mod client;
pub(crate) mod types;

pub use client::McpClient;
pub use types::McpToolDef;

use async_trait::async_trait;
use serde_json::Value;
use crate::types::{Tool, ToolDefinition};

pub struct McpTool {
    client: std::sync::Arc<McpClient>,
    definition: ToolDefinition,
    remote_name: String,
}

impl McpTool {
    pub fn new(
        client: std::sync::Arc<McpClient>,
        remote_name: String,
        definition: ToolDefinition,
    ) -> Self {
        Self { client, definition, remote_name }
    }
}

#[async_trait]
impl Tool for McpTool {
    fn definition(&self) -> &ToolDefinition {
        &self.definition
    }

    async fn call(&self, args: Value) -> anyhow::Result<String> {
        self.client.call_tool(&self.remote_name, args).await
    }
}
}

Arc<McpClient> gives shared ownership (multiple tools from one server share a client). definition is the mcp__server__tool name the LLM sees. remote_name is the original name the server expects. The Tool implementation is glue: definition() returns the local definition, call() forwards to client.call_tool() with the remote name.

Configuration

In Chapter 16 you built the config system. MCP servers slot right in with McpServerConfig in config.rs:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Deserialize)]
pub struct McpServerConfig {
    pub name: String,
    pub command: String,
    #[serde(default)]
    pub args: Vec<String>,
    #[serde(default)]
    pub env: std::collections::HashMap<String, String>,
}
}

In the config file:

[[mcp_servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@anthropic/mcp-filesystem-server", "/home/user/projects"]

[[mcp_servers]]
name = "github"
command = "npx"
args = ["-y", "@anthropic/mcp-github-server"]
env = { GITHUB_TOKEN = "ghp_..." }

At startup, iterate over configured servers, connect, discover, and register:

#![allow(unused)]
fn main() {
use std::sync::Arc;

for server_config in &config.mcp_servers {
    let client = McpClient::connect(
        &server_config.name,
        &server_config.command,
        &server_config.args,
    ).await?;

    let client = Arc::new(client);
    let mcp_tools = client.list_tools().await?;
    let defs = McpClient::convert_tool_defs(&mcp_tools, client.server_name());

    for (mcp_def, tool_def) in mcp_tools.into_iter().zip(defs) {
        tools.push(McpTool::new(client.clone(), mcp_def.name, tool_def));
    }
}
}

The agent loop doesn’t know or care that some tools are local and others are remote MCP servers. They all implement Tool. The abstraction works.

Module structure

Wire up the module and re-export from lib.rs:

#![allow(unused)]
fn main() {
pub mod mcp;
// ...
pub use mcp::{McpClient, McpTool};
}

The submodules client and types are pub(crate) – internal implementation details. Only McpClient, McpTool, and McpToolDef are part of the public API.

Running the tests

cargo test -p mini-claw-code ch21

The tests verify protocol types and conversion logic without a real MCP server. They cover: convert_tool_defs with empty, single, multiple, and missing-description inputs; McpToolDef deserialization (including the inputSchema rename and minimal name-only definitions); JsonRpcRequest serialization (with and without params, verifying skip_serializing_if); and ToolCallResult content extraction.

Integration tests for McpClient::connect require a real MCP server process and are better suited for CI.

What you’ve built

Take a step back and look at what you have.

Your agent started as type definitions in Chapter 1. Now it has streaming, subagents, safety rails, token tracking, context management, permissions – and with MCP, it is extensible without recompilation. Anyone can write an MCP server in any language and your agent will discover and use its tools at runtime. The same protocol Claude Code and Cursor speak.

Here’s the full lifecycle when a user configures an MCP server:

 1. Config loads McpServerConfig from config.toml
 2. McpClient::connect() spawns the server process
 3. Client sends initialize, receives capabilities
 4. Client sends notifications/initialized
 5. Client sends tools/list, receives tool definitions
 6. convert_tool_defs() creates ToolDefinitions with mcp__ prefix
 7. McpTool wrappers are added to the ToolSet
 8. User asks a question
 9. Agent loop sends prompt + all tool definitions to the LLM
10. LLM decides to call mcp__github__search_repos
11. Agent finds the McpTool, calls it
12. McpTool forwards to McpClient::call_tool()
13. Client sends tools/call JSON-RPC to the server process
14. Server executes, returns results
15. Client parses the response, returns text
16. Agent loop adds result to the conversation
17. LLM uses the result to answer the user

Seventeen steps, three process boundaries, one seamless experience.

Recap

  • MCP is the standard protocol for AI tool servers. JSON-RPC 2.0 over stdio, line-delimited.
  • The handshake: initialize -> notifications/initialized -> tools/list. Three messages and the client knows what the server can do.
  • McpClient spawns the server, manages stdio via tokio::sync::Mutex, uses AtomicU64 for request IDs. The read loop skips notifications.
  • convert_tool_defs bridges MCP’s owned strings to &'static str via Box::leak. The mcp__server__tool convention prevents collisions.
  • McpTool wraps Arc<McpClient> and implements Tool. The agent loop treats MCP tools identically to built-in tools.
  • McpServerConfig means zero code changes to add new servers.
  • The abstraction holds. A tool is a tool – whether call() reads a local file or sends JSON-RPC to a remote process.