Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Overview

Welcome to Build Your Own Mini Coding Agent in Rust. Over the next seven chapters you will implement a mini coding agent from scratch – a small version of programs like Claude Code or OpenCode – a program that takes a prompt, talks to a large-language model (LLM), and uses tools to interact with the real world. After that, a series of extension chapters add streaming, a TUI, user input, plan mode, and more.

By the end of this book you will have an agent that can run shell commands, read and write files, and edit code, all driven by an LLM. No API key is required until Chapter 6, and when you get there the default model is openrouter/free – a zero-cost endpoint on OpenRouter, no credits needed.

What is an AI agent?

An LLM on its own is a function: text in, text out. Ask it to summarize doc.pdf and it will either refuse or hallucinate – it has no way to open the file.

An agent solves this by giving the LLM tools. A tool is just a function your code can run – read a file, execute a shell command, hit an API. The agent sits in a loop:

  1. Send the user’s prompt to the LLM.
  2. The LLM decides it needs to read doc.pdf and outputs a tool call.
  3. Your code executes the read tool and feeds the file contents back.
  4. The LLM now has the text and returns a summary.

The LLM never touches the filesystem. It just asks, and your code does. That loop – ask, execute, feed back – is the entire idea.

How does an LLM use a tool?

An LLM cannot execute code. It is a text generator. So “calling a tool” really means the LLM outputs a structured request and your code does the rest.

When you send a request to the LLM, you include a list of tool definitions alongside the conversation. Each definition is a name, a description, and a JSON schema describing the arguments. For our read tool that looks like:

{
  "name": "read",
  "description": "Read the contents of a file.",
  "parameters": {
    "type": "object",
    "properties": {
      "path": { "type": "string" }
    },
    "required": ["path"]
  }
}

The LLM reads these definitions the same way it reads the user’s prompt – they are just part of the input. When it decides it needs to read a file, it does not run any code. It produces a structured output like:

{ "name": "read", "arguments": { "path": "doc.pdf" } }

along with a signal that says “I’m not done yet – I made a tool call.” Your code parses this, runs the real function, and sends the result back as a new message. The LLM then continues with that result in context.

Here is the full exchange for our “Summarize doc.pdf” example:

sequenceDiagram
    participant U as User
    participant A as Agent
    participant L as LLM
    participant T as read tool

    U->>A: "Summarize doc.pdf"
    A->>L: prompt + tool definitions
    L-->>A: tool_call: read("doc.pdf")
    A->>T: read("doc.pdf")
    T-->>A: file contents
    A->>L: tool result (file contents)
    L-->>A: "Here is a summary: ..."
    A->>U: "Here is a summary: ..."

The LLM’s only job is deciding which tool to call and what arguments to pass. Your code does the actual work.

A minimal agent in pseudocode

Here is that example as code:

tools    = [read_file]
messages = ["Summarize doc.pdf"]

loop:
    response = llm(messages, tools)

    if response.done:
        print(response.text)
        break

    // The LLM wants to call a tool -- run it and feed the result back.
    for call in response.tool_calls:
        result = execute(call.name, call.args)
        messages.append(result)

That is the entire agent. The rest of this book is implementing each piece – the llm function, the tools, and the types that connect them – in Rust.

The tool-calling loop

Here is the flow of a single agent invocation:

flowchart TD
    A["👤 User prompt"] --> B["🤖 LLM"]
    B -- "StopReason::Stop" --> C["✅ Text response"]
    B -- "StopReason::ToolUse" --> D["🔧 Execute tool calls"]
    D -- "tool results" --> B
  1. The user sends a prompt.
  2. The LLM either responds with text (done) or requests one or more tool calls.
  3. Your code executes each tool and gathers the results.
  4. The results are fed back to the LLM as new messages.
  5. Repeat from step 2 until the LLM responds with text.

That is the entire architecture. Everything else is implementation detail.

What we will build

We will build a simple agent framework consisting of:

4 tools:

ToolWhat it does
readRead the contents of a file
writeWrite content to a file (creating directories as needed)
editReplace an exact string in a file
bashRun a shell command and capture its output

1 provider:

ProviderPurpose
OpenRouterProviderTalks to a real LLM over HTTP via the OpenAI-compatible API

Tests use a MockProvider that returns pre-configured responses so you can run the full test suite without an API key.

Project structure

The project is a Cargo workspace with three crates and a tutorial book:

mini-claw-code/
  Cargo.toml              # workspace root
  mini-claw-code/             # reference solution (do not peek!)
  mini-claw-code-starter/     # YOUR code -- you implement things here
  mini-claw-code-xtask/             # helper commands (cargo x ...)
  mini-claw-code-book/              # this tutorial
  • mini-claw-code contains the complete, working implementation. It is there so the test suite can verify that the exercises are solvable, but you should avoid reading it until you have tried on your own.
  • mini-claw-code-starter is your working crate. Each source file contains struct definitions, trait implementations with unimplemented!() bodies, and doc-comment hints. Your job is to replace the unimplemented!() calls with real code.
  • mini-claw-code-xtask provides the cargo x helper with check, solution-check, and book commands.
  • mini-claw-code-book is this mdbook tutorial.

Prerequisites

Before starting, make sure you have:

  • Rust installed (1.85+ required, for edition 2024). Install from https://rustup.rs.
  • Basic Rust knowledge: ownership, structs, enums, pattern matching, and Result / Option. If you have read the first half of The Rust Programming Language book, you are ready.
  • A terminal and a text editor.
  • mdbook (optional, for reading the tutorial locally). Install with cargo install mdbook mdbook-mermaid.

You do not need an API key until Chapter 6. Chapters 1 through 5 use the MockProvider for testing, so everything runs locally.

Setup

Clone the repository and verify things build:

git clone https://github.com/odysa/mini-claw-code.git
cd mini-claw-code
cargo build

Then verify the test harness works:

cargo test -p mini-claw-code-starter ch1

The tests should fail – that is expected! Your job in Chapter 1 is to make them pass.

If cargo x does not work, make sure you are in the workspace root (the directory containing the top-level Cargo.toml).

Chapter roadmap

ChapterTopicWhat you build
1Core TypesMockProvider – understand the core types by building a test helper
2Your First ToolReadTool – reading files
3Single Turnsingle_turn() – explicit match on StopReason, one round of tool calls
4More ToolsBashTool, WriteTool, EditTool
5Your First Agent SDK!SimpleAgent – generalizes single_turn() into a loop
6The OpenRouter ProviderOpenRouterProvider – talking to a real LLM API
7A Simple CLIWire everything into an interactive CLI with conversation memory
8The SingularityYour agent can now code itself – what’s next

Chapters 1–7 are hands-on: you write code in mini-claw-code-starter and run tests to check your work. Chapter 8 marks the transition to extension chapters (9+) which walk through the reference implementation:

ChapterTopicWhat it adds
9A Better TUIMarkdown rendering, spinners, collapsed tool calls
10StreamingStreamingAgent with SSE parsing and AgentEvents
11User InputAskTool – let the LLM ask you clarifying questions
12Plan ModePlanAgent – read-only planning phase with approval gating

Chapters 1–7 follow the same rhythm:

  1. Read the chapter to understand the concepts.
  2. Open the corresponding source file in mini-claw-code-starter/src/.
  3. Replace the unimplemented!() calls with your implementation.
  4. Run cargo test -p mini-claw-code-starter chN to check your work.

Ready? Let’s build an agent.

What’s next

Head to Chapter 1: Core Types to understand the foundational types – StopReason, Message, and the Provider trait – and build MockProvider, the test helper you will use throughout the next four chapters.

Chapter 1: Core Types

In this chapter you will understand the types that make up the agent protocol – StopReason, AssistantTurn, Message, and the Provider trait. These are the building blocks everything else is built on.

To verify your understanding, you will implement a small test helper: MockProvider, a struct that returns pre-configured responses so that you can test future chapters without an API key.

Goal

Understand the core types, then implement MockProvider so that:

  1. You create it with a VecDeque<AssistantTurn> of canned responses.
  2. Each call to chat() returns the next response in sequence.
  3. If all responses have been consumed, it returns an error.

The core types

Open mini-claw-code-starter/src/types.rs. These types define the protocol between the agent and any LLM backend.

Here is how they relate to each other:

classDiagram
    class Provider {
        <<trait>>
        +chat(messages, tools) AssistantTurn
    }

    class AssistantTurn {
        text: Option~String~
        tool_calls: Vec~ToolCall~
        stop_reason: StopReason
    }

    class StopReason {
        <<enum>>
        Stop
        ToolUse
    }

    class ToolCall {
        id: String
        name: String
        arguments: Value
    }

    class Message {
        <<enum>>
        System(String)
        User(String)
        Assistant(AssistantTurn)
        ToolResult(id, content)
    }

    class ToolDefinition {
        name: &'static str
        description: &'static str
        parameters: Value
    }

    Provider --> AssistantTurn : returns
    Provider --> Message : receives
    Provider --> ToolDefinition : receives
    AssistantTurn --> StopReason
    AssistantTurn --> ToolCall : contains 0..*
    Message --> AssistantTurn : wraps

Provider takes in messages and tool definitions, and returns an AssistantTurn. The turn’s stop_reason tells you what to do next.

ToolDefinition and its builder

#![allow(unused)]
fn main() {
pub struct ToolDefinition {
    pub name: &'static str,
    pub description: &'static str,
    pub parameters: Value,
}
}

Each tool declares a ToolDefinition that tells the LLM what it can do. The parameters field is a JSON Schema object describing the tool’s arguments.

Rather than building JSON by hand every time, ToolDefinition has a builder API:

#![allow(unused)]
fn main() {
ToolDefinition::new("read", "Read the contents of a file.")
    .param("path", "string", "The file path to read", true)
}
  • new(name, description) creates a definition with an empty parameter schema.
  • param(name, type, description, required) adds a parameter and returns self, so you can chain calls.

You will use this builder in every tool starting from Chapter 2.

StopReason and AssistantTurn

#![allow(unused)]
fn main() {
pub enum StopReason {
    Stop,
    ToolUse,
}

pub struct AssistantTurn {
    pub text: Option<String>,
    pub tool_calls: Vec<ToolCall>,
    pub stop_reason: StopReason,
}
}

The ToolCall struct holds a single tool invocation:

#![allow(unused)]
fn main() {
pub struct ToolCall {
    pub id: String,
    pub name: String,
    pub arguments: Value,
}
}

Each tool call has an id (for matching results back to requests), a name (which tool to call), and arguments (a JSON value the tool will parse).

Every response from the LLM comes with a stop_reason that tells you why the model stopped generating:

  • StopReason::Stop – the model is done. Check text for the response.
  • StopReason::ToolUse – the model wants to call tools. Check tool_calls.

This is the raw LLM protocol: the model tells you what to do next. In Chapter 3 you will write a function that explicitly matches on stop_reason to handle each case. In Chapter 5 you will wrap that match inside a loop to create the full agent.

The Provider trait

#![allow(unused)]
fn main() {
pub trait Provider: Send + Sync {
    fn chat<'a>(
        &'a self,
        messages: &'a [Message],
        tools: &'a [&'a ToolDefinition],
    ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a;
}
}

This says: “A Provider is something that can take a slice of messages and a slice of tool definitions, and asynchronously return an AssistantTurn.”

The Send + Sync bounds mean the provider must be safe to share across threads. This is important because tokio (the async runtime) may move tasks between threads.

Notice that chat() takes &self, not &mut self. The real provider (OpenRouterProvider) does not need mutation – it just fires HTTP requests. Making the trait &mut self would force every caller to hold exclusive access, which is unnecessarily restrictive. The trade-off: MockProvider (a test helper) does need to mutate its response list, so it must use interior mutability to conform to the trait.

The Message enum

#![allow(unused)]
fn main() {
pub enum Message {
    System(String),
    User(String),
    Assistant(AssistantTurn),
    ToolResult { id: String, content: String },
}
}

The conversation history is a list of Message values:

  • System(text) – a system prompt that sets the agent’s role and behavior. Typically the first message in the history.
  • User(text) – a prompt from the user.
  • Assistant(turn) – a response from the LLM (text, tool calls, or both).
  • ToolResult { id, content } – the result of executing a tool call. The id matches the ToolCall::id so the LLM knows which call this result belongs to.

You will use these variants starting in Chapter 3 when building the single_turn() function.

Why Provider uses impl Future but Tool uses #[async_trait]

You may notice in Chapter 2 that the Tool trait uses #[async_trait] while Provider uses impl Future directly. The difference is about how the trait is used:

  • Provider is used generically (SimpleAgent<P: Provider>). The compiler knows the concrete type at compile time, so impl Future works.
  • Tool is stored as a trait object (Box<dyn Tool>) in a collection of different tool types. Trait objects require a uniform return type, which #[async_trait] provides by boxing the future.

When implementing a trait that uses impl Future, you can simply write async fn in the impl block – Rust desugars it to the impl Future form automatically. So while the trait definition says -> impl Future<...>, your implementation can just write async fn chat(...).

If this distinction is unclear now, it will click in Chapter 5 when you see both patterns in action.

ToolSet – a collection of tools

One more type you will use starting in Chapter 3: ToolSet. It wraps a HashMap<String, Box<dyn Tool>> and indexes tools by name, giving O(1) lookup when executing tool calls. You build one with a builder:

#![allow(unused)]
fn main() {
let tools = ToolSet::new()
    .with(ReadTool::new())
    .with(BashTool::new());
}

You do not need to implement ToolSet – it is provided in types.rs.

Implementing MockProvider

Now that you understand the types, let’s put them to use. MockProvider is a test helper – it implements Provider by returning canned responses instead of calling a real LLM. You will use it throughout chapters 2–5 to test tools and the agent loop without needing an API key.

Open mini-claw-code-starter/src/mock.rs. You will see the struct and method signatures already laid out with unimplemented!() bodies.

Interior mutability with Mutex

MockProvider needs to remove responses from a list each time chat() is called. But chat() takes &self. How do we mutate through a shared reference?

Rust’s std::sync::Mutex provides interior mutability: you wrap a value in a Mutex, and calling .lock().unwrap() gives you a mutable guard even through &self. The lock ensures only one thread accesses the data at a time.

#![allow(unused)]
fn main() {
use std::collections::VecDeque;
use std::sync::Mutex;

struct MyState {
    items: Mutex<VecDeque<String>>,
}

impl MyState {
    fn take_one(&self) -> Option<String> {
        self.items.lock().unwrap().pop_front()
    }
}
}

Step 1: The struct fields

The struct already has the field you need: a Mutex<VecDeque<AssistantTurn>> to hold the responses. This is provided so that the method signatures compile. Your job is to implement the methods that use this field.

Step 2: Implement new()

The new() method receives a VecDeque<AssistantTurn>. We want FIFO order – each call to chat() should return the first remaining response, not the last. VecDeque::pop_front() does exactly that in O(1):

flowchart LR
    subgraph "VecDeque (FIFO)"
        direction LR
        A["A"] ~~~ B["B"] ~~~ C["C"]
    end
    A -- "pop_front()" --> out1["chat() → A"]
    B -. "next call" .-> out2["chat() → B"]
    C -. "next call" .-> out3["chat() → C"]

So in new():

  1. Wrap the input deque in a Mutex.
  2. Store it in Self.

Step 3: Implement chat()

The chat() method should:

  1. Lock the mutex.
  2. pop_front() the next response.
  3. If there is one, return Ok(response).
  4. If the deque is empty, return an error.

The mock provider intentionally ignores the messages and tools parameters. It does not care what the “user” said – it just returns the next canned response.

A useful pattern for converting Option to Result:

#![allow(unused)]
fn main() {
some_option.ok_or_else(|| anyhow::anyhow!("no more responses"))
}

Running the tests

Run the Chapter 1 tests:

cargo test -p mini-claw-code-starter ch1

What the tests verify

  • test_ch1_returns_text: Creates a MockProvider with one response containing text. Calls chat() once and checks the text matches.
  • test_ch1_returns_tool_calls: Creates a provider with one response containing a tool call. Verifies the tool call name and id.
  • test_ch1_steps_through_sequence: Creates a provider with three responses. Calls chat() three times and verifies they come back in the correct order (First, Second, Third).

These are the core tests. There are also additional edge-case tests (empty responses, exhausted queue, multiple tool calls, etc.) that will pass once your core implementation is correct.

Recap

You have learned the core types that define the agent protocol:

  • StopReason tells you whether the LLM is done or wants to call tools.
  • AssistantTurn carries the LLM’s response – text, tool calls, or both.
  • Provider is the trait any LLM backend implements.

You also built MockProvider, a test helper you will use throughout the next four chapters to simulate LLM conversations without HTTP requests.

What’s next

In Chapter 2: Your First Tool you will implement the ReadTool – a tool that reads file contents and returns them to the LLM.

Chapter 2: Your First Tool

Now that you have a mock provider, it is time to build your first tool. You will implement ReadTool – a tool that reads a file and returns its contents. This is the simplest tool in our agent, but it introduces the Tool trait pattern that every other tool follows.

Goal

Implement ReadTool so that:

  1. It declares its name, description, and parameter schema.
  2. When called with a {"path": "some/file.txt"} argument, it reads the file and returns its contents as a string.
  3. Missing arguments or non-existent files produce errors.

Key Rust concepts

The Tool trait

Open mini-claw-code-starter/src/types.rs and look at the Tool trait:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait Tool: Send + Sync {
    fn definition(&self) -> &ToolDefinition;
    async fn call(&self, args: Value) -> anyhow::Result<String>;
}
}

Two methods:

  • definition() returns metadata about the tool: its name, a description, and a JSON schema describing its parameters. The LLM uses this to decide which tool to call and how to format the arguments.
  • call() actually executes the tool. It receives a serde_json::Value containing the arguments and returns a string result.

ToolDefinition

#![allow(unused)]
fn main() {
pub struct ToolDefinition {
    pub name: &'static str,
    pub description: &'static str,
    pub parameters: Value,
}
}

As you saw in Chapter 1, ToolDefinition has a builder API for declaring parameters. For ReadTool, we need a single required parameter called "path" of type "string":

#![allow(unused)]
fn main() {
ToolDefinition::new("read", "Read the contents of a file.")
    .param("path", "string", "The file path to read", true)
}

Under the hood, the builder constructs the JSON Schema you saw in Chapter 1. The last argument (true) marks the parameter as required.

Why #[async_trait] instead of plain async fn?

You might wonder why we use the async_trait macro instead of writing async fn directly in the trait. The reason is trait object compatibility.

Later, in the agent loop, we will store tools in a ToolSet – a HashMap-backed collection of different tool types behind a common interface. This requires dynamic dispatch, which means the compiler needs to know the size of the return type at compile time.

async fn in traits generates a different, uniquely-sized Future type for each implementation. That breaks dynamic dispatch. The #[async_trait] macro automatically rewrites async fn into a method that returns Pin<Box<dyn Future<...>>>, which has a known, fixed size regardless of which tool produced it. You write normal async fn code, and the macro handles the boxing for you.

Here is the data flow when the agent calls a tool:

flowchart LR
    A["LLM returns<br/>ToolCall"] --> B["args: JSON Value<br/>{&quot;path&quot;: &quot;f.txt&quot;}"]
    B --> C["Tool::call(args)"]
    C --> D["Result: String<br/>(file contents)"]
    D --> E["Sent back to LLM<br/>as ToolResult"]

The LLM never touches the filesystem. It produces a JSON request, your code executes it, and returns a string.

The implementation

Open mini-claw-code-starter/src/tools/read.rs. The struct, Default impl, and method signatures are already provided.

Remember to annotate your impl Tool for ReadTool block with #[async_trait::async_trait]. The starter file already has this in place.

Step 1: Implement new()

Create a ToolDefinition and store it in self.definition. Use the builder:

#![allow(unused)]
fn main() {
ToolDefinition::new("read", "Read the contents of a file.")
    .param("path", "string", "The file path to read", true)
}

Step 2: definition() – already provided

The definition() method is already implemented in the starter – it simply returns &self.definition. No work needed here.

Step 3: Implement call()

This is where the real work happens. Your implementation should:

  1. Extract the "path" argument from args.
  2. Read the file asynchronously.
  3. Return the file contents.

Here is the shape:

#![allow(unused)]
fn main() {
async fn call(&self, args: Value) -> anyhow::Result<String> {
    // 1. Extract path
    // 2. Read file with tokio::fs::read_to_string
    // 3. Return contents
}
}

Some useful APIs:

  • args["path"].as_str() returns Option<&str>. Use .context("missing 'path' argument")? from anyhow to convert None into a descriptive error.
  • tokio::fs::read_to_string(path).await reads a file asynchronously. Chain .with_context(|| format!("failed to read '{path}'"))? for a clear error message.

That is it – extract the path, read the file, return the contents.

Running the tests

Run the Chapter 2 tests:

cargo test -p mini-claw-code-starter ch2

What the tests verify

  • test_ch2_read_definition: Creates a ReadTool and checks that its name is "read", description is non-empty, and "path" is in the required parameters.
  • test_ch2_read_file: Creates a temp file with known content, calls ReadTool with the file path, and checks the returned content matches.
  • test_ch2_read_missing_file: Calls ReadTool with a path that does not exist and verifies it returns an error.
  • test_ch2_read_missing_arg: Calls ReadTool with an empty JSON object (no "path" key) and verifies it returns an error.

There are also additional edge-case tests (empty files, unicode content, wrong argument types, etc.) that will pass once your core implementation is correct.

Recap

You built your first tool by implementing the Tool trait. The key patterns:

  • ToolDefinition::new(...).param(...) declares the tool’s name, description, and parameters.
  • #[async_trait::async_trait] on the impl block lets you write async fn call() while keeping trait object compatibility.
  • tokio::fs for async file I/O.
  • anyhow::Context for adding descriptive error messages.

Every tool in the agent follows this exact same structure. Once you understand ReadTool, the remaining tools are variations on the theme.

What’s next

In Chapter 3: Single Turn you will write a function that matches on StopReason to handle a single round of tool calls.

Chapter 3: Single Turn

You have a provider and a tool. Before jumping to the full agent loop, let’s see the raw protocol: the LLM returns a stop_reason that tells you whether it is done or wants to use tools. In this chapter you will write a function that handles exactly one prompt with at most one round of tool calls.

Goal

Implement single_turn() so that:

  1. It sends a prompt to the provider.
  2. It matches on stop_reason.
  3. If Stop – return the text.
  4. If ToolUse – execute the tools, send results back, return the final text.

No loop. Just one turn.

Key Rust concepts

ToolSet – a HashMap of tools

The function signature takes a &ToolSet instead of a raw slice or vector:

#![allow(unused)]
fn main() {
pub async fn single_turn<P: Provider>(
    provider: &P,
    tools: &ToolSet,
    prompt: &str,
) -> anyhow::Result<String>
}

ToolSet wraps a HashMap<String, Box<dyn Tool>> and indexes tools by their definition name. This gives O(1) lookup when executing tool calls instead of scanning a list. The builder API auto-extracts the name from each tool’s definition:

#![allow(unused)]
fn main() {
let tools = ToolSet::new().with(ReadTool::new());
let result = single_turn(&provider, &tools, "Read test.txt").await?;
}

match on StopReason

This is the core teaching point. Instead of checking tool_calls.is_empty(), you explicitly match on the stop reason:

#![allow(unused)]
fn main() {
match turn.stop_reason {
    StopReason::Stop => { /* return text */ }
    StopReason::ToolUse => { /* execute tools */ }
}
}

This makes the protocol visible. The LLM is telling you what to do, and you handle each case explicitly.

Here is the complete flow of single_turn():

flowchart TD
    A["prompt"] --> B["provider.chat()"]
    B --> C{"stop_reason?"}
    C -- "Stop" --> D["Return text"]
    C -- "ToolUse" --> E["Execute each tool call"]
    E --> F{"Tool error?"}
    F -- "Ok" --> G["result = output"]
    F -- "Err" --> H["result = error message"]
    G --> I["Push Assistant message"]
    H --> I
    I --> J["Push ToolResult messages"]
    J --> K["provider.chat() again"]
    K --> L["Return final text"]

The key difference from the full agent loop (Chapter 5) is that there is no outer loop here. If the LLM asks for tools a second time, single_turn() does not handle it – that is what the agent loop is for.

The implementation

Open mini-claw-code-starter/src/agent.rs. You will see the single_turn() function signature at the top of the file, before the SimpleAgent struct.

Step 1: Collect tool definitions

ToolSet has a definitions() method that returns all tool schemas:

#![allow(unused)]
fn main() {
let defs = tools.definitions();
}

Step 2: Create the initial message

#![allow(unused)]
fn main() {
let mut messages = vec![Message::User(prompt.to_string())];
}

Step 3: Call the provider

#![allow(unused)]
fn main() {
let turn = provider.chat(&messages, &defs).await?;
}

Step 4: Match on stop_reason

This is the heart of the function:

#![allow(unused)]
fn main() {
match turn.stop_reason {
    StopReason::Stop => Ok(turn.text.unwrap_or_default()),
    StopReason::ToolUse => {
        // execute tools, send results, get final answer
    }
}
}

For the ToolUse branch:

  1. For each tool call, find the matching tool and call it. Collect the results into a Vec first – you will need turn.tool_calls for this, so you cannot move turn yet.
  2. Push Message::Assistant(turn) and then Message::ToolResult for each result. Pushing the assistant turn moves turn, which is why you must collect results beforehand.
  3. Call the provider again to get the final answer.
  4. Return final_turn.text.unwrap_or_default().

The tool-finding and execution logic is the same as what you will use in the agent loop (Chapter 5):

#![allow(unused)]
fn main() {
println!("{}", tool_summary(call));
let content = match tools.get(&call.name) {
    Some(t) => t.call(call.arguments.clone()).await
        .unwrap_or_else(|e| format!("error: {e}")),
    None => format!("error: unknown tool `{}`", call.name),
};
}

The tool_summary() helper prints each tool call to the terminal so you can see which tools the agent is using and what arguments it passed. For example, [bash: ls -la] or [read: src/main.rs]. (The reference implementation uses print!("\x1b[2K\r...") instead of println! to clear the thinking... indicator line before printing – you’ll see this pattern in Chapter 7. A plain println! works fine for now.)

Error handling – never crash the loop

Notice that tool errors are caught, not propagated. The .unwrap_or_else() converts any error into a string like "error: failed to read 'missing.txt'". This string is sent back to the LLM as a normal tool result. The LLM can then decide what to do – try a different file, use another tool, or explain the problem to the user.

The same applies to unknown tools – instead of panicking, you send an error message back as a tool result.

This is a key design principle: the agent loop should never crash because of a tool failure. Tools operate on the real world (files, processes, network), and failures are expected. The LLM is smart enough to recover if you give it the error message.

Here is the message sequence for a successful tool call:

sequenceDiagram
    participant ST as single_turn()
    participant P as Provider
    participant T as ReadTool

    ST->>P: [User("Read test.txt")] + tool defs
    P-->>ST: ToolUse: read({path: "test.txt"})
    ST->>T: call({path: "test.txt"})
    T-->>ST: "file contents..."
    Note over ST: Push Assistant + ToolResult
    ST->>P: [User, Assistant, ToolResult]
    P-->>ST: Stop: "Here are the contents: ..."
    ST-->>ST: return text

And here is what happens when a tool fails (e.g. file not found):

sequenceDiagram
    participant ST as single_turn()
    participant P as Provider
    participant T as ReadTool

    ST->>P: [User("Read missing.txt")] + tool defs
    P-->>ST: ToolUse: read({path: "missing.txt"})
    ST->>T: call({path: "missing.txt"})
    T--xST: Err("failed to read 'missing.txt'")
    Note over ST: Catch error, use as result
    Note over ST: Push Assistant + ToolResult("error: failed to read ...")
    ST->>P: [User, Assistant, ToolResult]
    P-->>ST: Stop: "Sorry, that file doesn't exist."
    ST-->>ST: return text

The error does not crash the agent. It becomes a tool result that the LLM reads and responds to.

Running the tests

Run the Chapter 3 tests:

cargo test -p mini-claw-code-starter ch3

What the tests verify

  • test_ch3_direct_response: Provider returns StopReason::Stop. single_turn should return the text directly.

  • test_ch3_one_tool_call: Provider returns StopReason::ToolUse with a read tool call, then StopReason::Stop. Verifies the file was read and the final text is returned.

  • test_ch3_unknown_tool: Provider returns StopReason::ToolUse for a tool that does not exist. Verifies the error message is sent as a tool result and the final text is returned.

  • test_ch3_tool_error_propagates: Provider requests a read on a file that does not exist. The error should be caught and sent back to the LLM as a tool result (not crash the function). The LLM then responds with text.

There are also additional edge-case tests (empty responses, multiple tool calls in one turn, etc.) that will pass once your core implementation is correct.

Recap

You have written the simplest possible handler for the LLM protocol:

  • Match on StopReason – the model tells you what to do next.
  • No loop – you handle at most one round of tool calls.
  • ToolSet – a HashMap-backed collection with O(1) tool lookup by name.

This is the foundation. In Chapter 5 you will wrap this same logic in a loop to create the full agent.

What’s next

In Chapter 4: More Tools you will implement three more tools: BashTool, WriteTool, and EditTool.

Chapter 4: More Tools

You have already implemented ReadTool and understand the Tool trait pattern. Now you will implement three more tools: BashTool, WriteTool, and EditTool. Each follows the same structure – define a schema, implement call() – so this chapter reinforces the pattern through repetition.

By the end of this chapter your agent will have all four tools it needs to interact with the file system and execute commands.

flowchart LR
    subgraph ToolSet
        R["read<br/>Read a file"]
        B["bash<br/>Run a command"]
        W["write<br/>Write a file"]
        E["edit<br/>Replace a string"]
    end
    Agent -- "tools.get(name)" --> ToolSet

Goal

Implement three tools:

  1. BashTool – run a shell command and return its output.
  2. WriteTool – write content to a file, creating directories as needed.
  3. EditTool – replace an exact string in a file (must appear exactly once).

Key Rust concepts

tokio::process::Command

Tokio provides an async wrapper around std::process::Command. You will use it in BashTool:

#![allow(unused)]
fn main() {
let output = tokio::process::Command::new("bash")
    .arg("-c")
    .arg(command)
    .output()
    .await?;
}

This runs bash -c "<command>" and captures stdout and stderr. The output struct has stdout and stderr fields as Vec<u8>, which you convert to strings with String::from_utf8_lossy().

bail!() macro

The anyhow::bail!() macro is shorthand for returning an error immediately:

#![allow(unused)]
fn main() {
use anyhow::bail;

if count == 0 {
    bail!("not found");
}
// equivalent to:
// return Err(anyhow::anyhow!("not found"));
}

You will use this in EditTool for validation.

Make sure to import it: use anyhow::{Context, bail};. The starter file already includes this import in edit.rs.

create_dir_all

When writing a file to a path like a/b/c/file.txt, the parent directories might not exist. tokio::fs::create_dir_all creates the entire directory tree:

#![allow(unused)]
fn main() {
if let Some(parent) = std::path::Path::new(path).parent() {
    tokio::fs::create_dir_all(parent).await?;
}
}

Tool 1: BashTool

Open mini-claw-code-starter/src/tools/bash.rs.

Schema

Use the builder pattern you learned in Chapter 2:

#![allow(unused)]
fn main() {
ToolDefinition::new("bash", "Run a bash command and return its output.")
    .param("command", "string", "The bash command to run", true)
}

Implementation

The call() method should:

  1. Extract "command" from args.
  2. Run bash -c <command> using tokio::process::Command.
  3. Capture stdout and stderr.
  4. Build a result string:
    • Start with stdout (if non-empty).
    • Append stderr prefixed with "stderr: " (if non-empty).
    • If both are empty, return "(no output)".

Think about how you combine stdout and stderr. If both are present, you want them separated by a newline. Something like:

#![allow(unused)]
fn main() {
let mut result = String::new();
if !stdout.is_empty() {
    result.push_str(&stdout);
}
if !stderr.is_empty() {
    if !result.is_empty() {
        result.push('\n');
    }
    result.push_str("stderr: ");
    result.push_str(&stderr);
}
if result.is_empty() {
    result.push_str("(no output)");
}
}

Tool 2: WriteTool

Open mini-claw-code-starter/src/tools/write.rs.

Schema

#![allow(unused)]
fn main() {
ToolDefinition::new("write", "Write content to a file, creating directories as needed.")
    .param("path", "string", "The file path to write to", true)
    .param("content", "string", "The content to write to the file", true)
}

Implementation

The call() method should:

  1. Extract "path" and "content" from args.
  2. Create parent directories if they do not exist.
  3. Write the content to the file.
  4. Return a confirmation message like "wrote {path}".

For creating parent directories:

#![allow(unused)]
fn main() {
if let Some(parent) = std::path::Path::new(path).parent() {
    tokio::fs::create_dir_all(parent).await
        .with_context(|| format!("failed to create directories for '{path}'"))?;
}
}

Then write the file:

#![allow(unused)]
fn main() {
tokio::fs::write(path, content).await
    .with_context(|| format!("failed to write '{path}'"))?;
}

Tool 3: EditTool

Open mini-claw-code-starter/src/tools/edit.rs.

Schema

#![allow(unused)]
fn main() {
ToolDefinition::new("edit", "Replace an exact string in a file (must appear exactly once).")
    .param("path", "string", "The file path to edit", true)
    .param("old_string", "string", "The exact string to find and replace", true)
    .param("new_string", "string", "The replacement string", true)
}

Implementation

The call() method is the most interesting of the bunch. It should:

  1. Extract "path", "old_string", and "new_string" from args.
  2. Read the file contents.
  3. Count how many times old_string appears in the content.
  4. If the count is 0, return an error: the string was not found.
  5. If the count is greater than 1, return an error: the string is ambiguous.
  6. Replace the single occurrence and write the file back.
  7. Return a confirmation like "edited {path}".

The validation is important – requiring exactly one match prevents accidental edits in the wrong place.

flowchart TD
    A["Read file"] --> B["Count matches<br/>of old_string"]
    B --> C{"count?"}
    C -- "0" --> D["Error: not found"]
    C -- "1" --> E["Replace + write file"]
    C -- ">1" --> F["Error: ambiguous"]
    E --> G["Return &quot;edited path&quot;"]

Useful APIs:

  • content.matches(old).count() counts occurrences of a substring.
  • content.replacen(old, new, 1) replaces the first occurrence.
  • bail!("old_string not found in '{path}'") for the not-found case.
  • bail!("old_string appears {count} times in '{path}', must be unique") for the ambiguous case.

Running the tests

Run the Chapter 4 tests:

cargo test -p mini-claw-code-starter ch4

What the tests verify

BashTool:

  • test_ch4_bash_definition: Checks name is "bash" and "command" is required.
  • test_ch4_bash_runs_command: Runs echo hello and checks the output contains "hello".
  • test_ch4_bash_captures_stderr: Runs echo err >&2 and checks stderr is captured.
  • test_ch4_bash_missing_arg: Passes empty args and expects an error.

WriteTool:

  • test_ch4_write_definition: Checks name is "write".
  • test_ch4_write_creates_file: Writes to a temp file and reads it back.
  • test_ch4_write_creates_dirs: Writes to a/b/c/out.txt and verifies directories were created.
  • test_ch4_write_missing_arg: Passes only "path" (no "content") and expects an error.

EditTool:

  • test_ch4_edit_definition: Checks name is "edit".
  • test_ch4_edit_replaces_string: Edits "hello" to "goodbye" in a file containing "hello world" and checks the result is "goodbye world".
  • test_ch4_edit_not_found: Tries to replace a string that does not exist and expects an error.
  • test_ch4_edit_not_unique: Tries to replace "a" in a file containing "aaa" (three occurrences) and expects an error.

There are also additional edge-case tests for each tool (wrong argument types, missing arguments, output format checks, etc.) that will pass once your core implementations are correct.

Recap

You now have four tools, and they all follow the same pattern:

  1. Define a ToolDefinition with ::new(...).param(...) builder calls.
  2. Return &self.definition from definition().
  3. Add #[async_trait::async_trait] on the impl Tool block and write async fn call().

This is a deliberate design. The Tool trait makes every tool interchangeable from the agent’s perspective. The agent does not know or care how a tool works internally – it only needs the definition (to tell the LLM) and the call method (to execute it).

What’s next

With a provider and four tools ready, it is time to connect them. In Chapter 5: Your First Agent SDK! you will build the SimpleAgent – the core loop that sends prompts to the provider, executes tool calls, and iterates until the LLM gives a final answer.

Chapter 5: Your First Agent SDK!

This is the chapter where everything comes together. You have a provider that returns AssistantTurn responses and four tools that execute actions. Now you will build the SimpleAgent – the loop that connects them.

This is the “aha!” moment of the tutorial. The agent loop is surprisingly short, but it is the engine that makes an LLM into an agent.

What is an agent loop?

In Chapter 3 you built single_turn() – one prompt, one round of tool calls, one final answer. That is enough when the LLM knows everything it needs after reading a single file. But real tasks are messier:

“Find the bug in this project and fix it.”

The LLM might need to read five files, run the test suite, edit a source file, run the tests again, and then report back. Each of those is a tool call, and the LLM cannot plan them all upfront because the result of one call determines the next. It needs a loop.

The agent loop is that loop:

flowchart TD
    A["User prompt"] --> B["Call LLM"]
    B -- "StopReason::Stop" --> C["Return text"]
    B -- "StopReason::ToolUse" --> D["Execute tool calls"]
    D -- "Push assistant + tool results" --> B
  1. Send messages to the LLM.
  2. If the LLM says “I’m done” (StopReason::Stop), return its text.
  3. If the LLM says “I need tools” (StopReason::ToolUse), execute them.
  4. Append the assistant turn and tool results to the message history.
  5. Go to step 1.

That is the entire architecture of every coding agent – Claude Code, Cursor, OpenCode, Copilot. The details vary (streaming, parallel tool calls, safety checks), but the core loop is always the same. And you are about to build it in about 30 lines of Rust.

Goal

Implement SimpleAgent so that:

  1. It holds a provider and a collection of tools.
  2. You can register tools using a builder pattern (.tool(ReadTool::new())).
  3. The run() method implements the tool-calling loop: prompt -> provider -> tool calls -> tool results -> provider -> … -> final text.

Key Rust concepts

Generics with trait bounds

#![allow(unused)]
fn main() {
pub struct SimpleAgent<P: Provider> {
    provider: P,
    tools: ToolSet,
}
}

The <P: Provider> means SimpleAgent is generic over any type that implements the Provider trait. When you use MockProvider, the compiler generates code specialized for MockProvider. When you use OpenRouterProvider, it generates code for that type. Same logic, different providers.

ToolSet – a HashMap of trait objects

The tools field is a ToolSet, which wraps a HashMap<String, Box<dyn Tool>> internally. Each value is a heap-allocated trait object that implements Tool, but the concrete types can differ. One might be a ReadTool, the next a BashTool. The HashMap key is the tool’s name, giving O(1) lookup when executing tool calls.

Why trait objects (Box<dyn Tool>) instead of generics? Because you need a heterogeneous collection. A Vec<T> requires all elements to be the same type. With Box<dyn Tool>, you erase the concrete type and store them all behind the same interface.

This is why the Tool trait uses #[async_trait] – the macro rewrites async fn into a boxed future with a uniform type across different tool implementations.

The builder pattern

The tool() method takes self by value (not &mut self) and returns Self:

#![allow(unused)]
fn main() {
pub fn tool(mut self, t: impl Tool + 'static) -> Self {
    // push the tool
    self
}
}

This lets you chain calls:

#![allow(unused)]
fn main() {
let agent = SimpleAgent::new(provider)
    .tool(BashTool::new())
    .tool(ReadTool::new())
    .tool(WriteTool::new())
    .tool(EditTool::new());
}

The impl Tool + 'static parameter accepts any type implementing Tool with a 'static lifetime (meaning it does not borrow temporary data). Inside the method, you push it into the ToolSet, which boxes it and indexes it by name.

The implementation

Open mini-claw-code-starter/src/agent.rs. The struct definition and method signatures are provided.

Step 1: Implement new()

Store the provider and initialize an empty ToolSet:

#![allow(unused)]
fn main() {
pub fn new(provider: P) -> Self {
    Self {
        provider,
        tools: ToolSet::new(),
    }
}
}

This one is straightforward.

Step 2: Implement tool()

Push the tool into the set, return self:

#![allow(unused)]
fn main() {
pub fn tool(mut self, t: impl Tool + 'static) -> Self {
    self.tools.push(t);
    self
}
}

Step 3: Implement run() – the core loop

This is the heart of the agent. Here is the flow:

  1. Collect tool definitions from all registered tools.
  2. Create a messages vector starting with the user’s prompt.
  3. Loop: a. Call self.provider.chat(&messages, &defs) to get an AssistantTurn. b. Match on turn.stop_reason:
    • StopReason::Stop – the LLM is done, return turn.text.
    • StopReason::ToolUse – for each tool call:
      1. Find the matching tool by name.
      2. Call it with the arguments.
      3. Collect the result. c. Push the AssistantTurn as a Message::Assistant. d. Push each tool result as a Message::ToolResult. e. Continue the loop.

Think about the data flow carefully. After executing tools, you push both the assistant’s turn (so the LLM can see what it requested) and the tool results (so it can see what happened). This gives the LLM full context to decide what to do next.

Gathering tool definitions

At the start of run(), collect all tool definitions from the ToolSet:

#![allow(unused)]
fn main() {
let defs = self.tools.definitions();
}

The loop structure

This is single_turn() (from Chapter 3) wrapped in a loop. Instead of handling just one round, we match on stop_reason inside a loop:

#![allow(unused)]
fn main() {
loop {
    let turn = self.provider.chat(&messages, &defs).await?;

    match turn.stop_reason {
        StopReason::Stop => return Ok(turn.text.unwrap_or_default()),
        StopReason::ToolUse => {
            // Execute tool calls, collect results
            // Push messages
        }
    }
}
}

Finding and calling tools

For each tool call, look it up by name in the ToolSet:

#![allow(unused)]
fn main() {
println!("{}", tool_summary(call));
let content = match self.tools.get(&call.name) {
    Some(t) => t.call(call.arguments.clone()).await
        .unwrap_or_else(|e| format!("error: {e}")),
    None => format!("error: unknown tool `{}`", call.name),
};
}

The tool_summary() helper prints each tool call to the terminal – one line per tool with its key argument, so you can watch what the agent does in real time. For example: [bash: cat Cargo.toml] or [write: src/lib.rs].

Error handling

Tool errors are caught with .unwrap_or_else() and converted into a string that gets sent back to the LLM as a tool result. This is the same pattern from Chapter 3, and it is critical here because the agent loop runs multiple iterations. If a tool error crashed the loop, the agent would die on the first missing file or failed command. Instead, the LLM sees the error and can recover – try a different path, adjust the command, or explain the problem.

> What's in README.md?
[read: README.md]          <-- tool fails (file not found)
[read: Cargo.toml]         <-- LLM recovers, tries another file
Here is the project info from Cargo.toml...

Unknown tools are handled the same way – an error string as the tool result, not a crash.

Pushing messages

After executing all tool calls for a turn, push the assistant message and the tool results. You need to collect results first (because the turn is moved into Message::Assistant):

#![allow(unused)]
fn main() {
let mut results = Vec::new();
for call in &turn.tool_calls {
    // ... execute and collect (id, content) pairs
}

messages.push(Message::Assistant(turn));
for (id, content) in results {
    messages.push(Message::ToolResult { id, content });
}
}

The order matters: assistant message first, then tool results. This matches the format that LLM APIs expect.

Running the tests

Run the Chapter 5 tests:

cargo test -p mini-claw-code-starter ch5

What the tests verify

  • test_ch5_text_response: Provider returns text immediately (no tools). Agent should return that text.

  • test_ch5_single_tool_call: Provider first requests a read tool call, then returns text. Agent should execute the tool and return the final text.

  • test_ch5_unknown_tool: Provider requests a tool that does not exist. Agent should handle it gracefully (return an error string as the tool result) and continue to get the final text.

  • test_ch5_multi_step_loop: Provider requests read twice across two turns, then returns text. Verifies the loop runs multiple iterations.

  • test_ch5_empty_response: Provider returns None for text and no tool calls. Agent should return an empty string.

  • test_ch5_builder_chain: Verifies that .tool().tool() chaining compiles – a compile-time check for the builder pattern.

  • test_ch5_tool_error_propagates: Provider requests a read on a file that does not exist. The error should be caught and sent back as a tool result. The LLM then responds with text. Verifies the loop does not crash on tool failures.

There are also additional edge-case tests (three-step loops, multi-tool pipelines, etc.) that will pass once your core implementation is correct.

Seeing it all work

Once the tests pass, take a moment to appreciate what you have built. With about 30 lines of code in run(), you have a working agent loop. Here is what happens when a test runs agent.run("Read test.txt"):

  1. Messages: [User("Read test.txt")]
  2. Provider returns: tool call for read with {"path": "test.txt"}
  3. Agent calls ReadTool::call(), gets file contents
  4. Messages: [User("Read test.txt"), Assistant(tool_call), ToolResult("file content")]
  5. Provider returns: text response
  6. Agent returns the text

The mock provider makes this deterministic and testable. But the exact same loop works with a real LLM provider – you just swap MockProvider for OpenRouterProvider.

Recap

The agent loop is the core of the framework:

  • Generics (<P: Provider>) let it work with any provider.
  • ToolSet (a HashMap of Box<dyn Tool>) gives O(1) tool lookup by name.
  • The builder pattern makes setup ergonomic.
  • Error resilience – tool errors are caught and sent back to the LLM, not propagated. The loop never crashes from a tool failure.
  • The loop is simple: call provider, match on stop_reason, execute tools, feed results back, repeat.

What’s next

Your agent works, but only with the mock provider. In Chapter 6: The OpenRouter Provider you will implement OpenRouterProvider, which talks to a real LLM API over HTTP. This is what turns your agent from a testing harness into a real, usable tool.

Chapter 6: The OpenRouter Provider

Up to now, everything has run locally with the MockProvider. In this chapter you will implement OpenRouterProvider – a provider that talks to a real LLM over HTTP using the OpenAI-compatible chat completions API.

This is the chapter that makes your agent real.

Goal

Implement OpenRouterProvider so that:

  1. It can be created with an API key and model name.
  2. It converts our internal Message and ToolDefinition types to the API format.
  3. It sends HTTP POST requests to the chat completions endpoint.
  4. It parses responses back into AssistantTurn.

Key Rust concepts

Serde derives and attributes

The API types in openrouter.rs are already provided – you do not need to modify them. But understanding them helps:

#![allow(unused)]
fn main() {
#[derive(Serialize, Deserialize, Clone, Debug)]
pub(crate) struct ApiToolCall {
    pub(crate) id: String,
    #[serde(rename = "type")]
    pub(crate) type_: String,
    pub(crate) function: ApiFunction,
}
}

Key serde attributes used:

  • #[serde(rename = "type")] – The JSON field is called "type", but type is a reserved keyword in Rust. So the struct field is type_ and serde renames it during serialization/deserialization.

  • #[serde(skip_serializing_if = "Option::is_none")] – Omits the field from JSON if the value is None. This is important because the API expects certain fields to be absent (not null) when unused.

  • #[serde(skip_serializing_if = "Vec::is_empty")] – Same idea for empty vectors. If there are no tools, we omit the tools field entirely.

The reqwest HTTP client

reqwest is the standard HTTP client crate in Rust. The pattern:

#![allow(unused)]
fn main() {
let response: MyType = client
    .post(url)
    .bearer_auth(&api_key)
    .json(&body)        // serialize body as JSON
    .send()
    .await
    .context("request failed")?
    .error_for_status() // turn 4xx/5xx into errors
    .context("API returned error status")?
    .json()             // deserialize response as JSON
    .await
    .context("failed to parse response")?;
}

Each method returns a builder or future that you chain together. The ? operator propagates errors at each step.

impl Into<String>

Several methods use impl Into<String> as a parameter type:

#![allow(unused)]
fn main() {
pub fn new(api_key: impl Into<String>, model: impl Into<String>) -> Self
}

This accepts anything that can be converted into a String: String, &str, Cow<str>, etc. Inside the method, call .into() to get the String:

#![allow(unused)]
fn main() {
api_key: api_key.into(),
model: model.into(),
}

dotenvy

The dotenvy crate loads environment variables from a .env file:

#![allow(unused)]
fn main() {
let _ = dotenvy::dotenv(); // loads .env if present, ignores errors
let key = std::env::var("OPENROUTER_API_KEY")?;
}

The let _ = discards the result because it is fine if .env does not exist (the variable might already be in the environment).

The API types

The file mini-claw-code-starter/src/providers/openrouter.rs starts with a block of serde structs. These represent the OpenAI-compatible chat completions API format. Here is a quick summary:

Request types:

  • ChatRequest – the POST body: model name, messages, tools
  • ApiMessage – a single message with role, content, optional tool calls
  • ApiTool / ApiToolDef – tool definition in API format

Response types:

  • ChatResponse – the API response: a list of choices
  • Choice – a single choice containing a message and a finish_reason
  • ResponseMessage – the assistant’s response: optional content, optional tool calls

The finish_reason field on Choice tells you why the model stopped generating. Map it to StopReason in your chat() implementation: "tool_calls" becomes StopReason::ToolUse, anything else becomes StopReason::Stop.

These are already complete. Your job is to implement the methods that use them.

The implementation

Step 1: Implement new()

Initialize all four fields:

#![allow(unused)]
fn main() {
pub fn new(api_key: impl Into<String>, model: impl Into<String>) -> Self {
    Self {
        client: reqwest::Client::new(),
        api_key: api_key.into(),
        model: model.into(),
        base_url: "https://openrouter.ai/api/v1".into(),
    }
}
}

Step 2: Implement base_url()

A simple builder method that overrides the base URL:

#![allow(unused)]
fn main() {
pub fn base_url(mut self, url: impl Into<String>) -> Self {
    self.base_url = url.into();
    self
}
}

Step 3: Implement from_env_with_model()

  1. Load .env with dotenvy::dotenv() (ignore the result).
  2. Read OPENROUTER_API_KEY from the environment.
  3. Call Self::new() with the key and model.

Use std::env::var("OPENROUTER_API_KEY") and chain .context(...) for a clear error message if the key is missing.

Step 4: Implement from_env()

This is a one-liner that calls from_env_with_model with the default model "openrouter/free". This is a free model on OpenRouter – no credits needed to get started.

Step 5: Implement convert_messages()

This method translates our Message enum into the API’s ApiMessage format. Iterate over the messages and match on each variant:

  • Message::System(text) becomes an ApiMessage with role "system" and content: Some(text.clone()). The other fields are None.

  • Message::User(text) becomes an ApiMessage with role "user" and content: Some(text.clone()). The other fields are None.

  • Message::Assistant(turn) becomes an ApiMessage with role "assistant". Set content to turn.text.clone(). If turn.tool_calls is non-empty, convert each ToolCall to an ApiToolCall:

    #![allow(unused)]
    fn main() {
    ApiToolCall {
        id: c.id.clone(),
        type_: "function".into(),
        function: ApiFunction {
            name: c.name.clone(),
            arguments: c.arguments.to_string(), // Value -> String
        },
    }
    }

    If tool_calls is empty, set tool_calls: None (not Some(vec![])).

  • Message::ToolResult { id, content } becomes an ApiMessage with role "tool", content: Some(content.clone()), and tool_call_id: Some(id.clone()).

Step 6: Implement convert_tools()

Map each &ToolDefinition to an ApiTool:

#![allow(unused)]
fn main() {
ApiTool {
    type_: "function",
    function: ApiToolDef {
        name: t.name,
        description: t.description,
        parameters: t.parameters.clone(),
    },
}
}

Step 7: Implement chat()

This is the main method. It brings everything together:

  1. Build a ChatRequest with the model, converted messages, and converted tools.
  2. POST it to {base_url}/chat/completions with bearer auth.
  3. Parse the response as ChatResponse.
  4. Extract the first choice.
  5. Convert tool_calls back to our ToolCall type.

The tool call conversion is the trickiest part. The API returns function.arguments as a string (JSON-encoded), but our ToolCall stores it as a serde_json::Value. So you need to parse it:

#![allow(unused)]
fn main() {
let arguments = serde_json::from_str(&tc.function.arguments)
    .unwrap_or(Value::Null);
}

The unwrap_or(Value::Null) handles the case where the arguments string is not valid JSON (unlikely with a well-behaved API, but good to be safe).

Here is the skeleton for the chat() method:

#![allow(unused)]
fn main() {
async fn chat(
    &self,
    messages: &[Message],
    tools: &[&ToolDefinition],
) -> anyhow::Result<AssistantTurn> {
    let body = ChatRequest {
        model: &self.model,
        messages: Self::convert_messages(messages),
        tools: Self::convert_tools(tools),
    };

    let response: ChatResponse = self.client
        .post(format!("{}/chat/completions", self.base_url))
        // ... bearer_auth, json, send, error_for_status, json ...
        ;

    let choice = response.choices.into_iter().next()
        .context("no choices in response")?;

    // Convert choice.message.tool_calls to Vec<ToolCall>
    // Map finish_reason to StopReason
    // Return AssistantTurn { text, tool_calls, stop_reason }
    todo!()
}
}

Fill in the HTTP call chain and the response conversion logic.

Running the tests

Run the Chapter 6 tests:

cargo test -p mini-claw-code-starter ch6

The Chapter 6 tests verify the conversion methods (convert_messages and convert_tools), the constructor logic, and the full chat() method using a local mock HTTP server. They do not call a real LLM API, so no API key is needed. There are also additional edge-case tests that will pass once your core implementation is correct.

Optional: Live test

If you want to test with a real API, set up an OpenRouter API key:

  1. Sign up at openrouter.ai.
  2. Create an API key.
  3. Create a .env file in the workspace root:
OPENROUTER_API_KEY=sk-or-v1-your-key-here

Then try building and running the chat example from Chapter 7. But first, finish reading this chapter and move on to Chapter 7 where you wire everything up.

Recap

You have implemented a real HTTP provider that:

  • Constructs from an API key and model name (or from environment variables).
  • Converts between your internal types and the OpenAI-compatible API format.
  • Sends HTTP requests and parses responses.

The key patterns:

  • Serde attributes for JSON field mapping (rename, skip_serializing_if).
  • reqwest for HTTP with a fluent builder API.
  • impl Into<String> for flexible string parameters.
  • dotenvy for loading .env files.

Your agent framework is now complete. Every piece – tools, the agent loop, and the HTTP provider – is implemented and tested.

What’s next

In Chapter 7: A Simple CLI you will wire everything into an interactive CLI with conversation memory.

Chapter 7: A Simple CLI

You have built every component: a mock provider for testing, four tools, the agent loop, and an HTTP provider. Now it is time to wire them all into a working CLI.

Goal

Add a chat() method to SimpleAgent and write examples/chat.rs so that:

  1. The agent remembers the conversation – each prompt builds on the previous ones.
  2. It prints > , reads a line, runs the agent, and prints the result.
  3. It shows a thinking... indicator while the agent works.
  4. It keeps running until the user presses Ctrl+D (EOF).

The chat() method

Open mini-claw-code-starter/src/agent.rs. Below run() you will see the chat() method signature.

Why a new method?

run() creates a fresh Vec<Message> each time it is called. That means the LLM has no memory of previous exchanges. A real CLI should carry context forward, so the LLM can say “I already read that file” or “as I mentioned earlier.”

chat() solves this by accepting the message history from the caller:

#![allow(unused)]
fn main() {
pub async fn chat(&self, messages: &mut Vec<Message>) -> anyhow::Result<String>
}

The caller pushes Message::User(…) before calling, and chat() appends the assistant turns. When it returns, messages contains the full conversation history ready for the next round.

The implementation

The loop body is identical to run(). The only differences are:

  1. Use the provided messages instead of creating a new vec.
  2. On StopReason::Stop, clone the text before pushing Message::Assistant(turn) – the push moves turn, so you need the text first.
  3. Push Message::Assistant(turn) so the history includes the final response.
  4. Return the cloned text.
#![allow(unused)]
fn main() {
pub async fn chat(&self, messages: &mut Vec<Message>) -> anyhow::Result<String> {
    let defs = self.tools.definitions();

    loop {
        let turn = self.provider.chat(messages, &defs).await?;

        match turn.stop_reason {
            StopReason::Stop => {
                let text = turn.text.clone().unwrap_or_default();
                messages.push(Message::Assistant(turn));
                return Ok(text);
            }
            StopReason::ToolUse => {
                // Same tool execution as run() ...
            }
        }
    }
}
}

The ToolUse branch is exactly the same as in run(): execute each tool, collect results, push the assistant turn, push the tool results.

Ownership detail

In run() you could do return Ok(turn.text.unwrap_or_default()) directly because the function was done with turn. In chat() you also need to push Message::Assistant(turn) into the history. Since that push moves turn, you must extract the text first:

#![allow(unused)]
fn main() {
let text = turn.text.clone().unwrap_or_default();
messages.push(Message::Assistant(turn));  // moves turn
return Ok(text);                          // return the clone
}

This is a one-line change from run(), but it matters.

The CLI

Open mini-claw-code-starter/examples/chat.rs. You will see a skeleton with unimplemented!(). Replace it with the full program.

Step 1: Imports

#![allow(unused)]
fn main() {
use mini_claw_code_starter::{
    BashTool, EditTool, Message, OpenRouterProvider, ReadTool, SimpleAgent, WriteTool,
};
use std::io::{self, BufRead, Write};
}

Note the Message import – you need it to build the history vector.

Step 2: Create the provider and agent

#![allow(unused)]
fn main() {
let provider = OpenRouterProvider::from_env()?;
let agent = SimpleAgent::new(provider)
    .tool(BashTool::new())
    .tool(ReadTool::new())
    .tool(WriteTool::new())
    .tool(EditTool::new());
}

Same as before – nothing new here. (In Chapter 11 you’ll add AskTool here so the agent can ask you clarifying questions.)

Step 3: The system prompt and history vector

#![allow(unused)]
fn main() {
let cwd = std::env::current_dir()?.display().to_string();
let mut history: Vec<Message> = vec![Message::System(format!(
    "You are a coding agent. Help the user with software engineering tasks \
     using all available tools. Be concise and precise.\n\n\
     Working directory: {cwd}"
))];
}

The system prompt is the first message in the history. It tells the LLM what role it should play. Two things to note:

  1. No tool names in the prompt. Tool definitions are sent separately to the API. The system prompt focuses on behavior – be a coding agent, use whatever tools are available, be concise.

  2. Working directory is included. The LLM needs to know where it is so that tool calls like read and bash use correct paths. This is what real coding agents do – Claude Code, OpenCode, and Kimi CLI all inject the current directory (and sometimes platform, date, etc.) into their system prompts.

The history vector lives outside the loop and accumulates every user prompt, assistant response, and tool result across the entire session. The system prompt stays at the front, giving the LLM consistent instructions on every turn.

Step 4: The REPL loop

#![allow(unused)]
fn main() {
let stdin = io::stdin();

loop {
    print!("> ");
    io::stdout().flush()?;

    let mut line = String::new();
    if stdin.lock().read_line(&mut line)? == 0 {
        println!();
        break;
    }

    let prompt = line.trim();
    if prompt.is_empty() {
        continue;
    }

    history.push(Message::User(prompt.to_string()));
    print!("    thinking...");
    io::stdout().flush()?;
    match agent.chat(&mut history).await {
        Ok(text) => {
            print!("\x1b[2K\r");
            println!("{}\n", text.trim());
        }
        Err(e) => {
            print!("\x1b[2K\r");
            println!("error: {e}\n");
        }
    }
}
}

A few things to note:

  • history.push(Message::User(…)) adds the prompt before calling the agent. chat() will append the rest.
  • print!(" thinking...") shows a status while the agent works. The flush() is needed because print! (no newline) does not flush automatically.
  • \x1b[2K\r is an ANSI escape sequence: “erase entire line, move cursor to column 1.” This clears the thinking... text before printing the response. It also gets cleared automatically when the agent prints a tool summary (since tool_summary() uses the same escape).
  • stdout.flush()? after print! ensures the prompt and thinking indicator appear immediately.
  • read_line returns 0 on EOF (Ctrl+D), which breaks the loop.
  • Errors from the agent are printed instead of crashing – this keeps the loop alive even if one request fails.

The main function

Wrap everything in an async main:

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Steps 1-4 go here
    Ok(())
}

The complete program

Putting it all together, the entire program is about 45 lines. That is the beauty of the framework you built – the final assembly is straightforward because each component has a clean interface.

Running the full test suite

Run the full test suite:

cargo test -p mini-claw-code-starter

This runs all tests from chapters 1 through 7. If everything passes, congratulations – your agent framework is complete and fully tested.

What the tests verify

The Chapter 7 tests are integration tests that combine all components:

  • Write-then-read flows: Write a file, read it back, verify contents.
  • Edit flows: Write a file, edit it, read back the result.
  • Multi-tool pipelines: Use bash, write, edit, and read across multiple turns.
  • Long conversations: Five-step tool-call sequences.

There are about 10 integration tests that exercise the full agent pipeline.

Running the chat example

To try it with a real LLM, you need an API key. Create a .env file in the workspace root:

OPENROUTER_API_KEY=sk-or-v1-your-key-here

Then run:

cargo run -p mini-claw-code-starter --example chat

You will get an interactive prompt. Try a multi-turn conversation:

> List the files in the current directory
    thinking...
    [bash: ls]
Cargo.toml  src/  examples/  ...

> What is in Cargo.toml?
    thinking...
    [read: Cargo.toml]
The Cargo.toml contains the package definition for mini-claw-code-starter...

> Add a new dependency for serde
    thinking...
    [read: Cargo.toml]
    [edit: Cargo.toml]
Done! I added serde to the dependencies.

>

Notice how the second prompt (“What is in Cargo.toml?”) works without repeating context – the LLM already knows the directory listing from the first exchange. That is conversation history at work.

Press Ctrl+D (or Ctrl+C) to exit.

What you have built

Let’s step back and look at the complete picture:

examples/chat.rs
    |
    | creates
    v
SimpleAgent<OpenRouterProvider>
    |
    | holds
    +---> OpenRouterProvider (HTTP to LLM API)
    +---> ToolSet (HashMap<String, Box<dyn Tool>>)
              |
              +---> BashTool
              +---> ReadTool
              +---> WriteTool
              +---> EditTool

The chat() method drives the interaction:

User prompt
    |
    v
history: [User, Assistant, ToolResult, ..., User]
    |
    v
Provider.chat() ---HTTP---> LLM API
    |
    | AssistantTurn
    v
Tool calls? ----yes---> Execute tools ---> append to history ---> loop
    |
    no
    |
    v
Append final Assistant to history, return text

In about 300 lines of Rust across all files, you have:

  • A trait-based tool system with JSON schema definitions.
  • A generic agent loop that works with any provider.
  • A mock provider for deterministic testing.
  • An HTTP provider for real LLM APIs.
  • A CLI with conversation memory that ties it all together.

Where to go from here

This framework is intentionally minimal. Here are ideas for extending it:

Streaming responses – Instead of waiting for the full response, stream tokens as they arrive. This means changing chat() to return a Stream instead of a single AssistantTurn.

Token limits – Track token usage and truncate old messages when the context window fills up.

More tools – Add a web search tool, a database query tool, or anything else you can imagine. The Tool trait makes it easy to plug in new capabilities.

A richer UI – Add a spinner animation, markdown rendering, or collapsed tool call display. See mini-claw-code/examples/tui.rs for an example that does all three using termimad.

The foundation you built is solid. Every extension is a matter of adding to the existing patterns, not rewriting them. The Provider trait, the Tool trait, and the agent loop are the building blocks for anything you want to build next.

What’s next

Head to Chapter 8: The Singularity – your agent can now modify its own source code, and we will talk about what that means and where to go from here.

Chapter 8: The Singularity

Your agent can edit itself and it starts self-evolving. You don’t need to write any code starting from now.

Extensions

The extension chapters that follow walk through the reference implementation. You don’t need to write the code yourself – read them to understand the design, then let your agent implement them (or do it yourself for practice):

Beyond the extension chapters, here are more ideas to explore:

  • Parallel tool calls – Execute concurrent tool calls with tokio::join!.
  • Token tracking – Truncate old messages when approaching the context limit.
  • More tools – Web search, database queries, HTTP requests. The Tool trait makes it easy.
  • MCP – Expose your tools as an MCP server or connect to external ones.

Chapter 9: A Better TUI

The chat.rs CLI works, but it dumps plain text and shows every tool call. A real coding agent deserves markdown rendering, a thinking spinner, and collapsed tool calls when the agent gets busy.

See mini-claw-code/examples/tui.rs for a reference implementation. It uses:

  • termimad for inline markdown rendering in the terminal.
  • crossterm for raw terminal mode (used by the arrow-key selection UI in Chapter 11).
  • An animated spinner (⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏) that ticks while the agent thinks.
  • Collapsed tool calls: after 3 tool calls, subsequent ones are collapsed into a ... and N more counter to keep the output clean.

The TUI builds on the AgentEvent stream from StreamingAgent (Chapter 10). The event loop uses tokio::select! to multiplex three sources:

  1. Agent events (AgentEvent::TextDelta, ToolCall, Done, Error) – render streaming text, tool summaries, or final output.
  2. User input requests from AskTool (Chapter 11) – pause the spinner and show a text prompt or arrow-key selection list.
  3. Timer ticks – advance the spinner animation.

This chapter is exposition only – no code to write. Read through examples/tui.rs to see how the pieces fit together, or ask your mini-claw-code agent to build a TUI for you.

Chapter 10: Streaming

In Chapter 6 you built OpenRouterProvider::chat(), which waits for the entire response before returning. That works, but the user stares at a blank screen until every token has been generated. Real coding agents print tokens as they arrive – that is streaming.

This chapter adds streaming support and a StreamingAgent – the streaming counterpart to SimpleAgent. You will:

  1. Define a StreamEvent enum that represents real-time deltas.
  2. Build a StreamAccumulator that collects deltas into a complete AssistantTurn.
  3. Write a parse_sse_line() function that converts raw Server-Sent Events into StreamEvents.
  4. Define a StreamProvider trait – the streaming counterpart to Provider.
  5. Implement StreamProvider for OpenRouterProvider.
  6. Build a MockStreamProvider for testing without HTTP.
  7. Build StreamingAgent<P: StreamProvider> – a full agent loop with real-time text streaming.

None of this touches the Provider trait or SimpleAgent. Streaming is layered on top of the existing architecture.

Why streaming?

Without streaming, a long response (say 500 tokens) makes the CLI feel frozen. Streaming fixes three things:

  • Immediate feedback – the user sees the first word within milliseconds instead of waiting seconds for the full response.
  • Early cancellation – if the agent is heading in the wrong direction, the user can Ctrl-C without waiting for the full response.
  • Progress visibility – watching tokens arrive confirms the agent is working, not stuck.

How SSE works

The OpenAI-compatible API supports streaming via Server-Sent Events (SSE). You set "stream": true in the request, and instead of one big JSON response, the server sends a series of text lines:

data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"choices":[{"delta":{"content":" world"},"finish_reason":null}]}

data: {"choices":[{"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Each line starts with data: followed by a JSON object (or the sentinel [DONE]). The key difference from the non-streaming response: instead of a message field with the complete text, each chunk has a delta field with just the new part. Your code reads these deltas one by one, prints them immediately, and accumulates them into the final result.

Here is the flow:

sequenceDiagram
    participant A as Agent
    participant L as LLM (SSE)
    participant U as User

    A->>L: POST /chat/completions (stream: true)
    L-->>A: data: {"delta":{"content":"Hello"}}
    A->>U: print "Hello"
    L-->>A: data: {"delta":{"content":" world"}}
    A->>U: print " world"
    L-->>A: data: [DONE]
    A->>U: (done)

Tool calls stream the same way, but with tool_calls deltas instead of content deltas. The tool call’s name and arguments arrive in pieces that you concatenate.

StreamEvent

Open mini-claw-code/src/streaming.rs. The StreamEvent enum is our domain type for streaming deltas:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq)]
pub enum StreamEvent {
    /// A chunk of assistant text.
    TextDelta(String),
    /// A new tool call has started.
    ToolCallStart { index: usize, id: String, name: String },
    /// More argument JSON for a tool call in progress.
    ToolCallDelta { index: usize, arguments: String },
    /// The stream is complete.
    Done,
}
}

This is the interface between the SSE parser and the rest of the application. The parser produces StreamEvents; the UI consumes them for display; the accumulator collects them into an AssistantTurn.

StreamAccumulator

The accumulator is a simple state machine. It keeps a running text buffer and a list of partial tool calls. Each feed() call appends to the appropriate place:

#![allow(unused)]
fn main() {
pub struct StreamAccumulator {
    text: String,
    tool_calls: Vec<PartialToolCall>,
}

impl StreamAccumulator {
    pub fn new() -> Self { /* ... */ }
    pub fn feed(&mut self, event: &StreamEvent) { /* ... */ }
    pub fn finish(self) -> AssistantTurn { /* ... */ }
}
}

The implementation is straightforward:

  • TextDelta → append to self.text.
  • ToolCallStart → grow the tool_calls vec if needed, set the id and name at the given index.
  • ToolCallDelta → append to the arguments string at the given index.
  • Done → no-op (we handle completion in finish()).

finish() consumes the accumulator and builds an AssistantTurn:

#![allow(unused)]
fn main() {
pub fn finish(self) -> AssistantTurn {
    let text = if self.text.is_empty() { None } else { Some(self.text) };

    let tool_calls: Vec<ToolCall> = self.tool_calls
        .into_iter()
        .filter(|tc| !tc.name.is_empty())
        .map(|tc| ToolCall {
            id: tc.id,
            name: tc.name,
            arguments: serde_json::from_str(&tc.arguments)
                .unwrap_or(Value::Null),
        })
        .collect();

    let stop_reason = if tool_calls.is_empty() {
        StopReason::Stop
    } else {
        StopReason::ToolUse
    };

    AssistantTurn { text, tool_calls, stop_reason }
}
}

Notice that arguments is accumulated as a raw string and only parsed as JSON at the very end. This is because the API sends argument fragments like {"pa and th": "f.txt"} – they are not valid JSON until concatenated.

Parsing SSE lines

The parse_sse_line() function takes a single line from the SSE stream and returns zero or more StreamEvents:

#![allow(unused)]
fn main() {
pub fn parse_sse_line(line: &str) -> Option<Vec<StreamEvent>> {
    let data = line.strip_prefix("data: ")?;

    if data == "[DONE]" {
        return Some(vec![StreamEvent::Done]);
    }

    let chunk: ChunkResponse = serde_json::from_str(data).ok()?;
    // ... extract events from chunk.choices[0].delta
}
}

The SSE chunk types mirror the OpenAI delta format:

#![allow(unused)]
fn main() {
#[derive(Deserialize)]
struct ChunkResponse { choices: Vec<ChunkChoice> }

#[derive(Deserialize)]
struct ChunkChoice { delta: Delta, finish_reason: Option<String> }

#[derive(Deserialize)]
struct Delta {
    content: Option<String>,
    tool_calls: Option<Vec<DeltaToolCall>>,
}
}

For tool calls, the first chunk includes id and function.name (indicating a new tool call). Subsequent chunks only have function.arguments fragments. The parser emits ToolCallStart when id is present, and ToolCallDelta for non-empty argument strings.

StreamProvider trait

Just as Provider defines the non-streaming interface, StreamProvider defines the streaming one:

#![allow(unused)]
fn main() {
pub trait StreamProvider: Send + Sync {
    fn stream_chat<'a>(
        &'a self,
        messages: &'a [Message],
        tools: &'a [&'a ToolDefinition],
        tx: mpsc::UnboundedSender<StreamEvent>,
    ) -> impl Future<Output = anyhow::Result<AssistantTurn>> + Send + 'a;
}
}

The key difference from Provider::chat() is the tx parameter – an mpsc channel sender. The implementation sends StreamEvents through this channel as they arrive and returns the final accumulated AssistantTurn. This gives callers both real-time events and the complete result.

We keep StreamProvider separate from Provider rather than adding a method to the existing trait. This means SimpleAgent and all existing code are completely unaffected.

Implementing StreamProvider for OpenRouterProvider

The implementation ties together SSE parsing, the accumulator, and the channel:

#![allow(unused)]
fn main() {
impl StreamProvider for OpenRouterProvider {
    async fn stream_chat(
        &self,
        messages: &[Message],
        tools: &[&ToolDefinition],
        tx: mpsc::UnboundedSender<StreamEvent>,
    ) -> anyhow::Result<AssistantTurn> {
        // 1. Build request with stream: true
        // 2. Send HTTP request
        // 3. Read response chunks in a loop:
        //    - Buffer incoming bytes
        //    - Split on newlines
        //    - parse_sse_line() each complete line
        //    - feed() each event into the accumulator
        //    - send each event through tx
        // 4. Return acc.finish()
    }
}
}

The buffering detail is important. HTTP responses may arrive in arbitrary byte chunks that do not align with SSE line boundaries. So we maintain a String buffer, append each chunk, and process only complete lines (splitting on \n):

#![allow(unused)]
fn main() {
let mut buffer = String::new();

while let Some(chunk) = resp.chunk().await? {
    buffer.push_str(&String::from_utf8_lossy(&chunk));

    while let Some(newline_pos) = buffer.find('\n') {
        let line = buffer[..newline_pos].trim_end_matches('\r').to_string();
        buffer = buffer[newline_pos + 1..].to_string();

        if line.is_empty() { continue; }

        if let Some(events) = parse_sse_line(&line) {
            for event in events {
                acc.feed(&event);
                let _ = tx.send(event);
            }
        }
    }
}
}

MockStreamProvider

For testing, we need a streaming provider that does not make HTTP calls. MockStreamProvider wraps the existing MockProvider and synthesizes StreamEvents from each canned AssistantTurn:

#![allow(unused)]
fn main() {
pub struct MockStreamProvider {
    inner: MockProvider,
}

impl StreamProvider for MockStreamProvider {
    async fn stream_chat(
        &self,
        messages: &[Message],
        tools: &[&ToolDefinition],
        tx: mpsc::UnboundedSender<StreamEvent>,
    ) -> anyhow::Result<AssistantTurn> {
        let turn = self.inner.chat(messages, tools).await?;

        // Synthesize stream events from the complete turn
        if let Some(ref text) = turn.text {
            for ch in text.chars() {
                let _ = tx.send(StreamEvent::TextDelta(ch.to_string()));
            }
        }
        for (i, call) in turn.tool_calls.iter().enumerate() {
            let _ = tx.send(StreamEvent::ToolCallStart {
                index: i, id: call.id.clone(), name: call.name.clone(),
            });
            let _ = tx.send(StreamEvent::ToolCallDelta {
                index: i, arguments: call.arguments.to_string(),
            });
        }
        let _ = tx.send(StreamEvent::Done);

        Ok(turn)
    }
}
}

It sends text one character at a time (simulating token-by-token streaming) and each tool call as a start + delta pair. This lets us test StreamingAgent without any network calls.

StreamingAgent

Now for the main event. StreamingAgent is the streaming counterpart to SimpleAgent. It has the same structure – a provider, a tool set, and an agent loop – but it uses StreamProvider and emits AgentEvent::TextDelta events in real time:

#![allow(unused)]
fn main() {
pub struct StreamingAgent<P: StreamProvider> {
    provider: P,
    tools: ToolSet,
}

impl<P: StreamProvider> StreamingAgent<P> {
    pub fn new(provider: P) -> Self { /* ... */ }
    pub fn tool(mut self, t: impl Tool + 'static) -> Self { /* ... */ }

    pub async fn run(
        &self,
        prompt: &str,
        events: mpsc::UnboundedSender<AgentEvent>,
    ) -> anyhow::Result<String> { /* ... */ }

    pub async fn chat(
        &self,
        messages: &mut Vec<Message>,
        events: mpsc::UnboundedSender<AgentEvent>,
    ) -> anyhow::Result<String> { /* ... */ }
}
}

The chat() method is the heart of the streaming agent. Let us walk through it:

#![allow(unused)]
fn main() {
pub async fn chat(
    &self,
    messages: &mut Vec<Message>,
    events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
    let defs = self.tools.definitions();

    loop {
        // 1. Set up a stream channel
        let (stream_tx, mut stream_rx) = mpsc::unbounded_channel();

        // 2. Spawn a forwarder that converts StreamEvent::TextDelta
        //    into AgentEvent::TextDelta for the UI
        let events_clone = events.clone();
        let forwarder = tokio::spawn(async move {
            while let Some(event) = stream_rx.recv().await {
                if let StreamEvent::TextDelta(text) = event {
                    let _ = events_clone.send(AgentEvent::TextDelta(text));
                }
            }
        });

        // 3. Call stream_chat — this streams AND returns the turn
        let turn = self.provider.stream_chat(messages, &defs, stream_tx).await?;
        let _ = forwarder.await;

        // 4. Same stop_reason logic as SimpleAgent
        match turn.stop_reason {
            StopReason::Stop => {
                let text = turn.text.clone().unwrap_or_default();
                let _ = events.send(AgentEvent::Done(text.clone()));
                messages.push(Message::Assistant(turn));
                return Ok(text);
            }
            StopReason::ToolUse => {
                // Execute tools, push results, continue loop
                // (same pattern as SimpleAgent)
            }
        }
    }
}
}

The architecture has two channels flowing simultaneously:

flowchart LR
    SC["stream_chat()"] -- "StreamEvent" --> CH["mpsc channel"]
    CH --> FW["forwarder task"]
    FW -- "AgentEvent::TextDelta" --> UI["UI / events channel"]
    SC -- "feeds" --> ACC["StreamAccumulator"]
    ACC -- "finish()" --> TURN["AssistantTurn"]
    TURN --> LOOP["Agent loop"]

The forwarder task is a bridge: it receives raw StreamEvents from the provider and converts TextDelta events into AgentEvent::TextDelta for the UI. This keeps the provider’s streaming protocol separate from the agent’s event protocol.

Notice that AgentEvent now has a TextDelta variant:

#![allow(unused)]
fn main() {
pub enum AgentEvent {
    TextDelta(String),  // NEW — streaming text chunks
    ToolCall { name: String, summary: String },
    Done(String),
    Error(String),
}
}

Using StreamingAgent in the TUI

The TUI example (examples/tui.rs) uses StreamingAgent for the full experience:

#![allow(unused)]
fn main() {
let provider = OpenRouterProvider::from_env()?;
let agent = Arc::new(
    StreamingAgent::new(provider)
        .tool(BashTool::new())
        .tool(ReadTool::new())
        .tool(WriteTool::new())
        .tool(EditTool::new()),
);
}

The agent is wrapped in Arc so it can be shared with spawned tasks. Each turn spawns the agent and processes events with a spinner:

#![allow(unused)]
fn main() {
let (tx, mut rx) = mpsc::unbounded_channel();
let agent = agent.clone();
let mut msgs = std::mem::take(&mut history);
let handle = tokio::spawn(async move {
    let _ = agent.chat(&mut msgs, tx).await;
    msgs
});

// UI event loop — print TextDeltas, show spinner for tool calls
loop {
    tokio::select! {
        event = rx.recv() => {
            match event {
                Some(AgentEvent::TextDelta(text)) => print!("{text}"),
                Some(AgentEvent::ToolCall { summary, .. }) => { /* spinner */ },
                Some(AgentEvent::Done(_)) => break,
                // ...
            }
        }
        _ = tick.tick() => { /* animate spinner */ }
    }
}
}

Compare this to the SimpleAgent version from Chapter 9: the structure is almost identical. The only difference is that TextDelta events let us print tokens as they arrive instead of waiting for the full Done event.

Running the tests

cargo test -p mini-claw-code ch10

The tests verify:

  • Accumulator: text assembly, tool call assembly, mixed events, empty input, multiple parallel tool calls.
  • SSE parsing: text deltas, tool call start/delta, [DONE], non-data lines, empty deltas, invalid JSON, full multi-line sequences.
  • MockStreamProvider: text responses synthesize char-by-char events; tool call responses synthesize start + delta events.
  • StreamingAgent: text-only responses, tool call loops, and multi-turn chat history – all using MockStreamProvider for deterministic testing.
  • Integration: mock TCP servers that send real SSE responses to stream_chat() and verify both the returned AssistantTurn and the events sent through the channel.

Recap

  • StreamEvent represents real-time deltas: text chunks, tool call starts, argument fragments, and completion.
  • StreamAccumulator collects deltas into a complete AssistantTurn.
  • parse_sse_line() converts raw SSE data: lines into StreamEvents.
  • StreamProvider is the streaming counterpart to Provider – it adds an mpsc channel parameter for real-time events.
  • MockStreamProvider wraps MockProvider to synthesize streaming events for testing.
  • StreamingAgent is the streaming counterpart to SimpleAgent – same tool loop, but with real-time TextDelta events forwarded to the UI.
  • The Provider trait and SimpleAgent are unchanged. Streaming is an additive feature layered on top.

Chapter 11: User Input

Your agent can read files, run commands, and write code – but it can’t ask you a question. If it’s unsure which approach to take, which file to target, or whether to proceed with a destructive operation, it just guesses.

Real coding agents solve this with an ask tool. Claude Code has AskUserQuestion, Kimi CLI has approval prompts. The LLM calls a special tool, the agent pauses, and the user types an answer. The answer goes back as a tool result and execution continues.

In this chapter you’ll build:

  1. An InputHandler trait that abstracts how user input is collected.
  2. An AskTool that the LLM calls to ask the user a question.
  3. Three handler implementations: CLI, channel-based (for TUI), and mock (for tests).

Why a trait?

Different UIs collect input differently:

  • A CLI app prints to stdout and reads from stdin.
  • A TUI app sends a request through a channel and waits for the event loop to collect the answer (maybe with arrow-key selection).
  • Tests need to provide canned answers without any I/O.

The InputHandler trait lets AskTool work with all three without knowing which one it’s using:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait InputHandler: Send + Sync {
    async fn ask(&self, question: &str, options: &[String]) -> anyhow::Result<String>;
}
}

The question is what the LLM wants to ask. The options slice is an optional list of choices – if empty, the user types free-text. If non-empty, the UI can present a selection list.

AskTool

AskTool implements the Tool trait. It takes an Arc<dyn InputHandler> so the handler can be shared across threads:

#![allow(unused)]
fn main() {
pub struct AskTool {
    definition: ToolDefinition,
    handler: Arc<dyn InputHandler>,
}
}

Tool definition

The LLM needs to know what parameters the tool accepts. question is required (a string). options is optional (an array of strings).

For options, we need a JSON schema for an array type – something param() can’t express since it only handles scalar types. So first, add param_raw() to ToolDefinition:

#![allow(unused)]
fn main() {
/// Add a parameter with a raw JSON schema value.
///
/// Use this for complex types (arrays, nested objects) that `param()` can't express.
pub fn param_raw(mut self, name: &str, schema: Value, required: bool) -> Self {
    self.parameters["properties"][name] = schema;
    if required {
        self.parameters["required"]
            .as_array_mut()
            .unwrap()
            .push(serde_json::Value::String(name.to_string()));
    }
    self
}
}

Now the tool definition uses both param() and param_raw():

#![allow(unused)]
fn main() {
impl AskTool {
    pub fn new(handler: Arc<dyn InputHandler>) -> Self {
        Self {
            definition: ToolDefinition::new(
                "ask_user",
                "Ask the user a clarifying question...",
            )
            .param("question", "string", "The question to ask the user", true)
            .param_raw(
                "options",
                json!({
                    "type": "array",
                    "items": { "type": "string" },
                    "description": "Optional list of choices to present to the user"
                }),
                false,
            ),
            handler,
        }
    }
}
}

Tool::call

The call implementation extracts question, parses options with a helper, and delegates to the handler:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl Tool for AskTool {
    fn definition(&self) -> &ToolDefinition {
        &self.definition
    }

    async fn call(&self, args: Value) -> anyhow::Result<String> {
        let question = args
            .get("question")
            .and_then(|v| v.as_str())
            .ok_or_else(|| anyhow::anyhow!("missing required parameter: question"))?;

        let options = parse_options(&args);

        self.handler.ask(question, &options).await
    }
}

/// Extract the optional `options` array from tool arguments.
fn parse_options(args: &Value) -> Vec<String> {
    args.get("options")
        .and_then(|v| v.as_array())
        .map(|arr| {
            arr.iter()
                .filter_map(|v| v.as_str().map(String::from))
                .collect()
        })
        .unwrap_or_default()
}
}

The parse_options helper keeps call() focused on the happy path. If options is missing or not an array, it defaults to an empty vec – the handler treats this as free-text input.

Three handlers

CliInputHandler

The simplest handler. Prints the question, lists numbered choices (if any), reads a line from stdin, and resolves numbered answers:

#![allow(unused)]
fn main() {
pub struct CliInputHandler;

#[async_trait::async_trait]
impl InputHandler for CliInputHandler {
    async fn ask(&self, question: &str, options: &[String]) -> anyhow::Result<String> {
        let question = question.to_string();
        let options = options.to_vec();

        // spawn_blocking because stdin is synchronous
        tokio::task::spawn_blocking(move || {
            // Display the question and numbered choices (if any)
            println!("\n  {question}");
            for (i, opt) in options.iter().enumerate() {
                println!("    {}) {opt}", i + 1);
            }

            // Read the answer
            print!("  > ");
            io::stdout().flush()?;
            let mut line = String::new();
            io::stdin().lock().read_line(&mut line)?;
            let answer = line.trim().to_string();

            // If the user typed a valid option number, resolve it
            Ok(resolve_option(&answer, &options))
        }).await?
    }
}

/// If `answer` is a number matching one of the options, return that option.
/// Otherwise return the raw answer.
fn resolve_option(answer: &str, options: &[String]) -> String {
    if let Ok(n) = answer.parse::<usize>()
        && n >= 1
        && n <= options.len()
    {
        return options[n - 1].clone();
    }
    answer.to_string()
}
}

The resolve_option helper keeps the closure body clean. It uses let-chain syntax (stabilized in Rust 1.87 / edition 2024): multiple conditions joined with && including let Ok(n) = ... pattern bindings. If the user types "2" and there are three options, it resolves to options[1]. Otherwise the raw text is returned.

Note the for loop over options does nothing when the slice is empty – no special if branch needed.

Use this in simple CLI apps like examples/chat.rs:

#![allow(unused)]
fn main() {
let agent = SimpleAgent::new(provider)
    .tool(BashTool::new())
    .tool(ReadTool::new())
    .tool(WriteTool::new())
    .tool(EditTool::new())
    .tool(AskTool::new(Arc::new(CliInputHandler)));
}

ChannelInputHandler

For TUI apps, input collection happens in the event loop, not in the tool. The ChannelInputHandler bridges the gap with a channel:

#![allow(unused)]
fn main() {
pub struct UserInputRequest {
    pub question: String,
    pub options: Vec<String>,
    pub response_tx: oneshot::Sender<String>,
}

pub struct ChannelInputHandler {
    tx: mpsc::UnboundedSender<UserInputRequest>,
}
}

When ask() is called, it sends a UserInputRequest through the channel and awaits the oneshot response:

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
impl InputHandler for ChannelInputHandler {
    async fn ask(&self, question: &str, options: &[String]) -> anyhow::Result<String> {
        let (response_tx, response_rx) = oneshot::channel();
        self.tx.send(UserInputRequest {
            question: question.to_string(),
            options: options.to_vec(),
            response_tx,
        })?;
        Ok(response_rx.await?)
    }
}
}

The TUI event loop receives the request and renders it however it likes – a simple text prompt, or an arrow-key-navigable selection list using crossterm in raw terminal mode.

MockInputHandler

For tests, pre-configure answers in a queue:

#![allow(unused)]
fn main() {
pub struct MockInputHandler {
    answers: Mutex<VecDeque<String>>,
}

#[async_trait::async_trait]
impl InputHandler for MockInputHandler {
    async fn ask(&self, _question: &str, _options: &[String]) -> anyhow::Result<String> {
        self.answers.lock().await.pop_front()
            .ok_or_else(|| anyhow::anyhow!("MockInputHandler: no more answers"))
    }
}
}

This follows the same pattern as MockProvider – pop from the front, error when empty. Note that this uses tokio::sync::Mutex (with .lock().await), not std::sync::Mutex. The reason: ask() is an async fn, and the lock guard must be held across the .await boundary. A std::sync::Mutex guard is !Send, so holding it across .await won’t compile. tokio::sync::Mutex produces a Send-safe guard that works in async contexts. Compare this with MockProvider from Chapter 1, which uses std::sync::Mutex because its chat() method doesn’t hold the guard across an .await.

Tool summary

Update tool_summary() in agent.rs to display "question" for ask_user calls in the terminal output:

#![allow(unused)]
fn main() {
let detail = call.arguments
    .get("command")
    .or_else(|| call.arguments.get("path"))
    .or_else(|| call.arguments.get("question"))  // <-- new
    .and_then(|v| v.as_str());
}

Plan mode integration

ask_user is read-only – it collects information without mutating anything. Add it to PlanAgent’s default read_only set (see Chapter 12) so the LLM can ask questions during planning:

#![allow(unused)]
fn main() {
read_only: HashSet::from(["bash", "read", "ask_user"]),
}

Wiring it up

Add the module to mini-claw-code/src/tools/mod.rs:

#![allow(unused)]
fn main() {
mod ask;
pub use ask::*;
}

And re-export from lib.rs:

#![allow(unused)]
fn main() {
pub use tools::{
    AskTool, BashTool, ChannelInputHandler, CliInputHandler,
    EditTool, InputHandler, MockInputHandler, ReadTool,
    UserInputRequest, WriteTool,
};
}

Running the tests

cargo test -p mini-claw-code ch11

The tests verify:

  • Tool definition: schema has question (required) and options (optional array).
  • Question only: MockInputHandler returns answer for a question-only call.
  • With options: tool passes options to the handler correctly.
  • Missing question: missing question argument returns an error.
  • Handler exhausted: empty MockInputHandler returns an error.
  • Agent loop: LLM calls ask_user, gets an answer, then returns final text.
  • Ask then tool: ask_user followed by another tool call (e.g. read).
  • Multiple asks: two sequential ask_user calls with different answers.
  • Channel roundtrip: ChannelInputHandler sends request and receives response via oneshot channel.
  • param_raw: param_raw() adds array parameter to ToolDefinition correctly.

Recap

  • InputHandler trait abstracts input collection across CLI, TUI, and tests.
  • AskTool lets the LLM pause execution and ask the user a question.
  • param_raw() extends ToolDefinition to support complex JSON schema types like arrays.
  • Three handlers: CliInputHandler for simple apps, ChannelInputHandler for TUI apps, MockInputHandler for tests.
  • Plan mode: ask_user is read-only by default, so it works during planning.
  • Purely additive: no changes to SimpleAgent, StreamingAgent, or any existing tool.

Chapter 12: Plan Mode

Real coding agents can be dangerous. Give an LLM access to write, edit, and bash and it might rewrite your config, delete a file, or run a destructive command – all before you’ve had a chance to review what it’s doing.

Plan mode solves this with a two-phase workflow:

  1. Plan – the agent explores the codebase using read-only tools (read, bash, and ask_user). It cannot write, edit, or mutate anything. It returns a plan describing what it intends to do.
  2. Execute – after the user reviews and approves the plan, the agent runs again with all tools available.

This is exactly how Claude Code’s plan mode works. In this chapter you’ll build PlanAgent – a streaming agent with caller-driven approval gating.

You will:

  1. Build PlanAgent<P: StreamProvider> with plan() and execute() methods.
  2. Inject a system prompt that tells the LLM it’s in planning mode.
  3. Add an exit_plan tool the LLM calls when its plan is ready.
  4. Implement double defense: definition filtering and an execution guard.
  5. Let the caller drive the approval flow between phases.

Why plan mode?

Consider this scenario:

User: "Refactor auth.rs to use JWT instead of session cookies"

Agent (no plan mode):
  → calls write("auth.rs", ...) immediately
  → rewrites half your auth system
  → you didn't want that approach at all

With plan mode:

User: "Refactor auth.rs to use JWT instead of session cookies"

Agent (plan phase):
  → calls read("auth.rs") to understand current code
  → calls bash("grep -r 'session' src/") to find related files
  → calls exit_plan to submit its plan
  → "Plan: Replace SessionStore with JwtProvider in 3 files..."

User: "Looks good, go ahead."

Agent (execute phase):
  → calls write/edit with the approved changes

The key insight: the same agent loop works for both phases. The only difference is which tools are available.

Design

PlanAgent has the same shape as StreamingAgent – a provider, a ToolSet, and an agent loop. Three additions make it a planning agent:

  1. A HashSet<&'static str> recording which tools are allowed during planning.
  2. A system prompt injected at the start of the planning phase.
  3. An exit_plan tool definition the LLM calls when its plan is ready.
#![allow(unused)]
fn main() {
pub struct PlanAgent<P: StreamProvider> {
    provider: P,
    tools: ToolSet,
    read_only: HashSet<&'static str>,
    plan_system_prompt: String,
    exit_plan_def: ToolDefinition,
}
}

Two public methods drive the two phases:

  • plan() – injects the system prompt, runs the agent loop with only read-only tools and exit_plan visible.
  • execute() – runs the agent loop with all tools visible.

Both delegate to a private run_loop() that takes an optional tool filter.

The builder

Construction follows the same builder pattern as SimpleAgent and StreamingAgent:

#![allow(unused)]
fn main() {
impl<P: StreamProvider> PlanAgent<P> {
    pub fn new(provider: P) -> Self {
        Self {
            provider,
            tools: ToolSet::new(),
            read_only: HashSet::from(["bash", "read", "ask_user"]),
            plan_system_prompt: DEFAULT_PLAN_PROMPT.to_string(),
            exit_plan_def: ToolDefinition::new(
                "exit_plan",
                "Signal that your plan is complete and ready for user review. \
                 Call this when you have finished exploring and are ready to \
                 present your plan.",
            ),
        }
    }

    pub fn tool(mut self, t: impl Tool + 'static) -> Self {
        self.tools.push(t);
        self
    }

    pub fn read_only(mut self, names: &[&'static str]) -> Self {
        self.read_only = names.iter().copied().collect();
        self
    }

    pub fn plan_prompt(mut self, prompt: impl Into<String>) -> Self {
        self.plan_system_prompt = prompt.into();
        self
    }
}
}

By default, bash, read, and ask_user are read-only. (Chapter 11 added ask_user so the LLM can ask clarifying questions during planning.) The .read_only() method lets callers override this – for example, to exclude bash from planning if you want a stricter mode.

The .plan_prompt() method lets callers override the system prompt – useful for specialized agents like security auditors or code reviewers.

System prompt

The LLM needs to know it’s in planning mode. Without this, it will try to accomplish the task with whatever tools it sees, rather than producing a deliberate plan.

plan() injects a system prompt at the start of the conversation:

#![allow(unused)]
fn main() {
const DEFAULT_PLAN_PROMPT: &str = "\
You are in PLANNING MODE. Explore the codebase using the available tools and \
create a plan. You can read files, run shell commands, and ask the user \
questions — but you CANNOT write, edit, or create files.\n\n\
When your plan is ready, call the `exit_plan` tool to submit it for review.";
}

The injection is conditional – if the caller already provided a System message, plan() respects it:

#![allow(unused)]
fn main() {
pub async fn plan(
    &self,
    messages: &mut Vec<Message>,
    events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
    if !messages
        .first()
        .is_some_and(|m| matches!(m, Message::System(_)))
    {
        messages.insert(0, Message::System(self.plan_system_prompt.clone()));
    }
    self.run_loop(messages, Some(&self.read_only), events).await
}
}

This means:

  • First call: no system message → inject the plan prompt.
  • Re-plan call: system message already there → skip.
  • Caller provided their own: caller’s system message → respect it.

This is how real agents work. Claude Code switches its system prompt when entering plan mode. OpenCode uses entirely separate agent configurations with different system prompts for plan vs build agents.

The exit_plan tool

Without exit_plan, the planning phase ends when the LLM returns StopReason::Stop – the same way any conversation ends. This is ambiguous: did the LLM finish planning, or did it just stop talking?

Real agents solve this with an explicit signal. Claude Code has ExitPlanMode. OpenCode has exit_plan. The LLM calls the tool to say “my plan is ready for review.”

In PlanAgent, exit_plan is a tool definition stored on the struct – not registered in the ToolSet. This means:

  • During plan: exit_plan is injected into the tool list alongside read-only tools. The LLM can see and call it.
  • During execute: exit_plan is not in the tool list. The LLM doesn’t know it exists.

When the agent loop sees an exit_plan call, it returns immediately with the plan text (the LLM’s text from that turn):

#![allow(unused)]
fn main() {
// Handle exit_plan: signal plan completion
if allowed.is_some() && call.name == "exit_plan" {
    results.push((call.id.clone(), "Plan submitted for review.".into()));
    exit_plan = true;
    continue;
}
}

After processing all tool calls in the turn, if exit_plan was among them:

#![allow(unused)]
fn main() {
if exit_plan {
    let _ = events.send(AgentEvent::Done(plan_text.clone()));
    return Ok(plan_text);
}
}

The planning phase now has two exit paths:

  1. StopReason::Stop – LLM stops naturally (backward compatible).
  2. exit_plan tool call – LLM explicitly signals plan completion.

Both work. The exit_plan path is better because it’s unambiguous.

Double defense

Tool filtering still uses two layers of protection:

Layer 1: Definition filtering

During plan(), only read-only tool definitions plus exit_plan are sent to the LLM. The model literally cannot see write or edit in its tool list:

#![allow(unused)]
fn main() {
let all_defs = self.tools.definitions();
let defs: Vec<&ToolDefinition> = match allowed {
    Some(names) => {
        let mut filtered: Vec<&ToolDefinition> = all_defs
            .into_iter()
            .filter(|d| names.contains(d.name))
            .collect();
        filtered.push(&self.exit_plan_def);
        filtered
    }
    None => all_defs,
};
}

During execute(), allowed is None, so all registered tools are sent – and exit_plan is not included.

Layer 2: Execution guard

If the LLM somehow hallucinated a blocked tool call, the execution guard catches it and returns an error ToolResult instead of executing the tool:

#![allow(unused)]
fn main() {
if let Some(names) = allowed
    && !names.contains(call.name.as_str())
{
    results.push((
        call.id.clone(),
        format!(
            "error: tool '{}' is not available in planning mode",
            call.name
        ),
    ));
    continue;
}
}

The error goes back to the LLM as a tool result, so it learns the tool is blocked and adjusts its behavior. The file is never touched.

The shared agent loop

Both plan() and execute() delegate to run_loop(). The only parameter that differs is allowed:

#![allow(unused)]
fn main() {
pub async fn plan(
    &self,
    messages: &mut Vec<Message>,
    events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
    // System prompt injection (shown earlier)
    self.run_loop(messages, Some(&self.read_only), events).await
}

pub async fn execute(
    &self,
    messages: &mut Vec<Message>,
    events: mpsc::UnboundedSender<AgentEvent>,
) -> anyhow::Result<String> {
    self.run_loop(messages, None, events).await
}
}

plan() passes Some(&self.read_only) to restrict tools. execute() passes None to allow everything.

The run_loop itself is identical to StreamingAgent::chat() from Chapter 10, with these additions:

  1. Tool definition filtering (read-only + exit_plan during plan; all during execute).
  2. The exit_plan handler that breaks the loop when the LLM signals plan completion.
  3. The execution guard for blocked tools.

Caller-driven approval flow

The approval flow lives entirely in the caller. PlanAgent does not ask for approval – it just runs whichever phase is called. This keeps the agent simple and lets the caller implement any approval UX they want.

Here is the typical flow:

#![allow(unused)]
fn main() {
let agent = PlanAgent::new(provider)
    .tool(ReadTool::new())
    .tool(WriteTool::new())
    .tool(EditTool::new())
    .tool(BashTool::new());

let mut messages = vec![Message::User("Refactor auth.rs".into())];

// Phase 1: Plan (read-only tools + exit_plan)
let (tx, rx) = mpsc::unbounded_channel();
let plan = agent.plan(&mut messages, tx).await?;
println!("Plan: {plan}");

// Show plan to user, get approval
if user_approves() {
    // Phase 2: Execute (all tools)
    messages.push(Message::User("Approved. Execute the plan.".into()));
    let (tx2, rx2) = mpsc::unbounded_channel();
    let result = agent.execute(&mut messages, tx2).await?;
    println!("Result: {result}");
} else {
    // Re-plan with feedback
    messages.push(Message::User("No, try a different approach.".into()));
    let (tx3, rx3) = mpsc::unbounded_channel();
    let revised_plan = agent.plan(&mut messages, tx3).await?;
    println!("Revised plan: {revised_plan}");
}
}

Notice how the same messages vec is shared across phases. This is critical – the LLM sees its own plan, the user’s approval (or rejection), and all previous context when it enters the execute phase. Re-planning is just pushing feedback as a User message and calling plan() again.

sequenceDiagram
    participant C as Caller
    participant P as PlanAgent
    participant L as LLM

    C->>P: plan(&mut messages)
    P->>L: [read, bash, exit_plan tools only]
    L-->>P: reads files, calls exit_plan
    P-->>C: "Plan: ..."

    C->>C: User reviews plan

    alt Approved
        C->>P: execute(&mut messages)
        P->>L: [all tools]
        L-->>P: writes/edits files
        P-->>C: "Done."
    else Rejected
        C->>P: plan(&mut messages) [with feedback]
        P->>L: [read, bash, exit_plan tools only]
        L-->>P: revised plan
        P-->>C: "Revised plan: ..."
    end

Wiring it up

Add the module to mini-claw-code/src/lib.rs:

#![allow(unused)]
fn main() {
pub mod planning;
// ...
pub use planning::PlanAgent;
}

That’s it. Like streaming, plan mode is a purely additive feature – no existing code is modified.

Running the tests

cargo test -p mini-claw-code ch12

The tests verify:

  • Text response: plan() returns text when the LLM stops immediately.
  • Read tool allowed: read executes during planning.
  • Write tool blocked: write is blocked during planning; the file is NOT created; an error ToolResult is sent back to the LLM.
  • Edit tool blocked: same behavior for edit.
  • Execute allows write: write works during execution; the file IS created.
  • Full plan-then-execute: end-to-end flow – plan reads a file, approval, execute writes a file.
  • Message continuity: messages from the plan phase carry into the execute phase, including the injected system prompt.
  • read_only override: .read_only(&["read"]) excludes bash from planning.
  • Streaming events: TextDelta and Done events are emitted during planning.
  • Provider error: empty mock propagates errors correctly.
  • Builder pattern: chained .tool().read_only().plan_prompt() compiles.
  • System prompt injection: plan() injects a system prompt at position 0.
  • System prompt not duplicated: calling plan() twice doesn’t add a second system message.
  • Caller system prompt respected: if the caller provides a System message, plan() doesn’t overwrite it.
  • exit_plan tool: the LLM calls exit_plan to signal plan completion; plan() returns the plan text.
  • exit_plan not in execute: during execute(), exit_plan is not in the tool list.
  • Custom plan prompt: .plan_prompt(...) overrides the default.
  • Full flow with exit_plan: plan reads file → calls exit_plan → approve → execute writes file.

Recap

  • PlanAgent separates planning (read-only) from execution (all tools) using a single shared agent loop.
  • System prompt: plan() injects a system message telling the LLM it’s in planning mode — what tools are available, what’s blocked, and that it should call exit_plan when done.
  • exit_plan tool: the LLM explicitly signals plan completion, just like Claude Code’s ExitPlanMode. This is injected during planning and invisible during execution.
  • Double defense: definition filtering prevents the LLM from seeing blocked tools; an execution guard catches hallucinated calls.
  • Caller-driven approval: the agent doesn’t manage approval – the caller pushes approval/rejection as User messages and calls the appropriate phase.
  • Message continuity: the same messages vec flows through both phases, giving the LLM full context.
  • Streaming: both phases use StreamProvider and emit AgentEvents, just like StreamingAgent.
  • Purely additive: no changes to SimpleAgent, StreamingAgent, or any existing code.

Chapter 13: Subagents

This chapter is not yet written. It will cover spawning child agents that handle subtasks independently.

Want to contribute? Open an issue or submit a PR!

Chapter 14: MCP: Model Context Protocol

This chapter is not yet written. It will cover integrating MCP servers to give your agent access to external tools and data sources.

Want to contribute? Open an issue or submit a PR!

Chapter 15: Safety Rails

This chapter is not yet written. It will cover adding confirmation prompts, command filtering, timeouts, and other safety measures to your agent.

Want to contribute? Open an issue or submit a PR!