Chapter 4: Query Engine
This is the chapter where everything clicks.
In the previous chapters you built the vocabulary (messages), the mouth (provider), and the hands (tools). Now you build the brain – the loop that ties them all together. The QueryEngine is the heart of a coding agent. It is the thing that takes a user prompt, talks to an LLM, executes tools, feeds results back, and keeps going until the job is done.
Every coding agent – Claude Code, Cursor, Aider, OpenCode – has some version of this loop. The details vary (streaming, permissions, compaction), but the skeleton is identical. Get this right and you have a working agent. Everything else in this book is refinement.
What the QueryEngine does
Here is the entire agent lifecycle in one sentence: prompt the LLM, check if it wants to use tools, execute those tools, send the results back, repeat until the LLM says it is done.
That is it. The QueryEngine implements this loop. It owns three things:
- A provider – the LLM backend (from Chapter 2)
- A tool set – the registered tools (from Chapter 3)
- A config – safety limits and behavior knobs
User prompt
|
v
+-------------------+
| QueryEngine::chat |<---------+
+-------------------+ |
| |
v |
Provider.chat() |
| |
v |
StopReason? |
| | |
Stop ToolUse |
| | |
v v |
return execute_tools() |
text | |
v |
push Assistant |
push ToolResults |
| |
+----------------+
If you have read Claude Code’s source, this maps to QueryEngine.ts and the query function in query.ts. Our version strips away streaming, permissions, hooks, and compaction – those come in later chapters – leaving the pure control flow.
QueryConfig: safety from day one
Before writing any loop logic, we define the guardrails. A coding agent without limits is a runaway process that burns through your API budget and fills your context window with garbage.
#![allow(unused)]
fn main() {
pub struct QueryConfig {
/// Maximum number of agent loop iterations before stopping.
pub max_turns: usize,
/// Maximum tool result size in characters before truncation.
pub max_result_chars: usize,
}
impl Default for QueryConfig {
fn default() -> Self {
Self {
max_turns: 50,
max_result_chars: 100_000,
}
}
}
}
Two fields, both critical:
-
max_turns– The hard ceiling on loop iterations. Without this, a confused model that keeps calling tools will loop forever. Claude Code defaults to 200 in headless mode; we use 50 as a sensible tutorial default. When the limit is reached,chat()returns an error. The agent stops. The user is told what happened. -
max_result_chars– The maximum size of any single tool result before it gets truncated. Acaton a 2MB file should not consume your entire context window. When a result exceeds this limit, we slice it and append a[truncated]marker. The model sees enough to understand what happened without the full payload.
These two values are the difference between a prototype that works on toy examples and a system you can point at a real codebase. Production safety from day one.
The QueryEngine struct
#![allow(unused)]
fn main() {
pub struct QueryEngine<P: Provider> {
provider: P,
tools: ToolSet,
config: QueryConfig,
}
}
Generic over P: Provider, so the same engine works with OpenRouterProvider in production and MockProvider in tests. The builder pattern lets you configure it fluently – new() creates an engine with defaults, then config(), tool(), and tools() chain on configuration. Each takes mut self and returns Self:
#![allow(unused)]
fn main() {
let engine = QueryEngine::new(provider)
.tool(BashTool::new())
.tool(ReadTool::new())
.tool(WriteTool::new())
.config(QueryConfig { max_turns: 20, ..Default::default() });
}
No surprises. The interesting part is the methods that actually run.
execute_tools: the tool dispatch pipeline
Before tackling the main loop, we need a helper that takes a slice of ToolCalls from the LLM and produces results. This is execute_tools, and it handles three concerns: lookup, validation, and truncation.
#![allow(unused)]
fn main() {
async fn execute_tools(&self, calls: &[ToolCall]) -> Vec<(String, ToolResult)> {
let mut results = Vec::with_capacity(calls.len());
for call in calls {
let result = match self.tools.get(&call.name) {
Some(t) => {
// Validate input
match t.validate_input(&call.arguments) {
ValidationResult::Ok => {}
ValidationResult::Error { message, .. } => {
results.push((call.id.clone(), ToolResult::error(message)));
continue;
}
}
// Execute
match t.call(call.arguments.clone()).await {
Ok(mut r) => {
// Truncate large results
if r.content.len() > self.config.max_result_chars {
r.content = format!(
"{}... [truncated, {} chars total]",
&r.content[..self.config.max_result_chars],
r.content.len()
);
r.is_truncated = true;
}
r
}
Err(e) => ToolResult::error(e.to_string()),
}
}
None => ToolResult::error(format!("unknown tool `{}`", call.name)),
};
results.push((call.id.clone(), result));
}
results
}
}
Walk through the three stages:
-
Tool lookup – If the LLM hallucinates a tool name that does not exist, we return
ToolResult::error(...). The model sees"error: unknown tool \foo`“` and can recover. This happens more than you might expect, especially with smaller models. -
Input validation – Before executing anything, we call
validate_input(). The default implementation always returnsOk, but tools can override it to check required fields, validate paths, or reject dangerous arguments. Validation failures skip execution entirely – push the error andcontinueto the next call. -
Execute and truncate – Run the tool. If it succeeds but the output exceeds
max_result_chars, slice it and append a[truncated, N chars total]marker. If it fails, convert the error toToolResult::error(...).
This is a key design decision: tool errors become results, not panics. The agent loop never crashes because a tool failed. The model reads the error, adjusts its approach, and tries again. This is how Claude Code works too – if bash returns a non-zero exit code, the agent sees the stderr output and adapts.
Why sequential, not parallel?
You may notice we execute tools sequentially in a for loop rather than spawning them concurrently with join_all. This is deliberate for correctness: tools can have side effects. If the LLM calls write_file("foo.rs", ...) and then bash("cargo build") in the same turn, those must run in order. Claude Code does support concurrent execution for read-only tools (tools where is_concurrent_safe() returns true), but we keep it simple here. You can add concurrency later using tokio::JoinSet – the is_concurrent_safe() method on the Tool trait is already there waiting for you.
The chat() method: the core loop
This is it. The agentic loop. Read it carefully – it is shorter than you expect.
#![allow(unused)]
fn main() {
pub async fn chat(&self, messages: &mut Vec<Message>) -> anyhow::Result<String> {
let defs = self.tools.definitions();
let mut turns = 0;
loop {
if turns >= self.config.max_turns {
anyhow::bail!("exceeded max turns ({})", self.config.max_turns);
}
let turn = self.provider.chat(messages, &defs).await?;
match turn.stop_reason {
StopReason::Stop => {
let text = turn.text.clone().unwrap_or_default();
messages.push(Message::Assistant(turn));
return Ok(text);
}
StopReason::ToolUse => {
let results = self.execute_tools(&turn.tool_calls).await;
messages.push(Message::Assistant(turn));
for (id, result) in results {
messages.push(Message::tool_result(id, result.content));
}
}
}
turns += 1;
}
}
}
Let’s break it down.
Tool definitions: collected once
#![allow(unused)]
fn main() {
let defs = self.tools.definitions();
}
We gather tool definitions outside the loop. They do not change between iterations – the tool set is fixed for the lifetime of the engine. Every call to provider.chat() includes these definitions so the LLM knows which tools are available.
The max turns guard
#![allow(unused)]
fn main() {
if turns >= self.config.max_turns {
anyhow::bail!("exceeded max turns ({})", self.config.max_turns);
}
}
Checked at the top of each iteration, before calling the provider. If we have hit the limit, bail immediately. The anyhow::bail! macro returns an Err, which propagates up to the caller. No infinite loops, no surprise bills.
Call the provider
#![allow(unused)]
fn main() {
let turn = self.provider.chat(messages, &defs).await?;
}
Send the full message history and tool definitions to the LLM. The ? propagates provider errors (network failure, auth error, rate limit) directly to the caller. Provider errors are not recoverable by the agent loop – they need human intervention.
Match the stop reason
#![allow(unused)]
fn main() {
match turn.stop_reason {
StopReason::Stop => { /* final answer */ }
StopReason::ToolUse => { /* tool dispatch */ }
}
}
The LLM tells us why it stopped generating. Two possibilities:
Stop– The model is done. It has a final text answer. Extract it, push the assistant message into history, return.ToolUse– The model wants to use tools. It has populatedtool_callswith one or more calls. Execute them, push results, loop.
The two branches
StopReason::Stop – Clone the text, push the assistant message into history, return. The conversation ends with an Assistant message, ready for the next user turn.
StopReason::ToolUse – Execute the tools, then push messages in this exact order:
- First,
Message::Assistant(turn)– the assistant’s response including its tool calls - Then,
Message::tool_result(...)for each tool result
This ordering matters. The LLM API expects tool results to follow the assistant message that requested them. Each ToolResult is linked to its ToolCall by the id field. If you push them in the wrong order, the provider will reject the request.
After pushing results, the loop continues. The next iteration sends the entire history – including the tool calls and their results – back to the LLM. The model sees what happened and decides what to do next.
Why &mut Vec<Message>?
The caller owns the message history and passes it as &mut Vec<Message>. This is deliberate:
- Multi-turn conversations – The caller can push a new
Message::user(...)and callchat()again. The engine picks up where it left off with the full context. - Inspection – After
chat()returns, the caller can examine the full message history to see every tool call, every result, every intermediate step. - Persistence – The caller can serialize the messages to disk for session save/resume (Chapter 19).
run(): the convenience wrapper
Most of the time you just want to send a prompt and get a response. That is run():
#![allow(unused)]
fn main() {
pub async fn run(&self, prompt: &str) -> anyhow::Result<String> {
let mut messages = vec![Message::user(prompt)];
self.chat(&mut messages).await
}
}
Two lines. Creates a fresh message history with the user prompt, delegates to chat(). The message history is discarded after the call – use chat() directly if you need to preserve it.
QueryEvent: making it observable
The chat() method returns when the agent is done. That is fine for tests, but a real UI needs to show progress while the loop is running. What tool is being called? How long has it been running? Is it done?
The QueryEvent enum models these updates:
#![allow(unused)]
fn main() {
#[derive(Debug)]
pub enum QueryEvent {
/// A chunk of text streamed from the LLM.
TextDelta(String),
/// A tool is about to be called.
ToolStart { name: String, summary: String },
/// A tool finished executing.
ToolEnd { name: String, result: String },
/// The engine finished with a final response.
Done(String),
/// The engine encountered an error.
Error(String),
}
}
Five variants covering the full lifecycle:
| Event | When | UI use |
|---|---|---|
TextDelta | LLM streams a text chunk | Append to terminal output |
ToolStart | About to execute a tool | Show spinner: “Running bash: ls -la…” |
ToolEnd | Tool finished | Show result preview, stop spinner |
Done | Agent loop finished | Display final answer |
Error | Unrecoverable error | Show error message |
The summary field in ToolStart comes from Tool::summary() – recall from Chapter 3 that this produces compact descriptions like [bash: ls -la] or [read: src/main.rs]. The result field in ToolEnd is capped at 200 characters to keep event payloads small.
run_with_events / chat_with_events
These methods duplicate the core loop logic but emit events through a tokio::sync::mpsc::UnboundedSender<QueryEvent> channel. The caller creates the channel, passes the sender, and consumes events from the receiver – typically in a separate task that drives the UI.
#![allow(unused)]
fn main() {
pub async fn run_with_events(
&self,
prompt: &str,
events: mpsc::UnboundedSender<QueryEvent>,
) -> Vec<Message> {
let mut messages = vec![Message::user(prompt)];
self.chat_with_events(&mut messages, events).await;
messages
}
}
run_with_events returns Vec<Message> instead of Result<String> – errors are sent as QueryEvent::Error rather than propagated. This keeps the event-driven API simple: the receiver always gets a terminal event (Done or Error).
chat_with_events has the same structure as chat() but with events woven in. Here is the full implementation:
#![allow(unused)]
fn main() {
pub async fn chat_with_events(
&self,
messages: &mut Vec<Message>,
events: mpsc::UnboundedSender<QueryEvent>,
) {
let defs = self.tools.definitions();
let mut turns = 0;
loop {
if turns >= self.config.max_turns {
let _ = events.send(QueryEvent::Error(format!(
"exceeded max turns ({})",
self.config.max_turns
)));
return;
}
let turn = match self.provider.chat(messages, &defs).await {
Ok(t) => t,
Err(e) => {
let _ = events.send(QueryEvent::Error(e.to_string()));
return;
}
};
match turn.stop_reason {
StopReason::Stop => {
let text = turn.text.clone().unwrap_or_default();
let _ = events.send(QueryEvent::Done(text));
messages.push(Message::Assistant(turn));
return;
}
StopReason::ToolUse => {
// Emit ToolStart events before execution
for call in &turn.tool_calls {
if let Some(t) = self.tools.get(&call.name) {
let _ = events.send(QueryEvent::ToolStart {
name: call.name.clone(),
summary: t.summary(&call.arguments),
});
}
}
let results = self.execute_tools(&turn.tool_calls).await;
// Emit ToolEnd events with truncated previews
for (id, result) in &results {
let call_name = turn
.tool_calls
.iter()
.find(|c| c.id == *id)
.map(|c| c.name.clone())
.unwrap_or_default();
let _ = events.send(QueryEvent::ToolEnd {
name: call_name,
result: if result.content.len() > 200 {
format!("{}...", &result.content[..200])
} else {
result.content.clone()
},
});
}
messages.push(Message::Assistant(turn));
for (id, result) in results {
messages.push(Message::tool_result(id, result.content));
}
}
}
turns += 1;
}
}
}
The key differences from chat():
- Provider errors are caught with
matchinstead of?, and sent asQueryEvent::Error. - Max turns sends an error event instead of returning
Err. - ToolStart events fire before
execute_tools– the UI shows the spinner before the tool runs. - ToolEnd events fire after execution with a truncated result preview (200 chars max).
- Done event fires before pushing the final assistant message, so the UI gets the text immediately.
Note the let _ = events.send(...) pattern. The send can fail if the receiver has been dropped (the UI task crashed or exited early). We ignore the error because the engine should finish its work regardless of whether anyone is watching.
Using events in practice
The caller creates an unbounded channel, passes the sender to the engine, and reads events from the receiver – typically in a separate task:
#![allow(unused)]
fn main() {
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel();
let engine_handle = tokio::spawn(async move {
engine.run_with_events("Fix the bug in main.rs", tx).await
});
while let Some(event) = rx.recv().await {
match event {
QueryEvent::ToolStart { summary, .. } => println!(" -> {summary}"),
QueryEvent::Done(text) => { println!("{text}"); break; }
QueryEvent::Error(e) => { eprintln!("Error: {e}"); break; }
_ => {}
}
}
}
This two-task pattern is what Chapter 26 (TUI) builds on. The UI task renders events; the engine task runs the loop. They communicate through the channel.
Tool result truncation in depth
Truncation deserves special attention because getting it wrong breaks the agent. When a tool returns a huge result – say, bash("find / -name '*.rs'") producing 500KB of output – the engine keeps the first max_result_chars characters and appends "... [truncated, 523847 chars total]". This gives the model:
- Enough data to understand the result (100K characters is usually more than sufficient)
- A signal that data was lost, so it can request a more targeted query if needed
- The total size, so it can reason about how much it missed
The is_truncated flag on ToolResult is carried for downstream consumers (session logging, cost tracking) but does not affect the agent loop itself. Claude Code does something similar but more sophisticated – separate limits per tool type and token counting instead of character counting. Our character-based approach is simpler and good enough.
Error handling philosophy
The engine has two distinct error strategies, and the boundary between them is intentional.
Tool errors become results
When a tool fails – validation error, execution error, unknown tool – the error becomes a ToolResult::error(...) that the model sees as a normal tool result. The loop continues. The model reads the error and adapts.
Tool error flow:
LLM requests bash("rm -rf /")
-> Tool validates -> ValidationResult::Error("prohibited command")
-> ToolResult::error("prohibited command")
-> Pushed as Message::ToolResult
-> LLM sees error, tries different approach
This is essential for robust agents. Models make mistakes. Tools fail for legitimate reasons. The agent should recover, not crash.
Provider errors propagate
When the provider fails – network timeout, authentication error, rate limit, malformed response – the error propagates up via ? (in chat()) or via QueryEvent::Error (in chat_with_events()). The loop stops.
Provider error flow:
Engine calls provider.chat()
-> Provider returns Err(network timeout)
-> chat() returns Err(network timeout)
-> Caller handles it (retry, show error, etc.)
Provider errors are not the agent’s problem. They need human or system-level intervention (check your API key, wait for rate limits, fix your network). The engine does not try to recover.
The max turns safety guard
The max turns check is simple but its placement matters:
#![allow(unused)]
fn main() {
loop {
if turns >= self.config.max_turns {
anyhow::bail!("exceeded max turns ({})", self.config.max_turns);
}
let turn = self.provider.chat(messages, &defs).await?;
// ... handle turn ...
turns += 1;
}
}
A few things to notice:
- Checked before the provider call, not after. We do not waste an API call if we are already at the limit.
- Incremented after the full iteration (tool execution + message pushing), not before. The first iteration runs with
turns == 0. - Only tool-use turns count. If the model responds with
StopReason::Stop, we return before hittingturns += 1. A direct text response does not count as a “turn” because it does not add to the looping risk.
This means max_turns: 3 allows exactly 3 tool-use iterations. If the model has not finished after 3 rounds of tool use, it is probably stuck.
Message history management
The order in which messages are pushed into the history is load-bearing. After a tool-use turn:
#![allow(unused)]
fn main() {
StopReason::ToolUse => {
let results = self.execute_tools(&turn.tool_calls).await;
messages.push(Message::Assistant(turn)); // 1. Assistant message (with tool_calls)
for (id, result) in results {
messages.push(Message::tool_result(id, result.content)); // 2. Tool results
}
}
}
The resulting message sequence looks like:
[User] "What files are in src/?"
[Assistant] tool_calls: [bash("ls src/")] <- includes the tool call
[ToolResult] "main.rs\nlib.rs\n" <- linked by call ID
[Assistant] "There are two files: ..." <- next LLM response
Why this order?
-
API requirement: The Claude API (and OpenAI-compatible APIs) require that
tool_resultmessages immediately follow theassistantmessage that generated the correspondingtool_use. Violating this causes a 400 error. -
ID linking: Each
ToolResulthas atool_use_idthat matches aToolCall.idin the preceding assistant message. The LLM uses this to associate results with requests when there are multiple parallel tool calls. -
Context for the next turn: The LLM needs to see its own tool calls to understand what it asked for, and the results to know what happened. Both must be present in the history for the next
provider.chat()call.
Putting it all together: a complete trace
Let’s trace through a realistic scenario. The user asks: “What is 2 + 3?”
The engine has an AddTool registered. The mock provider is configured to return a tool call first, then a final answer.
Turn 0:
messages: [User("What is 2 + 3?")]
-> provider.chat() returns: ToolUse, tool_calls: [add(a=2, b=3)]
-> execute_tools: AddTool.call({a:2, b:3}) -> Ok("5")
-> push: Assistant(tool_calls: [add(a=2, b=3)])
-> push: ToolResult(id=call_1, content="5")
-> turns = 1
Turn 1:
messages: [User, Assistant, ToolResult]
-> provider.chat() returns: Stop, text: "The sum is 5"
-> push: Assistant(text: "The sum is 5")
-> return Ok("The sum is 5")
Two provider calls, one tool execution, clean exit. The final message history has 4 entries: User, Assistant (with tool call), ToolResult, Assistant (with text).
How this compares to Claude Code
Our QueryEngine is a teaching implementation. Claude Code’s real engine is considerably more complex. Here is what it adds:
| Feature | Our engine | Claude Code |
|---|---|---|
| Core loop | loop { match stop_reason } | Same pattern, but with async hooks at every stage |
| Streaming | Separate chat_with_events | Integrated SSE streaming with StreamProvider |
| Permissions | None | Full permission pipeline checked before every tool call |
| Compaction | None | Auto-compacts when approaching token limit |
| Hooks | None | Pre/post tool hooks with shell command execution |
| Concurrency | Sequential tool execution | Parallel execution for is_concurrent_safe tools |
| Error recovery | Tool errors as results | Same, plus retry logic for transient provider errors |
| Subagents | None | Spawns child QueryEngines with isolated history |
| Cost tracking | None | Accumulates TokenUsage from every turn |
The good news: the architecture is the same. Every feature in the right column plugs into the same loop structure. Permissions are checked in execute_tools before calling t.call(). Compaction runs at the top of the loop when token count is high. Hooks fire around tool execution. You will build all of these in later chapters, and they will slot into the engine you are building now.
Tests
Run the chapter 4 tests to verify your implementation:
cargo test -p claw-code test_ch4
Here is what each test covers:
test_ch4_direct_text_response – The simplest case. The provider returns StopReason::Stop on the first call. The engine should return the text without executing any tools.
test_ch4_single_tool_call – The provider returns a tool call, then a final text. Tests the full loop: call tool, push results, call provider again, return.
test_ch4_multi_step_loop – Two rounds of tool use before the final answer. Verifies the loop handles multiple iterations correctly.
test_ch4_unknown_tool – The LLM requests a tool that does not exist. The engine should return an error result (not crash), and the model should see it and respond.
test_ch4_max_turns – A provider that always returns ToolUse. The engine should bail with a “max turns” error after hitting the configured limit.
test_ch4_chat_preserves_history – Calls chat() twice on the same message vec. Verifies that history accumulates correctly across calls, enabling multi-turn conversations.
test_ch4_events_emitted – Uses run_with_events with a channel. Checks that ToolStart, ToolEnd, and Done events are emitted in the correct order.
test_ch4_result_truncation – A tool that returns 200 characters of output with max_result_chars: 50. Verifies that the result is sliced and the truncation marker is applied.
test_ch4_provider_error – An empty MockProvider with no responses. The first provider.chat() call fails. The engine should propagate the error.
Implementation checklist
Here is what you need to implement in the starter crate:
-
QueryConfig– The struct withmax_turnsandmax_result_chars, plus theDefaultimpl. -
QueryEvent– The five-variant enum. -
QueryEngine– The struct and builder methods (new,config,tool,tools). -
execute_tools– Look up each tool, validate, execute, truncate, collect. ReturnVec<(String, ToolResult)>. -
chat– The core loop. Check max turns, call provider, match stop reason, dispatch tools, push messages, loop. -
run– Create messages, callchat. -
chat_with_events– Same loop aschatbut emit events. Handle errors as events instead of?. -
run_with_events– Create messages, callchat_with_events, return messages.
Start with QueryConfig, QueryEngine::new, and the builder methods. Then implement execute_tools – you can test it implicitly through run. Then chat, then run. Save the event methods for last.
What you have now
After this chapter, you have a working coding agent. Not a complete one – it has no tools yet (those come in Part II), no permissions (Part III), no context management (Part V) – but the core loop is done. You can register any tool that implements the Tool trait, point it at any provider that implements Provider, and the engine will autonomously loop until it has an answer.
This is the skeleton that everything else hangs on. The permission engine (Chapter 10) adds a check inside execute_tools. The hook system (Chapter 12) wraps tool calls with pre/post events. Context compaction (Chapter 18) triggers at the top of the loop. Session management (Chapter 19) serializes the message vec. Every feature in this book is a modification to the loop you just built.
Next up: Chapter 5 builds the system prompt – the instructions that tell the LLM how to behave as a coding agent.