Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 11: Search Tools

File(s) to edit: (extension -- no stubs in starter) Tests: No tests in the starter. GlobTool and GrepTool are extension tools. Estimated time: 25 min (read-only)

Goal

  • Understand why file discovery (GlobTool) and content search (GrepTool) are separate tools with distinct parameter schemas.
  • Implement GlobTool so the agent can find files by name pattern using the glob crate.
  • Implement GrepTool with recursive directory walking, regex matching, and an optional file type filter.
  • Learn when to use async vs sync helper functions in tool implementations (I/O-bound file reads vs fast directory walking).

A coding agent that can only read files it already knows about is like a developer who never uses find or grep. You can hand it a specific file path and it will read it faithfully, but drop it into an unfamiliar codebase and it is blind. It cannot discover which files exist, cannot search for where a function is defined, cannot find all the places a type is used. Without search, the LLM has to guess file paths -- and it will guess wrong.

Search tools fix this. In this chapter we explore two: GlobTool finds files by name pattern, and GrepTool searches file contents by regex. Together they give the LLM the ability to navigate any codebase, no matter how large or unfamiliar. These are the eyes of the agent.

How the search tools fit into the agent workflow

flowchart TD
    LLM[LLM decides what to do]
    LLM -->|"What files exist?"| Glob[GlobTool]
    LLM -->|"Where is this defined?"| Grep[GrepTool]
    Glob -->|returns file paths| LLM
    Grep -->|"returns path:line: content"| LLM
    LLM -->|reads specific file| Read[ReadTool]
    LLM -->|modifies file| Edit[EditTool]
    Read -->|file contents| LLM
    Edit -->|confirmation| LLM

Note: Search tools are extensions in this book -- neither the starter (mini-claw-code-starter) nor the reference implementation (mini-claw-code) ships a GlobTool or GrepTool. If you want to add them, you will create src/tools/glob.rs and src/tools/grep.rs from scratch and register them in src/tools/mod.rs. The complete reference code for both tools is reproduced inline below -- treat this chapter as an annotated implementation walkthrough rather than a stub-filling exercise.

Two tools, two questions

The split between glob and grep maps to two distinct questions the LLM asks when exploring code:

  1. "What files exist?" -- GlobTool. The LLM knows it wants Rust files, or test files, or config files. It does not know their exact paths. A glob pattern like **/*.rs or tests/*.toml answers this.

  2. "Where is this thing defined?" -- GrepTool. The LLM knows a function name, a type, an error message. It needs to find which file and which line contain it. A regex pattern like fn parse_sse_line or struct QueryConfig answers this.

Claude Code has both as separate tools for exactly this reason. They serve different purposes, take different inputs, and the LLM chooses between them based on what it knows. Merging them into one tool would muddy the interface -- the LLM would have to figure out whether it is doing a name search or a content search, and the parameter schema would be awkward.


GlobTool

GlobTool is the simpler of the two. It takes a glob pattern, optionally scoped to a base directory, and returns all matching file paths.

File layout

The implementation lives at src/tools/glob.rs. Here is the complete code:

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use serde_json::Value;

use crate::types::*;

pub struct GlobTool {
    definition: ToolDefinition,
}

impl GlobTool {
    pub fn new() -> Self {
        Self {
            definition: ToolDefinition::new("glob", "Find files matching a glob pattern")
                .param("pattern", "string", "Glob pattern (e.g. \"**/*.rs\")", true)
                .param(
                    "path",
                    "string",
                    "Base directory to search in (default: current directory)",
                    false,
                ),
        }
    }
}

impl Default for GlobTool {
    fn default() -> Self {
        Self::new()
    }
}

#[async_trait]
impl Tool for GlobTool {
    fn definition(&self) -> &ToolDefinition {
        &self.definition
    }

    async fn call(&self, args: Value) -> anyhow::Result<String> {
        let pattern = args["pattern"]
            .as_str()
            .ok_or_else(|| anyhow::anyhow!("missing 'pattern' argument"))?;

        let base = args
            .get("path")
            .and_then(|v| v.as_str())
            .unwrap_or(".");

        let full_pattern = if pattern.starts_with('/') || pattern.starts_with('.') {
            pattern.to_string()
        } else {
            format!("{base}/{pattern}")
        };

        let entries: Vec<String> = glob::glob(&full_pattern)
            .map_err(|e| anyhow::anyhow!("invalid glob pattern: {e}"))?
            .filter_map(|entry| entry.ok())
            .map(|p| p.display().to_string())
            .collect();

        if entries.is_empty() {
            Ok("no files matched".to_string())
        } else {
            Ok(entries.join("\n"))
        }
    }
}
}

Walking through the implementation

The definition. Two parameters: pattern (required) and path (optional). The pattern is a standard glob -- *.rs for Rust files in the current directory, **/*.rs for Rust files recursively, src/**/*.toml for TOML files under src/. The path sets the base directory; it defaults to "." (the current working directory) when omitted.

Pattern construction. The call method builds the full glob pattern from the base directory and the user-supplied pattern. If the pattern already starts with / or ., it is treated as an absolute or relative path and used directly. Otherwise, the base directory is prepended: format!("{base}/{pattern}"). This means calling with {"pattern": "*.rs", "path": "/home/user/project"} produces the glob /home/user/project/*.rs.

The glob crate. We use the glob crate (already in Cargo.toml) to do the actual matching. glob::glob() returns an iterator of Result<PathBuf> entries. We filter_map with entry.ok() to silently skip any paths that fail (permission errors, broken symlinks). The remaining paths are converted to display strings and collected.

Output format. Matching paths are joined with newlines -- one path per line. If nothing matches, we return "no files matched" rather than an empty string. This matters for the LLM: an explicit "no files matched" message tells it the pattern was valid but found nothing, prompting it to try a different pattern. An empty string would be ambiguous.


GrepTool

GrepTool is more complex. It searches file contents using regex, optionally scoped to a directory and filtered by file type. The output follows the classic grep format: path:line_no: content.

The complete implementation

Here is src/tools/grep.rs:

#![allow(unused)]
fn main() {
use std::path::Path;

use async_trait::async_trait;
use serde_json::Value;

use crate::types::*;

pub struct GrepTool {
    definition: ToolDefinition,
}

impl GrepTool {
    pub fn new() -> Self {
        Self {
            definition: ToolDefinition::new("grep", "Search file contents using a regex pattern")
                .param("pattern", "string", "Regex pattern to search for", true)
                .param(
                    "path",
                    "string",
                    "File or directory to search in (default: current directory)",
                    false,
                )
                .param(
                    "include",
                    "string",
                    "Glob pattern to filter files (e.g. \"*.rs\")",
                    false,
                ),
        }
    }
}

impl Default for GrepTool {
    fn default() -> Self {
        Self::new()
    }
}

#[async_trait]
impl Tool for GrepTool {
    fn definition(&self) -> &ToolDefinition {
        &self.definition
    }

    async fn call(&self, args: Value) -> anyhow::Result<String> {
        let pattern = args["pattern"]
            .as_str()
            .ok_or_else(|| anyhow::anyhow!("missing 'pattern' argument"))?;

        let re = regex::Regex::new(pattern)
            .map_err(|e| anyhow::anyhow!("invalid regex pattern: {e}"))?;

        let search_path = args
            .get("path")
            .and_then(|v| v.as_str())
            .unwrap_or(".");

        let include_pattern = args.get("include").and_then(|v| v.as_str());
        let include_glob = include_pattern
            .map(|p| glob::Pattern::new(p))
            .transpose()
            .map_err(|e| anyhow::anyhow!("invalid include pattern: {e}"))?;

        let path = Path::new(search_path);
        let mut matches = Vec::new();

        if path.is_file() {
            search_file(&re, path, &mut matches).await;
        } else if path.is_dir() {
            let mut entries = Vec::new();
            collect_files(path, &include_glob, &mut entries);
            entries.sort();
            for file_path in entries {
                search_file(&re, &file_path, &mut matches).await;
            }
        } else {
            anyhow::bail!("path does not exist: {search_path}");
        }

        if matches.is_empty() {
            Ok("no matches found".to_string())
        } else {
            Ok(matches.join("\n"))
        }
    }
}

/// Search a single file for regex matches and append formatted results.
async fn search_file(re: &regex::Regex, path: &Path, matches: &mut Vec<String>) {
    let Ok(content) = tokio::fs::read_to_string(path).await else {
        return; // Skip binary/unreadable files
    };
    let display = path.display();
    for (line_no, line) in content.lines().enumerate() {
        if re.is_match(line) {
            matches.push(format!("{display}:{}: {line}", line_no + 1));
        }
    }
}

/// Recursively collect files from a directory, optionally filtering by glob.
fn collect_files(
    dir: &Path,
    include: &Option<glob::Pattern>,
    out: &mut Vec<std::path::PathBuf>,
) {
    let Ok(entries) = std::fs::read_dir(dir) else {
        return;
    };
    for entry in entries.flatten() {
        let path = entry.path();
        if path.is_dir() {
            // Skip hidden directories
            if path
                .file_name()
                .is_some_and(|n| n.to_string_lossy().starts_with('.'))
            {
                continue;
            }
            collect_files(&path, include, out);
        } else if path.is_file() {
            if let Some(glob) = include {
                let name = path
                    .file_name()
                    .map(|n| n.to_string_lossy().to_string())
                    .unwrap_or_default();
                if !glob.matches(&name) {
                    continue;
                }
            }
            out.push(path);
        }
    }
}
}

Walking through the implementation

There is more going on here, so let's take it piece by piece.

The definition. Three parameters: pattern (required regex), path (optional file or directory), and include (optional glob filter for file names). The LLM might call it as {"pattern": "fn main"} to search the current directory, or {"pattern": "TODO", "path": "src/", "include": "*.rs"} to search only Rust files under src/.

Regex compilation. The pattern is compiled into a regex::Regex upfront. If the LLM provides an invalid regex (missing closing bracket, bad escape), we return an error immediately rather than crashing partway through the search. The regex crate handles the full Rust regex syntax -- character classes, quantifiers, alternation, captures.

The include filter. The include parameter is a glob pattern, not a regex. We compile it into a glob::Pattern using the same glob crate that powers GlobTool.

Rust concept: Option::transpose

The .transpose() call converts Option<Result<T>> into Result<Option<T>>. This is a common Rust idiom when you have an optional operation that might fail. Without transpose, you would need a match or if let to handle the Some(Ok(...)), Some(Err(...)), and None cases separately. With it, you can use ? to propagate the error and end up with a clean Option<T>. The pattern x.map(fallible_fn).transpose()? reads as: "if present, try the operation; if it fails, propagate the error; if absent, produce None."

Three-way path dispatch. The search path can be a file, a directory, or nonexistent:

  • File: Search just that one file. The LLM does this when it already knows which file to look in.
  • Directory: Recursively collect all files (filtered by include if provided), sort them for deterministic output, then search each one.
  • Nonexistent: Return an error via bail!. The agent loop catches this and reports it to the LLM as "error: path does not exist: /nonexistent/path", and the model can recover by trying a different path.

Output format. Each match is formatted as path:line_no: content, following the classic grep convention. Line numbers are 1-based (humans and LLMs both expect line 1 to be the first line, not line 0). When no matches are found, the tool returns "no matches found" -- again, explicit is better than empty.


Helper function design

Rust concept: choosing async vs sync for helpers

The two helper functions -- search_file and collect_files -- are deliberately designed with different signatures. Understanding why reveals practical Rust async patterns. The decision rule is simple: if the function does I/O that could block (reading file contents), make it async. If it does fast metadata operations (listing directory entries), keep it sync. Making everything async "just in case" adds complexity -- recursive async functions require Pin<Box<dyn Future>> or the async_recursion crate -- and provides no benefit when the operation is already fast.

search_file is async

#![allow(unused)]
fn main() {
async fn search_file(re: &regex::Regex, path: &Path, matches: &mut Vec<String>) {
    let Ok(content) = tokio::fs::read_to_string(path).await else {
        return; // Skip binary/unreadable files
    };
    let display = path.display();
    for (line_no, line) in content.lines().enumerate() {
        if re.is_match(line) {
            matches.push(format!("{display}:{}: {line}", line_no + 1));
        }
    }
}
}

This function reads a file from disk, which is I/O. Using tokio::fs::read_to_string instead of std::fs::read_to_string keeps the async runtime free to do other work while waiting on the filesystem. In a real agent with concurrent tool execution, this matters -- a slow NFS mount or large file should not block the entire runtime.

The let Ok(content) = ... else { return; } pattern is a quiet bailout. If the file cannot be read -- it is binary, it is a symlink to a deleted file, the user lacks permissions -- we silently skip it. This is the right behavior for a search tool. The LLM asked "where does this pattern appear?" and the answer should only include files where we could actually check. Reporting an error for every unreadable file in a directory tree would drown the useful results in noise.

collect_files is sync

#![allow(unused)]
fn main() {
fn collect_files(
    dir: &Path,
    include: &Option<glob::Pattern>,
    out: &mut Vec<std::path::PathBuf>,
) {
    let Ok(entries) = std::fs::read_dir(dir) else {
        return;
    };
    for entry in entries.flatten() {
        let path = entry.path();
        if path.is_dir() {
            if path
                .file_name()
                .is_some_and(|n| n.to_string_lossy().starts_with('.'))
            {
                continue;
            }
            collect_files(&path, include, out);
        } else if path.is_file() {
            if let Some(glob) = include {
                let name = path
                    .file_name()
                    .map(|n| n.to_string_lossy().to_string())
                    .unwrap_or_default();
                if !glob.matches(&name) {
                    continue;
                }
            }
            out.push(path);
        }
    }
}
}

Directory walking is fast -- it reads metadata, not file contents. Making it async would add complexity (recursive async functions require boxing) without meaningful performance benefit. The sync std::fs::read_dir is fine here.

Three details worth noting:

Hidden directory skipping. Directories whose names start with . are skipped entirely. This excludes .git, .cargo, .vscode, node_modules hidden behind a dot-prefix, and similar directories that are almost never what the LLM wants to search. Without this filter, a grep through a project directory would spend most of its time scanning .git/objects -- thousands of binary blob files that produce no useful matches.

The include filter. When present, the glob pattern is matched against the file name only (not the full path). This means "*.rs" matches src/main.rs by checking just main.rs against the pattern. This is intuitive -- when the LLM says "search only Rust files," it means files ending in .rs, regardless of where they live in the tree.

The sort. After collecting all files, the caller sorts them before searching. This ensures deterministic output order. Without sorting, read_dir returns entries in filesystem order, which varies across operating systems and even across runs on the same system. Deterministic output makes tests reliable and makes the LLM's experience consistent.


Why two separate tools

You might wonder: why not one SearchTool with a mode parameter? The answer comes down to how LLMs make decisions.

When the LLM sees two separate tools in its schema -- one called glob described as "find files matching a pattern" and one called grep described as "search file contents using regex" -- it can instantly match its intent to the right tool. "I need to find all test files" maps to glob. "I need to find where parse_sse_line is defined" maps to grep.

A combined tool with a mode: "files" | "content" parameter adds a decision layer. The LLM has to read the schema more carefully, understand the mode field, and get it right. With smaller models, this extra indirection leads to mistakes -- calling the tool in the wrong mode, or omitting the mode parameter entirely.

Claude Code keeps them separate. So do we.

There is also a practical reason: the parameter sets are different. Glob takes a glob pattern and a base path. Grep takes a regex pattern, a path, and an include filter. Merging them would mean the LLM always sees parameters that are irrelevant to what it is doing, which wastes context tokens and increases the chance of confusion.


How Claude Code does it

Our implementations are the essential protocol -- they capture the core behavior in under 200 lines. Claude Code's production versions are considerably more sophisticated.

Claude Code's Glob uses ripgrep internally for speed. On large codebases with hundreds of thousands of files, the glob crate's pure-Rust implementation can be slow. Ripgrep's directory walker is optimized for this use case, respecting .gitignore rules and parallelizing the walk. Claude Code's Glob also supports sorting results by modification time (most recently changed files first, which is often what the LLM wants) and limits the number of results to avoid flooding the context window.

Claude Code's Grep is equally enhanced. It supports context lines (-A, -B, -C flags) to show surrounding code, which helps the LLM understand matches without making a separate read call. It offers multiple output modes: show matching lines (default), show only file paths (for counting), or show match counts per file. File type filtering uses ripgrep's built-in type system rather than a glob pattern, so --type rust knows about .rs files, Cargo.toml, and build.rs without the user spelling out the glob.

Our versions skip all of this. We use the glob crate instead of ripgrep, we have no context lines, no output modes, no result limits. What we do have is the correct protocol: the LLM sends a pattern and gets back matching results in a format it can parse. Everything else is optimization. If you want to upgrade later, the Tool trait interface stays the same -- only the internals of call() change.


Tests

Since GlobTool and GrepTool are extensions, neither the starter nor the reference implementation ships tests for them. The assertions below describe the test cases you would add alongside the tool code if you build these out yourself -- they are the contract the tools should satisfy. Once you have copied the tool code into mini-claw-code-starter/src/tools/ and written these tests, you can run them with:

cargo test -p mini-claw-code-starter grep

Recommended test cases:

GlobTool tests

test_grep_glob_find_files -- Creates a temp directory with a.rs, b.rs, and c.txt. Globs for *.rs. Verifies that both .rs files appear in the result and the .txt file does not.

test_grep_glob_recursive -- Creates a temp directory with top.rs at the root and sub/deep.rs in a subdirectory. Globs for **/*.rs. Verifies that both files are found, confirming recursive descent works.

test_grep_glob_no_matches -- Creates a temp directory with file.txt and globs for *.xyz. Verifies the result contains "no files matched".

test_grep_glob_definition -- Verifies the tool definition has the name "glob".

GrepTool tests

test_grep_grep_single_file -- Creates a file containing fn main() and println!("hello"). Greps for "println". Verifies the match includes the content and the correct line number (:2:).

test_grep_grep_directory -- Creates two files, both containing fn foo(). Greps the directory for "fn foo". Verifies both files appear in the results.

test_grep_grep_with_include -- Creates code.rs and data.txt, both containing "hello world". Greps with include: "*.rs". Verifies only the .rs file appears in results.

test_grep_grep_no_matches -- Creates a file and greps for a pattern that does not appear. Verifies the result contains "no matches found".

test_grep_grep_regex -- Creates a file with foo123, bar456, baz789. Greps with the regex \d{3} (three digits). Verifies all three lines match, confirming real regex support rather than plain string matching.

test_grep_grep_nonexistent_path -- Greps a path that does not exist. Verifies the result is an error.

test_grep_grep_definition -- Verifies the tool definition has the name "grep".


Recap

This chapter added two search tools that let the agent discover and navigate code:

  • GlobTool finds files by name pattern. It takes a glob like **/*.rs and returns matching paths, one per line. It uses the glob crate for pattern matching and defaults to the current directory when no base path is provided.

  • GrepTool searches file contents by regex. It takes a pattern like fn main and returns matches in path:line_no: content format. It supports scoping to a file or directory and filtering by file type with the include parameter. Two helper functions split the work: search_file (async, handles I/O) and collect_files (sync, walks the directory tree).

  • Both tools are read-only. They never modify the filesystem. In a production agent with safety flags, they would be marked as read-only and concurrent-safe.

  • The separation is deliberate. Glob answers "what files exist?" Grep answers "where is this content?" Two tools with clear purposes are easier for the LLM to use correctly than one tool with a mode switch.

  • These are extensions. The starter does not include stubs for GlobTool or GrepTool. If you want to add them, create the files from scratch following the patterns shown above and register them in src/tools/mod.rs.

Key takeaway

Search tools are what turn a coding agent from a tool that edits known files into one that can explore and understand an unfamiliar codebase. The two-tool split (glob for names, grep for contents) maps directly to the two questions a developer asks when navigating code: "what files exist?" and "where is this thing?" Keeping them separate gives the LLM a clear, unambiguous interface for each question.

With search tools in place, the agent can now explore an unfamiliar codebase on its own. Given a prompt like "find and fix the bug in the parser," it can glob for source files, grep for the parser code, read the relevant files, and then use the write and edit tools from Chapter 9 to make changes. The tool suite is becoming complete.

Check yourself


← Chapter 10: Bash Tool · Contents · Chapter 12: Tool Registry →