react pattern

Research agent — sandboxed Rhai scripts as LLM tools

An LLM-driven research agent that writes and executes Rhai scripts in a deterministic sandbox. Lets the model use code without giving it the keys to the kingdom.

rust

View source on GitHub ↗

Highlights

LLM writes Rhai (embeddable Rust scripting language) to compute, transform, plot.
Sandboxed: no filesystem, no network, no subprocess — only the functions you whitelist.
Deterministic: same input → same output → cacheable.
Demonstrates the safer alternative to letting an LLM call Python eval().

What this shows

LLMs love writing code. The unsafe pattern is exec(llm_output) in Python — you just gave the model arbitrary code execution on your machine. The safe pattern is to give it a sandboxed scripting language with a whitelisted standard library. This example uses Rhai (an embeddable Rust scripting language) wrapped as an Agentmatic tool.

Architecture

   Agent (ReAct)
      ▼
   tool: rhai_eval(script: String) -> Value
      ▼
   Rhai engine with whitelisted std (no fs, no net, no exec)
      ▼
   typed return value back to the model

Key snippet

use agentmatic::prelude::*;
use rhai::{Engine, Scope};

#[tool]
fn rhai_eval(script: String) -> anyhow::Result<String> {
    let mut engine = Engine::new();
    // Whitelist only safe ops. No file/net/exec.
    engine.set_max_operations(100_000);
    engine.set_max_call_levels(8);

    let result: rhai::Dynamic = engine.eval(&script)?;
    Ok(result.to_string())
}

let agent = Agent::builder("researcher")
    .llm(OpenAI::from_env()?)
    .tools(vec![rhai_eval_tool()])
    .build()?;

Why not Python eval

eval(str) in Python is arbitrary code execution. Even with globals={} you can escape.
subprocess.run introduces fs and network access.
A sandboxed scripting language has a hard, auditable surface area. You decide what’s callable.

Use cases

Math, statistics, transformations on data the model needs to manipulate.
Plotting (whitelist a chart() builder that returns a serialized chart spec).
DSLs — let the model emit a config / policy / template in your DSL.