Local LLMs with Ollama
VoidBox can use local models served by Ollama instead of the Anthropic API. The guest VM reaches Ollama through the platform networking bridge — no Anthropic API key required.
Prerequisites
- Install Ollama: ollama.com.
- Pull a model:
ollama pull qwen3-coder. - Start Ollama. On macOS, bind to all interfaces so the guest VM can reach it through the VZ NAT gateway — a
127.0.0.1-only listener won’t accept that traffic:OLLAMA_HOST=0.0.0.0:11434 ollama serve - Build the guest initramfs — see Running on Linux or Running on macOS for the platform-specific
scripts/build_claude_rootfs.shinvocation.
How guest-to-host networking works
The guest reaches the host at a fixed gateway address that differs by backend:
- Linux/KVM:
10.0.2.2:11434(SLIRP NAT). - macOS/VZ:
192.168.64.1:11434(VZ NAT).
Guest VM Host
┌──────────────┐ ┌──────────────┐
│ claude-code │──────────────>│ Ollama:11434 │
│ │ │ │
└──────────────┘ └──────────────┘
For Ollama-backed runs, VoidBox injects three environment variables into the guest so claude-code can talk to the local endpoint as if it were an Anthropic-compatible proxy:
ANTHROPIC_BASE_URL— set to the host Ollama endpoint.ANTHROPIC_API_KEY=""— required but ignored by Ollama.ANTHROPIC_AUTH_TOKEN=ollama— required placeholder.
Code example
LlmProvider::ollama(model) automatically selects the correct per-platform gateway address. You only need ollama_with_host if you want a non-default host or port.
use void_box::agent_box::VoidBox;
use void_box::llm::LlmProvider;
use void_box::skill::Skill;
let model = std::env::var("OLLAMA_MODEL")
.unwrap_or_else(|_| "qwen3-coder".into());
let agent = VoidBox::new("ollama_demo")
.llm(LlmProvider::ollama(&model))
.skill(Skill::agent("claude-code"))
.memory_mb(2048)
.prompt("Write a Python script that prints the first 10 Fibonacci numbers.")
.build()?;
let result = agent.run(None, None).await?;
memory_mb sizes the guest VM, not Ollama. Ollama runs on the host and uses host memory; the guest still needs ~2 GB for claude-code plus its Node/bun runtime. Don’t go below 1024.
Running the example
OLLAMA_MODEL=qwen3-coder \
VOID_BOX_KERNEL=/boot/vmlinuz-$(uname -r) \
VOID_BOX_INITRAMFS=/tmp/void-box-rootfs.cpio.gz \
cargo run --example ollama_local
Environment variables
| Env var | Purpose |
|---|---|
OLLAMA_MODEL | Ollama model name (e.g. qwen3-coder, phi4-mini). |
VOID_BOX_KERNEL | Path to the host kernel image for KVM. |
VOID_BOX_INITRAMFS | Path to the guest initramfs built by scripts/build_claude_rootfs.sh. |
Without VOID_BOX_KERNEL set, the example falls back to mock mode (no real VM).
For YAML/spec-driven runs, VOIDBOX_LLM_PROVIDER, VOIDBOX_LLM_MODEL, and VOIDBOX_LLM_BASE_URL override the spec — see YAML Specs. Those env vars do not apply to hand-built VoidBox instances from Rust.
Model and context tuning
- Model choice is a footprint-vs-quality tradeoff.
phi4-mini(~4 GB) boots fast and handles simple prompts;qwen3-coderis larger but much stronger on code-heavy agent workloads. Pick based on your host RAM and iteration time, not the guest’smemory_mb. - Context length defaults to 4K in Ollama, which is often too short for long-running agent loops. Raise it globally with
OLLAMA_CONTEXT_LENGTHon the host, or per-model via aModelfilePARAMETER num_ctxdirective. This is an Ollama-side concern; VoidBox doesn’t touch it.
Next
See Pipeline Composition to chain Ollama-backed boxes, or define specs with YAML using llm.provider: ollama.