Local LLMs with Ollama

VoidBox can use local models served by Ollama instead of the Anthropic API. The guest VM reaches Ollama through the platform networking bridge — no Anthropic API key required.

Prerequisites

Install Ollama: ollama.com.
Pull a model: ollama pull qwen3-coder.
Start Ollama. On macOS, bind to all interfaces so the guest VM can reach it through the VZ NAT gateway — a 127.0.0.1-only listener won’t accept that traffic:
```
OLLAMA_HOST=0.0.0.0:11434 ollama serve
```
Build the guest initramfs — see Running on Linux or Running on macOS for the platform-specific scripts/build_claude_rootfs.sh invocation.

How guest-to-host networking works

The guest reaches the host at a fixed gateway address that differs by backend:

Linux/KVM: 10.0.2.2:11434 (SLIRP NAT).
macOS/VZ: 192.168.64.1:11434 (VZ NAT).

 Guest VM                        Host
 ┌──────────────┐               ┌──────────────┐
 │ claude-code  │──────────────>│ Ollama:11434 │
 │              │               │              │
 └──────────────┘               └──────────────┘

For Ollama-backed runs, VoidBox injects three environment variables into the guest so claude-code can talk to the local endpoint as if it were an Anthropic-compatible proxy:

ANTHROPIC_BASE_URL — set to the host Ollama endpoint.
ANTHROPIC_API_KEY="" — required but ignored by Ollama.
ANTHROPIC_AUTH_TOKEN=ollama — required placeholder.

Code example

LlmProvider::ollama(model) automatically selects the correct per-platform gateway address. You only need ollama_with_host if you want a non-default host or port.

use void_box::agent_box::VoidBox;
use void_box::llm::LlmProvider;
use void_box::skill::Skill;

let model = std::env::var("OLLAMA_MODEL")
    .unwrap_or_else(|_| "qwen3-coder".into());

let agent = VoidBox::new("ollama_demo")
    .llm(LlmProvider::ollama(&model))
    .skill(Skill::agent("claude-code"))
    .memory_mb(2048)
    .prompt("Write a Python script that prints the first 10 Fibonacci numbers.")
    .build()?;

let result = agent.run(None, None).await?;

memory_mb sizes the guest VM, not Ollama. Ollama runs on the host and uses host memory; the guest still needs ~2 GB for claude-code plus its Node/bun runtime. Don’t go below 1024.

Running the example

OLLAMA_MODEL=qwen3-coder \
  VOID_BOX_KERNEL=/boot/vmlinuz-$(uname -r) \
  VOID_BOX_INITRAMFS=/tmp/void-box-rootfs.cpio.gz \
  cargo run --example ollama_local

Environment variables

Env var	Purpose
`OLLAMA_MODEL`	Ollama model name (e.g. `qwen3-coder`, `phi4-mini`).
`VOID_BOX_KERNEL`	Path to the host kernel image for KVM.
`VOID_BOX_INITRAMFS`	Path to the guest initramfs built by `scripts/build_claude_rootfs.sh`.

Without VOID_BOX_KERNEL set, the example falls back to mock mode (no real VM).

For YAML/spec-driven runs, VOIDBOX_LLM_PROVIDER, VOIDBOX_LLM_MODEL, and VOIDBOX_LLM_BASE_URL override the spec — see YAML Specs. Those env vars do not apply to hand-built VoidBox instances from Rust.

Model and context tuning

Model choice is a footprint-vs-quality tradeoff. phi4-mini (~4 GB) boots fast and handles simple prompts; qwen3-coder is larger but much stronger on code-heavy agent workloads. Pick based on your host RAM and iteration time, not the guest’s memory_mb.
Context length defaults to 4K in Ollama, which is often too short for long-running agent loops. Raise it globally with OLLAMA_CONTEXT_LENGTH on the host, or per-model via a Modelfile PARAMETER num_ctx directive. This is an Ollama-side concern; VoidBox doesn’t touch it.

See Pipeline Composition to chain Ollama-backed boxes, or define specs with YAML using llm.provider: ollama.