Defense in Depth for AI Agents
AI agents execute code, call tools, and hold real credentials. Void-Box treats them as untrusted execution — with hardware-isolated boundaries, declared capabilities, and built-in observability.
Why AI Agents Should Be Treated as Untrusted Execution
AI agents execute code, call APIs, modify files, and hold real credentials. They are non-deterministic — their behavior depends on prompts, context, and model state that can be manipulated.
Two fundamental risks apply:
- Prompt injection — An attacker embeds instructions in data the agent processes. The agent follows the attacker's goals using its legitimate access.
- Confused deputy — The agent loses its safety context and takes destructive actions without external manipulation. This has happened in production with real coding agents.
In both cases, the agent uses its legitimate access to cause damage. Prompt-level restrictions are not sufficient because the execution environment itself must enforce boundaries.
Why Containers Are Not Always Enough
Docker containers are a reasonable first step for isolation, but they share a kernel with the host. This shared kernel creates escape paths:
Docker Socket Escape
Mounting /var/run/docker.sock lets a container create sibling containers with full host access.
Privileged Container Escape
--privileged mode exposes host block devices, allowing direct disk mounting and host filesystem access.
Cgroup v1 Escape
The cgroups v1 release_agent mechanism can be exploited to execute arbitrary code as root on the host.
These are not theoretical. The AI Agent Security Labs demonstrate these escapes, plus cloud metadata SSRF and mounted credential exfiltration, with reproducible exploits.
Five Layers of Defense
Void-Box implements defense in depth with five distinct layers:
Hardware Isolation (KVM / Virtualization.framework)
Each agent runs in its own micro-VM with a dedicated kernel. The host kernel is behind a hardware virtualization boundary — not reachable from the guest.
Seccomp-BPF
Syscall filtering applies even within the guest. Only the syscalls required for the agent's declared skills are permitted.
Session Authentication
Host-guest communication uses vsock with session tokens. No unauthenticated commands are accepted by the guest agent.
Guest Hardening
The guest environment is minimal: read-only rootfs, no package manager, no unnecessary services. The attack surface inside the VM is deliberately small.
Network Isolation
SLIRP user-mode networking provides controlled egress without host network access. No bridged networking, no host port exposure.
Capability Model
Void-Box uses a declared capability model. Skills are what an agent can do — they must be explicitly bound at definition time.
- Skills not mounted in the VM don't exist at runtime. There is no implicit access to host resources.
- Command allowlists restrict which executables can run inside the guest.
- Resource limits (CPU, memory) are enforced by the VMM, not by guest-side promises.
- Network access is opt-in and constrained via SLIRP.
This model ensures that an agent's capabilities are bounded by its declaration, not by what the runtime environment happens to expose.
Observability and Control
Security without visibility is incomplete. Void-Box emits structured telemetry by design:
- Per-stage traces via OpenTelemetry — every agent run and pipeline stage is traced.
- Structured events — tool calls, skill invocations, and execution transitions are logged as events.
- Metrics — resource consumption, execution duration, and agent health are measured.
- vsock communication — all host-guest interaction flows through a single, inspectable channel.
Observability is not an add-on. It is part of the runtime contract.
Security Labs: Proof
The AI Agent Security Labs provide six hands-on, reproducible labs covering prompt injection, container escape, metadata SSRF, and sensitive mount exfiltration — and show how Void-Box reduces blast radius with micro-VM isolation and explicit capability boundaries.
| # | Lab | What it demonstrates | Docker result | Void-Box result |
|---|---|---|---|---|
| 01 | Prompt Injection | Agent follows attacker-injected instructions | Secrets leaked | Nothing to steal |
| 02 | Docker Socket Escape | Mounted Docker socket gives full host access | Full host control | No socket exists |
| 03 | Privileged Container Escape | --privileged container mounts host disk | Host filesystem access | No host devices |
| 04 | Cgroup Escape | cgroups v1 release_agent executes on host | Root code exec on host | Guest kernel only |
| 05 | Cloud Metadata SSRF | Default networking reaches cloud metadata service | IAM credentials stolen | Metadata unreachable |
| 06 | Sensitive Mount Exfil | Mounted credential dirs expose host secrets | Credentials readable | No host mounts |
Assumptions and Non-Goals
Void-Box is honest about what it does and does not claim to solve:
- Not a sandbox for arbitrary untrusted binaries. Void-Box is designed for AI agent workloads, not general-purpose sandboxing.
- Not a replacement for application-level security. If your agent has credentials, it can use them. Void-Box reduces blast radius, not credential exposure.
- Not a compliance framework. Void-Box provides technical isolation. Compliance, audit, and identity management are separate concerns.
- Guest escape is assumed possible in theory. The security model is defense in depth — any single layer can fail, and the remaining layers still reduce damage.