Defense in Depth for AI Agents

AI agents execute code, call tools, and hold real credentials. Void-Box treats them as untrusted execution — with hardware-isolated boundaries, declared capabilities, and built-in observability.

Why AI Agents Should Be Treated as Untrusted Execution

AI agents execute code, call APIs, modify files, and hold real credentials. They are non-deterministic — their behavior depends on prompts, context, and model state that can be manipulated.

Two fundamental risks apply:

Prompt injection — An attacker embeds instructions in data the agent processes. The agent follows the attacker's goals using its legitimate access.
Confused deputy — The agent loses its safety context and takes destructive actions without external manipulation. This has happened in production with real coding agents.

In both cases, the agent uses its legitimate access to cause damage. Prompt-level restrictions are not sufficient because the execution environment itself must enforce boundaries.

Why Containers Are Not Always Enough

Docker containers are a reasonable first step for isolation, but they share a kernel with the host. This shared kernel creates escape paths:

Docker Socket Escape

Mounting /var/run/docker.sock lets a container create sibling containers with full host access.

Privileged Container Escape

--privileged mode exposes host block devices, allowing direct disk mounting and host filesystem access.

Cgroup v1 Escape

The cgroups v1 release_agent mechanism can be exploited to execute arbitrary code as root on the host.

These are not theoretical. The AI Agent Security Labs demonstrate these escapes, plus cloud metadata SSRF and mounted credential exfiltration, with reproducible exploits.

Five Layers of Defense

Void-Box implements defense in depth with five distinct layers:

Hardware Isolation (KVM / Virtualization.framework)

Each agent runs in its own micro-VM with a dedicated kernel. The host kernel is behind a hardware virtualization boundary — not reachable from the guest.

Seccomp-BPF

Syscall filtering applies even within the guest. Only the syscalls required for the agent's declared skills are permitted.

Session Authentication

Host-guest communication uses vsock with session tokens. No unauthenticated commands are accepted by the guest agent.

Guest Hardening

The guest environment is minimal: read-only rootfs, no package manager, no unnecessary services. The attack surface inside the VM is deliberately small.

Network Isolation

SLIRP user-mode networking provides controlled egress without host network access. No bridged networking, no host port exposure.

Capability Model

Void-Box uses a declared capability model. Skills are what an agent can do — they must be explicitly bound at definition time.

Skills not mounted in the VM don't exist at runtime. There is no implicit access to host resources.
Command allowlists restrict which executables can run inside the guest.
Resource limits (CPU, memory) are enforced by the VMM, not by guest-side promises.
Network access is opt-in and constrained via SLIRP.

This model ensures that an agent's capabilities are bounded by its declaration, not by what the runtime environment happens to expose.

Observability and Control

Security without visibility is incomplete. Void-Box emits structured telemetry by design:

Per-stage traces via OpenTelemetry — every agent run and pipeline stage is traced.
Structured events — tool calls, skill invocations, and execution transitions are logged as events.
Metrics — resource consumption, execution duration, and agent health are measured.
vsock communication — all host-guest interaction flows through a single, inspectable channel.

Observability is not an add-on. It is part of the runtime contract.

Security Labs: Proof

The AI Agent Security Labs provide six hands-on, reproducible labs covering prompt injection, container escape, metadata SSRF, and sensitive mount exfiltration — and show how Void-Box reduces blast radius with micro-VM isolation and explicit capability boundaries.

#	Lab	What it demonstrates	Docker result	Void-Box result
01	Prompt Injection	Agent follows attacker-injected instructions	Secrets leaked	Nothing to steal
02	Docker Socket Escape	Mounted Docker socket gives full host access	Full host control	No socket exists
03	Privileged Container Escape	`--privileged` container mounts host disk	Host filesystem access	No host devices
04	Cgroup Escape	cgroups v1 release_agent executes on host	Root code exec on host	Guest kernel only
05	Cloud Metadata SSRF	Default networking reaches cloud metadata service	IAM credentials stolen	Metadata unreachable
06	Sensitive Mount Exfil	Mounted credential dirs expose host secrets	Credentials readable	No host mounts

Explore the Security Labs →

Assumptions and Non-Goals

Void-Box is honest about what it does and does not claim to solve:

Not a sandbox for arbitrary untrusted binaries. Void-Box is designed for AI agent workloads, not general-purpose sandboxing.
Not a replacement for application-level security. If your agent has credentials, it can use them. Void-Box reduces blast radius, not credential exposure.
Not a compliance framework. Void-Box provides technical isolation. Compliance, audit, and identity management are separate concerns.
Guest escape is assumed possible in theory. The security model is defense in depth — any single layer can fail, and the remaining layers still reduce damage.

Learn More

Security Model Docs Architecture Read the Essay