Defense in depth

VoidBox uses a layered security model with five distinct isolation boundaries. Each layer provides independent protection — compromise of one layer does not grant access through subsequent layers.

Five layers of defense

Layer 1: Hardware isolation (KVM / VZ)
  — Separate kernel, memory space, devices per VM

Layer 2: Seccomp-BPF (Linux/KVM)
  — VMM thread restricted to KVM ioctls + vsock + networking syscalls

Layer 3: Session authentication (vsock)
  — 32-byte random secret, per-VM, injected at boot

Layer 4: Guest hardening (guest-agent)
  — Command allowlist, rlimits, privilege drop, timeout watchdog

Layer 5: Network isolation
  — CIDR deny list (both platforms, different enforcement points)
      • Linux/KVM: host-side SLIRP packet filter
      • macOS/VZ:  guest-side blackhole routes
  — Linux/KVM only: SLIRP rate limiting and max concurrent connections

Layer 1: Hardware isolation (KVM / VZ)

Each VoidBox runs in its own micro-VM with a separate kernel, memory space, and devices. Hardware virtualization enforces isolation — not advisory process controls. On macOS, Apple’s Virtualization.framework provides equivalent hypervisor-level isolation.

Layer 2: Seccomp-BPF (Linux/KVM)

On Linux/KVM, the VMM event-loop thread is restricted via seccomp-BPF to only the syscalls needed for KVM operation: KVM ioctls, vsock communication, and networking syscalls. Violations kill that thread (SECCOMP_RET_KILL_THREAD), ending the run without taking down other guests on the host. On macOS/VZ this layer does not apply.

Layer 3: Session authentication (vsock)

Every VM gets a unique 32-byte random session secret, injected via kernel command line. The host authenticates each control connection with a Ping/Pong handshake before sending any ExecRequest, WriteFile, PTY, or telemetry-subscription messages.

Host                                    Guest
  |                                       |
  +-- getrandom(32 bytes)                 |
  +-- hex-encode -> kernel cmdline        |
  |   voidbox.secret=abc123...            |
  |                                       |
  |              boot                     |
  | ----------------------------------->  |
  |                                       +-- parse /proc/cmdline
  |                                       +-- store in OnceLock
  |                                       |
  +-- Ping [secret + version]             |
  | ----------------------------------->  |
  |                                       +-- verify secret
  |                                       +-- mark connection authenticated
  | <-----------------------------------  |
  |  Pong [version]                       |
  |                                       |
  +-- ExecRequest { ... }                 |
  | ----------------------------------->  |
  |                                       +-- execute on authenticated channel
  | <-----------------------------------  |
  |  ExecResponse { ... }                 |

Layer 4: Guest hardening (guest-agent)

The guest-agent (PID 1) enforces four independent controls:

Command allowlist

Only approved binaries execute. The allowlist is read from /etc/voidbox/allowed_commands.json, provisioned by the trusted host at boot.

Resource limits

setrlimit enforces memory, file descriptor, and process count limits. Read from /etc/voidbox/resource_limits.json.

Privilege drop

Child processes run as uid:1000. The guest-agent drops privileges before executing any command, preventing root access inside the VM.

Timeout watchdog

A watchdog timer sends SIGKILL to child processes that exceed the configured timeout, preventing runaway execution.

Layer 5: Network isolation

The CIDR deny list applies on both platforms, but the enforcement point differs. Linux/KVM adds two further host-side controls; macOS/VZ does not, because Apple’s NAT attachment exposes no host-side filter hook.

CIDR deny list (both platforms)

The deny list (default: 169.254.0.0/16 link-local, e.g. cloud metadata) is enforced in different layers depending on the backend:

Linux/KVM — host-side SLIRP packet filter. The smoltcp-based usermode stack consults the deny list before NATing and drops outbound packets whose destination matches. The guest never sees a response. No file is provisioned in the guest.
macOS/VZ — guest-side blackhole routes. Apple’s VZNATNetworkDeviceAttachment has no host-side filter hook, so the host writes /etc/voidbox/network_deny_list.json before boot. The guest-agent reads it and installs blackhole routes for each entry immediately after bringing up the network — before any workload runs.

Host-side SLIRP controls (Linux/KVM only)

On top of the deny-list filter, the Linux/KVM SLIRP stack adds:

Rate limiting on new connections — prevents connection floods from the guest.
Maximum concurrent connection limit — bounds host resource usage.

macOS/VZ networking

macOS uses VZNATNetworkDeviceAttachment. The VM boundary and the guest-side blackhole routes still apply, but rate limiting and the concurrent connection cap are not enforced host-side there.

Snapshot security considerations

Snapshot cloning shares identical VM state across restored instances. Three areas require awareness:

RNG entropy

Restored VMs inherit the same /dev/urandom pool. Operationally, treat cloned snapshots as cloned execution state: use short-lived tasks, avoid assuming fresh entropy after restore, and rebuild snapshots when that matters.

ASLR

The guest kernel boots with nokaslr, so KASLR is disabled guest-wide — this is a guest-kernel property, not a snapshot-specific effect. Clones inherit an identical layout regardless. Mitigations: short-lived tasks, no direct network addressability (SLIRP NAT), and the command allowlist limiting attack surface.

Session isolation

Restored VMs reuse the snapshot’s stored session secret for vsock authentication (the secret is baked into the guest’s kernel cmdline in snapshot memory). Per-restore secret rotation would require guest-side support.