KVM
What It Is
KVM (Kernel-based Virtual Machine) is the Linux kernel's built-in hypervisor. It turns Linux into a Type 1 hypervisor -- the kernel manages virtual machines directly, using hardware virtualization extensions (Intel VT-x / AMD-V) in the CPU.
KVM is not a standalone program you install. It's a kernel module (kvm.ko)
that exposes an API (/dev/kvm) for creating and managing VMs. Userspace tools
(QEMU, cloud-hypervisor, Firecracker, crosvm) use this API to build complete
VM managers.
How It Works
Hardware Virtualization
Modern CPUs have extensions specifically for running VMs:
- Intel VT-x / AMD-V: Allow a hypervisor to run guest code directly on the CPU in a restricted mode. The guest thinks it's running on bare metal, but certain operations (I/O, privileged instructions) trap to the hypervisor.
- Intel EPT / AMD NPT: Hardware-assisted memory page tables for guests. The guest manages its own page tables, and the CPU translates guest-physical to host-physical addresses in hardware. No software overhead.
- Intel VT-d / AMD-Vi: IOMMU for device passthrough. A physical device (GPU, NIC, NVMe) can be assigned directly to a VM with DMA isolation.
KVM uses these extensions to run guest code at near-native speed. Only privileged operations (I/O, interrupts) require hypervisor intervention.
The VMM (Virtual Machine Monitor)
KVM provides the CPU and memory virtualization. A userspace VMM provides everything else: virtual devices (disk, network, display), firmware (UEFI for the guest), and management APIs.
FortrOS uses cloud-hypervisor as its VMM:
- Rust-native (no C codebase to audit)
- REST API over Unix socket (easy to automate)
- Live migration support (send/receive VM state between hosts)
- SEV-SNP/TDX support (confidential VMs)
- Minimal: ~140,000 lines (vs QEMU's ~4 million)
- 4.8MB static binary
VM Lifecycle
1. Create VM config (CPUs, memory, disks, network)
2. Open /dev/kvm, create VM file descriptor
3. Allocate guest memory, load firmware/kernel
4. Create vCPU file descriptors
5. Enter VM loop: KVM_RUN -> guest executes -> exit on I/O/interrupt ->
VMM handles exit -> resume guest
The VM loop is the core: the guest runs directly on the CPU (via VT-x/AMD-V), and only traps to the VMM when it needs something the VMM provides (disk I/O, network packets, interrupts).
How FortrOS Uses It
Tier 3-4 workloads: Org VMs (build VM, database) and user VMs (desktops) run under cloud-hypervisor via KVM. The reconciler manages the VM lifecycle.
VM storage: Base image (from org shard storage) + qcow2 COW (copy-on-write) overlay. The base image is read-only and shared; the overlay captures writes.
VM networking: TAP interfaces created by FortrOS, bridged to the WireGuard overlay or physical NIC per the workload manifest's network profile.
Live migration: cloud-hypervisor supports send/receive-migration for moving running VMs between hosts. The reconciler coordinates via conn_auth TCP (port 7206).
Confidential VMs: On AMD EPYC with SEV-SNP, VM memory is encrypted by the CPU with per-VM keys. The host cannot read guest memory. FortrOS detects SEV-SNP at runtime and enables it per org policy.
Why cloud-hypervisor
FortrOS chose cloud-hypervisor as its VMM. The decision comes from balancing attack surface, feature coverage, and API quality.
The VMM Landscape
| QEMU | cloud-hypervisor | Firecracker | crosvm | |
|---|---|---|---|---|
| Language | C (~2M lines) | Rust (~106K lines) | Rust (~83K lines) | Rust (~200K lines) |
| CVEs | 354+ | Very few | ~5-6 | Few |
| qcow2 COW overlays | Full | Supported | No (raw only) | Supported |
| Live migration | Mature | Supported | No | Experimental |
| SEV-SNP/TDX | Production | Experimental | No | No |
| GPU passthrough (VFIO) | Full | Supported | No (no PCIe) | Supported + virtio-gpu |
| API | QMP (JSON socket) | REST (OpenAPI) | REST | CLI args |
| Incremental disk tracking | Full dirty bitmap API | No | No | No |
| Windows guests | Mature | Supported | No | Supported |
| Boot time | ~500ms+ | ~200ms | ~125ms | ~150ms |
| Production users | Everyone | Fly.io, Kata | AWS Lambda | ChromeOS |
Why Not QEMU
QEMU is the most feature-complete VMM, and its dirty bitmap API for incremental disk backup is unmatched -- no other VMM has anything comparable. But QEMU is ~2 million lines of C with 354+ CVEs. The attack surface includes device emulation for hardware that hasn't been manufactured in decades (floppy controllers, ISA bus, PS/2). This code is compiled in by default and runs in the VMM's address space.
For an OS where the VMM runs in the trust boundary (a compromised VMM compromises the host), the codebase size and language matter. QEMU's feature breadth is a liability when you only need a fraction of its capabilities.
QEMU also lacks a clean programmatic API -- QMP is a stateful JSON socket protocol with async events, requiring careful connection management. The reconciler needs to create, boot, snapshot, migrate, and destroy VMs programmatically. A REST API maps to this naturally.
Why Not Firecracker
Firecracker is the smallest and most secure VMM, purpose-built for ephemeral serverless functions. But it deliberately excludes features FortrOS needs: no qcow2 (raw images only, no COW overlays), no GPU passthrough (no PCIe at all), no live migration, no Windows guests. It is optimized for a use case (thousands of short-lived function VMs) that isn't FortrOS's.
Why Not crosvm
crosvm has an interesting feature FortrOS lacks: paravirtualized virtio-gpu with 3D acceleration. This would let VMs render with GPU acceleration without full VFIO passthrough -- relevant for laptops with integrated GPUs that can't be exclusively assigned to one VM.
However, crosvm is tightly coupled to Google's ecosystem (ChromeOS, Android), has no REST API (controlled via CLI arguments), no SEV-SNP/TDX support, and is less documented for standalone server use. The process-per-device sandbox model is strong but adds operational complexity.
crosvm's virtio-gpu capability is worth tracking as a future option for the client VM / thin client use case.
Why cloud-hypervisor
cloud-hypervisor provides the right tradeoff:
- Attack surface: 106K lines of Rust vs 2M lines of C. Memory safety eliminates entire categories of vulnerabilities (buffer overflows, use-after-free).
- REST API: Clean OpenAPI interface. The reconciler calls HTTP endpoints
(
/vm.create,/vm.boot,/vm.snapshot,/vm.send-migration). Stateless request/response, no connection management. - Feature coverage: qcow2 COW overlays, live migration, VFIO GPU passthrough, SEV-SNP/TDX (experimental), snapshot/restore, CPU/memory hotplug. Covers FortrOS's requirements.
- Boot time: ~200ms. Fast enough for on-demand VM creation.
The Gap: Incremental Disk Tracking
The one significant feature cloud-hypervisor lacks is QEMU's dirty bitmap API for incremental disk backup. QEMU can track which disk blocks changed since the last checkpoint and export only those blocks. cloud-hypervisor tracks dirty memory pages (for live migration) but not dirty disk blocks.
FortrOS solves this at the storage layer rather than the VMM layer:
- dm-snapshot: Linux device-mapper tracks block changes below the VMM. The VMM doesn't need to participate -- the kernel maintains a bitmap of changed blocks since the last snapshot.
- qcow2 allocation diffing: The qcow2 overlay's allocation tables record which clusters have been written. Comparing tables between snapshots identifies new data without VMM involvement.
This approach is VMM-independent: if FortrOS ever needs to switch VMMs, the incremental sync still works because it operates below the VMM.
Alternatives
QEMU: The reference VMM. Unmatched feature set including dirty bitmap API. Rejected for FortrOS due to attack surface (2M lines C, 354+ CVEs) and lack of clean API.
Firecracker: Minimal attack surface, fastest boot. Rejected because it lacks qcow2, GPU passthrough, live migration, and Windows support.
crosvm: Paravirtualized virtio-gpu is interesting for desktop VMs. Rejected due to Google ecosystem coupling, no REST API, no SEV-SNP/TDX. Tracked as a future option for virtio-gpu.
Links
- KVM -- Official KVM site
- cloud-hypervisor
- KVM API Documentation
- Firecracker
- crosvm