Client Profiles and Roaming

What It Is

A FortrOS client is a user's roaming VM -- their desktop environment, apps, and session state. Close the laptop, open a different one, pick up where you left off. The VM IS the profile.

The Problem

Moving a user's environment between machines is harder than it sounds:

Full disk copy is slow. A 50GB VM image takes minutes to transfer even on gigabit. Most of the image is identical OS files.
Profile sync (dotfiles, config directories) misses application state (running processes, open files, in-memory buffers).
Cloud desktops (Citrix, VMware Horizon, Microsoft AVD) require constant connectivity and expensive licensing. Offline = no desktop.
ChromeOS roaming syncs browser state and settings but not Linux containers, local files, or app data that isn't cloud-backed.

None of these provide "close laptop, open another, keep working with full state" without significant limitations.

How Others Do It

VDI (Citrix, VMware Horizon, AVD)

Enterprise VDI separates the OS image from user state. A golden image (shared, read-only) provides the OS and apps. User state comes from a profile container: FSLogix mounts a VHDX file from a network share at login. The user's C:\Users\name directory IS the virtual disk. Writes go directly to the network-stored VHDX. At logoff, the VHDX is detached.

Strengths: Works for thousands of users. Non-persistent desktops (any user gets any VM) keep costs down. FSLogix eliminates the copy-at-login latency of traditional roaming profiles.

Weaknesses: Requires constant network to the profile share. Offline access is either nonexistent or poorly maintained. GPU workloads are expensive (vGPU licensing). Windows licensing adds cost (VDA subscription required for Windows VMs on non-Windows hosts). App compatibility issues when apps hardcode paths.

Qubes OS

Qubes uses a two-layer model: a TemplateVM provides the read-only root filesystem (OS, packages), and an AppVM gets the template's root mounted read-only plus its own persistent /home and /rw directories. Multiple AppVMs share one template. Root filesystem changes are discarded on shutdown; user files persist.

DisposableVMs are ephemeral: spawned from a template, used once, destroyed entirely. Boot in 1-3 seconds using cached memory snapshots. No state persists.

Strengths: Clean separation of system (template) from user (AppVM). Disposables are fast and safe. Weaknesses: No roaming -- AppVMs are tied to a single machine. No remote access built in.

ChromeOS

ChromeOS syncs through Google account services. Browser state, Android apps, and system settings sync. Linux container (Crostini) state does NOT sync. Local files outside Google Drive don't sync. First sign-in on a new device takes 2-5 minutes to rebuild from cloud state.

Strengths: Simple for browser-centric users. Fast enough for casual use. Weaknesses: Not instant (rebuilds from cloud, doesn't transfer running state). Anything outside the Google ecosystem is local-only.

The Fundamentals

Working from what the research shows, the correct design emerges from block-level COW rather than application-level state tracking.

Base Image + Block-Level Overlay

The VM's disk is a base image (shared, read-only) with a qcow2 overlay (per-user, captures all writes). This is the same mechanism as Qubes (template + AppVM), VDI (golden image + differencing disk), and Docker (image layers + container layer).

Base image: the OS, installed packages, default config. Shared across clients using the same blueprint. Stored in org shard storage.
Overlay: every write the user makes (documents, app data, config changes, browser history). Per-user. This is what makes the VM "theirs."

The overlay is typically much smaller than the base image (changed blocks only), making it efficient to transfer.

Incremental Sync via Overlay Diffing

The VM's disk overlay is a qcow2 file. Only allocated clusters (blocks that have been written) contain data -- the rest falls through to the base image. The qcow2 format's allocation tables tell you exactly which clusters have data.

The sync mechanism works at the qcow2 level, independent of the VMM (cloud-hypervisor doesn't need to participate):

VM runs, writes accumulate in the qcow2 overlay
Periodically: snapshot the overlay (filesystem-level reflink copy, or device-mapper snapshot below the qcow2 file)
Diff the current overlay against the previous snapshot at the cluster level (compare L1/L2 allocation tables, identify newly allocated clusters)
Ship only the new clusters to org shard storage
To resume on another host: start from base image + apply cluster diffs in order

Alternatively, device-mapper snapshots at the kernel level track block changes below the VMM entirely -- dm-snapshot maintains a bitmap of changed blocks since the last snapshot, giving precise incremental diffs without parsing qcow2 internals.

No per-app awareness needed. No filesystem-level hooks. The block layer tracks everything regardless of which process wrote it.

Three-Layer Model

Layer	What It Contains	Sync Model	Lifecycle
Blueprint	Base image + workload manifest (packages, settings, defaults)	Org CRDT (same as workload manifests)	Rebuilt on blueprint change
State	Block-level overlay (all user writes: files, app data, config)	Dirty bitmap incremental sync to shard storage	Streams continuously while VM runs
Scratch	Local performance cache (build artifacts, game installs, download cache)	Not synced. Travels with the VM during live migration. Rebuilt from state layer on cold migration.	Ephemeral per-host (except during live migration)

The blueprint IS a workload manifest -- a client VM is a workload. The distinction from org service manifests is that client VMs have a state layer (the user's overlay) that persists and roams.

Per-App State Isolation

Block-level sync handles the transport. But for rollback, conflict resolution, and selective restore, it helps to know which files belong to which app. Linux provides this naturally:

Flatpak model: Each app has its own data directory (~/.var/app/<app-id>/). The app sees standard XDG paths via bind mounts.
Separate mount points: Each app's data directory is a separate subvolume or mountpoint within the overlay
cgroup attribution: systemd puts each app in its own cgroup. Filesystem writes can be attributed to the cgroup (via eBPF or fanotify)

The simplest approach: apps that need isolated state get their own data directory (Flatpak pattern). Apps that don't (traditional Linux apps) share the user's home directory. Rollback is per-directory, not per-block.

Client VM Access and Privacy

Who Can See Inside the VM?

This is org-policy-driven, not a fixed design choice:

Policy	Org Visibility	Unlock Method	Use Case
managed	Org can inspect, monitor, manage	Org auth (automatic at boot)	Family (parent manages kids), enterprise IT
private	Org manages the base image but user data is encrypted per-user	YubiKey / CAC + PIN	Enterprise (user privacy within org management)
zero-knowledge	Org cannot see inside the VM at all. User data encrypted with user-only key.	CAC + PIN only (org has no key)	Intelligence, whistleblower, sensitive operations

For private and zero-knowledge policies, the state layer (user's overlay) is encrypted with a key derived from the user's physical token + PIN. The org can manage the base image and the VM's lifecycle (start, stop, migrate) but cannot decrypt the user's data. The org sees opaque encrypted blocks in shard storage.

Windows Client VMs

Windows VMs have specific considerations:

Licensing: Windows VMs on a non-Windows host require VDA (Virtual Desktop Access) subscriptions (~$100/user/year) or Microsoft 365 E3/E5. This is a real cost for the family/business use case.
Activation: VM migration changes hardware IDs, potentially triggering reactivation. KMS or Active Directory Based Activation handles this for enterprises. Home users may face friction.
GPU passthrough: NVIDIA removed the anti-VM block (Error 43) for KVM guests as of driver 465+ (2021). Consumer GeForce cards work in passthrough (one GPU, one VM). GPU sharing (vGPU) requires enterprise NVIDIA GRID licensing.
The user experience: The user sees Windows. FortrOS is invisible. The base image is a Windows install with FortrOS's guest agent (handles overlay sync, display streaming, peripheral passthrough). The user doesn't know they're in a VM unless they look at Device Manager.

Resuming on a New Host

The full sequence when a user opens a different device:

Device boots FortrOS (invisible), connects to org overlay
User authenticates (CAC + PIN, YubiKey + PIN, or automatic per policy)
Reconciler checks: is this user's client VM running elsewhere?
- Yes, on another host: Live migrate it (VM state + scratch transfer intact) or connect via streaming
- No: Pull base image from org storage (or use local cache), apply latest checkpoint + incremental diffs from shard storage, boot the VM
VM resumes. User sees their desktop exactly as they left it.

For cached hosts (the user's regular laptop), step 3 is mostly local -- the base image is cached, and only the most recent diffs need to be pulled. This takes seconds. For a fresh device, the full base image + state must be pulled, which takes longer depending on size and network speed.