A custom virtual hardware stack — virtual GPUs, virtual NVLink, virtual InfiniBand switches and virtual RDMA NICs — built from scratch in C around QEMU, so unmodified Linux guests run real distributed workloads on simulated silicon.
VDC does not use host-based shortcuts like Linux bridges or Soft-RoCE. Instead it builds custom QEMU PCIe devices and their corresponding Linux kernel drivers, so the guest sees a real RDMA HCA and a real GPU on its PCIe bus. From the application's point of view it is talking to hardware — even though every byte is moving through software.
hicain_net.ko, hicain_ib.ko) probe the device, register with ib_corelibibverbs works unmodified — ibv_rc_pingpong, perftest, NCCL all runA student cannot bring home a $200,000 InfiniBand switch or a 4× H100 server. Existing Linux virt approaches (bridges, Soft-RoCE) hide the hardware semantics that matter — DCB, ECN marking, IB LIDs, GPU peer DMA. VDC keeps those semantics in software so what you learn here transfers directly to NVIDIA, AMD or Intel production hardware.
Each component is its own open-source project under PacketFive. VDC ties them together as the integration / orchestration repository.
PCIe device model exposing standard BARs, doorbells, and DMA queues. The companion
kernel drivers register a netdev and an RDMA HCA. Guest applications use libibverbs
as if it were Mellanox/NVIDIA hardware.
Standalone C daemon acting as a top-of-rack switch. epoll event loop listens on
per-port UNIX sockets, classifies frames (Ethernet / RoCEv2 / InfiniBand LRH), and
forwards using a per-pipeline forwarding table. JSON-over-UDS control plane.
QEMU PCIe device with a SIMT execution model — Streaming Multiprocessors, tensor units, shared memory, HBM-equivalent device memory. Plus a complete NVIDIA-equivalent userland: HCC (CUDA-equivalent), hi-smi, hcc compiler, HiCCL (NCCL-equivalent).
Independent GPU-to-GPU interconnect (NVSwitch-equivalent). A VM with a HiGPU but no
HiNIC can still do GPU peer DMA. Uses ivshmem as the fast-path for
cross-VM shared memory.
The reference deployment follows the Open Compute Project Open Rack v3 — the rack design the modern AI buildout was created around. Two 1OU TOR switches (HiSwitch + HiLink) at the top of the rack, a management host, and eight 2OU VM "blades" acting as GPU compute nodes. The U-heights are visual in the emulator, but the layout maps 1:1 to real ORv3 deployments — making the same diagrams useful for teaching, design and real lab build-outs.
The HiSwitch fabric carries every Ethernet frame, including RoCEv2/RDMA
(UDP/4791) — the same path NCCL's IB transport, perftest and
ibv_rc_pingpong flow over. The HiLink fabric is independent
and carries only GPU-to-GPU peer traffic. A guest can use one, the other, or both —
exactly as in a real GPU rack.
VDC is the lab environment behind the HiCAIN training programme. Students run real distributed-training workloads, RDMA microbenchmarks and MPI collectives against the emulated fabric — building the skills modern AI cluster operations demands, on a laptop.
No physical GPUs, NICs, or switches required. A multi-node GPU cluster fits on a single workstation.
IB LIDs, GRH/LRH headers, DCB pause frames, ECN marking — all the things that matter for real production debugging.
Open-source, modifiable, reproducible. Ideal for academic networking and HPC research where commercial silicon is a black box.