Runtime Model

High-Level Model

slotd is a single-host scheduler. The runtime model is intentionally simple:

There is no controller/worker split and no remote node launch protocol.

slotd schedules three resource types:

Current behavior:

CPU reservation is ntasks * cpus-per-task
ntasks launches one local process per task rank for batch and foreground execution
total memory defaults to host-detected MemTotal from /proc/meminfo, with a 16384 MB fallback
memory is stored in MB
GPUs are integer slots
admission is reservation-based, not usage-based
if SLOTD_CGROUP_BASE is unset, CPU and memory remain reservation-only
if SLOTD_CGROUP_BASE is set to a writable cgroup v2 subtree, slotd writes memory.max and cpu.max
if cgroup setup fails after explicit configuration, launch fails instead of silently skipping enforcement

Configured by environment:

Rules:

only configured partition names are accepted
if there are no GPUs, no GPU partition is exposed
if a GPU partition is selected and --gpus is omitted, the default GPU request is 1
otherwise the default GPU request is 0
CPU and GPU partitions are virtual views over one local host
CPU and memory capacity stay shared across partitions; only GPU visibility/defaults differ by partition