Scheduler
The scheduler is a background Go service that runs a reconciliation loop. It matches pending job demand to available RISC-V node capacity, provisions runner pods, syncs worker state with Kubernetes and GitHub, and cleans up terminated pods. It also serves read-only HTML dashboards for jobs and workers.
Source: container/cmd/scheduler/
Reconciliation loop
The scheduler is woken by PostgreSQL LISTEN/NOTIFY (the ghfe webhook handler emits a {schema}_queue_event notification when a new job is recorded) or by a 15-second timeout (PollInterval). Each iteration acquires the workers-table lock, runs three phases, then releases the lock:
syncJobsState— reconciles every active DB job againstGH.GetJobInfo. If GitHub returns 404 the job is markedfailed. If the job’s parent run has completed but the job is still queued pastJobStuckQueuedMinAge(10m), it is marked failed too.syncWorkersState— single transaction holdingLOCK TABLE workers IN EXCLUSIVE MODE, running five sub-phases (see below).demandMatch— provisions new runner pods where demand exceeds supply.
Only one scheduler at a time may run syncWorkersState. If a second instance is deployed it blocks on the table lock until the first commits. The scheduler container’s serverless.yml pins minScale=1 and maxScale=1 to keep that invariant trivially.
sequenceDiagram
participant GH as GitHub
participant H as ghfe
participant DB as PostgreSQL
participant S as Scheduler
participant K as Kubernetes
participant N as RISC-V Node
GH->>H: workflow_job (queued)
H->>DB: INSERT job + NOTIFY queue_event
DB-->>S: LISTEN wakes scheduler
S->>S: syncJobsState (GH reconcile)
S->>S: syncWorkersState (5 phases under LOCK TABLE)
S->>S: demandMatch
S->>DB: SELECT pending jobs (FIFO)
S->>S: Check (demand > supply, max_workers cap, k8s capacity)
S->>GH: AuthenticateApp (per-installation JWT cached 59m)
S->>GH: CreateJITRunnerConfig{Org,Repo}
S->>K: ProvisionRunner (pod with RUNNER_JITCONFIG)
K->>N: Schedule on board-matching node
N->>GH: Register as JIT runner
GH->>H: workflow_job (in_progress)
H->>DB: UPDATE status → running
Note over N: Job executes
GH->>H: workflow_job (completed)
H->>DB: UPDATE status → completed
S->>K: Reconcile pod phase, kill stuck pods
S->>K: Delete pods past 6h grace
Demand matching
Demand and supply are matched by (entity_id, job_labels), not by pool. This avoids stuck workers when different label sets map to the same pool but the workflow expects matching runner labels.
demand = COUNT(jobs WHERE entity_id = ? AND job_labels = ? AND status IN ('pending','running'))
supply = COUNT(workers WHERE entity_id = ? AND job_labels = ? AND status IN ('pending','running'))
deficit = demand - supply
For each pending job, processed FIFO by created_at:
- Demand check. Skip if
supply >= demand. - Max-workers cap. Skip if the entity (organization or personal account) has reached its configured limit across all pools. The default is
DefaultMaxWorkers = 20; per-entity overrides live inEntityConfigsininternal/constants.go. - Capacity check. Query Kubernetes for available
riseproject.com/runnerslots on nodes matching the pool’sriseproject.dev/boardselector. Skip if no slots are free. - Provision. All checks pass:
DB.AddWorkerreserves a unique runner name (retries onErrDuplicatePodName).GH.AuthenticateAppfor the correct App (org or personal, chosen by entity type).- For organizations:
GH.EnsureRunnerGroup("RISE RISC-V Runners"), thenGH.CreateJITRunnerConfigOrg. - For personal accounts:
GH.CreateJITRunnerConfigRepo(repo-scoped). K8s.ProvisionRunnercreates the pod.
A failed worker does not count toward supply, so the next loop iteration automatically re-provisions a runner for the same pending job.
Pod provisioning
Runner pods are created via the Kubernetes API with:
- Namespace:
default. - Labels:
app=rise-riscv-runner,riseproject.dev/board=<pool>. - Node selector:
riseproject.dev/board: <pool>(targets the correct hardware). - Resource limit:
riseproject.com/runner: 1(enforces one pod per node via the device plugin). - Ephemeral storage: request 40 Gi, limit 90 Gi (on
scw-em-*pools only). - Active deadline:
525,600seconds (~6 days). Patched to1to kill stuck pods (see Health checks). - Security context:
privileged: true, host network. Required by the in-pod Docker daemon to program iptables and bridge devices. - Environment:
RUNNER_JITCONFIG(base64 JIT token from GitHub),RUNNER_WAIT_FOR_DOCKER_IN_SECONDS=60. - Single container. No init containers, no volumes. The image entrypoint launches the GitHub Actions runner directly. See Container Images.
syncWorkersState phases
Each iteration of syncWorkersState runs under LOCK TABLE workers IN EXCLUSIVE MODE so the five phases observe a consistent snapshot:
| Phase | Method | What it does |
|---|---|---|
| 1 | OrphanSweep | Worker rows with no matching k8s pod are marked terminal. |
| 2 | PodPhaseSync | Pod phase Running / Succeeded / Failed is reflected onto the worker status. failure_info is populated for failed pods. |
| 2.5 | UnreachableNodeCheck | Lists nodes tainted with node.kubernetes.io/unreachable and fails any worker whose pod is stranded on one. The pod is force-deleted so it does not sit in Terminating forever waiting on an absent kubelet. Skipped if the node-list call fails or returns no unreachable nodes. |
| 3 | HealthChecks | Per (installation, entity_type, entity_id, repo) fetches GH access tokens and runner lists; classifyWorker runs a state machine over each worker (see below). |
| 4 | GitHubCleanup | Deletes GH-registered runners whose worker row is terminal or missing. Org runners are scoped to the RISE RISC-V Runners runner group; repo runners are filtered by rise-riscv-runner{-staging}- name prefix. |
| 5 | DeleteTerminalPods | Deletes Succeeded/Failed pods past PodDeleteGrace (6h). |
Health checks
Phase 3 runs a per-worker decision tree. Rather than deleting the pod directly, the scheduler patches spec.activeDeadlineSeconds = 1. The kubelet then transitions the pod to Failed (reason DeadlineExceeded), so it enters the normal grace-and-delete flow and logs/events remain inspectable.
runner_never_registered— pod has beenRunningfor more thanRunnerRegistrationTimeout(120s) but the runner never appeared in the GitHub API. Worker is markedfailedwith full diagnostics infailure_info; pod is killed so its slot frees up for a retry.pod_stuck_pending— pod has beenPendingfor more thanPodPendingTimeout(600s), typically due to missing capacity or an image-pull failure.runner_idle— runner is registered with GitHub, online, and not busy for longer thanRunnerPendingTimeout(600s).
Phase 3 first tries to delete the GH-side runner. If GitHub refuses (e.g. 422 "Runner is busy"), syncWorkersState aborts cleanup for that worker. GitHub believes a job is still running, so the worker is left alone and retried next cycle.
Phase 2.5 (UnreachableNodeCheck) handles the separate case of a pod on a node the kube node controller has marked unreachable. The kubelet is gone, so:
node_unreachable— failure reason recorded on the worker. The pod is force-deleted (zero grace period) instead of patched, since patchingactiveDeadlineSecondscannot be applied without a live kubelet. The GH runner row, now offline, is cleaned up by Phase 4 on a subsequent iteration.
Lifecycle state machines
Job: forward-only pending → running → completed | failed. Every UPDATE enforces this with explicit WHERE clauses.
queued webhook in_progress webhook completed webhook
│ │ │
▼ ▼ ▼
pending ────────────► running ────────────► completed
│ ▲
└─────────────────── completed webhook ─────────┘
(received before provisioning happened)
Worker: pending → running → completed | failed. Failures populate failure_info. Worker rows are never deleted; the row outlives the pod by the 6-hour grace period.
AddWorker pod transitions pod terminates
reserves ┌─► Running ─► completed (after 6h grace)
pending row │ ▲
│ │
▼ │
Failed pod or
health-check kill
│
▼
failed (failure_info populated)
Configuration
| Setting | Value | Source |
|---|---|---|
| Poll interval | 15 s | internal/constants.go (PollInterval) |
| Max workers per entity | Per-entity in EntityConfigs, else 20 | internal/constants.go (DefaultMaxWorkers) |
| Pod active deadline | 525,600 s (~6 days) | internal/k8s.go |
| Pod delete grace | 6 h | PodDeleteGrace |
| Runner registration timeout | 120 s | RunnerRegistrationTimeout |
| Runner pending timeout | 600 s | RunnerPendingTimeout |
| Pod pending timeout | 600 s | PodPendingTimeout |
| Stuck-queued job age | 10 m | JobStuckQueuedMinAge |
| Postgres pool size | 10 | PostgresMaxConn |
| HTTP port | 8080 | HTTPPort |
HTTP endpoints
The scheduler exposes read-only HTML dashboards on the same port (8080). Each page has a .json variant returning paginated JSON with a GitHub-style Link header (page, per_page defaults to 100; date filters: start, end accept YYYY-MM-DD or -Xd).
| Route | Method | Purpose |
|---|---|---|
/health | GET | Health check |
/usage, /usage.json | GET | Live pool/job/worker view, grouped by (entity_id, labels) |
/history, /history.json, /jobs, /jobs.json | GET | Job history sorted by status then created_at |
/workers, /workers.json | GET | Worker history with failure_info |
/history and /jobs serve the same content; /jobs exists as a shorter alias.
Related files
container/cmd/scheduler/main.go: server, loop,WithWorkerLockwrapper.container/cmd/scheduler/handlers.go: HTTP routes.container/cmd/scheduler/sync_jobs.go: GH reconcile, stuck-queued detection.container/cmd/scheduler/sync_workers.go: 5-phase pipeline andclassifyWorker.container/cmd/scheduler/demand_match.go: demand matching and provisioning.container/cmd/scheduler/gh_auth.go:ghAuthenticatewrapper andauth_attempt.*logging.container/cmd/scheduler/templates.go: HTML rendering for dashboards.container/internal/k8s.go: pod provisioning, capacity checks,CollectPodFailureInfo.container/internal/github.go: GitHub App auth, JIT config, runner group / runner CRUD.container/internal/db.go: jobs / workers / events DB operations,WithWorkerLock.container/internal/constants.go: all configuration andEntityConfigs.