Operations

Material for operators of the service. Read this if you maintain a deployment, provision new RISC-V hardware, debug a stuck job, or inspect the running state.

For an architectural overview of what runs where, see the Architecture section. For component-level build and test commands, see each component’s README.md in the monorepo.

Pages in this section

Cluster Provisioning: create and maintain Scaleway control planes and bare-metal RISC-V nodes with scripts/scw.py.
Runbooks: inspect database state, force pod cleanup, debug failed workers via the trace endpoints, and rotate secrets.

Deployments at a glance

Production and staging each have their own Kubernetes cluster on Scaleway. Four Scaleway Container Functions are deployed in total:

ghfe + scheduler (production, deployed from main).
ghfe + scheduler (staging, deployed from staging).

All four are defined by container/serverless.yml. The scheduler’s serverless.yml pins minScale=1 maxScale=1 so the LOCK TABLE workers invariant is trivially preserved.

Service	Product	Purpose
`ghfe`	Scaleway Container Function	Receives GitHub webhooks, writes job state to PostgreSQL
`scheduler`	Scaleway Container Function	Demand matching, pod provisioning, cleanup, worker sync
State store	Scaleway Managed Database	PostgreSQL: `jobs`, `workers`, `installation_events`
Runner pods	Self-hosted Kubernetes clusters on Scaleway EM-RV1 (and CloudV 10xE Pioneer / Jupiter)	Ephemeral RISC-V runner pods

Operations

Pages in this section

Deployments at a glance

Table of contents