Operations
Material for operators of the service. Read this if you maintain a deployment, provision new RISC-V hardware, debug a stuck job, or inspect the running state.
For an architectural overview of what runs where, see the Architecture section. For component-level build and test commands, see each component’s README.md in the monorepo.
Pages in this section
- Cluster Provisioning: create and maintain Scaleway control planes and bare-metal RISC-V nodes with
scripts/scw.py. - Runbooks: inspect database state, force pod cleanup, debug failed workers via the trace endpoints, and rotate secrets.
Deployments at a glance
Production and staging each have their own Kubernetes cluster on Scaleway. Four Scaleway Container Functions are deployed in total:
ghfe+scheduler(production, deployed frommain).ghfe+scheduler(staging, deployed fromstaging).
All four are defined by container/serverless.yml. The scheduler’s serverless.yml pins minScale=1 maxScale=1 so the LOCK TABLE workers invariant is trivially preserved.
| Service | Product | Purpose |
|---|---|---|
ghfe | Scaleway Container Function | Receives GitHub webhooks, writes job state to PostgreSQL |
scheduler | Scaleway Container Function | Demand matching, pod provisioning, cleanup, worker sync |
| State store | Scaleway Managed Database | PostgreSQL: jobs, workers, installation_events |
| Runner pods | Self-hosted Kubernetes clusters on Scaleway EM-RV1 (and CloudV 10xE Pioneer / Jupiter) | Ephemeral RISC-V runner pods |