Cluster Provisioning
Production and staging each have their own Kubernetes cluster on Scaleway, managed via scripts/scw.py.
Provisioning subcommands
| Subcommand | Purpose |
|---|---|
scw.py control-plane create [--staging] | Create a Kubernetes control plane (Scaleway POP2-2C-8G) with containerd, kubeadm, Flannel CNI, RBAC, and device plugins |
scw.py runner create --control-plane <name> <count> | Create bare-metal RISC-V runner nodes (Scaleway EM-RV1) and join them to the cluster |
scw.py runner list --control-plane <name> | List runners tagged to a control plane |
scw.py runner reinstall <runner-name> | Reinstall the OS on a runner (wipes and re-joins the cluster). Accepts brace expansion: riscv-runner-{6,25,27} |
scw.py runner setup <runner-name> | Re-run post-install configuration |
scw.py runner reboot <runner-name> | Reboot the bare-metal server |
scw.py runner delete <runner-name> | Delete a runner node |
Defaults: ZONE=fr-par-2, PROJECT_ID=03a2e06e-…, PRIVATE_NETWORK_ID=58fa41d0-…. Constants are hard-coded at the top of scw.py.
Creating a new cluster from scratch
cd scripts
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt
# 1. Create the control plane (--staging for the staging cluster).
python scw.py control-plane create
# 2. Add 3 bare-metal RISC-V runners.
python scw.py runner create --control-plane <control-plane-name> 3
# 3. Push kubeconfigs into GitHub Secrets. Replace `--env prod` with `--env staging`
# when targeting the staging cluster.
SCW_QUERY='zone=fr-par-2 project-id=03a2e06e-e7c1-45a6-9f05-775d813c2e28'
SELECT_HOST='.[] | select(.name == "<control-plane-name>") | .public_ip.address'
HOST=$(scw instance server list $SCW_QUERY -o json | jq -r "$SELECT_HOST")
ssh root@$HOST cat /etc/kubernetes/kubeconfig-gh-app.conf \
| gh secret set K8S_KUBECONFIG --repo riseproject-dev/riscv-runner --env prod
ssh root@$HOST cat /etc/kubernetes/kubeconfig-gh-deploy.conf \
| gh secret set K8S_KUBECONFIG --repo riseproject-dev/riscv-runner --env prod
gh-app is the kubeconfig used at runtime by the scheduler container; it has edit access plus node list permission. gh-deploy is used by CI (the K8S_KUBECONFIG secret read by deploy-images.yml and deploy-device-plugin.yml); it has cluster-admin.
After provisioning
The control plane bootstraps with device-plugin/k8s-ds-device-plugin.yaml and device-plugin/k8s-ds-node-labeller.yaml applied. Each newly-joined node is auto-labelled by the node labeller; the device plugin advertises riseproject.com/runner: 1 so the scheduler can target it.
To verify a node is ready to accept jobs:
kubectl describe node <node-name> | grep -E 'riseproject|kubernetes.io/arch'
Expected:
kubernetes.io/arch=riscv64
riseproject.dev/board=scw-em-rv1 # or cloudv10x-pioneer / cloudv10x-jupiter
riseproject.com/runner: 1 # under "Allocatable"
Kubernetes RBAC
RBAC is configured automatically by scw.py control-plane create. Two user identities matter:
gh-app— used by the scheduler container.editaccess plusnodes: listfor capacity checks.gh-deploy— used by CI.cluster-admin. Stored in GitHub Secrets asK8S_KUBECONFIG.
The node labeller has its own ServiceAccount in kube-system with a ClusterRole granting nodes: get, patch. The device plugin needs no RBAC (it talks to the local kubelet via a Unix socket).
Adding a new board
When new RISC-V hardware enters the fleet:
- SSH into a node of the new board and read
/sys/firmware/devicetree/base/compatible. Note the first NUL-separated entry. - Add a row to
boardMapindevice-plugin/pkg/soc/detect.go. - If the new board needs a dedicated label, extend
matchLabelsToK8sincontainer/cmd/ghfe/payload.goand add the label to Runner Labels. - Push and let the device-plugin deploy workflow roll out the new labeller.