Database Schema
Persistent state for the service lives in a single Scaleway Managed PostgreSQL database. Tables are isolated by schema (prod or staging), selected per-connection via SET search_path. There is no auto-migration: the DDL below is the source of truth and must be applied by hand when it changes.
The schema models three things:
jobsrecords demand (each GitHubworkflow_jobbecomes one row).workersrecords supply (each provisioned pod becomes one row; never deleted).installation_eventsis an append-only history of every webhook delivery and every scheduler GitHub-auth failure. See Installation event log.
Enums
CREATE TYPE status_enum AS ENUM ('pending', 'running', 'completed', 'failed');
CREATE TYPE provider_enum AS ENUM ('github', 'gitlab', 'azdo');
CREATE TYPE entity_type_enum AS ENUM ('Organization', 'User');
jobs
CREATE TABLE jobs (
job_id BIGINT PRIMARY KEY,
status status_enum NOT NULL DEFAULT 'pending',
failure_info JSONB,
provider provider_enum NOT NULL,
entity_id BIGINT NOT NULL,
entity_name TEXT NOT NULL,
entity_type TEXT NOT NULL, -- 'Organization' or 'User'
repo_full_name TEXT NOT NULL,
installation_id BIGINT NOT NULL,
job_labels JSONB NOT NULL DEFAULT '[]', -- sorted at write time
k8s_pool TEXT NOT NULL,
k8s_image TEXT NOT NULL,
k8s_pod TEXT,
html_url TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_jobs_active ON jobs (entity_id, job_labels, created_at) WHERE status != 'completed';
CREATE INDEX idx_jobs_reconcile ON jobs (installation_id) WHERE status != 'completed';
CREATE INDEX idx_jobs_created ON jobs (created_at DESC);
Inserts come exclusively from ghfe, with ON CONFLICT (job_id) DO NOTHING so redelivered webhooks are no-ops. Every INSERT fires a trigger that emits NOTIFY {schema}_queue_event; the scheduler LISTENs on that channel and wakes immediately rather than waiting for its 15-second tick.
Status transitions are forward-only: pending → running → completed | failed. All UPDATE queries enforce this with explicit WHERE clauses, so out-of-order webhook deliveries cannot regress state.
workers
CREATE TABLE workers (
pod_name TEXT PRIMARY KEY,
provider provider_enum NOT NULL,
entity_id BIGINT NOT NULL,
entity_name TEXT NOT NULL,
entity_type TEXT NOT NULL, -- 'Organization' or 'User'
installation_id BIGINT NOT NULL, -- GitHub App installation
repo_full_name TEXT, -- set only for User entities; NULL for Organization
job_labels JSONB NOT NULL DEFAULT '[]',
k8s_pool TEXT NOT NULL,
k8s_image TEXT NOT NULL,
k8s_node TEXT,
status status_enum NOT NULL DEFAULT 'pending',
failure_info JSONB, -- exhaustive diagnostics for Failed and stuck pods (version=2)
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
running_at TIMESTAMPTZ, -- set when pod first reaches Running
completed_at TIMESTAMPTZ, -- set when status transitions to completed | failed
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_workers_active ON workers (entity_id, job_labels, k8s_pool) WHERE status != 'completed';
Inserts come exclusively from the scheduler as part of tryProvision (which retries on ErrDuplicatePodName to handle name collisions). Worker rows are never deleted: terminal rows with failure_info populated are kept for post-mortem debugging. The (entity_id, job_labels) index supports demand matching.
A worker row’s lifecycle is decoupled from its pod’s. The row transitions to completed or failed on pod phase change, even though the pod itself stays around for the 6-hour delete grace so its logs and events remain inspectable.
installation_events
CREATE TABLE installation_events (
id BIGSERIAL PRIMARY KEY,
source TEXT NOT NULL, -- 'webhook' or 'scheduler'
event TEXT NOT NULL, -- '{X-GitHub-Event}.{payload.action}' or 'auth_attempt.{status}'
outcome TEXT NOT NULL, -- WebhookOutcome value (open-set TEXT, no migration on add)
installation_id BIGINT,
app_id BIGINT, -- GHAppOrgID or GHAppPersonalID
entity_type entity_type_enum,
entity_id BIGINT, -- installation.target_id = account.id (stable across renames)
entity_name TEXT, -- account login
payload JSONB NOT NULL, -- full webhook body, or synthesised payload for scheduler rows
received_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_install_events_installation ON installation_events (installation_id, entity_id);
CREATE INDEX idx_install_events_entity ON installation_events (entity_id, received_at DESC);
ghfe writes one row per accepted webhook; the scheduler writes one row per GitHub-auth failure. See Installation event log for the read side, available diagnoses, and the trace endpoints.
Demand matching query
The scheduler’s demandMatch reduces to two COUNTs keyed by (entity_id, job_labels). job_labels is the JSONB array sorted at write time, so equality matches without normalisation at read time.
-- demand
SELECT COUNT(*) FROM jobs
WHERE entity_id = $1 AND job_labels = $2
AND status IN ('pending', 'running');
-- supply
SELECT COUNT(*) FROM workers
WHERE entity_id = $1 AND job_labels = $2
AND status IN ('pending', 'running');
failed workers do not count toward supply, so the next loop iteration automatically re-provisions a runner for the same pending job.
Transactional model
ghfewrites thejobsside-effect and theinstallation_eventsrow in separate transactions. If the log write fails after the side-effect has committed, the handler returns 500 and GitHub redelivers. Re-deliveries converge:add_jobis idempotent and the worker-status updates are no-ops on a second run.- Scheduler runs
syncWorkersStateinsideWithWorkerLock, which opens a transaction, issuesLOCK TABLE workers IN EXCLUSIVE MODE, and pins the same connection for allmark_worker_*calls. Concurrent scheduler instances block on the lock, never racing. - The scheduler
WaitForJobopens a separatepgx.Conn(outside the pool) forLISTEN {schema}_queue_eventso the long-lived notification connection does not exhaust the connection pool.
Connection settings
The runtime opens a pgxpool.Pool with PostgresMaxConn = 10 and a separate pgx.Conn for the LISTEN channel. The DSN is read from POSTGRES_URL; the schema (prod or staging) is selected per-connection via SET search_path in the post-connect hook. SSL is required by Scaleway (sslmode=require).