Runbook: Portfolio App CI Triage (Quality + Build Gates)
Purpose
Provide a fast and repeatable procedure to diagnose and resolve CI failures for the Portfolio App.
CI failures are treated as “stop-the-line” events. The correct response is to fix the root cause or rollback—not to weaken gates.
Governance Context
This runbook assumes Vercel and GitHub governance are already configured per rbk-vercel-setup-and-promotion-validation.md. Required checks are:
ci / quality(lint, format:check, typecheck)ci / test(unit tests, coverage, E2E tests)ci / build(Next.js build)
When any required check fails, this runbook provides deterministic diagnosis and fix procedures. See rbk-portfolio-deploy.md for the deploy workflow where CI gating is enforced.
Scope
Use when
ci / qualityfails (lint, format:check, typecheck)ci / testfails (unit tests, coverage, E2E tests)ci / buildfails (Next build)- Vercel promotion is blocked due to failing checks
Do not use when
- failures are unrelated to CI (use relevant operational runbooks)
Prereqs / Inputs
- Access to GitHub Actions logs for the failing run
- Ability to run commands locally:
pnpm lintpnpm format:checkpnpm typecheckpnpm buildpnpm test:unitpnpm test:coveragepnpm test:e2e
Procedure / Content
CI topology (for context)
ci / qualityjob runs:- Auto-format step (Dependabot PRs only)
pnpm lintpnpm format:checkpnpm typecheck
ci / secrets-scanjob runs on pull requests only (not on push to main)- TruffleHog secret scanning with verified detectors
- reason: TruffleHog requires a diff between base and head; direct pushes to main have identical references and would fail
ci / buildjob runs:pnpm install --frozen-lockfilepnpm build- Playwright browser installation (
npx playwright install --with-deps) - Dev server startup (
pnpm dev &+ readiness check viawait-on) - Smoke tests (
pnpm test- 12 tests across Chromium + Firefox) - depends on
ci / qualitybeing green - note:
secrets-scanis not a strict dependency (only runs on PRs, but all PRs require it via branch protection)
ci / testjob runs:pnpm test:unitpnpm test:e2e- uploads coverage artifacts from
pnpm test:coveragewhen configured
1) Identify the failing check and error class
In the PR or main workflow run, identify:
- failing job:
qualityorbuild - failing step (lint vs format vs typecheck vs build)
- affected file paths
2) Reproduce locally (required)
Option 1: Comprehensive verification (recommended)
Run the complete validation suite to identify all issues at once:
pnpm install
pnpm verify
The verify script runs all CI checks (environment validation, auto-format, format check, lint, typecheck, registry validation, build) with detailed error reporting and troubleshooting guidance.
Option 2: Individual commands (targeted debugging)
On the same branch/commit:
pnpm install
pnpm lint
pnpm format:check
pnpm typecheck
pnpm build
pnpm test # Smoke tests (Playwright)
Use individual commands when you need to:
- Debug a specific failure type
- Run checks in isolation
- Understand what each check validates
If local results differ from CI:
- confirm Node and pnpm versions match project standards
- ensure lockfile is committed and install is deterministic
- for smoke test failures: ensure dev server is running (
pnpm dev) or Playwright will start it automatically
3) Fix by failure type
A) Formatting failures (format:check)
Symptoms:
- Prettier reports files are not formatted
Fix:
- run formatting write (if available):
pnpm format:write
- re-run:
pnpm format:check
Known failure mode:
- Prettier fails with ESM plugin / require() errors:
- ensure config file is
prettier.config.mjs - ensure plugins are specified as strings (e.g.,
"prettier-plugin-tailwindcss")
- ensure config file is
B) Lint failures (lint)
Symptoms:
- ESLint reports rule violations
Fix:
- resolve violations explicitly
- avoid disabling rules without governance rationale
- if a rule is overly strict:
- tune intentionally and document via ADR if policy change is significant
C) Typecheck failures (typecheck)
Symptoms:
- TypeScript errors appear (unsafe typing, invalid imports)
Fix:
- correct typings or imports
- avoid broad any usage unless explicitly justified
- ensure tsconfig aligns with Next.js project structure
D) Build failures (build)
Symptoms:
- Next.js build fails due to code errors, routing issues, or environment assumptions
Fix:
- reproduce with
pnpm build - correct the root cause
- do not “paper over” build errors by weakening the build process Common build failure modes:
-
Registry validation errors during page data collection:
- Error:
"demoUrl" is missing or invalid according to a Zod schema validation - Symptom: Build fails during static page generation for
/projects/[slug] - Root Cause: Environment variable interpolation failing (see Known Issue below)
- Fix: Verify environment variables are set correctly
- Verification:
pnpm registry:validateshould pass
- Error:
-
Known Issue: Registry interpolation with tsx/Node.js:
- Problem: Module load order causes environment variables to not be visible during registry loading
- Solution (Fixed in commit 1a1e272): Use
process.envdirectly ininterpolate()function instead of module-level imports - Prevention: Ensure
NEXT_PUBLIC_*environment variables are set before build
-
Environment variable check:
# Verify required variables are set
echo $NEXT_PUBLIC_DOCS_BASE_URL
echo $NEXT_PUBLIC_GITHUB_URL
# Test registry interpolation
pnpm registry:validate
# Should output: Registry OK (projects: N)
- Quick verification recipe (registry-specific):
cd portfolio-app
pnpm registry:validate # Expect: Registry OK (projects: N)
pnpm lint # Expect: silent, 0 warnings
pnpm build # Expect: ✓ Compiled successfully
- If build still fails on registry interpolation:
- Check env vars:
cat .env.local | grep NEXT_PUBLIC - Run with debug:
DEBUG_REGISTRY=1 pnpm registry:validate 2>&1 | head -20- Look for
interpolated="https://..."(absolute URLs)
- Look for
- Clean and rebuild:
rm -rf .next node_modules/.cache && pnpm build - Ensure
interpolate()reads fromprocess.env(fixed in commit1a1e272)
E) Smoke test failures (pnpm test)
Smoke tests are part of the current CI quality baseline.
Symptoms:
- Playwright tests fail (route rendering, navigation, evidence links)
- Browser launch failures in CI
- Server connection errors
Common failure modes:
-
Browser binaries missing in CI:
- Error:
browserType.launch: Executable doesn't exist - Fix: Ensure
npx playwright install --with-depsruns in CI before tests - Verification: Check CI workflow includes installation step
- Error:
-
Dev server not running:
- Error:
NS_ERROR_CONNECTION_REFUSEDornet::ERR_CONNECTION_REFUSED - Fix: Ensure dev server starts before tests (
pnpm dev &+wait-on http://localhost:3000) - Local: Playwright auto-starts server via
webServerconfig (disabled in CI)
- Error:
-
Route rendering failures:
- Error: Test expects status < 400 but receives 404 or 500
- Fix: Verify route exists and renders correctly locally
- Check: Dynamic routes may need param fixes (Next.js 15 async params)
-
Evidence link resolution failures:
- Error:
a[href*="/docs/"]locator not found - Fix: Verify project pages include documentation links
- Check:
NEXT_PUBLIC_DOCS_BASE_URLis configured correctly
- Error:
-
Timeout failures:
- Error: Test timeout exceeded (default 30s per test)
- Fix: Increase timeout in
playwright.config.tsor optimize slow routes - CI: Reduce parallelism (already set to 1 worker in CI for stability)
Debugging smoke tests:
# Local debugging
pnpm test:debug # Opens Playwright inspector
pnpm test:ui # Opens Playwright UI mode
# CI debugging
# - Download HTML test report artifact from failed CI run
# - Open playwright-report/index.html locally to see screenshots/traces
#### F) Unit test or coverage failures (`pnpm test:unit` / `pnpm test:coverage`)
Symptoms:
- Vitest failures in UI, API route handlers, data wrappers, or lib helpers
- Coverage thresholds failing after new code paths are added
Fix:
- Run `pnpm test:unit` to reproduce and isolate the failing test
- Run `pnpm test:coverage` to identify uncovered files or branches
- Add or update unit tests for affected modules (pages, components, API handlers)
Fix workflow:
- Reproduce locally with
pnpm test - Check Playwright config (
playwright.config.ts) for environment differences - Verify test file (
tests/e2e/smoke.spec.ts) expectations match actual behavior - Update tests or fix routes as needed
- Re-run locally to confirm fix
- Push and verify CI passes
4) Validate and push fix
After changes:
pnpm lint
pnpm format:check
pnpm typecheck
pnpm build
pnpm test # Smoke tests
Commit and push to PR branch.
5) Confirm CI is green and promotion unblocks
- Confirm GitHub checks pass.
- Confirm Vercel promotion gates clear.
6) Prevent recurrence
If the failure mode is likely to repeat:
- update contributor guidance
- add a checklist item to PR template
- add or refine lint/format/typecheck configuration
- consider pre-commit hooks (optional; CI remains authoritative)
Validation / Expected outcomes
- Local and CI results converge (deterministic)
- Required checks are green:
ci / qualityci / build
- Production promotion proceeds once checks pass
Rollback / Recovery
If the fix is non-trivial and production is impacted:
- rollback via revert and stabilize first
- fix forward in a new PR with proper validation
Failure modes / Troubleshooting
-
CI fails but local passes:
- toolchain mismatch; confirm Node/pnpm; ensure frozen lockfile install behavior
-
Persistent formatting churn:
- ensure editor integration and formatting scripts are documented and used
-
Type errors cascade:
- reduce scope; fix incrementally; avoid mixing large refactors with feature changes
-
Merge is blocked because required checks are unavailable to select in the ruleset:
- ensure checks exist with the exact names
ci / qualityandci / build - run the workflow on a PR and on
mainso GitHub can offer them as Required
- ensure checks exist with the exact names
How to re-run checks
- From the GitHub Actions UI:
- Use “Re-run all jobs” on the failed workflow run (preferred for transient issues).
- Push a no-op change if necessary to retrigger (e.g., amend commit message or whitespace change). Avoid
ci skippatterns since required checks must execute for promotion. - If checks are still not appearing as Required candidates, ensure a recent successful run exists on both a PR and a push to
mainwith the exact job names.
References
- Portfolio App testing and gates:
docs/60-projects/portfolio-app/testing.md - Deploy runbook:
docs/50-operations/runbooks/rbk-portfolio-deploy.md - Rollback runbook:
docs/50-operations/runbooks/rbk-portfolio-rollback.md