Incident Response Handbook

Purpose

Provide a consolidated incident response handbook that supports on-call responders with severity guidance, quick selectors, and operational patterns. This handbook complements the runbook catalog at /docs/50-operations/runbooks.

Scope

In scope

quick selectors to choose the right runbook
severity-based response guidance
operational patterns for detection, recovery, and verification
tools and utilities used during response
runbook improvement standards

Out of scope

step-by-step procedures (see runbooks)
architecture rationale (see ADRs)

Quick Selector: Find Your Runbook

Match your scenario to the appropriate runbook:

I'm seeing...	Use this runbook
❌ Deployment shows "Failed" in Vercel	Deployment Failure
⚠️ Health endpoint returns 503	Service Degradation
🔴 All routes return 500	Deployment Failure
🐌 Pages load slowly (>3s) but no errors	Performance Troubleshooting
📦 Bundle size too large (>30MB)	Performance Troubleshooting
❓ Unclear incident, need framework	General Incident Response
🔍 Want to understand monitoring setup	Observability & Health Checks
⚡ Want to improve performance proactively	Performance Optimization
🔐 CVE alert or dependency vulnerability	Dependency Vulnerability Response
🚨 Suspected secret leak in repo	Secrets Incident Response

Severity-Based Quick Reference

Critical Incident (SEV-1) — Immediate Response

Symptoms: Complete service outage, all users affected, all routes return 500

Quick Steps:

Page on-call engineer + VP Engineering (Slack + SMS + phone)
Create incident channel: #incident-INC-YYYYMMDD-NNN
Execute runbook: Deployment Failure if recent deployment, otherwise General Incident Response
Post updates every 5 minutes
All-clear when resolved
Schedule postmortem within 24 hours

MTTR Target: 15 minutes

If Secrets Incident: Execute Secrets Incident Response immediately; MTTR ≤5 min for critical secrets

High Severity (SEV-2) — Urgent Response

Symptoms: Significant user impact, core features broken, partial outage

Quick Steps:

Notify on-call engineer via Slack + PagerDuty
Execute runbook: Service Degradation or Deployment Failure
Target resolution: less than 1 hour
Post updates every 10 minutes
Schedule postmortem within 48 hours

MTTR Target: 1 hour

If Dependency CVE (High): Execute Dependency Vulnerability Response; MTTR 48 hours

Medium Severity (SEV-3) — Normal Response

Symptoms: Minor user impact, slow performance, non-critical features unavailable

Quick Steps:

Notify team lead via Slack
Create GitHub issue to track
Execute runbook: Service Degradation or Performance Troubleshooting
Investigate during business hours
No formal postmortem (document learnings in issue)

MTTR Target: 4 hours

Low Severity (SEV-4) — Low Priority

Symptoms: Cosmetic issues, documentation errors, non-user-facing problems

Quick Steps:

Create GitHub issue with appropriate label
Fix during next sprint
No incident response required

MTTR Target: 24 hours or next sprint

Common Operational Patterns

Error Detection Patterns

Pattern	Where to Look	What to Search For
Deployment errors	Vercel Deployments → Build logs	`error`, `Error:`, `failed`, `FAILED`
Runtime errors	Vercel Functions → Logs	`"level":"error"`, `500`, `timeout`
Performance issues	Vercel Analytics	Response time >3s, LCP >2.5s
Data issues	Health endpoint	`projectCount: 0`, `status: "degraded"`

Recovery Patterns

Issue Category	Recovery Method	Example
Deployment failure	Vercel UI rollback or Git revert	Promote previous deployment
Data corruption	Restore from backup commit	`git show <commit>:file.yml > file.yml`
Config issue	Revert environment variable	Vercel Settings → Env Vars → Restore
Resource exhaustion	Clear cache or scale up	Vercel Cache → Clear All

Verification Patterns

After any fix, always verify:

# 1. Health check returns 200
curl -s https://portfolio-app.vercel.app/api/health | jq '.status'

# 2. Routes are accessible
curl -I https://portfolio-app.vercel.app/ | grep HTTP

# 3. No errors in recent logs
# Check Vercel Dashboard → Functions → Logs (last 5 minutes)

# 4. Response times normal
time curl -s https://portfolio-app.vercel.app/projects > /dev/null

Tools & Utilities

Quick Commands

# Health check
curl -s https://portfolio-app.vercel.app/api/health | jq '.'

# Test route
curl -I https://portfolio-app.vercel.app/projects | grep HTTP

# View recent deployments (requires Vercel CLI)
vercel ls | head -10

# View logs (requires Vercel CLI)
vercel logs --follow

# Git rollback
git revert <commit-sha> --no-edit && git push

External Dashboards

Vercel Dashboard: https://vercel.com/bryce-seefieldts-projects/portfolio-app
Vercel Deployments: https://vercel.com/bryce-seefieldts-projects/portfolio-app/deployments
Vercel Logs: https://vercel.com/bryce-seefieldts-projects/portfolio-app/logs
Vercel Status: https://www.vercel-status.com/
GitHub Repository: https://github.com/bryce-seefieldt/portfolio-app

Monitoring Integrations

UptimeRobot: (to be configured)
PagerDuty: (to be configured)
Slack Alerts: #incidents, #deployments, #alerts

Runbook Improvement & Feedback

Review Schedule

After each use: Document any deviations from procedure
After incidents: Update with new learnings from postmortem
Quarterly: Full review of all runbooks for accuracy and completeness
After platform changes: Update commands/screenshots if Vercel UI changes

Submitting Improvements

If you use a runbook and encounter issues:

Unclear steps: Create GitHub issue to clarify
Missing steps: Add to runbook and submit PR
Incorrect commands: Test and correct in PR
MTTR targets not achievable: Reassess and update target

Template for runbook improvements:

gh issue create \
  --title "Runbook improvement: [runbook-name]" \
  --body "Issue found: [description]

Suggested improvement: [what to change]

Context: Used during INC-YYYYMMDD-NNN" \
  --label "documentation,runbook,ops" \
  --assignee ops-team-lead

Complete Runbook Index

Documentation App Runbooks

docs/50-operations/runbooks/rbk-docs-deploy.md
docs/50-operations/runbooks/rbk-docs-rollback.md
docs/50-operations/runbooks/rbk-docs-broken-links-triage.md

Portfolio App Runbooks (Current Baseline)

Core runbooks:

docs/50-operations/runbooks/rbk-vercel-setup-and-promotion-validation.md — Vercel setup
docs/50-operations/runbooks/rbk-portfolio-deploy.md
docs/50-operations/runbooks/rbk-portfolio-rollback.md
docs/50-operations/runbooks/rbk-portfolio-ci-triage.md
docs/50-operations/runbooks/rbk-portfolio-secrets-incident.md — secrets incident response
docs/50-operations/runbooks/rbk-portfolio-project-publish.md — project publication workflow
docs/50-operations/runbooks/troubleshooting-portfolio-publish.md — publication troubleshooting
docs/50-operations/runbooks/rbk-portfolio-environment-promotion.md — environment promotion
docs/50-operations/runbooks/rbk-portfolio-environment-rollback.md — environment rollback

Performance and incident runbooks:

docs/50-operations/runbooks/rbk-portfolio-performance-optimization.md — proactive performance tuning
docs/50-operations/runbooks/rbk-portfolio-performance-troubleshooting.md — performance troubleshooting
docs/50-operations/runbooks/rbk-portfolio-incident-response.md — incident response framework
docs/50-operations/runbooks/rbk-portfolio-service-degradation.md — service degradation procedures
docs/50-operations/runbooks/rbk-portfolio-deployment-failure.md — deployment failure recovery

Runbook template: docs/_meta/templates/template-runbook.md (internal-only)
ADRs: docs/10-architecture/adr/
Threat models: docs/40-security/threat-models/
Observability: docs/30-devops-platform/observability-health-checks.md

Last Updated: 2026-02-04
Maintained By: Portfolio Operations Team
Next Review: 2026-05-04 (Quarterly)

Purpose​

Scope​

In scope​

Out of scope​

Quick Selector: Find Your Runbook​

Severity-Based Quick Reference​

Critical Incident (SEV-1) — Immediate Response​

High Severity (SEV-2) — Urgent Response​

Medium Severity (SEV-3) — Normal Response​

Low Severity (SEV-4) — Low Priority​

Common Operational Patterns​

Error Detection Patterns​

Recovery Patterns​

Verification Patterns​

Tools & Utilities​

Quick Commands​

External Dashboards​

Monitoring Integrations​

Runbook Improvement & Feedback​

Review Schedule​

Submitting Improvements​

Complete Runbook Index​

Documentation App Runbooks​

Portfolio App Runbooks (Current Baseline)​

Related Documentation​

Purpose

Scope

In scope

Out of scope

Quick Selector: Find Your Runbook

Severity-Based Quick Reference

Critical Incident (SEV-1) — Immediate Response

High Severity (SEV-2) — Urgent Response

Medium Severity (SEV-3) — Normal Response

Low Severity (SEV-4) — Low Priority

Common Operational Patterns

Error Detection Patterns

Recovery Patterns

Verification Patterns

Tools & Utilities

Quick Commands

External Dashboards

Monitoring Integrations

Runbook Improvement & Feedback

Review Schedule

Submitting Improvements

Complete Runbook Index

Documentation App Runbooks

Portfolio App Runbooks (Current Baseline)

Related Documentation