ADR-059: Dependency Vulnerability Remediation + Blocking CI Security Gate
ADR-059: Dependency Vulnerability Remediation + Blocking CI Security Gate
Status: Implemented Date: 2026-05-30 Sprint: 75
Context
A routine audit surfaced 31 npm audit vulnerabilities (6 high, 25 moderate) across the single root package-lock.json (npm workspaces; no separate mobile lockfile), corresponding to ~13–25 open Dependabot alerts depending on how they dedupe. The vulnerable packages fell into three groups:
- Root-tree transitive deps reachable from direct root dependencies —
qs(express),ip-address(express-rate-limit),uuid(bull, node-cron),fast-uri. - One direct dependency —
axios@1.15.2(high). - Workspace-nested transitive deps buried in the
apps/*trees —tar,@xmldom/xmldom,node-forge,picomatch(all via expo inapps/mobile);postcss/next(build-time CSS inapps/frontend+apps/landing);ws/engine.io(via jsdom, test-only inapps/frontend).
The CI security: job had been capped at --audit-level=critical with an explicit comment that "high vulns in expo@54 are unfixable until SDK upgrade." That cap let dependency debt silently reaccumulate. This sprint's mandate: drive the count to zero and convert the cap into a blocking --audit-level=high gate with an SLA, so debt can never silently return.
Options Considered
- Expo SDK upgrade to clear the expo-chain highs at the source — large, risky, out of scope; deferred.
npm audit fix --force— rejected: it installsnext@9.3.3(a catastrophic framework downgrade) and other breaking majors.- Patch-at-the-leaf via root
overrides+ direct bump foraxios— chosen. Force-resolve patched leaf versions so theexpo-* depends on a vulnerable …parent alerts auto-clear without touching the expo SDK major.
Decision
1. Remediation: overrides-at-the-leaf + one direct bump
axios(direct) bumped1.15.2→^1.16.0.- Root
overridesextended (keeping the pre-existingtar/minimatch/react/react-domentries) with patched leaf versions:@xmldom/xmldom ^0.8.13,node-forge >=1.4.0,fast-uri >=3.1.2,qs >=6.15.2,ip-address >=10.1.1,postcss ^8.5.10, plus surgical version-range overrides for packages where the patched version is a major bump beyond the narrow vulnerable range (avoids dragging unrelated lower-major copies up):picomatch@3.0.0 - 3.0.1,ws@8.0.0 - 8.20.0,brace-expansion@5.0.2 - 5.0.5, and a parent-scopedjsdom → wsoverride.
2. The blocking gate
The CI security: job now runs:
- name: Run npm audit (blocking — no high/critical vulns; see ADR-059)
run: npm audit --package-lock-only --audit-level=high
No build passes with an unaddressed high or critical dependency vulnerability. --package-lock-only keeps it fast and deterministic.
3. The SLA (standing policy)
- No high or critical vulnerability (dependency or code-scanning) open longer than 1 week.
- No vulnerability of any severity open longer than 2 weeks.
- The gate blocks at
high; moderates/lows are tracked to zero under the 2-week clause but do not block a hotfix.
Version
10.3.0 → 10.4.0 (minor — ships a behavioral CI gate).
Implementation Notes (hard-won)
These are recorded because they cost real debugging time and will recur:
- npm overrides do not reach
apps/*workspace subtrees on an incremental install.npm installagainst the existing lockfile applies overrides to the root workspace tree (uuid/qs/ip-address/fast-uri cleared) but leaves the expo/next/jsdom subtree leaves untouched (14 residual vulns). Only a from-scratch lockfile regen (rm package-lock.json && rm -rf **/node_modules && npm install) applies every override and reaches zero. The trade-off: a from-scratch regen re-floats every^/~dependency to its newest satisfying version (~302 packages changed). This was a deliberate, owner-approved decision for this sprint, not an accident. uuidmust be capped at^11.1.1, not>=11.1.1. The vulnerability is fixed at exactly11.1.1, but>=resolves touuid@14, which is ESM-only for Node (norequireexport condition) and breaksbull'srequire('uuid')under Jest (SyntaxError: Unexpected token 'export').uuid@11.1.1ships a proper CJS build. Verified: node-cron schedules fire and the full suite passes under 11.1.1.tarneeds an exact-version override ("tar": "7.5.15"), not a range. A range override leftapps/mobile's@expo/clicopy at the vulnerable7.5.7; a parent-scoped nested override caused npm to drop tar entirely. Exact-version forces the hoisted, patched copy everywhere.@swc/helpersmust be pinned ("@swc/helpers": "0.5.15"). A from-scratch regen under Node 24 silently drops it, breakingnext buildwithCannot find module '@swc/helpers/_/_interop_require_default'. The pin forces npm to materialize the node.ts-jestis pinned to29.4.6. The re-float bumped it to29.4.11, which changed how its inline-tsconfigobject merges with the project tsconfig — droppingmoduleResolution: node16and breaking@karmyq/shared/schemas/uisubpath resolution inrequest-servicetests (TS2307).apps/mobiletype-check was already red on master (pre-existingFlatList/refreshControloverload errors) and is not in the CI gate. The expo-internal version churn from the re-float lands in that already-broken, non-web-deployed workspace and does not regress any gated check.
Consequences
Positive
- Zero high/critical/moderate/low
npm auditvulnerabilities at v10.4.0. - Dependency debt can no longer silently reaccumulate — the gate fails the build.
- No expo SDK upgrade required; the web demo's shipped backend + frontend + landing runtimes are unaffected by the leaf overrides.
Negative / costs
- Override maintenance burden. Each override is a manual pin that must be revisited as the ecosystem moves; a too-low cap (e.g.
uuid ^11) blocks legitimate future majors until reviewed. - Large lockfile churn. Reaching zero required a from-scratch regen that re-floated ~302 transitive packages. Future remediations should prefer the smallest diff that clears the gate (
high) and only re-float when zeroing moderates is explicitly in scope. - Emergency escape. If the gate blocks a genuine hotfix,
git push --no-verify(local) bypasses it; CI remains the backstop. Use only to unblock, then remediate within the SLA.
Relationship to other gates
This is the dependency half of the standing security posture. Code scanning (CodeQL) is a distinct alert class with its own gate under ADR-060 (Sprint 76). /security-review remains the human-level complement to both automated gates, not a replacement.
Alternatives Rejected
- Expo SDK 54 → 55/56 upgrade — clears the expo-chain highs at the source but is a large, breaking change; deferred to a dedicated sprint.
npm audit fix --force— installsnext@9.3.3and other breaking downgrades.- Leaving the gate at
critical— the status quo that allowed the debt; rejected.