Code Assurance Platform PRD

Build a unified code assurance platform with 67 integrated tools, 12 scan profiles, cryptographic attestation, and 1000-point quality scoring

67 Tools 12 Profiles 6-Stage Pipeline

1. Problem Statement

The core challenge: AI-generated code ships with significant quality and security deficits. Studies show up to 62% dead code and duplication in AI-assisted codebases, with 40% containing known security vulnerabilities. There is no unified assurance layer that combines code quality, security scanning, testing adequacy, supply chain integrity, and compliance reporting into a single pipeline.

The result: teams either skip quality checks entirely (shipping vulnerable code) or cobble together ad-hoc tool chains that produce thousands of unranked findings, most of which are false positives. Developers lose trust in the tooling and stop paying attention to findings.

What you need:

2. Architecture Overview

Code Assurance Platform Architecture

3. Key Components

3.1 Scan Profile Engine

The profile engine determines which tools run, at what depth, and with what thresholds. 12 predefined profiles cover the full spectrum from rapid pre-commit checks to exhaustive compliance audits.

ProfileTools ActiveDurationUse Case
quick8–12< 30sDeveloper desktop, rapid feedback
standard25–351–3 minDefault scan for most workflows
deep50–675–15 minThorough analysis, all tools enabled
security-focused20–302–5 minSecurity tools only, deep analysis
pre-commit5–8< 10sGit hook, staged files only
ci-pipeline30–453–8 minCI/CD integration with SARIF output
pre-release55–6710–20 minRelease candidate validation
compliance40–555–10 minRegulatory compliance checks
performance10–152–5 minPerformance-specific analysis
supply-chain8–121–3 minDependency and SBOM focused
customVariableVariableUser-defined tool selection
full-audit6715–30 minComplete audit with attestation

3.2 Code Quality Analysis

Combines 10 custom AI-powered analyzers with 8 established open-source tools for comprehensive code quality assessment.

AI Analyzers (10)

  • Dead code detection
  • Duplication analysis
  • Complexity scoring
  • Naming convention audit
  • Error handling review
  • API contract validation
  • Type safety analysis
  • Documentation completeness
  • Pattern conformance
  • Architectural boundary check

OSS Tools (8)

  • ESLint (JavaScript/TypeScript)
  • Pylint (Python)
  • Ruff (Python, fast linting)
  • Clippy (Rust)
  • golangci-lint (Go)
  • SonarScanner (multi-language)
  • PMD (Java)
  • ShellCheck (Bash/Shell)

3.3 Security Scanning

13 security-focused tools covering SAST, secret detection, dependency vulnerabilities, container scanning, and API security.

ToolCategoryCoverage
SemgrepSASTMulti-language pattern matching with custom rules
TrivyVulnerabilityOS packages, language deps, container images, IaC
BanditSAST (Python)Python-specific security anti-patterns
GitleaksSecret DetectionAPI keys, tokens, passwords in source and history
BearerSecret/APISensitive data flows and API security
CheckovIaC SecurityTerraform, CloudFormation, Kubernetes configs
GrypeVulnerabilityContainer image and filesystem vulnerability scanning
OSV-ScannerVulnerabilityGoogle's open source vulnerability database
njsscanSAST (Node.js)Node.js/Express specific security patterns
gosecSAST (Go)Go-specific security analysis
cargo-auditVulnerabilityRust crate vulnerability scanning
pip-auditVulnerabilityPython package vulnerability scanning
npm auditVulnerabilityNode.js dependency vulnerability scanning

3.4 Testing & Mutation

Beyond test execution, mutation testing validates that your test suite actually catches bugs. Mutants are generated, and tests that fail to detect them reveal weak spots.

Test Runners

  • Jest (JavaScript/TypeScript)
  • Pytest (Python)
  • Go test (Go)
  • Cargo test (Rust)

Mutation Engines

  • Stryker (JS/TS — 30+ mutators)
  • mutmut (Python)
  • Pitest (Java/JVM)
  • Coverage gap analysis

3.5 Supply Chain Assurance

Full software supply chain integrity from dependency analysis to cryptographic signing and provenance tracking.

3.6 Finding Enrichment Pipeline

Raw tool output is noise. The 6-stage enrichment pipeline transforms thousands of findings into a ranked, actionable set with near-zero false positives.

1

Static Analysis

Aggregate raw findings from all active tools. Normalize format, deduplicate by location and type, assign initial severity based on tool confidence.

2

Framework-Aware Suppression

Suppress findings that are false positives in the context of known frameworks. Django ORM queries aren't SQL injection. React's dangerouslySetInnerHTML with sanitized input isn't XSS.

3

Reachability Analysis

Trace call paths from entry points to vulnerable code. If no execution path reaches the finding, downgrade severity. Dead code vulnerabilities are informational, not critical.

4

Dataflow Tracing

Track tainted data from sources (user input, API responses, file reads) through transforms to sinks (database queries, system commands, network calls). Only flag findings where tainted data reaches a sink without sanitization.

5

Exploitability Scoring

Assign EPSS-style exploitability scores based on attack surface exposure, authentication requirements, network accessibility, and known exploit availability.

6

LLM-Assisted Verification

For findings that survive stages 1–5, use an LLM to review the surrounding code context and make a final determination. Can generate proof-of-concept exploits or confirm safe usage patterns.

3.7 Quality Scoring

Every scan produces a composite quality score on a 1000-point scale. The scoring algorithm uses a square-root penalty curve — early findings cost more, preventing "good enough" complacency.

Base Score Calculation

Start at 1000. Deduct points per finding: penalty = weight × sqrt(count). The sqrt curve means the first critical finding costs ~31 points, but 10 critical findings cost ~100 (not 310).

SeverityWeight1 Finding5 Findings10 Findings
Critical31-31-69-98
High18-18-40-57
Medium8-8-18-25
Low3-3-7-9
Info1-1-2-3

Bonus Categories (15)

Positive quality signals add up to 150 bonus points across 15 categories: test coverage (>80%), zero critical findings, complete documentation, no dead code, SBOM present, all dependencies up-to-date, mutation score >60%, no secrets detected, supply chain signed, SLSA attestation, performance thresholds met, accessibility compliance, license compliance, code review completed, and CI/CD integration active.

3.8 Cryptographic Attestation

Every scan result can be cryptographically signed to create a tamper-proof record of what was scanned, when, and what was found.

3.9 Integration Methods

Five integration methods ensure the platform fits into any development workflow:

MethodInterfaceBest For
MCP ServerModel Context ProtocolClaude Code direct integration, real-time scanning during development
Claude Code SkillSlash command (/harden)Developer-initiated scans with natural language configuration
REST APIHTTP endpointsCI/CD pipelines, webhook integration, custom tooling
n8n Workflowsn8n nodesAutomated scan pipelines, scheduled audits, notification flows
Natural LanguageChat interfaceAd-hoc queries: "scan this repo for security issues" or "what's the quality score?"

3.10 Reporting

Scan results are available in 5 output formats with executive summaries for non-technical stakeholders.

Output Formats

  • PDF (executive summary + details)
  • JSON (machine-readable, full data)
  • HTML (interactive, filterable)
  • SARIF (IDE/CI integration)
  • CSV (spreadsheet analysis)

Report Sections

  • Executive summary with quality score
  • Finding breakdown by pillar
  • Trend analysis (score over time)
  • Remediation guidance per finding
  • Attestation verification status

4. Requirements

REQ-CH-001 — The system SHALL support 12 predefined scan profiles configurable via YAML or JSON
REQ-CH-002 — Each scan profile SHALL declare which tools are active, their depth settings, and severity thresholds
REQ-CH-003 — The quick profile SHALL complete in under 30 seconds for repositories under 50K lines
REQ-CH-004 — The system SHALL integrate a minimum of 67 tools across 6 pillars
REQ-CH-005 — Code quality analysis SHALL include 10 AI-powered analyzers and 8 OSS tools
REQ-CH-006 — Security scanning SHALL include SAST, secret detection, dependency scanning, and IaC security
REQ-CH-007 — The system SHALL support mutation testing via Stryker (JS/TS), mutmut (Python), and Pitest (Java)
REQ-CH-008 — SBOM generation SHALL produce both CycloneDX and SPDX format outputs
REQ-CH-009 — The finding enrichment pipeline SHALL implement 6 stages of progressive analysis
REQ-CH-010 — Framework-aware suppression SHALL recognize Django, Rails, React, Express, Spring, and Flask patterns
REQ-CH-011 — Reachability analysis SHALL trace call graphs from entry points to flag only reachable vulnerabilities
REQ-CH-012 — Dataflow tracing SHALL track tainted input from sources through transforms to sinks
REQ-CH-013 — Exploitability scoring SHALL factor in attack surface, authentication, and known exploit availability
REQ-CH-014 — LLM-assisted verification SHALL review findings surviving stages 1–5 with code context
REQ-CH-015 — Quality scoring SHALL use a 1000-point scale with square-root penalty curve
REQ-CH-016 — The scoring algorithm SHALL support 15 bonus categories worth up to 150 points
REQ-CH-017 — Cryptographic attestation SHALL use Ed25519 for signing scan results
REQ-CH-018 — Attestations SHALL be logged to a Rekor-compatible transparency log
REQ-CH-019 — SLSA provenance SHALL be generated at Level 3 for build artifacts
REQ-CH-020 — The system SHALL expose an MCP server for Claude Code integration
REQ-CH-021 — The system SHALL provide a REST API with OpenAPI specification
REQ-CH-022 — Reports SHALL be generated in PDF, JSON, HTML, SARIF, and CSV formats
REQ-CH-023 — The system SHALL support natural language queries for scan configuration and results
REQ-CH-024 — Tool execution SHALL be parallelized where dependencies allow
REQ-CH-025 — The system SHALL cache tool results and invalidate on source file changes
REQ-CH-026 — Each finding SHALL include: tool source, file path, line range, severity, confidence, and enrichment stage
REQ-CH-027 — The system SHALL maintain a suppression database for acknowledged findings
REQ-CH-028 — Custom scan profiles SHALL be definable via YAML configuration files
REQ-CH-029 — The system SHALL detect and report on at least 8 programming languages
REQ-CH-030 — CI/CD integration SHALL support GitHub Actions, GitLab CI, and Jenkins via SARIF output
REQ-CH-031 — The system SHALL produce executive summaries suitable for non-technical stakeholders
REQ-CH-032 — Score trends SHALL be tracked over time for regression detection
REQ-CH-033 — The system SHALL support n8n workflow integration for automated scan pipelines
REQ-CH-034 — All tool binaries SHALL be managed via version-locked container images
REQ-CH-035 — The system SHALL provide remediation guidance for every finding category
REQ-CH-036 — License compliance checking SHALL flag copyleft, restrictive, and unknown licenses
REQ-CH-037 — The system SHALL integrate with governance audit events for compliance reporting
REQ-CH-038 — Scan results SHALL be storable in the vector memory system for cross-session analysis
REQ-CH-039 — The system SHALL support pre-commit hook integration for staged-file-only scanning
REQ-CH-040 — The verification CLI SHALL validate attestations against the transparency log

5. Prompt to Build It

Build a unified code assurance platform (Code Hardener) that scans codebases across 6 pillars:
Code Quality, Security, Testing, Performance, Supply Chain, and Policy/Reporting.

Architecture:
- 12 scan profiles (quick, standard, deep, security-focused, pre-commit, ci-pipeline,
  pre-release, compliance, performance, supply-chain, custom, full-audit)
- Each profile is a YAML file declaring active tools, depth, and thresholds
- Tool execution is parallelized with dependency-aware scheduling

Code Quality Pillar:
- 10 AI-powered analyzers: dead code, duplication, complexity, naming, error handling,
  API contracts, type safety, documentation, pattern conformance, architectural boundaries
- 8 OSS tools: ESLint, Pylint, Ruff, Clippy, golangci-lint, SonarScanner, PMD, ShellCheck

Security Pillar:
- 13 tools: Semgrep (SAST), Trivy (vuln/container), Bandit (Python SAST),
  Gitleaks (secrets), Bearer (API/secrets), Checkov (IaC), Grype (container),
  OSV-Scanner (vuln DB), njsscan (Node SAST), gosec (Go SAST),
  cargo-audit (Rust), pip-audit (Python), npm audit (Node)

Testing Pillar:
- Test runners: Jest, Pytest, go test, cargo test
- Mutation engines: Stryker (JS/TS), mutmut (Python), Pitest (Java)
- Coverage gap analysis with mutation score tracking

Supply Chain Pillar:
- SBOM: Syft + cdxgen (CycloneDX and SPDX output)
- Signing: Sigstore keyless signing
- Provenance: SLSA Level 3 attestations
- License compliance checking

Finding Enrichment Pipeline (6 stages):
1. Static Analysis - aggregate, normalize, deduplicate raw findings
2. Framework-Aware Suppression - suppress known false positives for Django, React, etc.
3. Reachability Analysis - trace call graphs, downgrade unreachable findings
4. Dataflow Tracing - track tainted data source-to-sink
5. Exploitability Scoring - EPSS-style scoring (attack surface, auth, known exploits)
6. LLM-Assisted Verification - final review with code context

Quality Scoring (1000-point scale):
- Base: 1000 minus sqrt-weighted penalties per severity
- Weights: Critical=31, High=18, Medium=8, Low=3, Info=1
- Formula: penalty = weight * sqrt(finding_count)
- 15 bonus categories (up to +150): test coverage >80%, zero criticals,
  complete docs, no dead code, SBOM present, deps current, mutation >60%,
  no secrets, supply chain signed, SLSA attestation, perf thresholds,
  accessibility, license compliance, code review done, CI/CD active

Cryptographic Attestation:
- Ed25519 signing of scan results
- Rekor transparency log integration
- SLSA Level 3 provenance generation
- Verification CLI: hardener verify <attestation-id>

Integration Methods:
- MCP server (for Claude Code)
- Claude Code skill (/harden slash command)
- REST API with OpenAPI spec
- n8n workflow nodes
- Natural language interface

Reporting: PDF, JSON, HTML, SARIF, CSV with executive summaries
Governance: emit audit events for the governance framework
Memory: store scan results in vector memory for trend analysis

6. Design Decisions

Tool Orchestration: Parallel with Dependency Graph

Decision: Run tools in parallel using a dependency-aware scheduler rather than sequentially.

Trade-off: Higher memory usage but significantly faster scans. A full-audit scan with 67 tools completes in 15–30 minutes instead of 2+ hours sequential. Tool dependencies (e.g., SBOM must complete before license checking) are modeled as a DAG.

Scoring Algorithm: Square-Root Penalty Curve

Decision: Use sqrt(count) rather than linear penalties.

Trade-off: A codebase with 1 critical and 100 low findings scores differently than 100 criticals and 1 low. The sqrt curve prevents "why bother" despair from high finding counts while still penalizing the first findings heavily. The first critical finding is more impactful to your score than the 10th.

Enrichment Pipeline: 6 Stages over 3

Decision: Invest in 6 enrichment stages rather than simpler 3-stage filtering.

Trade-off: The pipeline adds processing time (30s–2min per stage) but reduces false positive rates from typical 60–80% to under 5%. The LLM verification stage (Stage 6) is the most expensive but catches context-dependent patterns that static analysis cannot.

Self-Hosted vs SaaS: Self-Hosted First

Decision: Design for self-hosted deployment with Docker Compose, SaaS as optional future layer.

Trade-off: More operational overhead but no data leaves the organization. Source code never touches external servers. Critical for enterprises with strict data residency requirements. The containerized architecture means tools are version-locked and reproducible.

7. Integration Points

Governance Framework

The code assurance platform emits audit events to the governance framework's event bus. Every scan produces scan_initiated, scan_completed, and attestation_created events. Quality scores below configured thresholds trigger policy_violation events that can block deployments.

Multi-Agent Orchestration

The conductor's QA tier integrates directly with the code assurance platform. When the conductor classifies a task as STANDARD or MAJOR, it automatically triggers a scan at the appropriate profile depth. The QA agent reviews findings and can request code modifications before approving the task.

Persistent Memory System

Scan results are stored in the vector memory system for trend analysis. Over time, the system builds a knowledge base of common findings, successful remediations, and quality trajectories. Memory queries like "what security findings have we seen in authentication code?" return contextualized results.

Plugin Ecosystem

The code assurance platform integrates via the plugin hook system. PreToolUse hooks can trigger pre-commit scans. PostToolUse hooks can validate that file modifications don't introduce new findings. Custom skills expose scanning capabilities through slash commands.

Context Management

Scan profiles and quality thresholds can be configured per-project via CLAUDE.md or project-level configuration. The context system ensures that the right scan profile is automatically selected based on the project's security requirements and compliance posture.