Multi-Agent Orchestration PRD

Build intelligent multi-agent systems with tier-based workflows, quality gates, BRD-driven development, and intent engineering

Tiered Workflows Quality Gates Intent Engineering BRD-Driven

1. Problem Statement

The orchestration challenge: Single-agent AI systems fail at complex software development because:

Multi-agent orchestration solves this by decomposing work into specialized agents with formal handoffs, quality gates, and requirement tracing. Instead of one overwhelmed agent, you get a coordinated team where each agent has a clear role, deliverables, and verification criteria.

2. Architecture Overview

Agent Orchestration

3. Key Components

3.1 Five-Signal Tier Classification

Every task is classified into TRIVIAL, MINOR, STANDARD, or MAJOR tier using a weighted 5-signal matrix:

Signal Weight What It Measures 1 (Low) 4 (High)
scope 30% How many components affected Single file tweak Multi-service platform
type 25% Nature of work Bug fix Greenfield system
risk 25% Blast radius of failure Dev-only change Production auth system
ambiguity 20% Clarity of requirements Exact spec provided Vague description
intent_sensitivity 25% How closely task touches intent objectives or hard limits Cosmetic change Core security decision

Calculation: score = (scope × 0.30) + (type × 0.25) + (risk × 0.25) + (ambiguity × 0.20) + (intent_sensitivity × 0.25)

# Example: "Build a SaaS dashboard with Stripe integration" scope = 3.5 # Multi-page app, API, database, payment integration type = 3.0 # New feature in existing codebase risk = 3.5 # Payment processing (PCI compliance, fraud risk) ambiguity = 2.5 # Some details provided, but UX/design unclear intent_sensitivity = 3.0 # Core product feature, revenue-critical score = (3.5 × 0.30) + (3.0 × 0.25) + (3.5 × 0.25) + (2.5 × 0.20) + (3.0 × 0.25) = 1.05 + 0.75 + 0.875 + 0.5 + 0.75 = 3.925 → MAJOR tier (3.3-4.0)

Why intent_sensitivity matters: A simple CSS change (scope=1) normally scores TRIVIAL. But if it's changing the color of a security warning that users must notice, intent_sensitivity=4, escalating the tier to ensure proper review.

3.2 Tiered Workflow Templates

Each tier uses a different workflow. Higher tiers add more phases and stricter gates.

TRIVIAL Tier (1.0-1.5)

analyze-codebase → conductor-builder(plan-and-implement) → verify

Characteristics: Single agent, no critic gates, no BRD extraction. For quick fixes and cosmetic changes.

MINOR Tier (1.6-2.3)

analyze-codebase → conductor-builder(plan) → conductor-builder(implement) → conductor-ciso(advisory) → conductor-critic(advisory) → verify → conductor-completeness-validator(advisory)

Characteristics: Split planning/implementation, advisory-only gates (log findings but don't block).

STANDARD Tier (2.4-3.2)

conductor-project-setup → conductor-research → conductor-ciso(requirements) → CRITIC(post-ciso, advisory) → BRD-EXTRACTION → CRITIC(post-extraction, advisory) → [conductor-architect + api-design + database] → CRITIC(post-architect, advisory) → conductor-qa → CRITIC(post-qa, advisory) → conductor-builder(implement) → conductor-ciso(code-review) → [code-reviewer + qa + performance + compliance] → CRITIC(post-implementation, advisory) → FINAL-BRD-VERIFICATION → pentest-coordinator → CRITIC(post-pentest, BLOCKING) → CRITIC(pre-release, BLOCKING) → conductor-doc-gen → api-docs → devops → observability → conductor-completeness-validator(BLOCKING)

Characteristics: Full workflow, most gates advisory, PRE-RELEASE and COMPLETENESS gates BLOCKING. Pentest required.

MAJOR Tier (3.3-4.0)

Same as STANDARD, but ALL gates are BLOCKING.

Characteristics: Every checkpoint must pass before progression. Maximum scrutiny.

3.3 Quality Gate Matrix

Quality gates are checkpoints where the conductor-critic agent validates deliverables. Gates can be:

Gate TRIVIAL MINOR STANDARD MAJOR
POST-CISO skip advisory advisory BLOCKING
POST-BRD-EXTRACTION skip advisory advisory BLOCKING
POST-ARCHITECT skip advisory advisory BLOCKING
POST-QA skip skip advisory BLOCKING
POST-IMPLEMENTATION skip advisory advisory BLOCKING
PRE-RELEASE skip skip BLOCKING BLOCKING
POST-PENTEST skip skip BLOCKING BLOCKING
COMPLETENESS skip advisory BLOCKING BLOCKING

Critical insight: PRE-RELEASE and COMPLETENESS gates are ALWAYS blocking in STANDARD+ tiers. This ensures no half-finished implementations or broken deployments.

3.4 BRD-Driven Development

Every project starts with a Business Requirements Document (BRD). The workflow ensures 100% BRD traceability:

  1. Requirements gathering — conductor-research agent creates BRD with numbered requirements (REQ-001, REQ-002, ...)
  2. BRD extraction (MANDATORY BLOCKING GATE) — every requirement extracted to BRD-tracker.json:
    { "requirements": [ { "id": "REQ-001", "description": "User can log in with email/password", "category": "functional", "priority": "critical", "acceptance_criteria": ["...", "..."], "status": "pending", "todo_file": null, "is_placeholder": false } ] }
  3. Specification decomposition — conductor-architect creates TODO spec for each requirement, links in BRD-tracker.json
  4. Implementation tracking — conductor-builder updates status: pending → in_progress → implemented → tested → complete
  5. Final verification (MANDATORY BLOCKING GATE) — 100% requirements must be "complete" before release
Anti-pattern detected: No placeholder implementations allowed. Every integration must actually connect. Tests must pass with real data. The BRD-tracker.json enforces this via is_placeholder: false validation.

3.5 Agent Capability Routing

The conductor uses a capability matrix to route tasks to the right agent. Each agent declares:

Example: conductor-builder capability

conductor-builder: accepts: - specification - bug_fix_request - implementation_task produces: - code - tests - updated_brd_tracker requires: - TODO_spec_file - BRD-tracker.json constraints: - "No stub implementations" - "Must update BRD-tracker status" intent_constraints: - "Must respect trade-off resolutions when making implementation decisions" - "Must check delegation_boundaries before executing" - "Must never violate hard_limits"

Handoff validation: Before dispatching, conductor checks:

if not (source.produces ⊆ target.accepts): error("Handoff invalid: source doesn't produce what target accepts") if not (target.requires ⊆ available_artifacts): error("Missing dependencies: {target.requires - available_artifacts}")

This turns agent orchestration into a type-checked workflow.

3.6 Intent Engineering

Intent engineering solves the "agent guesses wrong trade-off" problem. Instead of hoping the agent picks the right balance (speed vs security, simplicity vs features), you declare intent upfront.

The intent block in conductor-state.json has four sections:

1. Objectives (what success looks like)

"objectives": [ "Production-ready authentication with MFA", "WCAG AA accessibility compliance", "Sub-200ms API response times" ]

2. Trade-offs (resolved upfront)

"trade_offs": [ { "decision": "Security over speed", "rationale": "Financial data - compliance is non-negotiable", "implications": ["May sacrifice some UX convenience for MFA"] }, { "decision": "Simplicity over features", "rationale": "MVP launch in 6 weeks", "implications": ["Defer advanced analytics to v2"] } ]

3. Delegation boundaries (what requires human approval)

"delegation_boundaries": { "autonomous": [ "Code implementation within approved specs", "Test generation", "Documentation" ], "human_in_loop": [ "Architecture decisions affecting >3 components", "Third-party API selection", "Database schema changes" ] }

4. Hard limits (never violate these)

"hard_limits": [ "No GPL dependencies in proprietary code", "No API keys in code or config files", "No unauthenticated routes to PII", "Max bundle size 500KB (gzip)", "Zero OWASP Top 10 violations" ]
Intent cascade: Intent flows from global CLAUDE.md → project CLAUDE.md → conductor-state.json → agent constraints. Each layer can add specificity but never violate parent intent.

Intent-aware agent behavior

Agents with intent_constraints in the capability matrix:

Critic validation of intent alignment

At every checkpoint, conductor-critic validates:

# POST-IMPLEMENTATION gate findings = [] # Check hard_limit violations (always BLOCKING) if uses_gpl_library(code) and "No GPL" in hard_limits: findings.append({ "severity": "CRITICAL", "type": "HARD_LIMIT_VIOLATION", "message": "GPL library detected", "blocking": True # regardless of tier }) # Check trade-off compliance if trade_off == "Security over speed" and uses_weak_hash(code): findings.append({ "severity": "HIGH", "type": "TRADE_OFF_VIOLATION", "message": "Weak hashing contradicts security priority" }) # Check delegation boundary violations if architectural_change and not has_approval: findings.append({ "severity": "HIGH", "type": "DELEGATION_VIOLATION", "message": "Architecture change requires human approval" })

Hard limit violations escalate tier. A TRIVIAL task (score=1.2) that touches a hard limit (e.g., "No unauthenticated PII routes") auto-escalates to STANDARD tier (minimum) for blocking gates.

3.7 State Persistence (conductor-state.json)

The workflow survives session restarts via conductor-state.json. Schema excerpt:

{ "project_name": "saas-dashboard", "initiated_at": "2026-03-17T10:00:00Z", "last_updated": "2026-03-17T14:32:15Z", "tier": "MAJOR", "tier_score": 3.925, "tier_signals": { "scope": 3.5, "type": 3.0, "risk": 3.5, "ambiguity": 2.5, "intent_sensitivity": 3.0 }, "current_phase": { "number": 3, "name": "Implementation", "started_at": "2026-03-17T12:00:00Z" }, "current_step": { "number": 11, "name": "Code Generation", "assigned_agent": "conductor-builder", "status": "in_progress" }, "intent": { "objectives": ["..."], "trade_offs": [...], "delegation_boundaries": {...}, "hard_limits": [...] }, "task_queue": [ { "id": "task-042", "agent": "conductor-builder", "prompt": "Implement TODO/feature-payment.md", "status": "pending" } ], "completed_tasks": [...], "verification_status": { "extraction_complete": true, "specs_complete": true, "post_ciso_passed": true, "post_architect_passed": false, "gate_failures": [ { "gate": "POST-ARCHITECT", "reason": "Missing API error handling spec", "timestamp": "2026-03-17T11:45:00Z" } ] } }

Recovery: /conduct resume reads state, verifies no steps were skipped, continues from current_step.

3.8 Completeness Validation (12 Domains)

The conductor-completeness-validator agent runs exhaustive checks across 12 domains:

Domain What It Checks
Dependencies Every import resolves, no missing packages
Dead Code No orphan files, unused functions
Configuration All env vars defined, no hardcoded secrets
Links All internal links resolve, external links reachable
Assets All referenced images/fonts/files exist
Build Build succeeds with zero errors
Tests Full test suite passes
Routes Every route returns valid response (not 500)
API Every endpoint responds correctly
UI Pages load without console errors (if applicable)
Containers Health checks pass (if containerized)
BRD Traceability 100% requirements marked complete

Output: completeness-report-<timestamp>.json with verdict (PASS/FAIL) and findings per domain.

When it runs: Phase 7 (after all code changes). In STANDARD+ tier, BLOCKING gate — workflow cannot complete until PASS.

3.9 Adversarial Dual-AI Review

The conductor-qa-review agent runs multi-model consensus reviews at checkpoints:

Consensus logic:

# Example: Code quality review claude_findings = ["Weak input validation in auth.js", "No rate limiting"] gemini_findings = ["Weak input validation in auth.js", "Missing error logging"] codex_findings = ["Weak input validation in auth.js"] # Consensus: 3/3 agree on input validation → CRITICAL # Split: 1/3 on rate limiting → escalate to user decision # Split: 1/3 on error logging → escalate to user decision consensus_report = { "critical": ["Weak input validation in auth.js"], "escalated": [ {"finding": "No rate limiting", "votes": 1, "requires_review": true}, {"finding": "Missing error logging", "votes": 1, "requires_review": true} ] }

Escalation rule: If 1/3 models flag CRITICAL and others don't, escalate to user. Never auto-dismiss.

Profile selection by tier:

3.10 Code Hardener Integration

After implementation, the code-hardener agent runs automated security fixes:

# Example hardener output { "auto_fixed": [ {"file": "auth.js", "issue": "MD5 hash", "fix": "Replaced with bcrypt"}, {"file": "config.js", "issue": "Hardcoded API key", "fix": "Moved to .env"} ], "requires_review": [ { "file": "payment.js", "issue": "SQL injection risk in dynamic query", "todo_file": "TODO/security-payment-sqli.md", "severity": "CRITICAL" } ] }

Integration point: Runs in Phase 3 after conductor-builder, before conductor-critic POST-IMPLEMENTATION gate. Ensures security issues caught before final review.

3.11 Non-Human Identity (NHI) Tracking

Every agent invocation receives a unique Non-Human Identity instance ID following the pattern nhi_{agent}_{YYYYMMDD}_{hex8}. This extends agent identity from static manifest to per-invocation tracking.

Event Trigger Audit Type Data Recorded
Spawn Task tool dispatches agent NHI_SPAWN nhi_id, agent_id, parent_nhi_id, manifest_id
Terminate Task completes or fails NHI_TERMINATE nhi_id, exit_reason, duration_seconds, tool_count

The conductor tracks active NHIs in conductor-state.json under agent_instances. Each entry maps an NHI ID to its agent_id, manifest_id, spawned_at timestamp, status, parent_nhi_id, and tool_count. This enables:

3.12 Prohibited Behavior Monitoring

Hard-coded kill switches for behaviors that should never occur, regardless of manifest permissions or trust level. These are checked in the PreToolUse hook before any policy engine evaluation.

Prohibited Patterns

  • Network tunneling (ssh -L/-R/-D, socat, ncat)
  • Credential exfiltration (reading secrets + external send)
  • Audit trail modification (DELETE/UPDATE on audit.db)
  • Unauthorized internal API calls

Enforcement

  • Always PROHIBITED_BEHAVIOR audit event
  • Always severity critical, outcome block
  • Cannot be overridden by manifest permissions
  • Cannot be approved via human gate

3.13 External Audit Sink & Cost Tracking

Two conductor-level protections against operational risk:

External Audit Sink

All security-relevant events are emitted to an external syslog endpoint (Wazuh compatible) in addition to local SQLite. Provides a tamper-resistant audit trail — even if an agent compromises the local audit database, the external copy persists. Fail-open: syslog delivery failures never block operations.

Cost Tracking (Denial of Wallet)

Tracks token consumption per agent invocation and per workflow. Configurable thresholds prevent runaway costs:

Scope Default Threshold Action
Per-agent invocation 100K tokens Warn at 80%, halt at 100%
Per-workflow 500K tokens Warn at 80%, halt at 100%
Per-session 1M tokens Hard stop, require restart

Cost data stored in conductor-state.json under cost_tracking with fields: token_budget, tokens_used, cost_estimate_usd, halt_on_exceed.

3.14 Cryptographic Bill of Materials (C-BOM)

The conductor-ciso agent generates a C-BOM at STANDARD+ tiers — a comprehensive inventory of all cryptographic implementations in the target project with PQC readiness assessment.

Category What's Inventoried PQC Risk
Symmetric AES modes, key sizes, implementations Low
Asymmetric RSA, ECDSA, Ed25519 usage Critical
Hash functions SHA-256, SHA-3, HMAC Low-Medium
Key exchange DH, ECDH, X25519 Critical
TLS/mTLS Protocol versions, cipher suites Varies

Output includes migration recommendations: quantum-safe alternatives (ML-KEM, ML-DSA, SLH-DSA per NIST standards), estimated migration effort, and CNSA 2.0 compliance status. Stored in conductor-state.json under project_characteristics.crypto_implementations and project_characteristics.pqc_readiness.

4. Requirements

REQ-001 Tier classification MUST use 5-signal weighted matrix (scope, type, risk, ambiguity, intent_sensitivity)
REQ-002 Tier score MUST map to workflow template: 1.0-1.5=TRIVIAL, 1.6-2.3=MINOR, 2.4-3.2=STANDARD, 3.3-4.0=MAJOR
REQ-003 Hard limit violations MUST auto-escalate tier to STANDARD minimum (for blocking gates)
REQ-004 BRD extraction MUST be MANDATORY BLOCKING GATE — no phase progression until 100% requirements extracted
REQ-005 Every BRD requirement MUST have BRD-tracker.json entry with id, description, status, todo_file, is_placeholder
REQ-006 Conductor MUST validate handoffs: source.produces ⊆ target.accepts, target.requires ⊆ available_artifacts
REQ-007 Agent dispatch MUST use Task tool: Task(subagent_type="agent-name", prompt="...", description="...")
REQ-008 Quality gates MUST support three modes: BLOCKING (stop), ADVISORY (log), SKIP (don't run)
REQ-009 PRE-RELEASE gate MUST be BLOCKING in STANDARD+ tiers (no half-finished releases)
REQ-010 COMPLETENESS gate MUST be BLOCKING in STANDARD+ tiers (validate all 12 domains)
REQ-011 Intent block MUST have: objectives, trade_offs, delegation_boundaries, hard_limits
REQ-012 Agents with intent_constraints MUST check trade-offs before decisions and log rationale
REQ-013 Conductor-critic MUST validate hard_limit compliance at every checkpoint (violations always BLOCKING)
REQ-014 Conductor-critic MUST validate delegation_boundaries were respected (no unauthorized architectural changes)
REQ-015 State persistence MUST survive session restarts via conductor-state.json
REQ-016 State schema MUST validate: project_name, tier, tier_score, tier_signals, current_phase, current_step, intent
REQ-017 Completeness validator MUST check 12 domains: dependencies, dead code, config, links, assets, build, tests, routes, API, UI, containers, BRD
REQ-018 Adversarial review MUST use multi-model consensus (Claude + Gemini minimum)
REQ-019 Adversarial review MUST escalate 1/3 CRITICAL findings to user (never auto-dismiss)
REQ-020 Code hardener MUST run after implementation, before POST-IMPLEMENTATION gate
REQ-021 No placeholder implementations allowed — every integration must actually connect
REQ-022 No spec may exceed 50% context window (split large specs into multiple files)
REQ-023 Workflow MUST enforce strict sequencing — no phase/step may be skipped or reordered
REQ-024 Max 2 retries per task, then escalate to user
REQ-025 Git ratcheting MUST commit after every logical change for rollback capability
REQ-026 Every agent invocation MUST receive a unique NHI instance ID (nhi_{agent}_{YYYYMMDD}_{hex8}) with NHI_SPAWN/NHI_TERMINATE audit events
REQ-027 Conductor MUST track active NHI instances in conductor-state.json agent_instances registry with parent chain propagation
REQ-028 Prohibited behaviors (network tunneling, credential exfiltration, audit modification) MUST be unconditionally blocked before policy evaluation
REQ-029 Token consumption MUST be tracked per NHI instance and per workflow with configurable warn/halt thresholds (COST_THRESHOLD events)
REQ-030 Conductor-ciso MUST generate C-BOM at STANDARD+ tiers with PQC readiness assessment (ML-KEM, ML-DSA, SLH-DSA alternatives)
REQ-031 Security-relevant audit events MUST be emitted to external syslog sink in addition to local SQLite (fail-open delivery)

5. Prompt to Build It

Build a multi-agent orchestration system for software development with tier-based workflows, quality gates, and intent engineering. Create a "conductor" plugin with the following:

**1. Tier Classification System**
- 5-signal weighted matrix: scope (30%), type (25%), risk (25%), ambiguity (20%), intent_sensitivity (25%)
- Score range 1.0-4.0 maps to TRIVIAL/MINOR/STANDARD/MAJOR tiers
- Auto-escalate tier when task touches hard_limits
- Create tier-classifier.py that accepts task description, returns tier + signal breakdown

**2. Workflow Templates**
- Create workflow-templates.yaml with phase sequences for each tier
- TRIVIAL: single agent, no gates
- MINOR: split plan/implement, advisory gates
- STANDARD: full workflow, PRE-RELEASE and COMPLETENESS blocking
- MAJOR: all gates blocking
- Each template defines: phases, agents, gates (with mode: blocking/advisory/skip)

**3. BRD-Driven Development**
- Create BRD-tracker.json schema: id, description, category, status, todo_file, is_placeholder
- conductor-research agent generates BRD with numbered requirements
- BRD extraction (MANDATORY BLOCKING GATE) extracts all to BRD-tracker.json
- conductor-architect creates TODO specs, links in BRD-tracker
- conductor-builder updates status: pending → implemented → tested → complete
- Final verification gate: 100% requirements must be "complete"

**4. Intent Engineering**
- Extend conductor-state.json schema with intent block:
  - objectives (array of strings)
  - trade_offs (array of {decision, rationale, implications})
  - delegation_boundaries ({autonomous: [...], human_in_loop: [...]})
  - hard_limits (array of strings)
- conductor-critic validates at every checkpoint:
  - hard_limit violations → always BLOCKING
  - trade_off compliance → log findings
  - delegation_boundary violations → escalate
- Agents with intent_constraints check before decisions, log rationale

**5. Agent Capability Matrix**
- Create capabilities.yaml with 14 core agents:
  - conductor (orchestrator, model: opus[1m])
  - conductor-research (requirements, model: sonnet)
  - conductor-ciso (security, model: opus)
  - conductor-architect (design, model: opus)
  - conductor-builder (implementation, model: opus)
  - conductor-qa (testing, model: sonnet)
  - conductor-critic (validation, model: opus)
  - conductor-code-reviewer (quality, model: sonnet)
  - conductor-completeness-validator (artifact checks, model: opus)
  - conductor-doc-gen (documentation, model: sonnet)
  - conductor-devops (CI/CD, model: sonnet)
  - conductor-performance (load tests, model: sonnet)
  - conductor-compliance (SBOM, licenses, model: sonnet)
  - conductor-qa-review (adversarial review, model: opus)
- Each agent defines: accepts, produces, requires, constraints, intent_constraints
- Handoff validation: source.produces ⊆ target.accepts, target.requires ⊆ available_artifacts

**6. Quality Gate System**
- Create quality-gates.yaml defining 8 gates:
  - POST-CISO (STRIDE, OWASP coverage)
  - POST-BRD-EXTRACTION (100% requirements captured)
  - POST-ARCHITECT (100% BRD-to-spec mapping)
  - POST-QA (test coverage validation)
  - POST-IMPLEMENTATION (no placeholders)
  - PRE-RELEASE (comprehensive readiness check)
  - POST-PENTEST (findings remediated)
  - COMPLETENESS (12-domain artifact validation)
- Each gate has mode matrix (tier → blocking/advisory/skip)
- conductor-critic agent executes gates, returns verdict + findings

**7. Completeness Validator**
- conductor-completeness-validator agent checks 12 domains:
  - Dependencies (all imports resolve)
  - Dead code (no orphan files)
  - Configuration (all env vars defined)
  - Links (internal resolve, external reachable)
  - Assets (all referenced files exist)
  - Build (succeeds with 0 errors)
  - Tests (full suite passes)
  - Routes (all return valid responses)
  - API (all endpoints respond)
  - UI (pages load without console errors)
  - Containers (health checks pass)
  - BRD traceability (100% complete)
- Output: completeness-report-.json with verdict + findings

**8. Adversarial Review**
- conductor-qa-review agent with multi-model consensus:
  - Claude Opus 4.6 (primary)
  - Google Gemini 2.0 (adversarial)
  - OpenAI GPT-4o (tie-breaker if available)
- Consensus logic: 3/3 agree → CRITICAL, 1/3 → escalate to user
- Profile selection by tier: quick/standard/thorough
- Never auto-dismiss 1/3 CRITICAL findings

**9. State Persistence**
- conductor-state.json schema with:
  - project_name, tier, tier_score, tier_signals
  - current_phase, current_step
  - task_queue, completed_tasks
  - verification_status (gates passed/failed)
  - intent block
  - BRD progress
- SessionStart hook injects status if state exists
- PostToolUse hook validates state against schema

**10. Commands**
- /conduct command with argument routing:
  - new  → tier classification, create state, begin workflow
  - resume → read state, continue from current_step
  - status → comprehensive status display
  - reset → delete state
  - validate → run completeness-validator

**11. Agentic Security Hardening**
- NHI instance tracking: generate nhi_{agent}_{YYYYMMDD}_{hex8} per invocation
  - Emit NHI_SPAWN on dispatch, NHI_TERMINATE on completion
  - Track in conductor-state.json agent_instances registry
  - Propagate parent_nhi_id through delegation chains
- Prohibited behavior monitoring: hard-coded kill switches
  - Network tunneling (ssh -L/-R/-D, socat, ncat)
  - Credential exfiltration, audit trail modification
  - Always block, always critical, no override
  - Emit PROHIBITED_BEHAVIOR events
- External audit sink: all security events → syslog (Wazuh)
  - Tamper-resistant external copy alongside local SQLite
  - Fail-open: delivery failures never block
- Cost tracking (denial of wallet):
  - Per-agent (100K), per-workflow (500K), per-session (1M) token budgets
  - Warn at 80%, halt at 100%, emit COST_THRESHOLD events
  - Store in conductor-state.json cost_tracking
- C-BOM generation by conductor-ciso at STANDARD+ tiers:
  - Inventory all crypto: symmetric, asymmetric, hash, key exchange, TLS, RNG
  - PQC readiness assessment with NIST alternatives (ML-KEM, ML-DSA, SLH-DSA)
  - Store in project_characteristics.crypto_implementations

**Deliverables:**
- Complete conductor plugin with all agents, skills, commands
- Tier classification system with 5-signal matrix
- BRD-tracker.json schema and extraction workflow
- Intent engineering with 4-section intent block
- Quality gate system with mode matrix
- Completeness validator checking 12 domains
- Adversarial review with multi-model consensus
- State persistence with recovery
- NHI instance tracking with lifecycle events
- Prohibited behavior kill switches
- Cost tracking with configurable thresholds
- C-BOM generation with PQC readiness
- Working /conduct command with full workflow orchestration

6. Design Decisions

6.1 Why Tiered Workflows Instead of One-Size-Fits-All?

A CSS color change and a payment processing system need different rigor levels. One-size-fits-all means:

Tiered workflows solve this by matching rigor to risk. The 5-signal matrix ensures objective classification.

6.2 Why Intent Engineering Over Implicit Optimization?

Without explicit intent, agents guess trade-offs:

Intent engineering declares upfront what matters. Agents don't guess, they consult intent block. Misalignment detected at gates, not in production.

6.3 Why BRD Extraction as MANDATORY BLOCKING GATE?

Without forced extraction, agents:

The blocking gate ensures 100% requirements captured before any design work. No progression until BRD-tracker.json complete.

6.4 Why Capability Matrix Over Ad-Hoc Task Passing?

Without capability validation, you get:

The capability matrix turns orchestration into a type-checked workflow. Handoffs validated at dispatch, not at failure.

6.5 Why Adversarial Multi-Model Review?

Single-model review has blind spots:

Multi-model consensus provides defense in depth. If 3/3 agree it's safe, high confidence. If 1/3 flags CRITICAL, escalate — never auto-dismiss.

6.6 Why Completeness Validator in Phase 7?

Agents claim "done" but:

Completeness validator is the "does it actually work" gate. Runs after all code changes, checks 12 domains, blocking in STANDARD+ tier.

6.7 Why State Persistence?

Software projects take days/weeks. Without state:

State persistence in conductor-state.json enables resume from any step. Workflow survives network failures, session timeouts, even machine reboots.

7. Integration Points

7.1 With Plugin Ecosystem

Conductor is a Claude Code plugin. It uses:

7.2 With Memory Systems

Conductor integrates with memory plugins for:

# After successful workflow completion memory_store( type="procedure", content=f"STANDARD tier workflow for {project_type}", metadata={"tier": "STANDARD", "phases": 7, "duration_hours": 18} )

7.3 With Governance Systems

Governance plugins enforce compliance during orchestration:

7.4 With Code Hardener

Code hardener runs in Phase 3 after conductor-builder:

# Phase 3 sequence conductor-builder(implement) → code-hardener → conductor-ciso(code-review) → [code-reviewer + qa] → CRITIC(post-implementation)

Hardener auto-fixes safe issues (weak crypto → strong crypto). Complex issues generate TODO specs routed back to conductor-builder.

7.5 With Testing Infrastructure

Testing runs at multiple checkpoints:

Tests stored in git, run in Docker container (testing-security-stack) for isolation.

7.6 With CI/CD Pipelines

Phase 6 (deployment) integrates with CI/CD:

Summary

Multi-agent orchestration transforms software development from single overwhelmed agent to coordinated specialist team:

The result: production-ready code with no placeholders, 100% test coverage, security validated, and every requirement proven complete.