Agent Governance & Trust Framework

Build a production-grade governance layer for AI agent systems with cryptographic identity, trust-mediated delegation, policy enforcement, audit trails, LLM threat detection, and SIEM integration.

Gartner AI TRiSM Compliant OWASP LLM Top 10 Claude Code Plugin SQLite + WAL HMAC-SHA256

1. Problem Statement

Modern AI agent systems present unique governance challenges. Unlike monolithic applications, agents can autonomously delegate tasks to other agents, invoke external tools, write to persistent memory, and access sensitive data. Without a comprehensive governance framework, organizations face:

Privilege Escalation Risk

A low-trust agent can delegate to higher-trust agents, bypassing security boundaries. Delegation chains compound risk as each hop loses visibility into the original intent.

Data Leakage & Misclassification

Agents writing to memory without classification checks can persist restricted data in shared collections. No provenance tracking means you can't audit who wrote what or when.

Unlimited Delegation Chains

Without breadth and depth limits, a single agent could spawn hundreds of child tasks, creating infinite loops or resource exhaustion attacks.

No Audit Trail

Traditional logging captures tool invocations but misses the context: which agent, under what authority, with what data classification, and what was the delegation chain?

Prompt Injection Attacks

LLMs are vulnerable to adversarial inputs that override system instructions. Without runtime detection, an attacker can manipulate agent behavior through crafted prompts.

System Prompt Leakage

Tool outputs can inadvertently expose governance internals, manifest structures, or system instructions. No post-tool scanning means leaks go undetected.

Gartner AI TRiSM Framework
This governance system aligns with Gartner's AI Trust, Risk, and Security Management (TRiSM) framework:
  • Trust: Manifest-based identity with cryptographic signing
  • Risk: Policy engine with tool classification and delegation limits
  • Security: LLM threat detection, output validation, audit trails
  • Management: Metrics collection, SIEM integration, environment-aware policy

The governance framework addresses these challenges through six core subsystems working in concert: manifest identity, trust broker, policy engine, audit bus, memory governor, and LLM threat detector. Together they provide end-to-end governance from session start through tool execution to memory persistence.

2. Architecture Overview

The governance framework is implemented as a Claude Code plugin with hook-based enforcement. All components share a unified audit bus for event logging and a centralized policy file for configuration.

Governance Framework Architecture

Data Flow

1. Session Start

Load root agent manifest, generate session ID, initialize audit bus, emit MANIFEST_LOADED event, purge stale delegation registry entries, apply environment-aware policy overrides.

2. Tool Invocation (PreToolUse)

Scan input for prompt injection, classify tool risk tier, check manifest permissions, evaluate policy gate (allow/deny/human_gate), emit POLICY_CHECK or LLM_THREAT event.

3. Delegation (Task Tool)

Trust broker validates breadth/depth limits, checks classification ceiling, derives child manifest with parent constraints, issues delegation token, registers child manifest, emits DELEGATION_EVENT.

4. Memory Write

Memory governor classifies content, enforces agent ceiling, blocks restricted data, queues confidential writes for review, adds provenance tags (9 fields), emits MEMORY_WRITE event.

5. Tool Output (PostToolUse)

Output validator scans for system prompt leakage, sensitive data disclosure, governance artifacts, emits LLM_THREAT events for critical/high severity findings, records metrics.

6. Session End

Flush audit queue, export session events to JSONL, send session summary metrics, optionally archive old events per retention policy, deregister session manifests.

Fail-Open vs Fail-Closed Design
Hook failures (timeout, crash) fail-open to prevent system lock-up. Trust broker and policy engine fail-closed to prevent privilege escalation. Buffer fallback ensures audit events survive DB failures.

3. Key Components

3.1 Manifest System

Agent identity documents define trust level, data classification, permitted tools, and delegation rules. Manifests are YAML files stored in state/manifests/ with cryptographic signing for tamper evidence.

agent_id: security-analyst
manifest_id: gov-sec-analyst-v2
manifest_version: "2.1.0"
trust_level: 4                    # 1-5 scale
data_classification: confidential  # public | internal | confidential | restricted
permitted_tools:
  - "Read"
  - "Grep"
  - "Bash"
  - "mcp__*"                      # fnmatch wildcards supported
permitted_delegations:
  - "pentest-agent"
  - "compliance-*"
human_required: false
max_autonomy_depth: 3             # Delegation depth budget
max_delegation_count: 5           # Breadth limit per session
model_id: claude-opus-4-6
model_version: "4.6"

Manifest Resolution Logic

  1. Static + Parent: Load static manifest and enforce parent ceiling (intersect capabilities)
  2. Static Only: Root agent, use static manifest as authoritative
  3. Parent Only: No static manifest found, derive restrictive child from parent
  4. Neither: Default to trust_level=1, data_classification=public, no tools, no delegation

Parent ceiling enforcement is the security foundation: a child can never exceed its parent's trust level, data classification, or autonomy depth. This creates a monotonically decreasing privilege chain.

Field Type Ceiling Rule
trust_level Integer (1-5) min(static, parent)
data_classification Enum Lower classification wins
max_autonomy_depth Integer min(static, parent - 1)
permitted_tools List[pattern] Union (additive)
permitted_delegations List[pattern] Union (additive)

Cryptographic Signing

Manifests are signed with HMAC-SHA256 on load. The signing key is a 32-byte secret stored in state/.signing-key with 0600 permissions. Signature verification happens during manifest validation — tampered manifests fail validation and fall back to default-restrictive.

# Canonical manifest for signing (excludes volatile fields)
canonical = json.dumps(
    {k: v for k, v in manifest.items()
     if k not in {"manifest_signature", "manifest_hash",
                  "audit_session_id", "audit_parent_id"}},
    sort_keys=True, separators=(",", ":"))

signature = hmac.new(signing_key, canonical.encode(), hashlib.sha256).hexdigest()

3.2 Trust Broker

Mediates inter-agent delegation by validating breadth/depth limits, classification boundaries, and permitted targets. Issues delegation tokens for forensic linkage and registers resolved child manifests in a session-scoped registry.

Delegation Validation Rules

Breadth Limit

Query audit bus for previous DELEGATION_EVENT events from this agent in this session. If count >= max_delegation_count, deny with TRUST_DENY event. Prevents agent from spawning unlimited child tasks.

Depth Budget

Check parent's max_autonomy_depth. If <= 0, escalate to human gate with TRUST_DENY. Child receives parent.depth - 1, creating a monotonically decreasing budget.

Classification Ceiling

Child's data_classification cannot exceed parent's. If target manifest requests higher classification, deny delegation. Prevents data exfiltration via upward delegation.

Trust Escalation

Child's trust_level cannot exceed parent's. Defense-in-depth check — should be caught by ceiling enforcement, but trust broker validates independently.

Permitted Targets

Target agent ID must match a pattern in parent's permitted_delegations list (supports fnmatch wildcards). Prevents delegation to arbitrary agents.

Delegation Token

SHA-256 hash of session_id:parent_manifest_id:child_manifest_id:timestamp:nonce. Stored in DELEGATION_EVENT detail for forensic chain reconstruction.

Manifest Registry

A JSON file (state/manifest-registry.json) with file-lock-based concurrency control. Stores active child manifests keyed by session_id:agent_id. TTL purging (1 hour) removes stale entries. SessionStart hook purges on startup.

3.3 Policy Engine

Classifies tools into risk tiers (exempt/standard/elevated) and enforces manifest permissions + conductor tier matrix to decide allow/deny/human_gate. Unknown tools default to elevated (fail toward scrutiny).

Tool Risk Tiers

Tier Examples Audit Strategy Gate Logic
exempt Read, Glob, Grep, TaskList Async emit (nowait queue) Always allow, skip policy checks
standard Edit, Write, Task, Bash Sync audit if tier >= STANDARD Check manifest permissions, depth, human_required
elevated memory_store, NotebookEdit, MCP tools Always sync audit Check + tier matrix (MAJOR + elevated = human_gate)

Conductor Tier Matrix

Conductor plugin exports current task tier (TRIVIAL/MINOR/STANDARD/MAJOR) in conductor-state.json. Policy engine loads this and applies tier-specific gates:

# MAJOR tier + elevated tool = always human gate
if tool_tier == "elevated" and conductor_tier == "MAJOR":
    return PolicyDecision("human_gate",
                         "MAJOR task + elevated tool requires human approval")

# Synchronous audit for elevated or STANDARD+
if tool_tier == "elevated" or conductor_tier in ("STANDARD", "MAJOR"):
    audit_bus.emit(EventType.POLICY_CHECK, manifest, tool_name=tool_name)
else:
    audit_bus.emit_nowait(EventType.TOOL_INVOKED, manifest, tool_name=tool_name)

3.3.1 Tool Exemption Tiers & Practical Overhead

A common misconception is that governance adds "5 forms per git commit." In practice, most daily development work hits ZERO approval gates. Here's how it actually works:

Full Tier Definitions

Tier Tools Governance Overhead User Experience
Exempt Read, Glob, Grep, TaskList, TaskGet, AskUserQuestion None. No policy evaluation, no audit (async log only). Instant — zero latency added
Standard Edit, Write, Task, Bash, WebFetch, WebSearch Policy evaluated, audit logged, auto-allowed if manifest permits. ~50ms — imperceptible
Elevated memory_store, memory_forget, NotebookEdit, all mcp__MCP_DOCKER__*, all mcp__hostinger-mcp__* Full policy evaluation + conductor tier matrix check. Human gate triggered only at MAJOR tier. ~100ms, or human approval at MAJOR

Conductor Tier Matrix — When Human Approval Actually Triggers

Conductor Tier Exempt Tool Standard Tool Elevated Tool
TRIVIAL Allow Allow Allow
MINOR Allow Allow Allow
STANDARD Allow Allow (audited) Allow (audited)
MAJOR Allow Allow (audited) Human Gate

Key Insight

  • Human approval is ONLY required when an elevated tool is used during a MAJOR-tier conductor workflow
  • Normal coding session (TRIVIAL/MINOR): zero approval prompts
  • Significant feature build (STANDARD): zero approval prompts, full audit trail
  • Greenfield architecture (MAJOR): approval only for elevated tools

Real-World Workflow Example

Typical 5-Step Coding Session (STANDARD Tier)

Step Action Tool Tier Governance Result Overhead
1 Read files Exempt Skip — no evaluation 0ms
2 Edit code Standard Auto-allowed, audited ~50ms
3 Run tests Standard Auto-allowed, audited ~50ms
4 Write fix Standard Auto-allowed, audited ~50ms
5 Commit Standard Auto-allowed, audited ~50ms

Total governance overhead: ~200ms. Invisible.

Performance After Hook Consolidation

Before unified hook consolidation: 5–6 separate process spawns × ~100ms = 500–600ms overhead. After: 2–3 process spawns × ~100ms = 200–300ms.

Unknown Tools

Any tool not explicitly listed in the tier definitions defaults to elevated — the system fails toward scrutiny, not permissiveness. This ensures new or unexpected tools receive maximum governance evaluation until explicitly classified.

3.4 Audit Bus

SQLite database with WAL mode for concurrent writes. Bounded async queue (256 depth) for low-risk events, synchronous writes for critical events, JSON buffer fallback for database failures. Supports JSONL export and retention-based archival.

Event Types (21 Total)

Operational

  • TOOL_INVOKED
  • MEMORY_WRITE
  • MEMORY_READ
  • MANIFEST_LOADED
  • MANIFEST_DERIVED

Governance

  • POLICY_CHECK
  • POLICY_DENY
  • TRUST_CHECK
  • TRUST_DENY
  • DELEGATION_EVENT

Security

  • HUMAN_GATE
  • CIRCUIT_BREAK
  • LLM_THREAT
  • CONTEXT_PRESSURE
  • BUFFER_REPLAY

Agentic Security Events (6 New)

Agent Lifecycle

  • NHI_SPAWN — agent instance created with Non-Human Identity ID
  • NHI_TERMINATE — agent completed, failed, or killed

Runtime Protection

  • PROHIBITED_BEHAVIOR — kill switch triggered
  • DLP_VIOLATION — sensitive data blocked from MCP tools

External Validation

  • LAST_HOP_VIOLATION — external connection flagged
  • COST_THRESHOLD — token budget exceeded

Schema & Indexes

CREATE TABLE audit_events (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    event_id TEXT UNIQUE NOT NULL,
    timestamp TEXT NOT NULL,
    audit_session_id TEXT NOT NULL,
    event_type TEXT NOT NULL,
    agent_id TEXT NOT NULL,
    manifest_id TEXT,
    manifest_version TEXT,
    manifest_hash TEXT,
    trust_level INTEGER,
    data_classification TEXT,
    autonomy_depth_remaining INTEGER,
    tool_name TEXT,
    task_id TEXT,
    target_agent_id TEXT,
    context_hash TEXT,
    detail TEXT,               -- JSON-encoded event-specific fields
    outcome TEXT               -- allow | deny | escalate | warn
);

CREATE INDEX idx_audit_session ON audit_events(audit_session_id);
CREATE INDEX idx_audit_timestamp ON audit_events(timestamp);
CREATE INDEX idx_audit_agent ON audit_events(agent_id);
CREATE INDEX idx_audit_type ON audit_events(event_type);

Buffered Fallback

When SQLite writes fail (locked, disk full, corrupted), events are appended to state/audit-buffer.jsonl. On next startup, buffer is renamed to .replaying, events are replayed to database, then buffer is deleted. This ensures zero event loss even during database failures.

3.5 Memory Governor

Intercepts memory writes via PreToolUse hook on mcp__claude-memory__memory_store. Classifies content using regex patterns, enforces agent classification ceiling, blocks restricted data, queues confidential writes for review, adds 9-field provenance tags.

Classification Patterns

restricted:
  - '\b(password|secret|api[_-]?key|private[_-]?key|token|credential)\s*[:=]\s*\S+'
  - '\b\d{3}-\d{2}-\d{4}\b'                    # SSN pattern
  - '-----BEGIN\s+(RSA|EC|PRIVATE)\s+KEY-----'
  - '\b(bearer\s+[a-zA-Z0-9\-._~+/]+=*)\b'

confidential:
  - '\b(internal[_-]?only|do[_-]?not[_-]?share|proprietary|confidential)\b'
  - '\bCVE-\d{4}-\d{4,7}\b'                   # Vulnerability IDs
  - '\b(salary|compensation|revenue|profit)\s*[:=$]'
  - '\b(ssn|social[_-]?security|tax[_-]?id)\b'

internal:
  - '\b(prod(uction)?|staging)\s+\b(server|host|endpoint|cluster)\b'
  - '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b' # IP addresses
  - '\b[a-zA-Z0-9\-]+\.(internal|corp|local)\b'

Governance Decisions

Classification Agent Ceiling Check Action Audit Event
public N/A Allow with provenance tags MEMORY_WRITE (allow)
internal Agent must be internal+ Allow with provenance tags MEMORY_WRITE (allow)
confidential Agent must be confidential+ Queue for review, persist with pending_review tag HUMAN_GATE (escalate)
restricted Always exceeds ceiling Block (do not persist) POLICY_DENY (deny)

Provenance Tags (9 Fields)

provenance = {
    "gov_manifest_id": manifest["manifest_id"],
    "gov_agent_id": manifest["agent_id"],
    "gov_manifest_version": manifest["manifest_version"],
    "gov_manifest_hash": manifest["manifest_hash"],
    "gov_trust_level": manifest["trust_level"],
    "gov_classification": manifest["data_classification"],
    "gov_session_id": manifest["audit_session_id"],
    "gov_task_id": manifest.get("task_id"),
    "gov_timestamp": datetime.now(timezone.utc).isoformat(),
}

These tags are merged into the metadata field of the memory_store tool input, persisting alongside the content in Qdrant. This enables provenance-based memory queries (e.g., "show me all memories written by security-analyst agent in session X").

3.6 LLM Threat Detector (OWASP LLM Top 10)

Scans tool inputs for prompt injection attempts and tool outputs for system prompt leakage and sensitive data disclosure. Uses 30 prompt injection patterns and 24 system leakage patterns across three severity levels (critical/high/medium).

Prompt Injection Detection

Critical Patterns

  • Delimiter injection: </system>, <|im_start|>, [INST]
  • Direct override: "ignore all previous instructions"
  • Role hijacking: "you are now a", "act as an admin"
  • Instruction leakage: "show me your system instructions"

High Patterns

  • Indirect override: "bypass all security", "disable rules"
  • Base64 encoding: aWdub3JlIHByZXZpb3Vz (ignore previous)
  • Instruction leakage: "what are your original instructions"
  • Repeat attacks: "repeat your system prompts"

Encoding Attack Detection (Agentic Security)

Standard prose-based defenses fail against creative encoding. The detector now handles multi-encoding injection attempts that bypass pattern-matching by obfuscating payloads.

Encoding Patterns (High Severity)

  • Long base64 blocks — 20+ character base64 strings decoded and re-scanned
  • Morse code patterns — dots and dashes sequences
  • Hex-encoded sequences\x41\x42\x43\x44 chains
  • HTML entity encoding&#105;&#103; chains
  • URL encoding chains%69%67%6E%6F sequences

Indirect Injection (Medium Severity)

  • Zero-click triggers — "when you read this", "if the agent encounters this"
  • Hidden instructions — "instructions for the AI", "hidden instructions"
  • User concealment — "do not show the user", "do not reveal"
  • Character-separated — "i.g.n.o.r.e" collapsed and keyword-matched
# Base64 decode and re-scan
for match in b64_pattern.finditer(content):
    decoded = base64.b64decode(match.group()).decode("utf-8", errors="ignore")
    for pattern in PROMPT_INJECTION_PATTERNS["critical"] + PROMPT_INJECTION_PATTERNS["high"]:
        if re.search(pattern, decoded, re.IGNORECASE):
            return ThreatDetection(detected=True, severity="high",
                detail="Encoded prompt injection (base64 decoded)")

# Character-separated injection collapse
sep_pattern = re.compile(r'([a-zA-Z])[.\-_\s]{1,2}(?:[a-zA-Z][.\-_\s]{1,2}){4,}[a-zA-Z]')
for match in sep_pattern.finditer(content):
    collapsed = re.sub(r'[.\-_\s]+', '', match.group()).lower()
    if any(kw in collapsed for kw in ["ignore", "disregard", "override", "bypass"]):
        return ThreatDetection(detected=True, severity="high",
            detail=f"Character-separated injection: '{collapsed}'")

MCP Input DLP Scanning

A new scan_mcp_input() method intercepts MCP tool inputs before they reach external services, checking content against classification patterns to prevent data exfiltration.

Classification Detection Action Audit Event
restricted Block MCP call, emit alert DLP_VIOLATION (critical)
confidential Block MCP call, emit alert DLP_VIOLATION (high)
internal / public Allow, audit trail only POLICY_CHECK

System Leakage Detection

SYSTEM_LEAKAGE_PATTERNS = {
    "critical": [
        r'governance/lib/\w+\.py',           # File paths
        r'state/manifests/',
        r'\bmanifest_hash\s*[:=]',           # Manifest internals
        r'\btrust_level\s*[:=]\s*\d+',
        r'\bdata_classification\s*[:=]\s*(public|internal|confidential|restricted)',
        r'\baudelegation_token\s*[:=]',
        r'agent_id\s*:\s*\w+',               # YAML structure
        r'permitted_tools\s*:',
    ],
    "high": [
        r'\bgovernance\.lib\.',              # Module references
        r'\bpolicy_engine\b',
        r'\btrust_broker\b',
        r'gov-[a-z]+-[0-9a-f]{8}',          # Session IDs
    ],
    "medium": [
        r'\bgovernance\s+plugin\b',          # Generic terms
        r'\bmanifest\s+registry\b',
    ],
}

Threat Response Actions

Severity Input Scan Action Output Scan Action Audit Event
critical Block tool execution Block output, emit alert LLM_THREAT (block)
high Block tool execution Block output, emit alert LLM_THREAT (block)
medium Log warning, allow Log warning, allow LLM_THREAT (warn)

3.7 SIEM Integration & Alerting

Sends governance security events to external monitoring systems via webhook (n8n compatible) and syslog (Wazuh compatible). Alerting is fail-open — failures never block governance operations.

Alert Triggers

Five event types trigger alerts: policy_deny, trust_deny, circuit_break, human_gate, llm_threat. All other events are audit-only.

Webhook Payload (n8n)

{
  "source": "governance",
  "timestamp": "2026-03-17T14:23:45.123456Z",
  "event_type": "llm_threat",
  "agent_id": "security-analyst",
  "tool_name": "Bash",
  "outcome": "block",
  "detail": {
    "threat_type": "prompt_injection",
    "severity": "critical",
    "pattern_matched": "\\bignore\\s+all\\s+previous\\s+instructions\\b",
    "scan_type": "input",
    "detail": "Detected prompt injection pattern in content"
  },
  "session_id": "gov-sess-a4f8d2c1",
  "manifest_id": "gov-sec-analyst-v2",
  "trust_level": 4
}

Syslog Format (RFC 5424)

<131>1 2026-03-17T14:23:45.123456Z governance claude-code - - - \
event_type=llm_threat agent_id=security-analyst tool_name=Bash \
outcome=block session_id=gov-sess-a4f8d2c1

PRI calculation: facility * 8 + severity. Facility defaults to local0 (16). Severity is 3 (error) for deny/block, 4 (warning) for escalate/warn. Wazuh can parse these via custom decoder rules.

3.8 Metrics Collection

Separate SQLite database (state/governance-metrics.db) tracks operational metrics: agent success/failure rates, gate trigger frequency, confidence scores, delegation depth, circuit breaker activations. Enables drift detection and performance monitoring.

CREATE TABLE governance_metrics (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    session_id TEXT NOT NULL,
    metric_type TEXT NOT NULL,     -- agent_success | gate_trigger | delegation_depth
    agent_id TEXT,
    value REAL NOT NULL,
    metadata TEXT                   -- JSON for additional context
);

Example Metrics

3.9 Environment-Aware Policy

Detects environment from hostname or GOVERNANCE_ENV env var, loads base policy from governance-policy.yaml, then applies environment-specific overrides from state/env-overrides/<env>.yaml. Enables different retention policies, gate enforcement modes, and alerting configs per environment.

# state/env-overrides/production.yaml
retention:
  audit_events:
    retention_days: 365      # Override from base 90 days
  metrics:
    retention_days: 730

alerting:
  enabled: true
  webhook:
    enabled: true
    url: "https://n8n.example.com/webhook/governance"
  syslog:
    enabled: true
    host: "siem.example.com"
    port: 514
Centralized Policy (Single Source of Truth)
All governance components reference state/governance-policy.yaml. No scattered config files. Tool tiers, classification patterns, gate rules, tier matrix, and retention policies all in one place.

3.10 Non-Human Identity (NHI) Instance Tracking

Every agent invocation receives a unique Non-Human Identity (NHI) instance ID following the pattern nhi_{agent}_{YYYYMMDD}_{hex8}. This extends the manifest system from static identity to per-invocation lifecycle tracking — enabling audit correlation, cost attribution, and forensic reconstruction of multi-agent workflows.

NHI Lifecycle

Event Trigger Audit Type Data Recorded
Spawn Task tool dispatches agent NHI_SPAWN nhi_id, agent_id, parent_nhi_id, manifest_id, spawn timestamp
Active Agent executing tools All tool events tagged with nhi_id
Terminate Task completes or fails NHI_TERMINATE nhi_id, exit_reason (success/failure/timeout), duration, tool_count
# NHI ID generation (in pre_tool_check.py)
import secrets, datetime

def generate_nhi_id(agent_id: str) -> str:
    date = datetime.datetime.utcnow().strftime("%Y%m%d")
    hex8 = secrets.token_hex(4)
    return f"nhi_{agent_id}_{date}_{hex8}"

# Example: nhi_security-analyst_20260317_a4f8d2c1

NHI IDs propagate through delegation chains. When a parent agent spawns a child via the Task tool, the child's NHI_SPAWN event records parent_nhi_id, creating a traceable invocation tree. The conductor's agent_instances registry in state tracks all active NHIs.

3.11 MCP Firewall (DLP Screening)

External MCP tool calls (mcp__*) pass through a Data Loss Prevention (DLP) firewall before execution. The firewall intercepts all MCP tool inputs in the PreToolUse hook and scans content against classification patterns to prevent sensitive data exfiltration to external services.

MCP Firewall Decision Matrix

Content Classification Action Audit Event Severity
Restricted Block — hard stop DLP_VIOLATION critical
Confidential Block — require human approval DLP_VIOLATION high
Internal Allow with audit trail TOOL_ALLOWED (logged)
Public Allow
# MCP Firewall (in pre_tool_check.py)
def _check_mcp_firewall(tool_name, tool_input, manifest, threat_detector, audit_bus):
    """Block MCP tools from exfiltrating classified data."""
    if not tool_name.startswith("mcp__"):
        return None  # Not an MCP tool

    # Extract MCP server name for audit trail
    parts = tool_name.split("__")
    mcp_server = parts[1] if len(parts) >= 2 else "unknown"

    # Run DLP scan against classification patterns
    detection = threat_detector.scan_mcp_input(tool_name, tool_input, manifest)

    if detection.detected:
        audit_bus.emit(EventType.DLP_VIOLATION, manifest,
            tool_name=tool_name, outcome="block",
            detail={"mcp_server": mcp_server, ...detection})
        return {"decision": "block", "reason": detection.detail}

    return None  # Allow

3.12 Last-Hop Validation

PostToolUse hook validates that Bash commands containing external connectivity tools (ssh, curl, wget, psql, mysql, redis-cli) target approved destinations. Prevents agents from establishing unauthorized external connections even when the command itself passes PreToolUse policy checks.

Detection Pattern

The validator extracts destination hostnames and IPs from command arguments using regex patterns. These are compared against an allowlist in governance-policy.yaml. Connections to unlisted destinations emit LAST_HOP_VIOLATION audit events with the extracted destination for forensic review.

# Last-hop validation (in output_validator.py)
EXTERNAL_TOOLS = re.compile(
    r'\b(ssh|curl|wget|psql|mysql|redis-cli)\b')

def _check_last_hop(command: str, manifest: dict, audit_bus) -> Optional[str]:
    """Validate external connection destinations."""
    if not EXTERNAL_TOOLS.search(command):
        return None

    # Extract destination (hostname/IP after tool keyword)
    destinations = _extract_destinations(command)
    allowed = load_allowed_destinations()

    for dest in destinations:
        if dest not in allowed:
            audit_bus.emit(EventType.LAST_HOP_VIOLATION, manifest,
                tool_name="Bash", outcome="warn",
                detail={"command_tool": match, "destination": dest})
            return f"Unrecognized destination: {dest}"

    return None

3.13 Prohibited Behavior Monitoring

Hard-coded kill switches for behaviors that should never occur in any context, regardless of manifest permissions or trust level. Unlike policy engine rules (which can vary by tier), prohibited behaviors are absolute — no override, no exception, no human approval path.

Prohibited Behaviors

  • Network tunnelingssh -L, ssh -R, ssh -D, socat, ncat tunnel creation
  • Credential exfiltration — reading .env, credentials.json, .signing-key and sending externally
  • Audit trail modification — any DELETE or UPDATE on audit.db, audit-buffer.jsonl
  • Unauthorized API calls — direct HTTP calls to governance/conductor internal endpoints

Enforcement

  • Pattern matching in PreToolUse (before policy engine)
  • Emits PROHIBITED_BEHAVIOR audit event
  • Always severity critical
  • Always outcome block
  • Cannot be overridden by manifest permissions
  • Cannot be approved via human gate

3.14 External Audit Sink

Critical governance events are emitted to an external syslog endpoint (Wazuh compatible) in addition to the local SQLite audit bus. This provides a tamper-resistant audit trail — even if an agent compromises the local audit.db, the external copy persists.

Sink Architecture

The audit bus's alerting callback (Section 3.7) is extended to emit all security-relevant events to syslog, not just the 5 alert types. Events flow: governance hook → audit_bus.emit() → SQLite WAL → alerting callback → syslog UDP. The sink is fail-open — syslog delivery failures never block governance operations. A local buffer at state/syslog-buffer.jsonl captures events during syslog outages for replay.

Defense in Depth
Local audit (SQLite) + external audit (syslog/Wazuh) + metrics (separate SQLite) creates three independent records of governance activity. An attacker would need to compromise all three simultaneously to erase evidence.

3.15 Cost Tracking (Denial of Wallet)

Tracks token consumption per agent invocation and per workflow to prevent runaway costs. Each NHI instance accumulates token counts (input + output) from tool calls. When cumulative cost exceeds configurable thresholds, the system halts the workflow and escalates to the operator.

Cost Thresholds

Scope Threshold Action Audit Event
Per-agent invocation Configurable (default: 100K tokens) Warn at 80%, halt at 100% COST_THRESHOLD
Per-workflow total Configurable (default: 500K tokens) Warn at 80%, halt at 100% COST_THRESHOLD
Per-session Configurable (default: 1M tokens) Hard stop, require restart COST_THRESHOLD

Cost data is recorded in conductor-state.json under the cost_tracking property, which includes token_budget, tokens_used, cost_estimate_usd, and halt_on_exceed flag. The conductor orchestrator checks thresholds before each agent dispatch.

3.16 Cryptographic Bill of Materials (C-BOM)

The CISO agent generates a Cryptographic Bill of Materials — a comprehensive inventory of all cryptographic implementations in the target project. This addresses post-quantum cryptography (PQC) readiness assessment and crypto-agility requirements from frameworks like NIST SP 800-131A and the NSA CNSA 2.0 suite.

C-BOM Inventory Scope

Category What's Inventoried PQC Risk Level
Symmetric encryption AES modes, key sizes, implementations Low (quantum-resistant)
Asymmetric encryption RSA, ECDSA, Ed25519 key sizes and usage Critical (quantum-vulnerable)
Hash functions SHA-256, SHA-3, HMAC implementations Low-Medium
Key exchange DH, ECDH, X25519 protocols Critical (quantum-vulnerable)
TLS/mTLS Protocol versions, cipher suites, certificate chains Varies by suite
Random number generation CSPRNG usage, entropy sources Low

The C-BOM output includes a PQC readiness assessment table mapping each crypto implementation to its NIST PQC migration priority (Harvest Now/Decrypt Later risk), recommended quantum-safe alternative (ML-KEM, ML-DSA, SLH-DSA), and estimated migration effort. This is stored in conductor-state.json under project_characteristics.crypto_implementations and project_characteristics.pqc_readiness.

4. Requirements

4.1 Manifest Identity

REQ-GOV-001: Manifest Fields

Every agent manifest MUST include: agent_id, manifest_id, manifest_version, trust_level (1-5), data_classification (public/internal/confidential/restricted), permitted_tools (list of fnmatch patterns), permitted_delegations (list), human_required (bool), max_autonomy_depth (int), max_delegation_count (int).

REQ-GOV-002: Cryptographic Signing

Manifests MUST be signed with HMAC-SHA256 using a 32-byte signing key stored in state/.signing-key. Signatures MUST be verified on manifest load. Tampered manifests MUST fail validation and fall back to default-restrictive manifest.

REQ-GOV-003: Manifest Hash

Every manifest MUST include a SHA-256 hash of canonicalized JSON content (excluding manifest_signature and manifest_hash fields). Hash MUST be recomputed on each load to detect tampering.

REQ-GOV-004: Parent Ceiling Enforcement

When resolving a child manifest with parent context, the system MUST enforce: trust_level = min(static, parent), data_classification = lower(static, parent), max_autonomy_depth = min(static, parent - 1). Child capabilities MUST be monotonically decreasing along delegation chains.

REQ-GOV-005: Model Inventory

Manifests MUST track model_id and model_version fields. SessionStart hook MUST populate these from runtime context. Audit events MUST include manifest_version for model drift analysis.

4.2 Trust Broker

REQ-GOV-006: Breadth Limit

Trust broker MUST query audit bus for DELEGATION_EVENT count from source agent in current session. If count >= max_delegation_count, MUST deny with TRUST_DENY event and reason "delegation_count_exceeded".

REQ-GOV-007: Depth Budget

Trust broker MUST check parent max_autonomy_depth. If <= 0, MUST escalate to human_gate with TRUST_DENY event. Child MUST receive parent.depth - 1 in resolved manifest.

REQ-GOV-008: Classification Boundary

Trust broker MUST deny delegation if target data_classification exceeds parent data_classification (using CLASSIFICATION_ORDER = ["public", "internal", "confidential", "restricted"]). MUST emit TRUST_DENY with reason "classification_boundary_violation".

REQ-GOV-009: Trust Escalation

Trust broker MUST deny delegation if target trust_level exceeds parent trust_level. MUST emit TRUST_DENY with reason "trust_escalation_attempt".

REQ-GOV-010: Permitted Delegations

Trust broker MUST check if target agent_id matches any pattern in parent's permitted_delegations list (using fnmatch). If no match, MUST deny with TRUST_DENY and reason "delegation_target_not_permitted".

REQ-GOV-011: Delegation Token

Trust broker MUST issue a unique delegation token for each approved delegation, computed as SHA-256(session_id:parent_manifest_id:target_manifest_id:timestamp:nonce)[:24]. Token MUST be stored in DELEGATION_EVENT detail field.

REQ-GOV-012: Manifest Registry

Trust broker MUST maintain a session-scoped registry of active child manifests at state/manifest-registry.json. Registry operations MUST use file locking (fcntl.flock) for concurrency safety. TTL purging (1 hour) MUST remove stale entries.

4.3 Policy Engine

REQ-GOV-013: Tool Classification

Policy engine MUST classify tools into exempt/standard/elevated tiers using state/tool-tiers.yaml. Unknown tools MUST default to "elevated" tier (fail toward scrutiny).

REQ-GOV-014: Exempt Tools

Exempt tools (Read, Glob, Grep, TaskList, TaskGet) MUST always be allowed with async audit emission (nowait queue). Policy checks MUST be skipped for exempt tools.

REQ-GOV-015: Permitted Tools Check

For standard and elevated tools, policy engine MUST verify tool_name matches a pattern in manifest.permitted_tools (using fnmatch). If no match, MUST deny with POLICY_DENY and reason "tool_not_permitted".

REQ-GOV-016: Autonomy Depth Gate

Policy engine MUST check manifest.max_autonomy_depth. If <= 0, MUST emit CIRCUIT_BREAK and return human_gate decision with reason "autonomy_depth_exhausted".

REQ-GOV-017: Manifest Human Gate

Policy engine MUST check manifest.human_required. If true, MUST emit HUMAN_GATE and return human_gate decision with reason "agent manifest requires human approval".

REQ-GOV-018: Conductor Tier Matrix

Policy engine MUST load conductor_tier from conductor-state.json. If tier is "MAJOR" and tool tier is "elevated", MUST emit HUMAN_GATE and return human_gate decision with reason "MAJOR task + elevated tool requires human approval".

REQ-GOV-019: Sync Audit Strategy

Policy engine MUST emit synchronous POLICY_CHECK events for elevated tools or when conductor_tier is STANDARD or MAJOR. All other allowed tools MUST use async emit_nowait.

4.4 Audit Bus

REQ-GOV-020: SQLite WAL Mode

Audit bus MUST use SQLite with WAL (Write-Ahead Logging) mode and NORMAL synchronous mode for concurrent write safety and performance. Database MUST be at state/audit.db.

REQ-GOV-021: Event Types

Audit bus MUST support 15 event types: TOOL_INVOKED, DELEGATION_EVENT, CONTEXT_PRESSURE, MEMORY_WRITE, MEMORY_READ, POLICY_CHECK, POLICY_DENY, HUMAN_GATE, MANIFEST_LOADED, MANIFEST_DERIVED, TRUST_CHECK, TRUST_DENY, CIRCUIT_BREAK, BUFFER_REPLAY, LLM_THREAT.

REQ-GOV-022: Bounded Queue

Audit bus MUST implement a bounded queue (256 depth) with single daemon worker thread for async event emission. If queue is full, MUST fall back to synchronous emit.

REQ-GOV-023: Buffer Fallback

When SQLite writes fail, audit bus MUST append events to state/audit-buffer.jsonl. On next SessionStart, MUST replay buffered events to database and delete buffer file.

REQ-GOV-024: Event Schema

Every audit event MUST include: event_id (UUID), timestamp (ISO 8601), audit_session_id, event_type, agent_id, manifest_id, manifest_version, manifest_hash, trust_level, data_classification, autonomy_depth_remaining, tool_name, task_id, target_agent_id, context_hash, detail (JSON), outcome (allow/deny/escalate/warn).

REQ-GOV-025: Retention & Archival

Audit bus MUST support retention-based purging with configurable retention_days (default 90). Old events MUST be archived to JSONL before deletion. Archive path defaults to state/archive/audit-archive-YYYYMMDD.jsonl.

4.5 Memory Governor

REQ-GOV-026: Content Classification

Memory governor MUST classify content using regex patterns from state/classification-patterns.yaml. MUST scan restricted patterns first, then confidential, then internal. Highest match wins. Unmatched content defaults to "public".

REQ-GOV-027: Classification Ceiling

Memory governor MUST deny writes if content classification exceeds agent data_classification ceiling (using CLASSIFICATION_ORDER). MUST emit POLICY_DENY with reason "classification_ceiling_exceeded".

REQ-GOV-028: Restricted Block

Memory governor MUST block (not persist) all restricted content with POLICY_DENY event and reason "restricted_content_blocked". Human approval via /governance-review required before storage.

REQ-GOV-029: Confidential Queue

Memory governor MUST allow confidential writes to proceed but tag with gov_approval_status="pending_review". MUST emit HUMAN_GATE event with reason "confidential_write_queued".

REQ-GOV-030: Provenance Tags

Memory governor MUST add 9 provenance fields to all memory writes: gov_manifest_id, gov_agent_id, gov_manifest_version, gov_manifest_hash, gov_trust_level, gov_classification, gov_session_id, gov_task_id, gov_timestamp.

4.6 LLM Threat Detection

REQ-GOV-031: Input Scanning

LLM threat detector MUST scan all Task, Bash, and Skill tool inputs for prompt injection patterns. MUST check critical patterns first (30 total), then high, then medium. First match determines severity.

REQ-GOV-032: Output Scanning

LLM threat detector MUST scan Write, Edit, Bash, NotebookEdit outputs for system prompt leakage (24 patterns) and sensitive data disclosure (using classification patterns). MUST emit LLM_THREAT events for all detections.

REQ-GOV-033: Severity Response

For critical/high severity threats, MUST block tool execution (input scan) or block output (output scan) and emit LLM_THREAT with outcome="block". For medium severity, MUST log warning with outcome="warn" and allow execution.

REQ-GOV-034: OWASP LLM Top 10 Coverage

Threat detector MUST address OWASP LLM01 (Prompt Injection), LLM06 (Sensitive Information Disclosure), and LLM09 (Overreliance) through pattern-based detection and output validation.

4.7 SIEM & Alerting

REQ-GOV-035: Alert Event Types

Alerting service MUST send alerts for: policy_deny, trust_deny, circuit_break, human_gate, llm_threat. All other events are audit-only.

REQ-GOV-036: Webhook Format

Webhook alerts MUST POST JSON with fields: source="governance", timestamp, event_type, agent_id, tool_name, outcome, detail (parsed from JSON), session_id, manifest_id, trust_level. Timeout MUST be 5 seconds.

REQ-GOV-037: Syslog Format

Syslog alerts MUST use RFC 5424 format with facility=local0 (16) and severity=3 (error) for deny/block, severity=4 (warning) for escalate/warn. MUST send via UDP.

REQ-GOV-038: Fail-Open

Alerting failures (network timeout, unreachable host) MUST NOT block governance operations. MUST silently continue after logging error.

4.8 Metrics & Monitoring

REQ-GOV-039: Metrics Database

Metrics collector MUST use separate SQLite database at state/governance-metrics.db with schema: timestamp, session_id, metric_type, agent_id, value (float), metadata (JSON).

REQ-GOV-040: Metric Types

MUST support metric types: agent_success, gate_trigger, delegation_depth, circuit_break, confidence_score. Additional types MAY be added without schema migration.

REQ-GOV-041: Session Summary

Metrics collector MUST provide session summary aggregation: count, average, and max value per metric_type for a given session_id.

4.9 Environment & Policy

REQ-GOV-042: Environment Detection

MUST detect environment from GOVERNANCE_ENV env var, falling back to hostname detection. MUST support: local, staging, production, c2. Default to "local" if unresolvable.

REQ-GOV-043: Centralized Policy

All governance components MUST reference state/governance-policy.yaml for: tool_tiers, classification_patterns, gate rules, tier_gate_matrix, retention policies. NO scattered config files.

REQ-GOV-044: Environment Overrides

MUST support environment-specific policy overrides at state/env-overrides/<env>.yaml. Overrides MUST deep-merge into base policy. Override values win on conflicts.

4.10 Hook Integration

REQ-GOV-045: SessionStart Hook

MUST load root agent manifest, generate session ID, initialize audit bus, emit MANIFEST_LOADED event, purge stale registry entries, apply retention policy archival.

REQ-GOV-046: PreToolUse Hook

MUST scan for prompt injection (threat detector), classify tool tier (policy engine), check manifest permissions, evaluate delegation (trust broker for Task tool), emit appropriate audit events.

REQ-GOV-047: PostToolUse Hook

MUST scan output for leakage (threat detector on Write/Edit/Bash/NotebookEdit), deregister completed tasks (trust broker), record metrics, emit LLM_THREAT events if detected.

REQ-GOV-048: Hook Timeout

All hooks MUST have timeout limits: SessionStart 10s, PreToolUse 10s, PostToolUse 10s. Timeout failures MUST fail-open (allow operation to proceed).

4.11 Agentic Security Hardening

REQ-GOV-049: NHI Instance ID

Every agent invocation MUST receive a unique NHI instance ID following the pattern nhi_{agent_id}_{YYYYMMDD}_{hex8}. NHI IDs MUST propagate through delegation chains via parent_nhi_id. All audit events for the invocation MUST include the NHI ID.

REQ-GOV-050: NHI Lifecycle Events

Audit bus MUST emit NHI_SPAWN when an agent is dispatched (recording nhi_id, agent_id, parent_nhi_id, manifest_id) and NHI_TERMINATE when the agent completes (recording exit_reason, duration_seconds, tool_count).

REQ-GOV-051: MCP DLP Firewall

All MCP tool calls (mcp__*) MUST pass through DLP screening in PreToolUse. Tool inputs MUST be scanned against classification patterns. Restricted data MUST be blocked (critical). Confidential data MUST be blocked (high). The MCP server name MUST be extracted and included in DLP_VIOLATION audit events.

REQ-GOV-052: Encoding Attack Detection

Threat detector MUST detect prompt injection hidden in encoded content: base64 blocks (decode and re-scan against critical/high patterns), character-separated text (collapse separators and check for injection keywords), hex-encoded sequences, HTML entity encoding, and URL encoding chains.

REQ-GOV-053: Last-Hop Validation

PostToolUse hook MUST validate Bash commands containing external connectivity tools (ssh, curl, wget, psql, mysql, redis-cli) against an approved destination allowlist. Unapproved destinations MUST emit LAST_HOP_VIOLATION audit events.

REQ-GOV-054: Prohibited Behaviors

The following behaviors MUST be unconditionally blocked regardless of manifest permissions or trust level: network tunneling (ssh -L/-R/-D, socat, ncat), credential exfiltration (reading secrets and sending externally), audit trail modification (DELETE/UPDATE on audit.db), unauthorized internal API calls. Violations MUST emit PROHIBITED_BEHAVIOR events with severity=critical.

REQ-GOV-055: External Audit Sink

All security-relevant audit events MUST be emitted to an external syslog endpoint in addition to local SQLite storage. Syslog delivery MUST be fail-open. A local buffer at state/syslog-buffer.jsonl MUST capture events during syslog outages for replay.

REQ-GOV-056: Cost Tracking

Token consumption MUST be tracked per NHI instance and per workflow. Configurable thresholds MUST trigger warnings at 80% and halts at 100%. COST_THRESHOLD audit events MUST include scope (agent/workflow/session), threshold_pct, tokens_used, and token_budget.

REQ-GOV-057: C-BOM Generation

The CISO agent MUST generate a Cryptographic Bill of Materials at STANDARD+ tiers, inventorying all cryptographic implementations (symmetric, asymmetric, hash, key exchange, TLS, RNG). Output MUST include PQC readiness assessment with NIST migration priority, recommended quantum-safe alternatives, and estimated migration effort per implementation.

5. Complete Build Prompt

Copy this prompt and give it to Claude Code in a fresh project directory. It will generate the entire governance framework with all components, hooks, config files, and test fixtures.

I need you to build a complete Agent Governance & Trust Framework for Claude Code as a plugin.

## Architecture

Create a plugin at `governance/` with the following structure:

```
governance/
├── __init__.py
├── lib/
│   ├── __init__.py
│   ├── manifest.py           # Agent identity, signing, resolution
│   ├── trust_broker.py       # Delegation mediation, manifest registry
│   ├── policy_engine.py      # Tool classification, gate logic
│   ├── audit_bus.py          # SQLite event store with WAL
│   ├── memory_governor.py    # Content classification, provenance
│   ├── llm_threat_detector.py # OWASP LLM Top 10 patterns
│   ├── alerting.py           # Webhook + syslog
│   ├── metrics_collector.py  # Performance monitoring
│   ├── env_policy.py         # Environment detection
│   └── manifest_signing.py   # HMAC-SHA256 signing
├── hooks/
│   ├── hooks.json            # Hook registration
│   ├── session_start.py
│   ├── pre_tool_check.py
│   ├── approval_gate_hook.py
│   ├── post_task_cleanup.py
│   └── output_validator.py
├── state/
│   ├── governance-policy.yaml
│   ├── tool-tiers.yaml
│   ├── classification-patterns.yaml
│   ├── manifests/
│   │   ├── root.yaml
│   │   └── security-analyst.yaml
│   └── env-overrides/
│       └── production.yaml
└── tests/
    └── test_governance.py
```

## Core Components

### 1. Manifest System (lib/manifest.py)

Implement:
- Load static YAML manifests from state/manifests/
- HMAC-SHA256 signing with key at state/.signing-key
- SHA-256 content hash for tamper detection
- Parent ceiling enforcement: trust_level, data_classification, autonomy_depth
- Derive restrictive child manifests for unknown agents
- Validation against required fields

Required manifest fields:
- agent_id, manifest_id, manifest_version
- trust_level (1-5), data_classification (public/internal/confidential/restricted)
- permitted_tools (list of fnmatch patterns)
- permitted_delegations (list)
- human_required (bool)
- max_autonomy_depth (int), max_delegation_count (int)
- model_id, model_version

### 2. Trust Broker (lib/trust_broker.py)

Implement:
- Delegation validation: breadth limit (query audit bus for session count)
- Depth budget check (parent.depth - 1)
- Classification boundary (child cannot exceed parent)
- Trust escalation prevention
- Permitted delegation target check (fnmatch)
- Delegation token generation (SHA-256 hash)
- ManifestRegistry with file locking (fcntl), TTL purging (1 hour)

### 3. Policy Engine (lib/policy_engine.py)

Implement:
- Tool classification: exempt/standard/elevated from tool-tiers.yaml
- Unknown tools default to "elevated"
- Manifest permission check (fnmatch)
- Autonomy depth gate
- Human_required gate
- Conductor tier matrix: MAJOR + elevated = human_gate
- Sync audit for elevated/STANDARD+, async for others

### 4. Audit Bus (lib/audit_bus.py)

Implement:
- SQLite with WAL mode at state/audit.db
- 21 event types (EventType enum, including NHI_SPAWN, NHI_TERMINATE, PROHIBITED_BEHAVIOR, DLP_VIOLATION, LAST_HOP_VIOLATION, COST_THRESHOLD)
- Bounded queue (256 depth) with daemon worker
- Buffer fallback to state/audit-buffer.jsonl
- Buffer replay on SessionStart
- Schema with all required fields + indexes
- query() method with filterable columns
- export_jsonl() for session export
- purge_old_events() with archival to state/archive/
- Alerting callback integration
- Metrics callback integration

### 5. Memory Governor (lib/memory_governor.py)

Implement:
- Content classification via regex from classification-patterns.yaml
- Scan restricted first, then confidential, then internal
- Ceiling check: block if content classification exceeds agent ceiling
- Restricted = block (emit POLICY_DENY)
- Confidential = queue with pending_review tag (emit HUMAN_GATE)
- Public/internal = allow with provenance tags
- 9 provenance fields: gov_manifest_id, gov_agent_id, gov_manifest_version,
  gov_manifest_hash, gov_trust_level, gov_classification, gov_session_id,
  gov_task_id, gov_timestamp

### 6. LLM Threat Detector (lib/llm_threat_detector.py)

Implement:
- 30+ prompt injection patterns (critical/high/medium)
- 24 system leakage patterns (critical/high/medium)
- Input scanning for Task, Bash, Skill tools
- Output scanning for Write, Edit, Bash, NotebookEdit
- Sensitive disclosure using classification patterns
- Emit LLM_THREAT events
- Block on critical/high, warn on medium
- MCP DLP screening: scan_mcp_input() for mcp__* tools
- Encoding attack detection: _scan_encoded_injection()
  - Base64 decode and re-scan against critical/high patterns
  - Character-separated collapse (e.g., "i.g.n.o.r.e" → "ignore")
  - Hex, HTML entity, URL encoding chain detection
- Indirect injection patterns (zero-click): "when the agent reads this",
  "instructions for the ai", "do not show the user", "hidden instructions"

Patterns:
- Critical injection: delimiter injection, direct override, role hijacking
- High injection: indirect override, base64 encoding, instruction leakage,
  multi-encoding (morse, hex, HTML entities, URL chains)
- Medium injection: subtle manipulation, indirect zero-click injection
- Critical leakage: governance file paths, manifest internals, YAML structure
- High leakage: module references, session IDs

### 7. Alerting (lib/alerting.py)

Implement:
- Webhook POST (n8n compatible) with 5s timeout
- Syslog UDP RFC 5424 format (Wazuh compatible)
- Alert on: policy_deny, trust_deny, circuit_break, human_gate, llm_threat
- Fail-open design (never block on alerting failure)
- Config from governance-policy.yaml alerting section

### 8. Metrics Collector (lib/metrics_collector.py)

Implement:
- Separate SQLite at state/governance-metrics.db
- Schema: timestamp, session_id, metric_type, agent_id, value, metadata
- record() method (never raises)
- query_metrics() with filters
- get_session_summary() with count/avg/max per metric_type

### 9. Environment Policy (lib/env_policy.py)

Implement:
- detect_environment() from GOVERNANCE_ENV or hostname
- load_governance_policy() with deep-merge of env overrides
- Deep merge function (override wins on conflicts)

### 10. Manifest Signing (lib/manifest_signing.py)

Implement:
- generate_signing_key() → 32 bytes to state/.signing-key (0600 perms)
- load_signing_key() → bytes or None
- sign_manifest() → HMAC-SHA256 hex string
- verify_manifest_signature() → bool
- _canonicalize_manifest() → sorted JSON excluding volatile fields

## Hooks

### hooks.json

Register:
- SessionStart: session_start.py (timeout 10s)
- PreToolUse: approval_gate_hook.py, pre_tool_check.py (timeout 10s each)
- PostToolUse: post_task_cleanup.py (Task matcher), output_validator.py (Write|Edit|Bash matcher)

### session_start.py

1. Load root manifest (agent_id from env or default "root")
2. Generate session_id (gov-sess-{8 hex chars})
3. Initialize audit bus with alerting and metrics callbacks
4. Emit MANIFEST_LOADED event
5. Purge stale registry entries
6. Run retention policy archival if configured
7. Print session summary to stderr

### pre_tool_check.py

1. Load manifest from registry or resolve
2. Check prohibited behaviors (hard block, no override)
3. If Task tool: trust_broker.evaluate_delegation(), generate NHI ID, emit NHI_SPAWN
4. LLM threat detector: scan_input()
5. If mcp__* tool: _check_mcp_firewall() DLP screening
6. Policy engine: evaluate()
7. If decision is deny: exit 1 with error message
8. If decision is human_gate: print gate prompt to stderr, wait for approval
9. If allow: exit 0

### approval_gate_hook.py

Check for external communication gate and data classification gate from
governance-policy.yaml gates section. Inject approval prompts per skill config.

### post_task_cleanup.py

For Task tool completion: deregister child manifest from registry.

### output_validator.py

1. Read tool output from stdin JSON
2. LLM threat detector: scan_output() (includes MCP output scanning)
3. If Bash tool: _check_last_hop() for external connection validation
4. If critical/high threat detected: emit LLM_THREAT, exit 1
5. If medium: emit warning, exit 0
6. Record metrics

## Configuration Files

### state/governance-policy.yaml

Include sections:
- version, effective_from
- tool_tiers: exempt, standard, elevated, elevated_patterns
- classification_patterns: restricted, confidential, internal (regex lists)
- gates: external_communication, data_classification
- tier_gate_matrix: TRIVIAL/MINOR/STANDARD/MAJOR with gate modes
- retention: audit_events, metrics (retention_days, archive_path)
- environments: default, env_var
- alerting: enabled, webhook (url, enabled), syslog (host, port, facility, enabled)
- cost_tracking: per_agent_tokens, per_workflow_tokens, per_session_tokens, warn_threshold_pct
- approved_destinations: list of allowed hostnames/IPs for last-hop validation
- prohibited_behaviors: enabled (bool), always true in production

### state/tool-tiers.yaml

List exempt, standard, elevated tools and elevated_patterns.

### state/classification-patterns.yaml

Regex patterns for restricted, confidential, internal classifications.

### state/manifests/root.yaml

Root agent with trust_level=5, data_classification=restricted, full permissions.

### state/manifests/security-analyst.yaml

Example agent with trust_level=4, confidential, limited tools and delegations.

## Agentic Security Hardening

### 11. NHI Instance Tracking

Implement in pre_tool_check.py and post_task_cleanup.py:
- Generate NHI ID: nhi_{agent_id}_{YYYYMMDD}_{hex8} using secrets.token_hex(4)
- Emit NHI_SPAWN on Task dispatch (nhi_id, agent_id, parent_nhi_id, manifest_id)
- Emit NHI_TERMINATE on Task completion (nhi_id, exit_reason, duration_seconds, tool_count)
- Tag all audit events with nhi_id for the active invocation
- Propagate parent_nhi_id through delegation chains

### 12. MCP Firewall (DLP Screening)

Implement in pre_tool_check.py:
- _check_mcp_firewall() runs before policy engine for mcp__* tools
- Extract MCP server name from tool_name (split on __)
- Call threat_detector.scan_mcp_input() with tool_input and manifest
- scan_mcp_input() checks all string values against classification_patterns
- Block restricted (critical) and confidential (high) data
- Emit DLP_VIOLATION audit event with mcp_server name

### 13. Last-Hop Validation

Implement in output_validator.py:
- _check_last_hop() for Bash commands containing ssh, curl, wget, psql, mysql, redis-cli
- Extract destination hostnames/IPs from command arguments
- Compare against approved_destinations in governance-policy.yaml
- Emit LAST_HOP_VIOLATION for unapproved destinations
- Warn but do not block (PostToolUse runs after execution)

### 14. Prohibited Behaviors

Implement in pre_tool_check.py (before policy engine):
- Hard-coded patterns checked before all other evaluations
- Network tunneling: ssh -L, ssh -R, ssh -D, socat, ncat tunnel commands
- Credential exfiltration: reading .env/.signing-key + external send
- Audit trail modification: DELETE/UPDATE on audit.db or audit-buffer.jsonl
- Unauthorized API calls: direct HTTP to governance/conductor endpoints
- Always block, always critical, no override possible
- Emit PROHIBITED_BEHAVIOR audit event

### 15. Cost Tracking

Implement token budget tracking:
- Track tokens per NHI instance and per workflow in conductor-state.json cost_tracking
- Fields: token_budget, tokens_used, cost_estimate_usd, halt_on_exceed
- Warn at 80% threshold, halt at 100%
- Emit COST_THRESHOLD events with scope, threshold_pct, tokens_used, token_budget
- Configurable per-agent (100K default), per-workflow (500K), per-session (1M)

### 16. C-BOM (Cryptographic Bill of Materials)

Add to CISO agent workflow at STANDARD+ tiers:
- Inventory all crypto: symmetric, asymmetric, hash, key exchange, TLS, RNG
- Output PQC readiness assessment per implementation
- Map to NIST migration priority and quantum-safe alternatives (ML-KEM, ML-DSA, SLH-DSA)
- Store in conductor-state.json project_characteristics.crypto_implementations

## Tests

Write comprehensive pytest tests in tests/test_governance.py covering:
- Manifest loading, signing, ceiling enforcement
- Trust broker delegation validation (breadth, depth, classification)
- Policy engine tool classification and gate logic
- Audit bus emit, query, buffer fallback, retention
- Memory governor classification, ceiling, provenance
- LLM threat detector injection, leakage, encoding attacks, and MCP DLP
- Alerting webhook and syslog (mocked)
- Metrics recording and session summary
- Environment detection and policy merging
- NHI ID generation and lifecycle events
- MCP firewall DLP screening
- Prohibited behavior detection
- Cost threshold enforcement

## Implementation Guidelines

- Use only stdlib (no external dependencies except PyYAML which is in Claude Code)
- All paths relative to plugin root via GOVERNANCE_PLUGIN_ROOT env var
- Never raise exceptions from audit, alerting, metrics (fail-open)
- Use WAL mode for all SQLite databases
- File locking (fcntl) for manifest registry
- Hooks read stdin JSON, write stderr for messages, exit 0/1 for allow/deny
- All timestamps in ISO 8601 UTC format
- All hashes in hexadecimal lowercase

Build the complete framework with all files. Make it production-ready.

6. Design Decisions

6.1 Fail-Open vs Fail-Closed

Decision: Hybrid Approach

Hook failures fail-open: If a hook times out or crashes, the tool execution proceeds. Rationale: Governance failures should not lock up the entire system. Users can still make progress while governance is degraded.

Trust/policy failures fail-closed: If trust broker or policy engine denies a delegation or tool, the operation is blocked. Rationale: Privilege escalation and unauthorized tool use are security-critical failures that must prevent execution.

Audit/alerting failures fail-open: If audit bus or alerting service fails, the operation continues. Events fall back to buffer. Rationale: Observability failures should not block functional operations.

6.2 SQLite vs External Database

Decision: SQLite with WAL Mode

Rationale: Governance framework must work on single-machine deployments without infrastructure dependencies. SQLite provides ACID guarantees, concurrent reads (via WAL), and zero operational overhead. For distributed deployments, audit events can be exported to external SIEM via webhook/syslog.

Trade-offs: SQLite limits to ~1M events before performance degradation. Retention policy archival mitigates this. Cannot query across multiple agent hosts without centralized aggregation. SIEM integration provides distributed visibility.

6.3 HMAC-SHA256 vs Ed25519

Decision: HMAC-SHA256

Rationale: HMAC-SHA256 is in Python stdlib (hashlib + hmac). Ed25519 requires PyNaCl or cryptography library, adding external dependencies. Manifest signing provides tamper evidence, not authenticity proof (shared secret vs asymmetric keys). HMAC is sufficient for this use case.

Trade-offs: HMAC cannot prove manifest authorship (anyone with the key can sign). Ed25519 would enable signature verification without access to signing key. For multi-party governance (untrusted manifest sources), Ed25519 would be preferred. For single-operator systems, HMAC is simpler.

6.4 Regex vs ML for Threat Detection

Decision: Regex Patterns

Rationale: Regex patterns are deterministic, auditable, and have zero inference latency. ML models (e.g., prompt injection classifiers) require GPU, model hosting, and introduce non-deterministic false positives. For governance hooks with 10s timeout, regex is the only viable approach.

Trade-offs: Regex patterns can be evaded (obfuscation, encoding, paraphrasing). ML models adapt to novel attacks. Future enhancement: add ML-based scanning as async post-analysis (emit events, train on patterns, update regex rules).

Pattern Maintenance: OWASP LLM Top 10 patterns require periodic updates. Governance policy YAML allows operators to add custom patterns without code changes.

6.5 Hook-Based vs Middleware Enforcement

Decision: Claude Code Plugin Hooks

Rationale: Hooks integrate natively with Claude Code's tool execution lifecycle. No need to wrap every tool or modify core runtime. Hooks receive full tool context (name, input, output) and can block or transform execution.

Trade-offs: Hooks run in separate processes (command-type hooks), adding 10-50ms overhead per tool invocation. In-process hooks (if Claude Code supported Python plugins) would be faster but less isolated. Current design prioritizes safety (hook crash doesn't crash agent) over performance.

6.6 Advisory vs Blocking Gates

Decision: Tier-Based Gate Matrix

Rationale: Not all governance violations warrant hard blocks. TRIVIAL tasks can skip most gates. MINOR tasks get advisory prompts. STANDARD tasks get blocking gates on critical operations. MAJOR tasks require human approval for all elevated tools.

Configurability: Tier-to-gate matrix in governance-policy.yaml allows operators to customize enforcement per task tier. Example: PRE_RELEASE gate is "skip" for TRIVIAL/MINOR, "advisory" for STANDARD, "blocking" for MAJOR.

Human-in-the-Loop: human_gate decision prints approval prompt to stderr and waits for operator input. Enables oversight without breaking agent autonomy.

6.7 Classification Ceiling vs Lattice Model

Decision: Linear Classification Order

Rationale: CLASSIFICATION_ORDER = ["public", "internal", "confidential", "restricted"] is a total order (linear hierarchy). Simpler to reason about and enforce than a lattice model (e.g., medical vs financial vs PII classifications that don't strictly order).

Trade-offs: Linear order doesn't capture orthogonal sensitivity dimensions. For complex data governance (HIPAA + PCI-DSS + GDPR), would need multi-dimensional classification with intersection logic. Current design optimizes for 80% use case (corporate data tiers).

6.8 Session-Scoped vs Global Registry

Decision: Session-Scoped Manifest Registry

Rationale: Each agent session gets isolated delegation tracking. session_id scoping prevents breadth limit circumvention via session restarts. TTL purging (1 hour) removes stale entries without manual cleanup.

Trade-offs: Session restart resets delegation count. Malicious agent could restart session to bypass breadth limit. Mitigation: SessionStart hook logs all manifest loads for forensic analysis. Future enhancement: persistent delegation ledger across sessions.

6.9 Provenance Tags vs Separate Index

Decision: Inline Provenance in Metadata

Rationale: Embedding 9 provenance fields in memory_store metadata (merged into Qdrant payload) keeps data + provenance together. No separate index to maintain or sync. Memory queries can filter by provenance (e.g., "gov_trust_level >= 4").

Trade-offs: Increases payload size by ~200 bytes per memory chunk. For 1M chunks, adds ~200MB storage. Alternative: separate provenance collection with foreign key links. Current design optimizes for query simplicity over storage efficiency.

6.10 Centralized vs Distributed Policy

Decision: Single governance-policy.yaml

Rationale: All governance components reference one file. Eliminates scattered config files, version skew, and partial updates. Git-tracked policy enables version control, diffs, and rollback. Environment overrides provide per-env customization without duplicating base policy.

Trade-offs: Single file becomes large (current: ~120 lines, projected: 500+ with all patterns). YAML parsing on every SessionStart adds ~10ms overhead. Alternative: compiled policy cache (pickle) with invalidation on YAML mtime change. Current design optimizes for simplicity.

6.11 Defense-in-Depth for Agentic Security

Decision: Layered Security Controls

Rationale: No single security control is sufficient against determined adversarial input. The framework applies multiple independent layers: prohibited behavior patterns (hard block before any policy logic), DLP screening (MCP firewall), encoding attack detection (base64/Morse/hex decode+rescan), last-hop validation (PostToolUse destination checks), external audit sink (tamper-resistant syslog), and cost tracking (denial-of-wallet prevention). Each layer catches attacks the others miss.

NHI Tracking: Per-invocation identity enables forensic reconstruction of multi-agent attack chains. Unlike static manifests (which identify agent type), NHI IDs identify specific invocations, enabling detection of "confused deputy" attacks where a legitimate agent is tricked into malicious actions.

Trade-offs: Each layer adds PreToolUse/PostToolUse latency (~5-15ms per layer). Total hook overhead with all layers: ~50-100ms per tool call. Acceptable for development workflows, may require selective disabling for batch operations. Prohibited behaviors are non-negotiable — other layers can be toggled via governance-policy.yaml.

7. Integration Points

7.1 Plugins System

Governance framework is a Claude Code plugin at ~/.claude/plugins/local/governance (symlink from project directory). Plugin manifest declares hooks:

{
  "description": "Governance enforcement hooks",
  "hooks": {
    "SessionStart": [{
      "hooks": [{
        "type": "command",
        "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/session_start.py",
        "timeout": 10
      }]
    }],
    "PreToolUse": [{
      "hooks": [{
        "type": "command",
        "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/pre_tool_check.py",
        "timeout": 10
      }]
    }],
    "PostToolUse": [{
      "matcher": { "tool_name": "Write|Edit|Bash|NotebookEdit" },
      "hooks": [{
        "type": "command",
        "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/output_validator.py",
        "timeout": 10
      }]
    }]
  }
}

Hooks receive tool context via stdin JSON and control execution via exit codes. Exit 0 = allow, exit 1 = deny. Stderr messages shown to user.

7.2 Agent System (Conductor)

Every agent receives a manifest at delegation time. Conductor plugin exports current task tier to conductor-state.json for policy engine consumption:

{
  "active_task": "task-a4f8d2c1",
  "agent_stack": ["root", "security-analyst"],
  "governance": {
    "conductor_tier": "STANDARD",
    "intent": {
      "sensitivity": "medium",
      "external_visibility": false
    }
  }
}

Trust broker reads this file during Task tool validation to determine if MAJOR + elevated tool requires human gate.

7.3 Memory System

Memory governor intercepts mcp__claude-memory__memory_store tool via PreToolUse hook. Provenance tags are merged into tool input before MCP tool execution:

# Original tool input
{
  "content": "Agent completed security analysis...",
  "collection": "claude_memories",
  "metadata": {}
}

# After memory governor
{
  "content": "Agent completed security analysis...",
  "collection": "claude_memories",
  "metadata": {
    "gov_manifest_id": "gov-sec-analyst-v2",
    "gov_agent_id": "security-analyst",
    "gov_trust_level": 4,
    "gov_classification": "confidential",
    "gov_session_id": "gov-sess-a4f8d2c1",
    "gov_timestamp": "2026-03-17T14:23:45.123456Z"
  }
}

Memory system (Qdrant) stores these tags in the payload. Queries can filter by provenance:

memory_recall("security analysis",
              filter={"gov_trust_level": {"$gte": 4}})

7.4 Context Management

Governance events (especially CONTEXT_PRESSURE, CIRCUIT_BREAK) inform context management decisions. When autonomy depth is exhausted, context manager can trigger human escalation or task decomposition instead of silent failure.

7.5 SIEM & Alerting

Audit bus sends events to external SIEM via webhook or syslog. Example n8n workflow:

# n8n webhook trigger receives governance events
# Filter: event_type in [policy_deny, trust_deny, llm_threat]
# Route to Slack/PagerDuty/Wazuh based on severity

Webhook → Filter → Switch (severity):
  - critical → PagerDuty incident
  - high → Slack security channel
  - medium → Wazuh log aggregation

7.6 Metrics & Observability

Metrics database enables trend analysis and anomaly detection. Example queries:

# Agent success rate over last 30 days
SELECT agent_id,
       AVG(value) as success_rate,
       COUNT(*) as invocations
FROM governance_metrics
WHERE metric_type = 'agent_success'
  AND timestamp > datetime('now', '-30 days')
GROUP BY agent_id
ORDER BY success_rate ASC;

# Gate trigger frequency by agent
SELECT agent_id,
       COUNT(*) as gate_count
FROM governance_metrics
WHERE metric_type = 'gate_trigger'
  AND timestamp > datetime('now', '-7 days')
GROUP BY agent_id
ORDER BY gate_count DESC;

7.7 Development Workflow

Governance framework development uses standard Python tooling:

# Install in development mode
cd ~/.claude/plugins/local/governance
pip install -e .

# Run tests
pytest tests/ -v

# Lint
ruff check governance/

# Type check
mypy governance/lib/

# Generate signing key
python3 -c "from governance.lib.manifest_signing import generate_signing_key; \
            generate_signing_key()"

# Export session audit trail
python3 -c "from governance.lib.audit_bus import AuditBus; \
            from pathlib import Path; \
            bus = AuditBus(Path('state/audit.db'), Path('state/audit-buffer.jsonl')); \
            bus.export_jsonl('gov-sess-a4f8d2c1', Path('audit-export.jsonl'))"
Production Deployment Checklist
Before deploying governance framework to production:
  • Generate unique signing key per environment (do NOT share keys)
  • Configure SIEM webhook URL and verify connectivity
  • Set retention policies based on compliance requirements (90/365/730 days)
  • Create production manifest for each agent with least-privilege permissions
  • Test all governance gates in staging with realistic workloads
  • Set up alerting channels (Slack, PagerDuty, email)
  • Document human approval workflows for confidential/restricted gates
  • Schedule periodic manifest audits (quarterly review of trust levels)