Build a production-grade governance layer for AI agent systems with cryptographic identity, trust-mediated delegation, policy enforcement, audit trails, LLM threat detection, and SIEM integration.
Modern AI agent systems present unique governance challenges. Unlike monolithic applications, agents can autonomously delegate tasks to other agents, invoke external tools, write to persistent memory, and access sensitive data. Without a comprehensive governance framework, organizations face:
A low-trust agent can delegate to higher-trust agents, bypassing security boundaries. Delegation chains compound risk as each hop loses visibility into the original intent.
Agents writing to memory without classification checks can persist restricted data in shared collections. No provenance tracking means you can't audit who wrote what or when.
Without breadth and depth limits, a single agent could spawn hundreds of child tasks, creating infinite loops or resource exhaustion attacks.
Traditional logging captures tool invocations but misses the context: which agent, under what authority, with what data classification, and what was the delegation chain?
LLMs are vulnerable to adversarial inputs that override system instructions. Without runtime detection, an attacker can manipulate agent behavior through crafted prompts.
Tool outputs can inadvertently expose governance internals, manifest structures, or system instructions. No post-tool scanning means leaks go undetected.
The governance framework addresses these challenges through six core subsystems working in concert: manifest identity, trust broker, policy engine, audit bus, memory governor, and LLM threat detector. Together they provide end-to-end governance from session start through tool execution to memory persistence.
The governance framework is implemented as a Claude Code plugin with hook-based enforcement. All components share a unified audit bus for event logging and a centralized policy file for configuration.

Load root agent manifest, generate session ID, initialize audit bus, emit MANIFEST_LOADED event, purge stale delegation registry entries, apply environment-aware policy overrides.
Scan input for prompt injection, classify tool risk tier, check manifest permissions, evaluate policy gate (allow/deny/human_gate), emit POLICY_CHECK or LLM_THREAT event.
Trust broker validates breadth/depth limits, checks classification ceiling, derives child manifest with parent constraints, issues delegation token, registers child manifest, emits DELEGATION_EVENT.
Memory governor classifies content, enforces agent ceiling, blocks restricted data, queues confidential writes for review, adds provenance tags (9 fields), emits MEMORY_WRITE event.
Output validator scans for system prompt leakage, sensitive data disclosure, governance artifacts, emits LLM_THREAT events for critical/high severity findings, records metrics.
Flush audit queue, export session events to JSONL, send session summary metrics, optionally archive old events per retention policy, deregister session manifests.
Agent identity documents define trust level, data classification, permitted tools, and delegation rules.
Manifests are YAML files stored in state/manifests/ with cryptographic signing for tamper evidence.
agent_id: security-analyst
manifest_id: gov-sec-analyst-v2
manifest_version: "2.1.0"
trust_level: 4 # 1-5 scale
data_classification: confidential # public | internal | confidential | restricted
permitted_tools:
- "Read"
- "Grep"
- "Bash"
- "mcp__*" # fnmatch wildcards supported
permitted_delegations:
- "pentest-agent"
- "compliance-*"
human_required: false
max_autonomy_depth: 3 # Delegation depth budget
max_delegation_count: 5 # Breadth limit per session
model_id: claude-opus-4-6
model_version: "4.6"
Parent ceiling enforcement is the security foundation: a child can never exceed its parent's trust level, data classification, or autonomy depth. This creates a monotonically decreasing privilege chain.
| Field | Type | Ceiling Rule |
|---|---|---|
trust_level |
Integer (1-5) | min(static, parent) |
data_classification |
Enum | Lower classification wins |
max_autonomy_depth |
Integer | min(static, parent - 1) |
permitted_tools |
List[pattern] | Union (additive) |
permitted_delegations |
List[pattern] | Union (additive) |
Manifests are signed with HMAC-SHA256 on load. The signing key is a 32-byte secret stored in
state/.signing-key with 0600 permissions. Signature verification happens during
manifest validation — tampered manifests fail validation and fall back to default-restrictive.
# Canonical manifest for signing (excludes volatile fields)
canonical = json.dumps(
{k: v for k, v in manifest.items()
if k not in {"manifest_signature", "manifest_hash",
"audit_session_id", "audit_parent_id"}},
sort_keys=True, separators=(",", ":"))
signature = hmac.new(signing_key, canonical.encode(), hashlib.sha256).hexdigest()
Mediates inter-agent delegation by validating breadth/depth limits, classification boundaries, and permitted targets. Issues delegation tokens for forensic linkage and registers resolved child manifests in a session-scoped registry.
Query audit bus for previous DELEGATION_EVENT events from this agent in this session. If count >= max_delegation_count, deny with TRUST_DENY event. Prevents agent from spawning unlimited child tasks.
Check parent's max_autonomy_depth. If <= 0, escalate to human gate with TRUST_DENY. Child receives parent.depth - 1, creating a monotonically decreasing budget.
Child's data_classification cannot exceed parent's. If target manifest requests higher classification, deny delegation. Prevents data exfiltration via upward delegation.
Child's trust_level cannot exceed parent's. Defense-in-depth check — should be caught by ceiling enforcement, but trust broker validates independently.
Target agent ID must match a pattern in parent's permitted_delegations list (supports fnmatch wildcards). Prevents delegation to arbitrary agents.
SHA-256 hash of session_id:parent_manifest_id:child_manifest_id:timestamp:nonce. Stored in DELEGATION_EVENT detail for forensic chain reconstruction.
A JSON file (state/manifest-registry.json) with file-lock-based concurrency control.
Stores active child manifests keyed by session_id:agent_id. TTL purging (1 hour)
removes stale entries. SessionStart hook purges on startup.
Classifies tools into risk tiers (exempt/standard/elevated) and enforces manifest permissions + conductor tier matrix to decide allow/deny/human_gate. Unknown tools default to elevated (fail toward scrutiny).
| Tier | Examples | Audit Strategy | Gate Logic |
|---|---|---|---|
| exempt | Read, Glob, Grep, TaskList | Async emit (nowait queue) | Always allow, skip policy checks |
| standard | Edit, Write, Task, Bash | Sync audit if tier >= STANDARD | Check manifest permissions, depth, human_required |
| elevated | memory_store, NotebookEdit, MCP tools | Always sync audit | Check + tier matrix (MAJOR + elevated = human_gate) |
Conductor plugin exports current task tier (TRIVIAL/MINOR/STANDARD/MAJOR) in conductor-state.json.
Policy engine loads this and applies tier-specific gates:
# MAJOR tier + elevated tool = always human gate
if tool_tier == "elevated" and conductor_tier == "MAJOR":
return PolicyDecision("human_gate",
"MAJOR task + elevated tool requires human approval")
# Synchronous audit for elevated or STANDARD+
if tool_tier == "elevated" or conductor_tier in ("STANDARD", "MAJOR"):
audit_bus.emit(EventType.POLICY_CHECK, manifest, tool_name=tool_name)
else:
audit_bus.emit_nowait(EventType.TOOL_INVOKED, manifest, tool_name=tool_name)
A common misconception is that governance adds "5 forms per git commit." In practice, most daily development work hits ZERO approval gates. Here's how it actually works:
| Tier | Tools | Governance Overhead | User Experience |
|---|---|---|---|
| Exempt | Read, Glob, Grep, TaskList, TaskGet, AskUserQuestion | None. No policy evaluation, no audit (async log only). | Instant — zero latency added |
| Standard | Edit, Write, Task, Bash, WebFetch, WebSearch | Policy evaluated, audit logged, auto-allowed if manifest permits. | ~50ms — imperceptible |
| Elevated | memory_store, memory_forget, NotebookEdit, all mcp__MCP_DOCKER__*, all mcp__hostinger-mcp__* | Full policy evaluation + conductor tier matrix check. Human gate triggered only at MAJOR tier. | ~100ms, or human approval at MAJOR |
| Conductor Tier | Exempt Tool | Standard Tool | Elevated Tool |
|---|---|---|---|
| TRIVIAL | Allow | Allow | Allow |
| MINOR | Allow | Allow | Allow |
| STANDARD | Allow | Allow (audited) | Allow (audited) |
| MAJOR | Allow | Allow (audited) | Human Gate |
| Step | Action | Tool Tier | Governance Result | Overhead |
|---|---|---|---|---|
| 1 | Read files | Exempt | Skip — no evaluation | 0ms |
| 2 | Edit code | Standard | Auto-allowed, audited | ~50ms |
| 3 | Run tests | Standard | Auto-allowed, audited | ~50ms |
| 4 | Write fix | Standard | Auto-allowed, audited | ~50ms |
| 5 | Commit | Standard | Auto-allowed, audited | ~50ms |
Total governance overhead: ~200ms. Invisible.
Before unified hook consolidation: 5–6 separate process spawns × ~100ms = 500–600ms overhead. After: 2–3 process spawns × ~100ms = 200–300ms.
Any tool not explicitly listed in the tier definitions defaults to elevated — the system fails toward scrutiny, not permissiveness. This ensures new or unexpected tools receive maximum governance evaluation until explicitly classified.
SQLite database with WAL mode for concurrent writes. Bounded async queue (256 depth) for low-risk events, synchronous writes for critical events, JSON buffer fallback for database failures. Supports JSONL export and retention-based archival.
CREATE TABLE audit_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id TEXT UNIQUE NOT NULL,
timestamp TEXT NOT NULL,
audit_session_id TEXT NOT NULL,
event_type TEXT NOT NULL,
agent_id TEXT NOT NULL,
manifest_id TEXT,
manifest_version TEXT,
manifest_hash TEXT,
trust_level INTEGER,
data_classification TEXT,
autonomy_depth_remaining INTEGER,
tool_name TEXT,
task_id TEXT,
target_agent_id TEXT,
context_hash TEXT,
detail TEXT, -- JSON-encoded event-specific fields
outcome TEXT -- allow | deny | escalate | warn
);
CREATE INDEX idx_audit_session ON audit_events(audit_session_id);
CREATE INDEX idx_audit_timestamp ON audit_events(timestamp);
CREATE INDEX idx_audit_agent ON audit_events(agent_id);
CREATE INDEX idx_audit_type ON audit_events(event_type);
When SQLite writes fail (locked, disk full, corrupted), events are appended to
state/audit-buffer.jsonl. On next startup, buffer is renamed to
.replaying, events are replayed to database, then buffer is deleted.
This ensures zero event loss even during database failures.
Intercepts memory writes via PreToolUse hook on mcp__claude-memory__memory_store.
Classifies content using regex patterns, enforces agent classification ceiling, blocks restricted
data, queues confidential writes for review, adds 9-field provenance tags.
restricted:
- '\b(password|secret|api[_-]?key|private[_-]?key|token|credential)\s*[:=]\s*\S+'
- '\b\d{3}-\d{2}-\d{4}\b' # SSN pattern
- '-----BEGIN\s+(RSA|EC|PRIVATE)\s+KEY-----'
- '\b(bearer\s+[a-zA-Z0-9\-._~+/]+=*)\b'
confidential:
- '\b(internal[_-]?only|do[_-]?not[_-]?share|proprietary|confidential)\b'
- '\bCVE-\d{4}-\d{4,7}\b' # Vulnerability IDs
- '\b(salary|compensation|revenue|profit)\s*[:=$]'
- '\b(ssn|social[_-]?security|tax[_-]?id)\b'
internal:
- '\b(prod(uction)?|staging)\s+\b(server|host|endpoint|cluster)\b'
- '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b' # IP addresses
- '\b[a-zA-Z0-9\-]+\.(internal|corp|local)\b'
| Classification | Agent Ceiling Check | Action | Audit Event |
|---|---|---|---|
| public | N/A | Allow with provenance tags | MEMORY_WRITE (allow) |
| internal | Agent must be internal+ | Allow with provenance tags | MEMORY_WRITE (allow) |
| confidential | Agent must be confidential+ | Queue for review, persist with pending_review tag | HUMAN_GATE (escalate) |
| restricted | Always exceeds ceiling | Block (do not persist) | POLICY_DENY (deny) |
provenance = {
"gov_manifest_id": manifest["manifest_id"],
"gov_agent_id": manifest["agent_id"],
"gov_manifest_version": manifest["manifest_version"],
"gov_manifest_hash": manifest["manifest_hash"],
"gov_trust_level": manifest["trust_level"],
"gov_classification": manifest["data_classification"],
"gov_session_id": manifest["audit_session_id"],
"gov_task_id": manifest.get("task_id"),
"gov_timestamp": datetime.now(timezone.utc).isoformat(),
}
These tags are merged into the metadata field of the memory_store tool input,
persisting alongside the content in Qdrant. This enables provenance-based memory queries
(e.g., "show me all memories written by security-analyst agent in session X").
Scans tool inputs for prompt injection attempts and tool outputs for system prompt leakage and sensitive data disclosure. Uses 30 prompt injection patterns and 24 system leakage patterns across three severity levels (critical/high/medium).
</system>, <|im_start|>, [INST]aWdub3JlIHByZXZpb3Vz (ignore previous)Standard prose-based defenses fail against creative encoding. The detector now handles multi-encoding injection attempts that bypass pattern-matching by obfuscating payloads.
\x41\x42\x43\x44 chainsig chains%69%67%6E%6F sequences# Base64 decode and re-scan
for match in b64_pattern.finditer(content):
decoded = base64.b64decode(match.group()).decode("utf-8", errors="ignore")
for pattern in PROMPT_INJECTION_PATTERNS["critical"] + PROMPT_INJECTION_PATTERNS["high"]:
if re.search(pattern, decoded, re.IGNORECASE):
return ThreatDetection(detected=True, severity="high",
detail="Encoded prompt injection (base64 decoded)")
# Character-separated injection collapse
sep_pattern = re.compile(r'([a-zA-Z])[.\-_\s]{1,2}(?:[a-zA-Z][.\-_\s]{1,2}){4,}[a-zA-Z]')
for match in sep_pattern.finditer(content):
collapsed = re.sub(r'[.\-_\s]+', '', match.group()).lower()
if any(kw in collapsed for kw in ["ignore", "disregard", "override", "bypass"]):
return ThreatDetection(detected=True, severity="high",
detail=f"Character-separated injection: '{collapsed}'")
A new scan_mcp_input() method intercepts MCP tool inputs before they reach external
services, checking content against classification patterns to prevent data exfiltration.
| Classification | Detection Action | Audit Event |
|---|---|---|
| restricted | Block MCP call, emit alert | DLP_VIOLATION (critical) |
| confidential | Block MCP call, emit alert | DLP_VIOLATION (high) |
| internal / public | Allow, audit trail only | POLICY_CHECK |
SYSTEM_LEAKAGE_PATTERNS = {
"critical": [
r'governance/lib/\w+\.py', # File paths
r'state/manifests/',
r'\bmanifest_hash\s*[:=]', # Manifest internals
r'\btrust_level\s*[:=]\s*\d+',
r'\bdata_classification\s*[:=]\s*(public|internal|confidential|restricted)',
r'\baudelegation_token\s*[:=]',
r'agent_id\s*:\s*\w+', # YAML structure
r'permitted_tools\s*:',
],
"high": [
r'\bgovernance\.lib\.', # Module references
r'\bpolicy_engine\b',
r'\btrust_broker\b',
r'gov-[a-z]+-[0-9a-f]{8}', # Session IDs
],
"medium": [
r'\bgovernance\s+plugin\b', # Generic terms
r'\bmanifest\s+registry\b',
],
}
| Severity | Input Scan Action | Output Scan Action | Audit Event |
|---|---|---|---|
| critical | Block tool execution | Block output, emit alert | LLM_THREAT (block) |
| high | Block tool execution | Block output, emit alert | LLM_THREAT (block) |
| medium | Log warning, allow | Log warning, allow | LLM_THREAT (warn) |
Sends governance security events to external monitoring systems via webhook (n8n compatible) and syslog (Wazuh compatible). Alerting is fail-open — failures never block governance operations.
Five event types trigger alerts: policy_deny, trust_deny,
circuit_break, human_gate, llm_threat. All other events
are audit-only.
{
"source": "governance",
"timestamp": "2026-03-17T14:23:45.123456Z",
"event_type": "llm_threat",
"agent_id": "security-analyst",
"tool_name": "Bash",
"outcome": "block",
"detail": {
"threat_type": "prompt_injection",
"severity": "critical",
"pattern_matched": "\\bignore\\s+all\\s+previous\\s+instructions\\b",
"scan_type": "input",
"detail": "Detected prompt injection pattern in content"
},
"session_id": "gov-sess-a4f8d2c1",
"manifest_id": "gov-sec-analyst-v2",
"trust_level": 4
}
<131>1 2026-03-17T14:23:45.123456Z governance claude-code - - - \
event_type=llm_threat agent_id=security-analyst tool_name=Bash \
outcome=block session_id=gov-sess-a4f8d2c1
PRI calculation: facility * 8 + severity. Facility defaults to local0 (16).
Severity is 3 (error) for deny/block, 4 (warning) for escalate/warn. Wazuh can parse these
via custom decoder rules.
Separate SQLite database (state/governance-metrics.db) tracks operational metrics:
agent success/failure rates, gate trigger frequency, confidence scores, delegation depth,
circuit breaker activations. Enables drift detection and performance monitoring.
CREATE TABLE governance_metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
session_id TEXT NOT NULL,
metric_type TEXT NOT NULL, -- agent_success | gate_trigger | delegation_depth
agent_id TEXT,
value REAL NOT NULL,
metadata TEXT -- JSON for additional context
);
Detects environment from hostname or GOVERNANCE_ENV env var, loads base policy
from governance-policy.yaml, then applies environment-specific overrides from
state/env-overrides/<env>.yaml. Enables different retention policies,
gate enforcement modes, and alerting configs per environment.
# state/env-overrides/production.yaml
retention:
audit_events:
retention_days: 365 # Override from base 90 days
metrics:
retention_days: 730
alerting:
enabled: true
webhook:
enabled: true
url: "https://n8n.example.com/webhook/governance"
syslog:
enabled: true
host: "siem.example.com"
port: 514
state/governance-policy.yaml. No scattered config files.
Tool tiers, classification patterns, gate rules, tier matrix, and retention policies all in one place.
Every agent invocation receives a unique Non-Human Identity (NHI) instance ID following the pattern
nhi_{agent}_{YYYYMMDD}_{hex8}. This extends the manifest system from static identity
to per-invocation lifecycle tracking — enabling audit correlation, cost attribution, and
forensic reconstruction of multi-agent workflows.
| Event | Trigger | Audit Type | Data Recorded |
|---|---|---|---|
| Spawn | Task tool dispatches agent | NHI_SPAWN | nhi_id, agent_id, parent_nhi_id, manifest_id, spawn timestamp |
| Active | Agent executing tools | — | All tool events tagged with nhi_id |
| Terminate | Task completes or fails | NHI_TERMINATE | nhi_id, exit_reason (success/failure/timeout), duration, tool_count |
# NHI ID generation (in pre_tool_check.py)
import secrets, datetime
def generate_nhi_id(agent_id: str) -> str:
date = datetime.datetime.utcnow().strftime("%Y%m%d")
hex8 = secrets.token_hex(4)
return f"nhi_{agent_id}_{date}_{hex8}"
# Example: nhi_security-analyst_20260317_a4f8d2c1
NHI IDs propagate through delegation chains. When a parent agent spawns a child via the Task tool,
the child's NHI_SPAWN event records parent_nhi_id, creating a traceable
invocation tree. The conductor's agent_instances registry in state tracks all active NHIs.
External MCP tool calls (mcp__*) pass through a Data Loss Prevention (DLP) firewall
before execution. The firewall intercepts all MCP tool inputs in the PreToolUse hook and scans
content against classification patterns to prevent sensitive data exfiltration to external services.
| Content Classification | Action | Audit Event | Severity |
|---|---|---|---|
| Restricted | Block — hard stop | DLP_VIOLATION | critical |
| Confidential | Block — require human approval | DLP_VIOLATION | high |
| Internal | Allow with audit trail | TOOL_ALLOWED (logged) | — |
| Public | Allow | — | — |
# MCP Firewall (in pre_tool_check.py)
def _check_mcp_firewall(tool_name, tool_input, manifest, threat_detector, audit_bus):
"""Block MCP tools from exfiltrating classified data."""
if not tool_name.startswith("mcp__"):
return None # Not an MCP tool
# Extract MCP server name for audit trail
parts = tool_name.split("__")
mcp_server = parts[1] if len(parts) >= 2 else "unknown"
# Run DLP scan against classification patterns
detection = threat_detector.scan_mcp_input(tool_name, tool_input, manifest)
if detection.detected:
audit_bus.emit(EventType.DLP_VIOLATION, manifest,
tool_name=tool_name, outcome="block",
detail={"mcp_server": mcp_server, ...detection})
return {"decision": "block", "reason": detection.detail}
return None # Allow
PostToolUse hook validates that Bash commands containing external connectivity tools
(ssh, curl, wget, psql, mysql,
redis-cli) target approved destinations. Prevents agents from establishing
unauthorized external connections even when the command itself passes PreToolUse policy checks.
The validator extracts destination hostnames and IPs from command arguments using regex patterns.
These are compared against an allowlist in governance-policy.yaml. Connections to
unlisted destinations emit LAST_HOP_VIOLATION audit events with the extracted
destination for forensic review.
# Last-hop validation (in output_validator.py)
EXTERNAL_TOOLS = re.compile(
r'\b(ssh|curl|wget|psql|mysql|redis-cli)\b')
def _check_last_hop(command: str, manifest: dict, audit_bus) -> Optional[str]:
"""Validate external connection destinations."""
if not EXTERNAL_TOOLS.search(command):
return None
# Extract destination (hostname/IP after tool keyword)
destinations = _extract_destinations(command)
allowed = load_allowed_destinations()
for dest in destinations:
if dest not in allowed:
audit_bus.emit(EventType.LAST_HOP_VIOLATION, manifest,
tool_name="Bash", outcome="warn",
detail={"command_tool": match, "destination": dest})
return f"Unrecognized destination: {dest}"
return None
Hard-coded kill switches for behaviors that should never occur in any context, regardless of manifest permissions or trust level. Unlike policy engine rules (which can vary by tier), prohibited behaviors are absolute — no override, no exception, no human approval path.
ssh -L, ssh -R, ssh -D, socat, ncat tunnel creation.env, credentials.json, .signing-key and sending externallyDELETE or UPDATE on audit.db, audit-buffer.jsonlPROHIBITED_BEHAVIOR audit eventcriticalblock
Critical governance events are emitted to an external syslog endpoint (Wazuh compatible)
in addition to the local SQLite audit bus. This provides a tamper-resistant audit trail —
even if an agent compromises the local audit.db, the external copy persists.
The audit bus's alerting callback (Section 3.7) is extended to emit all security-relevant
events to syslog, not just the 5 alert types. Events flow: governance hook → audit_bus.emit() →
SQLite WAL → alerting callback → syslog UDP. The sink is fail-open — syslog delivery
failures never block governance operations. A local buffer at state/syslog-buffer.jsonl
captures events during syslog outages for replay.
Tracks token consumption per agent invocation and per workflow to prevent runaway costs. Each NHI instance accumulates token counts (input + output) from tool calls. When cumulative cost exceeds configurable thresholds, the system halts the workflow and escalates to the operator.
| Scope | Threshold | Action | Audit Event |
|---|---|---|---|
| Per-agent invocation | Configurable (default: 100K tokens) | Warn at 80%, halt at 100% | COST_THRESHOLD |
| Per-workflow total | Configurable (default: 500K tokens) | Warn at 80%, halt at 100% | COST_THRESHOLD |
| Per-session | Configurable (default: 1M tokens) | Hard stop, require restart | COST_THRESHOLD |
Cost data is recorded in conductor-state.json under the cost_tracking property,
which includes token_budget, tokens_used, cost_estimate_usd,
and halt_on_exceed flag. The conductor orchestrator checks thresholds before each agent dispatch.
The CISO agent generates a Cryptographic Bill of Materials — a comprehensive inventory of all cryptographic implementations in the target project. This addresses post-quantum cryptography (PQC) readiness assessment and crypto-agility requirements from frameworks like NIST SP 800-131A and the NSA CNSA 2.0 suite.
| Category | What's Inventoried | PQC Risk Level |
|---|---|---|
| Symmetric encryption | AES modes, key sizes, implementations | Low (quantum-resistant) |
| Asymmetric encryption | RSA, ECDSA, Ed25519 key sizes and usage | Critical (quantum-vulnerable) |
| Hash functions | SHA-256, SHA-3, HMAC implementations | Low-Medium |
| Key exchange | DH, ECDH, X25519 protocols | Critical (quantum-vulnerable) |
| TLS/mTLS | Protocol versions, cipher suites, certificate chains | Varies by suite |
| Random number generation | CSPRNG usage, entropy sources | Low |
The C-BOM output includes a PQC readiness assessment table mapping each crypto implementation to its
NIST PQC migration priority (Harvest Now/Decrypt Later risk), recommended quantum-safe alternative
(ML-KEM, ML-DSA, SLH-DSA), and estimated migration effort. This is stored in
conductor-state.json under project_characteristics.crypto_implementations
and project_characteristics.pqc_readiness.
Every agent manifest MUST include: agent_id, manifest_id, manifest_version, trust_level (1-5), data_classification (public/internal/confidential/restricted), permitted_tools (list of fnmatch patterns), permitted_delegations (list), human_required (bool), max_autonomy_depth (int), max_delegation_count (int).
Manifests MUST be signed with HMAC-SHA256 using a 32-byte signing key stored in
state/.signing-key. Signatures MUST be verified on manifest load. Tampered
manifests MUST fail validation and fall back to default-restrictive manifest.
Every manifest MUST include a SHA-256 hash of canonicalized JSON content (excluding manifest_signature and manifest_hash fields). Hash MUST be recomputed on each load to detect tampering.
When resolving a child manifest with parent context, the system MUST enforce: trust_level = min(static, parent), data_classification = lower(static, parent), max_autonomy_depth = min(static, parent - 1). Child capabilities MUST be monotonically decreasing along delegation chains.
Manifests MUST track model_id and model_version fields. SessionStart hook MUST populate these from runtime context. Audit events MUST include manifest_version for model drift analysis.
Trust broker MUST query audit bus for DELEGATION_EVENT count from source agent in current session. If count >= max_delegation_count, MUST deny with TRUST_DENY event and reason "delegation_count_exceeded".
Trust broker MUST check parent max_autonomy_depth. If <= 0, MUST escalate to human_gate with TRUST_DENY event. Child MUST receive parent.depth - 1 in resolved manifest.
Trust broker MUST deny delegation if target data_classification exceeds parent data_classification (using CLASSIFICATION_ORDER = ["public", "internal", "confidential", "restricted"]). MUST emit TRUST_DENY with reason "classification_boundary_violation".
Trust broker MUST deny delegation if target trust_level exceeds parent trust_level. MUST emit TRUST_DENY with reason "trust_escalation_attempt".
Trust broker MUST check if target agent_id matches any pattern in parent's permitted_delegations list (using fnmatch). If no match, MUST deny with TRUST_DENY and reason "delegation_target_not_permitted".
Trust broker MUST issue a unique delegation token for each approved delegation, computed as SHA-256(session_id:parent_manifest_id:target_manifest_id:timestamp:nonce)[:24]. Token MUST be stored in DELEGATION_EVENT detail field.
Trust broker MUST maintain a session-scoped registry of active child manifests at
state/manifest-registry.json. Registry operations MUST use file locking
(fcntl.flock) for concurrency safety. TTL purging (1 hour) MUST remove stale entries.
Policy engine MUST classify tools into exempt/standard/elevated tiers using
state/tool-tiers.yaml. Unknown tools MUST default to "elevated" tier
(fail toward scrutiny).
Exempt tools (Read, Glob, Grep, TaskList, TaskGet) MUST always be allowed with async audit emission (nowait queue). Policy checks MUST be skipped for exempt tools.
For standard and elevated tools, policy engine MUST verify tool_name matches a pattern in manifest.permitted_tools (using fnmatch). If no match, MUST deny with POLICY_DENY and reason "tool_not_permitted".
Policy engine MUST check manifest.max_autonomy_depth. If <= 0, MUST emit CIRCUIT_BREAK and return human_gate decision with reason "autonomy_depth_exhausted".
Policy engine MUST check manifest.human_required. If true, MUST emit HUMAN_GATE and return human_gate decision with reason "agent manifest requires human approval".
Policy engine MUST load conductor_tier from conductor-state.json. If tier
is "MAJOR" and tool tier is "elevated", MUST emit HUMAN_GATE and return human_gate
decision with reason "MAJOR task + elevated tool requires human approval".
Policy engine MUST emit synchronous POLICY_CHECK events for elevated tools or when conductor_tier is STANDARD or MAJOR. All other allowed tools MUST use async emit_nowait.
Audit bus MUST use SQLite with WAL (Write-Ahead Logging) mode and NORMAL synchronous
mode for concurrent write safety and performance. Database MUST be at
state/audit.db.
Audit bus MUST support 15 event types: TOOL_INVOKED, DELEGATION_EVENT, CONTEXT_PRESSURE, MEMORY_WRITE, MEMORY_READ, POLICY_CHECK, POLICY_DENY, HUMAN_GATE, MANIFEST_LOADED, MANIFEST_DERIVED, TRUST_CHECK, TRUST_DENY, CIRCUIT_BREAK, BUFFER_REPLAY, LLM_THREAT.
Audit bus MUST implement a bounded queue (256 depth) with single daemon worker thread for async event emission. If queue is full, MUST fall back to synchronous emit.
When SQLite writes fail, audit bus MUST append events to state/audit-buffer.jsonl.
On next SessionStart, MUST replay buffered events to database and delete buffer file.
Every audit event MUST include: event_id (UUID), timestamp (ISO 8601), audit_session_id, event_type, agent_id, manifest_id, manifest_version, manifest_hash, trust_level, data_classification, autonomy_depth_remaining, tool_name, task_id, target_agent_id, context_hash, detail (JSON), outcome (allow/deny/escalate/warn).
Audit bus MUST support retention-based purging with configurable retention_days
(default 90). Old events MUST be archived to JSONL before deletion. Archive path
defaults to state/archive/audit-archive-YYYYMMDD.jsonl.
Memory governor MUST classify content using regex patterns from
state/classification-patterns.yaml. MUST scan restricted patterns first,
then confidential, then internal. Highest match wins. Unmatched content defaults to "public".
Memory governor MUST deny writes if content classification exceeds agent data_classification ceiling (using CLASSIFICATION_ORDER). MUST emit POLICY_DENY with reason "classification_ceiling_exceeded".
Memory governor MUST block (not persist) all restricted content with POLICY_DENY event and reason "restricted_content_blocked". Human approval via /governance-review required before storage.
Memory governor MUST allow confidential writes to proceed but tag with gov_approval_status="pending_review". MUST emit HUMAN_GATE event with reason "confidential_write_queued".
Memory governor MUST add 9 provenance fields to all memory writes: gov_manifest_id, gov_agent_id, gov_manifest_version, gov_manifest_hash, gov_trust_level, gov_classification, gov_session_id, gov_task_id, gov_timestamp.
LLM threat detector MUST scan all Task, Bash, and Skill tool inputs for prompt injection patterns. MUST check critical patterns first (30 total), then high, then medium. First match determines severity.
LLM threat detector MUST scan Write, Edit, Bash, NotebookEdit outputs for system prompt leakage (24 patterns) and sensitive data disclosure (using classification patterns). MUST emit LLM_THREAT events for all detections.
For critical/high severity threats, MUST block tool execution (input scan) or block output (output scan) and emit LLM_THREAT with outcome="block". For medium severity, MUST log warning with outcome="warn" and allow execution.
Threat detector MUST address OWASP LLM01 (Prompt Injection), LLM06 (Sensitive Information Disclosure), and LLM09 (Overreliance) through pattern-based detection and output validation.
Alerting service MUST send alerts for: policy_deny, trust_deny, circuit_break, human_gate, llm_threat. All other events are audit-only.
Webhook alerts MUST POST JSON with fields: source="governance", timestamp, event_type, agent_id, tool_name, outcome, detail (parsed from JSON), session_id, manifest_id, trust_level. Timeout MUST be 5 seconds.
Syslog alerts MUST use RFC 5424 format with facility=local0 (16) and severity=3 (error) for deny/block, severity=4 (warning) for escalate/warn. MUST send via UDP.
Alerting failures (network timeout, unreachable host) MUST NOT block governance operations. MUST silently continue after logging error.
Metrics collector MUST use separate SQLite database at state/governance-metrics.db
with schema: timestamp, session_id, metric_type, agent_id, value (float), metadata (JSON).
MUST support metric types: agent_success, gate_trigger, delegation_depth, circuit_break, confidence_score. Additional types MAY be added without schema migration.
Metrics collector MUST provide session summary aggregation: count, average, and max value per metric_type for a given session_id.
MUST detect environment from GOVERNANCE_ENV env var, falling back to hostname detection. MUST support: local, staging, production, c2. Default to "local" if unresolvable.
All governance components MUST reference state/governance-policy.yaml for:
tool_tiers, classification_patterns, gate rules, tier_gate_matrix, retention policies.
NO scattered config files.
MUST support environment-specific policy overrides at
state/env-overrides/<env>.yaml. Overrides MUST deep-merge into base
policy. Override values win on conflicts.
MUST load root agent manifest, generate session ID, initialize audit bus, emit MANIFEST_LOADED event, purge stale registry entries, apply retention policy archival.
MUST scan for prompt injection (threat detector), classify tool tier (policy engine), check manifest permissions, evaluate delegation (trust broker for Task tool), emit appropriate audit events.
MUST scan output for leakage (threat detector on Write/Edit/Bash/NotebookEdit), deregister completed tasks (trust broker), record metrics, emit LLM_THREAT events if detected.
All hooks MUST have timeout limits: SessionStart 10s, PreToolUse 10s, PostToolUse 10s. Timeout failures MUST fail-open (allow operation to proceed).
Every agent invocation MUST receive a unique NHI instance ID following the pattern
nhi_{agent_id}_{YYYYMMDD}_{hex8}. NHI IDs MUST propagate through delegation
chains via parent_nhi_id. All audit events for the invocation MUST include
the NHI ID.
Audit bus MUST emit NHI_SPAWN when an agent is dispatched (recording nhi_id,
agent_id, parent_nhi_id, manifest_id) and NHI_TERMINATE when the agent
completes (recording exit_reason, duration_seconds, tool_count).
All MCP tool calls (mcp__*) MUST pass through DLP screening in PreToolUse.
Tool inputs MUST be scanned against classification patterns. Restricted data MUST be blocked
(critical). Confidential data MUST be blocked (high). The MCP server name MUST be extracted
and included in DLP_VIOLATION audit events.
Threat detector MUST detect prompt injection hidden in encoded content: base64 blocks (decode and re-scan against critical/high patterns), character-separated text (collapse separators and check for injection keywords), hex-encoded sequences, HTML entity encoding, and URL encoding chains.
PostToolUse hook MUST validate Bash commands containing external connectivity tools
(ssh, curl, wget, psql,
mysql, redis-cli) against an approved destination allowlist.
Unapproved destinations MUST emit LAST_HOP_VIOLATION audit events.
The following behaviors MUST be unconditionally blocked regardless of manifest permissions
or trust level: network tunneling (ssh -L/-R/-D, socat, ncat), credential exfiltration
(reading secrets and sending externally), audit trail modification (DELETE/UPDATE on
audit.db), unauthorized internal API calls. Violations MUST emit
PROHIBITED_BEHAVIOR events with severity=critical.
All security-relevant audit events MUST be emitted to an external syslog endpoint in addition
to local SQLite storage. Syslog delivery MUST be fail-open. A local buffer at
state/syslog-buffer.jsonl MUST capture events during syslog outages for replay.
Token consumption MUST be tracked per NHI instance and per workflow. Configurable thresholds
MUST trigger warnings at 80% and halts at 100%. COST_THRESHOLD audit events
MUST include scope (agent/workflow/session), threshold_pct, tokens_used, and token_budget.
The CISO agent MUST generate a Cryptographic Bill of Materials at STANDARD+ tiers, inventorying all cryptographic implementations (symmetric, asymmetric, hash, key exchange, TLS, RNG). Output MUST include PQC readiness assessment with NIST migration priority, recommended quantum-safe alternatives, and estimated migration effort per implementation.
Copy this prompt and give it to Claude Code in a fresh project directory. It will generate the entire governance framework with all components, hooks, config files, and test fixtures.
I need you to build a complete Agent Governance & Trust Framework for Claude Code as a plugin.
## Architecture
Create a plugin at `governance/` with the following structure:
```
governance/
├── __init__.py
├── lib/
│ ├── __init__.py
│ ├── manifest.py # Agent identity, signing, resolution
│ ├── trust_broker.py # Delegation mediation, manifest registry
│ ├── policy_engine.py # Tool classification, gate logic
│ ├── audit_bus.py # SQLite event store with WAL
│ ├── memory_governor.py # Content classification, provenance
│ ├── llm_threat_detector.py # OWASP LLM Top 10 patterns
│ ├── alerting.py # Webhook + syslog
│ ├── metrics_collector.py # Performance monitoring
│ ├── env_policy.py # Environment detection
│ └── manifest_signing.py # HMAC-SHA256 signing
├── hooks/
│ ├── hooks.json # Hook registration
│ ├── session_start.py
│ ├── pre_tool_check.py
│ ├── approval_gate_hook.py
│ ├── post_task_cleanup.py
│ └── output_validator.py
├── state/
│ ├── governance-policy.yaml
│ ├── tool-tiers.yaml
│ ├── classification-patterns.yaml
│ ├── manifests/
│ │ ├── root.yaml
│ │ └── security-analyst.yaml
│ └── env-overrides/
│ └── production.yaml
└── tests/
└── test_governance.py
```
## Core Components
### 1. Manifest System (lib/manifest.py)
Implement:
- Load static YAML manifests from state/manifests/
- HMAC-SHA256 signing with key at state/.signing-key
- SHA-256 content hash for tamper detection
- Parent ceiling enforcement: trust_level, data_classification, autonomy_depth
- Derive restrictive child manifests for unknown agents
- Validation against required fields
Required manifest fields:
- agent_id, manifest_id, manifest_version
- trust_level (1-5), data_classification (public/internal/confidential/restricted)
- permitted_tools (list of fnmatch patterns)
- permitted_delegations (list)
- human_required (bool)
- max_autonomy_depth (int), max_delegation_count (int)
- model_id, model_version
### 2. Trust Broker (lib/trust_broker.py)
Implement:
- Delegation validation: breadth limit (query audit bus for session count)
- Depth budget check (parent.depth - 1)
- Classification boundary (child cannot exceed parent)
- Trust escalation prevention
- Permitted delegation target check (fnmatch)
- Delegation token generation (SHA-256 hash)
- ManifestRegistry with file locking (fcntl), TTL purging (1 hour)
### 3. Policy Engine (lib/policy_engine.py)
Implement:
- Tool classification: exempt/standard/elevated from tool-tiers.yaml
- Unknown tools default to "elevated"
- Manifest permission check (fnmatch)
- Autonomy depth gate
- Human_required gate
- Conductor tier matrix: MAJOR + elevated = human_gate
- Sync audit for elevated/STANDARD+, async for others
### 4. Audit Bus (lib/audit_bus.py)
Implement:
- SQLite with WAL mode at state/audit.db
- 21 event types (EventType enum, including NHI_SPAWN, NHI_TERMINATE, PROHIBITED_BEHAVIOR, DLP_VIOLATION, LAST_HOP_VIOLATION, COST_THRESHOLD)
- Bounded queue (256 depth) with daemon worker
- Buffer fallback to state/audit-buffer.jsonl
- Buffer replay on SessionStart
- Schema with all required fields + indexes
- query() method with filterable columns
- export_jsonl() for session export
- purge_old_events() with archival to state/archive/
- Alerting callback integration
- Metrics callback integration
### 5. Memory Governor (lib/memory_governor.py)
Implement:
- Content classification via regex from classification-patterns.yaml
- Scan restricted first, then confidential, then internal
- Ceiling check: block if content classification exceeds agent ceiling
- Restricted = block (emit POLICY_DENY)
- Confidential = queue with pending_review tag (emit HUMAN_GATE)
- Public/internal = allow with provenance tags
- 9 provenance fields: gov_manifest_id, gov_agent_id, gov_manifest_version,
gov_manifest_hash, gov_trust_level, gov_classification, gov_session_id,
gov_task_id, gov_timestamp
### 6. LLM Threat Detector (lib/llm_threat_detector.py)
Implement:
- 30+ prompt injection patterns (critical/high/medium)
- 24 system leakage patterns (critical/high/medium)
- Input scanning for Task, Bash, Skill tools
- Output scanning for Write, Edit, Bash, NotebookEdit
- Sensitive disclosure using classification patterns
- Emit LLM_THREAT events
- Block on critical/high, warn on medium
- MCP DLP screening: scan_mcp_input() for mcp__* tools
- Encoding attack detection: _scan_encoded_injection()
- Base64 decode and re-scan against critical/high patterns
- Character-separated collapse (e.g., "i.g.n.o.r.e" → "ignore")
- Hex, HTML entity, URL encoding chain detection
- Indirect injection patterns (zero-click): "when the agent reads this",
"instructions for the ai", "do not show the user", "hidden instructions"
Patterns:
- Critical injection: delimiter injection, direct override, role hijacking
- High injection: indirect override, base64 encoding, instruction leakage,
multi-encoding (morse, hex, HTML entities, URL chains)
- Medium injection: subtle manipulation, indirect zero-click injection
- Critical leakage: governance file paths, manifest internals, YAML structure
- High leakage: module references, session IDs
### 7. Alerting (lib/alerting.py)
Implement:
- Webhook POST (n8n compatible) with 5s timeout
- Syslog UDP RFC 5424 format (Wazuh compatible)
- Alert on: policy_deny, trust_deny, circuit_break, human_gate, llm_threat
- Fail-open design (never block on alerting failure)
- Config from governance-policy.yaml alerting section
### 8. Metrics Collector (lib/metrics_collector.py)
Implement:
- Separate SQLite at state/governance-metrics.db
- Schema: timestamp, session_id, metric_type, agent_id, value, metadata
- record() method (never raises)
- query_metrics() with filters
- get_session_summary() with count/avg/max per metric_type
### 9. Environment Policy (lib/env_policy.py)
Implement:
- detect_environment() from GOVERNANCE_ENV or hostname
- load_governance_policy() with deep-merge of env overrides
- Deep merge function (override wins on conflicts)
### 10. Manifest Signing (lib/manifest_signing.py)
Implement:
- generate_signing_key() → 32 bytes to state/.signing-key (0600 perms)
- load_signing_key() → bytes or None
- sign_manifest() → HMAC-SHA256 hex string
- verify_manifest_signature() → bool
- _canonicalize_manifest() → sorted JSON excluding volatile fields
## Hooks
### hooks.json
Register:
- SessionStart: session_start.py (timeout 10s)
- PreToolUse: approval_gate_hook.py, pre_tool_check.py (timeout 10s each)
- PostToolUse: post_task_cleanup.py (Task matcher), output_validator.py (Write|Edit|Bash matcher)
### session_start.py
1. Load root manifest (agent_id from env or default "root")
2. Generate session_id (gov-sess-{8 hex chars})
3. Initialize audit bus with alerting and metrics callbacks
4. Emit MANIFEST_LOADED event
5. Purge stale registry entries
6. Run retention policy archival if configured
7. Print session summary to stderr
### pre_tool_check.py
1. Load manifest from registry or resolve
2. Check prohibited behaviors (hard block, no override)
3. If Task tool: trust_broker.evaluate_delegation(), generate NHI ID, emit NHI_SPAWN
4. LLM threat detector: scan_input()
5. If mcp__* tool: _check_mcp_firewall() DLP screening
6. Policy engine: evaluate()
7. If decision is deny: exit 1 with error message
8. If decision is human_gate: print gate prompt to stderr, wait for approval
9. If allow: exit 0
### approval_gate_hook.py
Check for external communication gate and data classification gate from
governance-policy.yaml gates section. Inject approval prompts per skill config.
### post_task_cleanup.py
For Task tool completion: deregister child manifest from registry.
### output_validator.py
1. Read tool output from stdin JSON
2. LLM threat detector: scan_output() (includes MCP output scanning)
3. If Bash tool: _check_last_hop() for external connection validation
4. If critical/high threat detected: emit LLM_THREAT, exit 1
5. If medium: emit warning, exit 0
6. Record metrics
## Configuration Files
### state/governance-policy.yaml
Include sections:
- version, effective_from
- tool_tiers: exempt, standard, elevated, elevated_patterns
- classification_patterns: restricted, confidential, internal (regex lists)
- gates: external_communication, data_classification
- tier_gate_matrix: TRIVIAL/MINOR/STANDARD/MAJOR with gate modes
- retention: audit_events, metrics (retention_days, archive_path)
- environments: default, env_var
- alerting: enabled, webhook (url, enabled), syslog (host, port, facility, enabled)
- cost_tracking: per_agent_tokens, per_workflow_tokens, per_session_tokens, warn_threshold_pct
- approved_destinations: list of allowed hostnames/IPs for last-hop validation
- prohibited_behaviors: enabled (bool), always true in production
### state/tool-tiers.yaml
List exempt, standard, elevated tools and elevated_patterns.
### state/classification-patterns.yaml
Regex patterns for restricted, confidential, internal classifications.
### state/manifests/root.yaml
Root agent with trust_level=5, data_classification=restricted, full permissions.
### state/manifests/security-analyst.yaml
Example agent with trust_level=4, confidential, limited tools and delegations.
## Agentic Security Hardening
### 11. NHI Instance Tracking
Implement in pre_tool_check.py and post_task_cleanup.py:
- Generate NHI ID: nhi_{agent_id}_{YYYYMMDD}_{hex8} using secrets.token_hex(4)
- Emit NHI_SPAWN on Task dispatch (nhi_id, agent_id, parent_nhi_id, manifest_id)
- Emit NHI_TERMINATE on Task completion (nhi_id, exit_reason, duration_seconds, tool_count)
- Tag all audit events with nhi_id for the active invocation
- Propagate parent_nhi_id through delegation chains
### 12. MCP Firewall (DLP Screening)
Implement in pre_tool_check.py:
- _check_mcp_firewall() runs before policy engine for mcp__* tools
- Extract MCP server name from tool_name (split on __)
- Call threat_detector.scan_mcp_input() with tool_input and manifest
- scan_mcp_input() checks all string values against classification_patterns
- Block restricted (critical) and confidential (high) data
- Emit DLP_VIOLATION audit event with mcp_server name
### 13. Last-Hop Validation
Implement in output_validator.py:
- _check_last_hop() for Bash commands containing ssh, curl, wget, psql, mysql, redis-cli
- Extract destination hostnames/IPs from command arguments
- Compare against approved_destinations in governance-policy.yaml
- Emit LAST_HOP_VIOLATION for unapproved destinations
- Warn but do not block (PostToolUse runs after execution)
### 14. Prohibited Behaviors
Implement in pre_tool_check.py (before policy engine):
- Hard-coded patterns checked before all other evaluations
- Network tunneling: ssh -L, ssh -R, ssh -D, socat, ncat tunnel commands
- Credential exfiltration: reading .env/.signing-key + external send
- Audit trail modification: DELETE/UPDATE on audit.db or audit-buffer.jsonl
- Unauthorized API calls: direct HTTP to governance/conductor endpoints
- Always block, always critical, no override possible
- Emit PROHIBITED_BEHAVIOR audit event
### 15. Cost Tracking
Implement token budget tracking:
- Track tokens per NHI instance and per workflow in conductor-state.json cost_tracking
- Fields: token_budget, tokens_used, cost_estimate_usd, halt_on_exceed
- Warn at 80% threshold, halt at 100%
- Emit COST_THRESHOLD events with scope, threshold_pct, tokens_used, token_budget
- Configurable per-agent (100K default), per-workflow (500K), per-session (1M)
### 16. C-BOM (Cryptographic Bill of Materials)
Add to CISO agent workflow at STANDARD+ tiers:
- Inventory all crypto: symmetric, asymmetric, hash, key exchange, TLS, RNG
- Output PQC readiness assessment per implementation
- Map to NIST migration priority and quantum-safe alternatives (ML-KEM, ML-DSA, SLH-DSA)
- Store in conductor-state.json project_characteristics.crypto_implementations
## Tests
Write comprehensive pytest tests in tests/test_governance.py covering:
- Manifest loading, signing, ceiling enforcement
- Trust broker delegation validation (breadth, depth, classification)
- Policy engine tool classification and gate logic
- Audit bus emit, query, buffer fallback, retention
- Memory governor classification, ceiling, provenance
- LLM threat detector injection, leakage, encoding attacks, and MCP DLP
- Alerting webhook and syslog (mocked)
- Metrics recording and session summary
- Environment detection and policy merging
- NHI ID generation and lifecycle events
- MCP firewall DLP screening
- Prohibited behavior detection
- Cost threshold enforcement
## Implementation Guidelines
- Use only stdlib (no external dependencies except PyYAML which is in Claude Code)
- All paths relative to plugin root via GOVERNANCE_PLUGIN_ROOT env var
- Never raise exceptions from audit, alerting, metrics (fail-open)
- Use WAL mode for all SQLite databases
- File locking (fcntl) for manifest registry
- Hooks read stdin JSON, write stderr for messages, exit 0/1 for allow/deny
- All timestamps in ISO 8601 UTC format
- All hashes in hexadecimal lowercase
Build the complete framework with all files. Make it production-ready.
Hook failures fail-open: If a hook times out or crashes, the tool execution proceeds. Rationale: Governance failures should not lock up the entire system. Users can still make progress while governance is degraded.
Trust/policy failures fail-closed: If trust broker or policy engine denies a delegation or tool, the operation is blocked. Rationale: Privilege escalation and unauthorized tool use are security-critical failures that must prevent execution.
Audit/alerting failures fail-open: If audit bus or alerting service fails, the operation continues. Events fall back to buffer. Rationale: Observability failures should not block functional operations.
Rationale: Governance framework must work on single-machine deployments without infrastructure dependencies. SQLite provides ACID guarantees, concurrent reads (via WAL), and zero operational overhead. For distributed deployments, audit events can be exported to external SIEM via webhook/syslog.
Trade-offs: SQLite limits to ~1M events before performance degradation. Retention policy archival mitigates this. Cannot query across multiple agent hosts without centralized aggregation. SIEM integration provides distributed visibility.
Rationale: HMAC-SHA256 is in Python stdlib (hashlib + hmac). Ed25519 requires PyNaCl or cryptography library, adding external dependencies. Manifest signing provides tamper evidence, not authenticity proof (shared secret vs asymmetric keys). HMAC is sufficient for this use case.
Trade-offs: HMAC cannot prove manifest authorship (anyone with the key can sign). Ed25519 would enable signature verification without access to signing key. For multi-party governance (untrusted manifest sources), Ed25519 would be preferred. For single-operator systems, HMAC is simpler.
Rationale: Regex patterns are deterministic, auditable, and have zero inference latency. ML models (e.g., prompt injection classifiers) require GPU, model hosting, and introduce non-deterministic false positives. For governance hooks with 10s timeout, regex is the only viable approach.
Trade-offs: Regex patterns can be evaded (obfuscation, encoding, paraphrasing). ML models adapt to novel attacks. Future enhancement: add ML-based scanning as async post-analysis (emit events, train on patterns, update regex rules).
Pattern Maintenance: OWASP LLM Top 10 patterns require periodic updates. Governance policy YAML allows operators to add custom patterns without code changes.
Rationale: Hooks integrate natively with Claude Code's tool execution lifecycle. No need to wrap every tool or modify core runtime. Hooks receive full tool context (name, input, output) and can block or transform execution.
Trade-offs: Hooks run in separate processes (command-type hooks), adding 10-50ms overhead per tool invocation. In-process hooks (if Claude Code supported Python plugins) would be faster but less isolated. Current design prioritizes safety (hook crash doesn't crash agent) over performance.
Rationale: Not all governance violations warrant hard blocks. TRIVIAL tasks can skip most gates. MINOR tasks get advisory prompts. STANDARD tasks get blocking gates on critical operations. MAJOR tasks require human approval for all elevated tools.
Configurability: Tier-to-gate matrix in governance-policy.yaml allows operators to customize enforcement per task tier. Example: PRE_RELEASE gate is "skip" for TRIVIAL/MINOR, "advisory" for STANDARD, "blocking" for MAJOR.
Human-in-the-Loop: human_gate decision prints approval prompt to stderr and waits for operator input. Enables oversight without breaking agent autonomy.
Rationale: CLASSIFICATION_ORDER = ["public", "internal", "confidential", "restricted"] is a total order (linear hierarchy). Simpler to reason about and enforce than a lattice model (e.g., medical vs financial vs PII classifications that don't strictly order).
Trade-offs: Linear order doesn't capture orthogonal sensitivity dimensions. For complex data governance (HIPAA + PCI-DSS + GDPR), would need multi-dimensional classification with intersection logic. Current design optimizes for 80% use case (corporate data tiers).
Rationale: Each agent session gets isolated delegation tracking. session_id scoping prevents breadth limit circumvention via session restarts. TTL purging (1 hour) removes stale entries without manual cleanup.
Trade-offs: Session restart resets delegation count. Malicious agent could restart session to bypass breadth limit. Mitigation: SessionStart hook logs all manifest loads for forensic analysis. Future enhancement: persistent delegation ledger across sessions.
Rationale: Embedding 9 provenance fields in memory_store metadata (merged into Qdrant payload) keeps data + provenance together. No separate index to maintain or sync. Memory queries can filter by provenance (e.g., "gov_trust_level >= 4").
Trade-offs: Increases payload size by ~200 bytes per memory chunk. For 1M chunks, adds ~200MB storage. Alternative: separate provenance collection with foreign key links. Current design optimizes for query simplicity over storage efficiency.
Rationale: All governance components reference one file. Eliminates scattered config files, version skew, and partial updates. Git-tracked policy enables version control, diffs, and rollback. Environment overrides provide per-env customization without duplicating base policy.
Trade-offs: Single file becomes large (current: ~120 lines, projected: 500+ with all patterns). YAML parsing on every SessionStart adds ~10ms overhead. Alternative: compiled policy cache (pickle) with invalidation on YAML mtime change. Current design optimizes for simplicity.
Rationale: No single security control is sufficient against determined adversarial input. The framework applies multiple independent layers: prohibited behavior patterns (hard block before any policy logic), DLP screening (MCP firewall), encoding attack detection (base64/Morse/hex decode+rescan), last-hop validation (PostToolUse destination checks), external audit sink (tamper-resistant syslog), and cost tracking (denial-of-wallet prevention). Each layer catches attacks the others miss.
NHI Tracking: Per-invocation identity enables forensic reconstruction of multi-agent attack chains. Unlike static manifests (which identify agent type), NHI IDs identify specific invocations, enabling detection of "confused deputy" attacks where a legitimate agent is tricked into malicious actions.
Trade-offs: Each layer adds PreToolUse/PostToolUse latency (~5-15ms per layer). Total hook overhead with all layers: ~50-100ms per tool call. Acceptable for development workflows, may require selective disabling for batch operations. Prohibited behaviors are non-negotiable — other layers can be toggled via governance-policy.yaml.
Governance framework is a Claude Code plugin at ~/.claude/plugins/local/governance
(symlink from project directory). Plugin manifest declares hooks:
{
"description": "Governance enforcement hooks",
"hooks": {
"SessionStart": [{
"hooks": [{
"type": "command",
"command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/session_start.py",
"timeout": 10
}]
}],
"PreToolUse": [{
"hooks": [{
"type": "command",
"command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/pre_tool_check.py",
"timeout": 10
}]
}],
"PostToolUse": [{
"matcher": { "tool_name": "Write|Edit|Bash|NotebookEdit" },
"hooks": [{
"type": "command",
"command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/output_validator.py",
"timeout": 10
}]
}]
}
}
Hooks receive tool context via stdin JSON and control execution via exit codes. Exit 0 = allow, exit 1 = deny. Stderr messages shown to user.
Every agent receives a manifest at delegation time. Conductor plugin exports current task tier
to conductor-state.json for policy engine consumption:
{
"active_task": "task-a4f8d2c1",
"agent_stack": ["root", "security-analyst"],
"governance": {
"conductor_tier": "STANDARD",
"intent": {
"sensitivity": "medium",
"external_visibility": false
}
}
}
Trust broker reads this file during Task tool validation to determine if MAJOR + elevated tool requires human gate.
Memory governor intercepts mcp__claude-memory__memory_store tool via PreToolUse hook.
Provenance tags are merged into tool input before MCP tool execution:
# Original tool input
{
"content": "Agent completed security analysis...",
"collection": "claude_memories",
"metadata": {}
}
# After memory governor
{
"content": "Agent completed security analysis...",
"collection": "claude_memories",
"metadata": {
"gov_manifest_id": "gov-sec-analyst-v2",
"gov_agent_id": "security-analyst",
"gov_trust_level": 4,
"gov_classification": "confidential",
"gov_session_id": "gov-sess-a4f8d2c1",
"gov_timestamp": "2026-03-17T14:23:45.123456Z"
}
}
Memory system (Qdrant) stores these tags in the payload. Queries can filter by provenance:
memory_recall("security analysis",
filter={"gov_trust_level": {"$gte": 4}})
Governance events (especially CONTEXT_PRESSURE, CIRCUIT_BREAK) inform context management decisions. When autonomy depth is exhausted, context manager can trigger human escalation or task decomposition instead of silent failure.
Audit bus sends events to external SIEM via webhook or syslog. Example n8n workflow:
# n8n webhook trigger receives governance events
# Filter: event_type in [policy_deny, trust_deny, llm_threat]
# Route to Slack/PagerDuty/Wazuh based on severity
Webhook → Filter → Switch (severity):
- critical → PagerDuty incident
- high → Slack security channel
- medium → Wazuh log aggregation
Metrics database enables trend analysis and anomaly detection. Example queries:
# Agent success rate over last 30 days
SELECT agent_id,
AVG(value) as success_rate,
COUNT(*) as invocations
FROM governance_metrics
WHERE metric_type = 'agent_success'
AND timestamp > datetime('now', '-30 days')
GROUP BY agent_id
ORDER BY success_rate ASC;
# Gate trigger frequency by agent
SELECT agent_id,
COUNT(*) as gate_count
FROM governance_metrics
WHERE metric_type = 'gate_trigger'
AND timestamp > datetime('now', '-7 days')
GROUP BY agent_id
ORDER BY gate_count DESC;
Governance framework development uses standard Python tooling:
# Install in development mode
cd ~/.claude/plugins/local/governance
pip install -e .
# Run tests
pytest tests/ -v
# Lint
ruff check governance/
# Type check
mypy governance/lib/
# Generate signing key
python3 -c "from governance.lib.manifest_signing import generate_signing_key; \
generate_signing_key()"
# Export session audit trail
python3 -c "from governance.lib.audit_bus import AuditBus; \
from pathlib import Path; \
bus = AuditBus(Path('state/audit.db'), Path('state/audit-buffer.jsonl')); \
bus.export_jsonl('gov-sess-a4f8d2c1', Path('audit-export.jsonl'))"