AIP — Agent Infrastructure Protocol Analysis
A deep-dive into the design, architecture, and implementation of AIP v1 — the governance control plane that sits above tool protocols like MCP, providing risk-classified capability management, policy evaluation, quota enforcement, and immutable evidence audit for every AI agent invocation.
1. Why AIP Exists
Modern AI agents invoke external tools via protocols like MCP (Model Context Protocol). MCP gives agents reach — the ability to read files, call APIs, and execute commands. But it offers no answer to:
- Who is allowed to call this tool?
- Under what conditions should the call be modified or blocked?
- What happened after the call, and can we prove it?
AIP fills this gap. It is not a replacement for MCP — it is a governance layer above it.
Code evidence:
aip-spec/reference/aip_runtime/mcp_adapter.py— MCP servers are consumed as execution backends, not replacedaip-spec/spec/aip-1.md— §1.1: “AIP sits between the agent runtime and execution adapters”
2. Design Principles
AIP is built on five foundational principles, each traceable to code.
2.1 Capability-First
The core abstraction is the Capability — a governed, typed, risk-classified unit of work. Every tool, API, or function an agent can invoke is wrapped as a Capability with explicit metadata.
Code evidence:
aip-spec/reference/aip_runtime/models.py—@dataclass(frozen=True) class Capabilitywithcapability_id,version,risk_tier,input_schema,output_schema,execution,lifecycle
2.2 Governance-Native
Policy evaluation, approval workflows, quota enforcement, and audit records are protocol-level primitives — not optional add-ons.
Code evidence:
aip-spec/reference/aip_runtime/runtime.py—invoke()executes 5 mandatory phases; policy evaluation cannot be skippedaip-spec/reference/aip_runtime/policy_engine.py—_DEFAULT_BY_RISK: CRITICAL defaults to deny even without explicit policy
2.3 Execution-Agnostic
AIP defines semantics, not wire format. The same Capability contract can dispatch to MCP, REST, CLI, internal functions, containers, or worker queues.
Code evidence:
aip-spec/reference/aip_runtime/models.py—ExecutionBinding.type: str # mcp | rest | cli | internal | container | workeraip-spec/reference/aip_runtime/runtime.py—self._handlers: dict[str, ExecutionHandler]— handlers pluggable per transport type
2.4 Immutability by Design
All core data models are frozen dataclasses. Evidence records cannot be modified after creation. Capability versions are append-only.
Code evidence:
aip-spec/reference/aip_runtime/models.py— every model uses@dataclass(frozen=True)aip-spec/reference/aip_runtime/evidence_store.py—record()raisesValueErrorif evidence_id already exists
2.5 Fail-Closed Defaults
When no policy explicitly allows an invocation, the risk tier determines the default behavior. CRITICAL operations are denied by default.
Code evidence:
aip-spec/reference/aip_runtime/policy_engine.py:
_DEFAULT_BY_RISK = {
"LOW": "allow",
"MEDIUM": "allow",
"HIGH": "require_approval",
"CRITICAL": "deny",
}
3. The 5-Phase Control Flow
Every AIP invocation follows a deterministic 5-phase pipeline. No phase can be skipped, and the order is fixed.
Code evidence:
aip-spec/reference/aip_runtime/runtime.py—invoke()method, lines 64–195
Phase 1: Resolve Capability
The registry looks up the requested capability_id with version resolution:
- Exact version match if specified
- Otherwise: latest non-deprecated version (active > staged > shadow)
Code evidence:
aip-spec/reference/aip_runtime/registry.py—get()method with status priority:{"active": 0, "staged": 1, "shadow": 2}
candidates.sort(
key=lambda c: (
status_priority.get(c.lifecycle.status, 99),
-c.major_version,
-c.minor_version,
)
)
If no capability is found, the control flow terminates immediately with CAPABILITY_NOT_FOUND.
Phase 2: Evaluate Policies
All matching policies are evaluated in priority order (lower number = higher priority). Each policy has targeting rules and condition evaluation.
| Decision | Behavior | Terminal? |
|---|---|---|
deny | Stop immediately, block execution | Yes |
require_approval | Stop immediately, await approval | Yes |
modify | Apply input/option overrides, continue | No (cumulative) |
allow | Mark as explicitly allowed, continue | No |
log_only | Record evaluation, continue | No |
Code evidence:
aip-spec/reference/aip_runtime/policy_engine.py—evaluate()method:
# Terminal: deny
if rule.decision == "deny":
return "deny", summaries, evaluations, {}
# Terminal: require_approval
if rule.decision == "require_approval":
return "require_approval", summaries, evaluations, {}
# Cumulative: modify
if rule.decision == "modify" and rule.modifications:
# ... merge modifications
Policy Targeting
Policies match invocations via four dimensions:
| Dimension | Mechanism |
|---|---|
| Capabilities | Glob patterns (github.*, fs.file.*) |
| Risk tiers | Exact match (HIGH, CRITICAL) |
| Actors | Glob patterns on actor_id |
| Actor types | Exact match (agent, user, scheduler) |
Code evidence:
aip-spec/reference/aip_runtime/policy_engine.py—_policy_matches_target()usesfnmatch.fnmatch()for glob patterns
Condition Language
Leaf conditions use a {field, operator, value} structure with dot-path field resolution into the invocation request.
Supported operators: eq, ne, gt, lt, gte, lte, in, not_in, matches, starts_with, contains
Composites: all_of (AND), any_of (OR), not_ (NOT) — evaluated recursively.
Code evidence:
aip-spec/reference/aip_runtime/policy_engine.py—_evaluate_condition()with recursive AND/OR/NOT,_resolve_field()for dot-path access,_compare()for 11 operators
Risk Tier Monotonicity
A critical safety invariant: policies can only make the effective risk tier more restrictive, never less.
# MUST NOT reduce below capability's declared tier
if RISK_TIER_ORDER.get(candidate, 0) >= RISK_TIER_ORDER.get(
capability.risk_tier, 0
):
effective_risk_tier = most_restrictive_tier(
effective_risk_tier, candidate
)
Phase 3: Check Quota
Budget enforcement across multiple dimensions:
| Dimension | Description |
|---|---|
max_calls_per_minute | Rate limiting |
max_calls_per_hour | Hourly cap |
max_calls_per_day | Daily cap |
max_tokens_per_day | Token budget |
max_cost_per_day | Cost ceiling (USD) |
max_parallel | Concurrency limit |
Quotas scope to: capability, actor, session, tenant, or global. The most restrictive limit wins when quotas exist at multiple scopes.
Code evidence:
aip-spec/reference/aip_runtime/models.py—QuotaSpecdataclass with all dimensions
Phase 4: Dispatch Execution
The runtime routes to the registered handler for the capability’s transport type, applying any accumulated policy modifications.
transport_type = capability.execution.type if capability.execution else "internal"
handler = self._handlers.get(transport_type)
output = handler(capability, effective_input, effective_options)
If no handler exists for the transport type, TRANSPORT_ERROR is returned.
Code evidence:
aip-spec/reference/aip_runtime/runtime.py— dispatch section with handler lookup and execution
Phase 5: Record Evidence
Every invocation — success or failure — produces exactly one immutable evidence record.
Code evidence:
aip-spec/reference/aip_runtime/evidence_store.py:
class EvidenceStore:
"""AIP-1 §3.4.1 guarantees:
- Immutable: records cannot be modified after creation.
- Complete: every invocation produces exactly one record.
- Auditable: enough info to answer who/what/when/why/cost.
"""
def record(self, evidence: Evidence) -> None:
if evidence.evidence_id in self._records:
raise ValueError(
f"Evidence {evidence.evidence_id} already exists (immutability violated)"
)
Evidence records include: SHA-256 digests of input/output, full policy evaluations, cost metrics, execution details, error information, and actor delegation chains.
4. Four Core Objects
AIP defines exactly four core domain objects, all mapped to JSON schemas.
4.1 Capability
The fundamental unit of agent functionality — a governed, typed, risk-classified operation.
| Field | Type | Description |
|---|---|---|
capability_id | string | Hierarchical dot-separated ID (e.g., github.repo.read) |
version | string | MAJOR.MINOR (backward-compatible within major) |
risk_tier | enum | LOW | MEDIUM | HIGH | CRITICAL |
lifecycle | object | shadow → staged → active → deprecated |
input_schema | JSON Schema | Input validation contract |
output_schema | JSON Schema | Output structure contract |
execution | ExecutionBinding | Transport type, endpoint, method |
constraints | object | Rate limits, timeouts, token budgets |
auth_requirements | object | oauth2, api_key, oidc, mtls, bearer_token |
Code evidence:
aip-spec/reference/aip_runtime/models.py—Capabilitydataclass, lines 43–64aip-spec/spec/schemas/capability.json— canonical JSON schema
4.2 Invocation (Request / Response)
The transactional unit: one request in, one response out, one evidence record created.
Request fields: invocation_id, capability_id, actor (with delegation_chain), input, context (run_id, plan_id, step_index), options (dry_run, timeout, priority, trace)
Response fields: status (success | error | denied | timeout | circuit_open | quota_exceeded), output, error, cost, evidence_id, policy_decisions
Code evidence:
aip-spec/reference/aip_runtime/models.py—InvocationRequest(lines 98–105),InvocationResponse(lines 134–141)
4.3 Policy
Declarative rules that govern capability invocations.
@dataclass(frozen=True)
class Policy:
policy_id: str
target: PolicyTarget # capabilities, risk_tiers, actors, actor_types
rules: tuple[PolicyRule, ...] # ordered, first match per policy
description: str = ""
enabled: bool = True
priority: int = 100 # lower = evaluated first
Code evidence:
aip-spec/reference/aip_runtime/models.py—Policy,PolicyRule,PolicyTarget,PolicyCondition,PolicyModifications, lines 147–198
4.4 Evidence
Immutable audit record produced for every invocation.
| Field | Description |
|---|---|
evidence_id | Unique record ID |
invocation_id | Which invocation produced this |
input_digest | SHA-256 of input (privacy-preserving) |
output_digest | SHA-256 of output |
policy_evaluations | Full chain of policy decisions |
cost | Latency, execution time, tokens, monetary cost |
execution | Transport type, endpoint, retry count, circuit state |
error | Error code, message, retryable flag |
Code evidence:
aip-spec/reference/aip_runtime/models.py—Evidencedataclass, lines 221–237aip-spec/reference/aip_runtime/evidence_store.py—compute_digest()useshashlib.sha256
5. MCP Bridge — Zero-Modification Integration
The most important adapter: any existing MCP server becomes an AIP execution backend without modification.
5.1 Architecture
AIP Runtime.invoke()
↓
MCPAdapter.handle(capability, input, options)
↓
_StdioTransport (per MCP server)
↓ JSON-RPC 2.0 over stdio
MCP Server (unmodified)
Code evidence:
aip-spec/reference/aip_runtime/mcp_adapter.py— class hierarchy:MCPAdapter→_StdioTransport→ subprocess
5.2 Auto-Discovery
MCP tools are automatically discovered and registered as AIP capabilities:
def discover_capabilities(self, provider_id: str, ...) -> list[Capability]:
result = transport.request("tools/list", {})
tools = result.get("tools", [])
# Each MCP tool → one AIP Capability with:
# capability_id = f"{provider_id}.{name}"
# risk_tier = inferred from tool name/description
# execution = ExecutionBinding(type="mcp", endpoint=provider_id, method=name)
Code evidence:
aip-spec/reference/aip_runtime/mcp_adapter.py—discover_capabilities(), lines 296–346
5.3 Risk Tier Inference
Tools are automatically classified by keyword analysis:
| Keywords | Risk Tier |
|---|---|
| write, delete, remove, create, update, modify, send, deploy, execute, run | HIGH |
| fetch, request, post, put, patch, connect, upload | MEDIUM |
| (default) | LOW |
Code evidence:
aip-spec/reference/aip_runtime/mcp_adapter.py—infer_risk_tier()function, lines 231–238
5.4 Server Lifecycle
Each MCP server is managed as a subprocess with full lifecycle control:
- Start: subprocess + JSON-RPC initialize handshake (protocol version
2024-11-05) - Health check:
is_alive()via process poll - Auto-restart: up to
max_restarts(default 3) - Graceful stop: SIGTERM → wait 5s → SIGKILL
Code evidence:
aip-spec/reference/aip_runtime/mcp_adapter.py—_StdioTransportclass, lines 78–221
5.5 Error Mapping
MCP errors are translated to AIP error codes:
| MCP Error | AIP Error Code |
|---|---|
tool_not_found | CAPABILITY_NOT_FOUND |
invalid_params | INPUT_VALIDATION_FAILED |
invalid_request | INPUT_VALIDATION_FAILED |
| Timeout | TIMEOUT |
| Server exited | TRANSPORT_ERROR |
Code evidence:
aip-spec/reference/aip_runtime/mcp_adapter.py—_MCP_ERROR_MAPdict, line 244
6. HTTP Binding
AIP exposes three HTTP endpoints for external integration:
| Method | Path | Description |
|---|---|---|
| POST | /invoke | Invoke a capability through the full control plane |
| GET | /capabilities | List all registered capabilities |
| GET | /evidence/{id} | Retrieve an evidence record |
Built on FastAPI with Pydantic validation. Intentionally minimal — registry admin, policy admin, and quota admin endpoints are deferred to later versions.
Code evidence:
aip-spec/reference/aip_runtime/http_binding.py—create_app(runtime: AIPRuntime) -> FastAPI
@app.post("/invoke")
async def invoke(req: InvokeRequest) -> JSONResponse:
"""Full control flow: resolve → policy → quota → execute → evidence."""
aip_request = InvocationRequest(...)
response = runtime.invoke(aip_request)
status_code = 200 if response.status == "success" else 422
return JSONResponse(content=result, status_code=status_code)
7. Error Taxonomy
AIP defines a structured error taxonomy with 11 error codes:
| Error Code | Description | Retryable |
|---|---|---|
CAPABILITY_NOT_FOUND | Capability ID does not exist in registry | No |
CAPABILITY_VERSION_NOT_FOUND | Specific version not found | No |
INPUT_VALIDATION_FAILED | Input does not match schema | No |
POLICY_DENIED | Policy engine blocked the invocation | No |
QUOTA_EXCEEDED | Budget limit reached | Yes (after window) |
AUTH_REQUIRED | Authentication needed | No |
AUTH_FAILED | Authentication credentials invalid | No |
EXECUTION_FAILED | Handler threw an exception | Yes |
TIMEOUT | Execution exceeded time limit | Yes |
CIRCUIT_OPEN | Circuit breaker tripped | Yes (after cooldown) |
TRANSPORT_ERROR | No handler for transport type | No |
Code evidence:
aip-spec/reference/aip_runtime/models.py—InvocationErrordataclass withcode,message,retryable,retry_after_msaip-spec/reference/aip_runtime/runtime.py—_fail()method constructs error responses
8. Relationship to OctopusOS Kernel
AIP was extracted from OctopusOS governance infrastructure. The kernel’s internal contracts map directly to AIP objects:
| AIP Object | OctopusOS Kernel Contract |
|---|---|
| Capability | CapabilityManifest + CapabilityRegistryEntryV2 |
| Invocation | PlanStep → DispatchRouter → DispatchResult |
| Policy | PolicyDecision + AccessDecision + DiscoveryPolicy |
| Evidence | AuditEnvelope + EvidenceIndexEntry + ExecutionOutcome |
| Risk Tier | RiskTaxonomy (LOW / MEDIUM / HIGH / CRITICAL) |
| Lifecycle | LifecycleTier (shadow / staged / active) |
Code evidence:
kernel/contracts/policy_engine.py—RiskTaxonomy,BudgetQuotaSpec,PolicyDecision,ApprovalPipelineStatekernel/governance/policy_engine.py—evaluate_policy_decision()kernel/governance/registry.py—validate_registry_snapshot()(fail-closed)kernel/governance/quota.py—evaluate_quota()(fail-closed)kernel/ports/evidence/interfaces.py—EvidenceIndexPort,ReplayLedgerPort
The kernel uses the same frozen-dataclass pattern, the same risk tier ordering, and the same fail-closed defaults — AIP is the public specification of governance patterns that were battle-tested inside OctopusOS.
9. Reference Implementation Architecture
The reference implementation is 6 modules, zero external dependencies for the core runtime:
| Module | Lines | Responsibility |
|---|---|---|
models.py | 253 | Frozen dataclasses for all AIP objects |
runtime.py | 246 | 5-phase control flow orchestrator |
policy_engine.py | 301 | Policy targeting, conditions, decisions |
registry.py | 120 | Capability resolution with version logic |
evidence_store.py | 75 | Append-only immutable audit store |
mcp_adapter.py | 449 | MCP server lifecycle and bridge |
http_binding.py | 261 | FastAPI HTTP wrapper (optional) |
Code evidence:
aip-spec/reference/aip_runtime/— all 7 modulesaip-spec/reference/tests/— 34 tests covering core flows
Dependency Graph
10. Four Scenarios Walkthrough
The AIP animation on the site demonstrates four representative scenarios that illustrate the protocol’s behavior across risk tiers.
Scenario 1: Read File (LOW Risk — Allowed)
Request: fs.file.read → path=/workspace/readme.md
Policy: No explicit policy matches
Default: LOW risk → allow
Result: ✅ Executed, evidence recorded
Scenario 2: Write File with Policy Modification (HIGH Risk — Modified)
Request: fs.file.write → path=/workspace/output.txt
Policy 1: pol_sandbox → allow (path is within /workspace)
Policy 2: pol_force_dry_run → modify (inject dry_run: true)
Result: ✅ Executed with dry_run=true, evidence recorded
Scenario 3: Write Outside Sandbox (HIGH Risk — Denied)
Request: fs.file.write → path=/etc/passwd
Policy: pol_sandbox → deny (path outside /workspace)
Result: ❌ POLICY_DENIED, execution never reached, evidence recorded
Scenario 4: Delete (CRITICAL Risk — Denied)
Request: fs.file.delete → path=/workspace/temp.log
Policy: pol_no_critical_agent → deny (CRITICAL ops blocked for agents)
Default: CRITICAL risk → deny even without policy
Result: ❌ POLICY_DENIED, evidence recorded
Key insight: in scenarios 3 and 4, the execution phase is never reached. The policy engine terminates the control flow before any side effect occurs. Yet an evidence record is still created — the audit trail is complete regardless of outcome.
11. Design Patterns Summary
| Pattern | Implementation | Purpose |
|---|---|---|
| Frozen dataclasses | @dataclass(frozen=True) everywhere | Immutability guarantee |
| Fail-closed defaults | _DEFAULT_BY_RISK with CRITICAL→deny | Safety by default |
| Terminal vs cumulative | deny/require_approval stop; modify accumulates | Predictable evaluation |
| Risk monotonicity | most_restrictive_tier() only increases | Cannot bypass declared risk |
| Append-only evidence | ValueError on duplicate evidence_id | Tamper-resistant audit |
| SHA-256 digests | hashlib.sha256 for input/output | Privacy-preserving proof |
| Transport abstraction | ExecutionHandler callable protocol | Pluggable backends |
| Glob-based targeting | fnmatch.fnmatch() for policy matching | Flexible policy scoping |
| Version resolution | active > staged > shadow priority | Safe rollout lifecycle |
| Delegation chain | Actor.delegation_chain: tuple[str, ...] | Multi-agent accountability |