AIP — Agent Infrastructure Protocol Analysis

A deep-dive into the design, architecture, and implementation of AIP v1 — the governance control plane that sits above tool protocols like MCP, providing risk-classified capability management, policy evaluation, quota enforcement, and immutable evidence audit for every AI agent invocation.

1. Why AIP Exists

Modern AI agents invoke external tools via protocols like MCP (Model Context Protocol). MCP gives agents reach — the ability to read files, call APIs, and execute commands. But it offers no answer to:

Who is allowed to call this tool?
Under what conditions should the call be modified or blocked?
What happened after the call, and can we prove it?

AIP fills this gap. It is not a replacement for MCP — it is a governance layer above it.

Code evidence:

aip-spec/reference/aip_runtime/mcp_adapter.py — MCP servers are consumed as execution backends, not replaced
aip-spec/spec/aip-1.md — §1.1: “AIP sits between the agent runtime and execution adapters”

AIP Control Flow Architecture

Agent Runtime

Claude, GPT, Cursor, custom agents

AIP Control Plane

Governance, policy, quota, evidence

Execution Adapters

MCP, REST, CLI, internal, container, worker

External Services

GitHub, filesystem, databases, APIs

2. Design Principles

AIP is built on five foundational principles, each traceable to code.

2.1 Capability-First

The core abstraction is the Capability — a governed, typed, risk-classified unit of work. Every tool, API, or function an agent can invoke is wrapped as a Capability with explicit metadata.

Code evidence:

aip-spec/reference/aip_runtime/models.py — @dataclass(frozen=True) class Capability with capability_id, version, risk_tier, input_schema, output_schema, execution, lifecycle

2.2 Governance-Native

Policy evaluation, approval workflows, quota enforcement, and audit records are protocol-level primitives — not optional add-ons.

Code evidence:

aip-spec/reference/aip_runtime/runtime.py — invoke() executes 5 mandatory phases; policy evaluation cannot be skipped
aip-spec/reference/aip_runtime/policy_engine.py — _DEFAULT_BY_RISK: CRITICAL defaults to deny even without explicit policy

2.3 Execution-Agnostic

AIP defines semantics, not wire format. The same Capability contract can dispatch to MCP, REST, CLI, internal functions, containers, or worker queues.

Code evidence:

aip-spec/reference/aip_runtime/models.py — ExecutionBinding.type: str # mcp | rest | cli | internal | container | worker
aip-spec/reference/aip_runtime/runtime.py — self._handlers: dict[str, ExecutionHandler] — handlers pluggable per transport type

2.4 Immutability by Design

All core data models are frozen dataclasses. Evidence records cannot be modified after creation. Capability versions are append-only.

Code evidence:

aip-spec/reference/aip_runtime/models.py — every model uses @dataclass(frozen=True)
aip-spec/reference/aip_runtime/evidence_store.py — record() raises ValueError if evidence_id already exists

2.5 Fail-Closed Defaults

When no policy explicitly allows an invocation, the risk tier determines the default behavior. CRITICAL operations are denied by default.

Code evidence:

aip-spec/reference/aip_runtime/policy_engine.py:

_DEFAULT_BY_RISK = {
    "LOW": "allow",
    "MEDIUM": "allow",
    "HIGH": "require_approval",
    "CRITICAL": "deny",
}

3. The 5-Phase Control Flow

Every AIP invocation follows a deterministic 5-phase pipeline. No phase can be skipped, and the order is fixed.

Code evidence:

aip-spec/reference/aip_runtime/runtime.py — invoke() method, lines 64–195

Phase 1: Resolve Capability

The registry looks up the requested capability_id with version resolution:

Exact version match if specified
Otherwise: latest non-deprecated version (active > staged > shadow)

Code evidence:

aip-spec/reference/aip_runtime/registry.py — get() method with status priority: {"active": 0, "staged": 1, "shadow": 2}

candidates.sort(
    key=lambda c: (
        status_priority.get(c.lifecycle.status, 99),
        -c.major_version,
        -c.minor_version,
    )
)

If no capability is found, the control flow terminates immediately with CAPABILITY_NOT_FOUND.

Phase 2: Evaluate Policies

All matching policies are evaluated in priority order (lower number = higher priority). Each policy has targeting rules and condition evaluation.

Decision	Behavior	Terminal?
`deny`	Stop immediately, block execution	Yes
`require_approval`	Stop immediately, await approval	Yes
`modify`	Apply input/option overrides, continue	No (cumulative)
`allow`	Mark as explicitly allowed, continue	No
`log_only`	Record evaluation, continue	No

Code evidence:

aip-spec/reference/aip_runtime/policy_engine.py — evaluate() method:

# Terminal: deny
if rule.decision == "deny":
    return "deny", summaries, evaluations, {}

# Terminal: require_approval
if rule.decision == "require_approval":
    return "require_approval", summaries, evaluations, {}

# Cumulative: modify
if rule.decision == "modify" and rule.modifications:
    # ... merge modifications

Policy Targeting

Policies match invocations via four dimensions:

Dimension	Mechanism
Capabilities	Glob patterns (`github.`, `fs.file.`)
Risk tiers	Exact match (`HIGH`, `CRITICAL`)
Actors	Glob patterns on actor_id
Actor types	Exact match (`agent`, `user`, `scheduler`)

Code evidence:

aip-spec/reference/aip_runtime/policy_engine.py — _policy_matches_target() uses fnmatch.fnmatch() for glob patterns

Condition Language

Leaf conditions use a {field, operator, value} structure with dot-path field resolution into the invocation request.

Supported operators: eq, ne, gt, lt, gte, lte, in, not_in, matches, starts_with, contains

Composites: all_of (AND), any_of (OR), not_ (NOT) — evaluated recursively.

Code evidence:

aip-spec/reference/aip_runtime/policy_engine.py — _evaluate_condition() with recursive AND/OR/NOT, _resolve_field() for dot-path access, _compare() for 11 operators

Risk Tier Monotonicity

A critical safety invariant: policies can only make the effective risk tier more restrictive, never less.

# MUST NOT reduce below capability's declared tier
if RISK_TIER_ORDER.get(candidate, 0) >= RISK_TIER_ORDER.get(
    capability.risk_tier, 0
):
    effective_risk_tier = most_restrictive_tier(
        effective_risk_tier, candidate
    )

Phase 3: Check Quota

Budget enforcement across multiple dimensions:

Dimension	Description
`max_calls_per_minute`	Rate limiting
`max_calls_per_hour`	Hourly cap
`max_calls_per_day`	Daily cap
`max_tokens_per_day`	Token budget
`max_cost_per_day`	Cost ceiling (USD)
`max_parallel`	Concurrency limit

Quotas scope to: capability, actor, session, tenant, or global. The most restrictive limit wins when quotas exist at multiple scopes.

Code evidence:

aip-spec/reference/aip_runtime/models.py — QuotaSpec dataclass with all dimensions

Phase 4: Dispatch Execution

The runtime routes to the registered handler for the capability’s transport type, applying any accumulated policy modifications.

transport_type = capability.execution.type if capability.execution else "internal"
handler = self._handlers.get(transport_type)
output = handler(capability, effective_input, effective_options)

If no handler exists for the transport type, TRANSPORT_ERROR is returned.

Code evidence:

aip-spec/reference/aip_runtime/runtime.py — dispatch section with handler lookup and execution

Phase 5: Record Evidence

Every invocation — success or failure — produces exactly one immutable evidence record.

Code evidence:

aip-spec/reference/aip_runtime/evidence_store.py:

class EvidenceStore:
    """AIP-1 §3.4.1 guarantees:
    - Immutable: records cannot be modified after creation.
    - Complete: every invocation produces exactly one record.
    - Auditable: enough info to answer who/what/when/why/cost.
    """

    def record(self, evidence: Evidence) -> None:
        if evidence.evidence_id in self._records:
            raise ValueError(
                f"Evidence {evidence.evidence_id} already exists (immutability violated)"
            )

Evidence records include: SHA-256 digests of input/output, full policy evaluations, cost metrics, execution details, error information, and actor delegation chains.

4. Four Core Objects

AIP defines exactly four core domain objects, all mapped to JSON schemas.

4.1 Capability

The fundamental unit of agent functionality — a governed, typed, risk-classified operation.

Field	Type	Description
`capability_id`	string	Hierarchical dot-separated ID (e.g., `github.repo.read`)
`version`	string	MAJOR.MINOR (backward-compatible within major)
`risk_tier`	enum	`LOW` \| `MEDIUM` \| `HIGH` \| `CRITICAL`
`lifecycle`	object	`shadow` → `staged` → `active` → `deprecated`
`input_schema`	JSON Schema	Input validation contract
`output_schema`	JSON Schema	Output structure contract
`execution`	ExecutionBinding	Transport type, endpoint, method
`constraints`	object	Rate limits, timeouts, token budgets
`auth_requirements`	object	oauth2, api_key, oidc, mtls, bearer_token

Code evidence:

aip-spec/reference/aip_runtime/models.py — Capability dataclass, lines 43–64
aip-spec/spec/schemas/capability.json — canonical JSON schema

4.2 Invocation (Request / Response)

The transactional unit: one request in, one response out, one evidence record created.

Request fields: invocation_id, capability_id, actor (with delegation_chain), input, context (run_id, plan_id, step_index), options (dry_run, timeout, priority, trace)

Code evidence:

aip-spec/reference/aip_runtime/models.py — InvocationRequest (lines 98–105), InvocationResponse (lines 134–141)

4.3 Policy

Declarative rules that govern capability invocations.

@dataclass(frozen=True)
class Policy:
    policy_id: str
    target: PolicyTarget         # capabilities, risk_tiers, actors, actor_types
    rules: tuple[PolicyRule, ...]  # ordered, first match per policy
    description: str = ""
    enabled: bool = True
    priority: int = 100          # lower = evaluated first

Code evidence:

aip-spec/reference/aip_runtime/models.py — Policy, PolicyRule, PolicyTarget, PolicyCondition, PolicyModifications, lines 147–198

4.4 Evidence

Immutable audit record produced for every invocation.

Field	Description
`evidence_id`	Unique record ID
`invocation_id`	Which invocation produced this
`input_digest`	SHA-256 of input (privacy-preserving)
`output_digest`	SHA-256 of output
`policy_evaluations`	Full chain of policy decisions
`cost`	Latency, execution time, tokens, monetary cost
`execution`	Transport type, endpoint, retry count, circuit state
`error`	Error code, message, retryable flag

Code evidence:

aip-spec/reference/aip_runtime/models.py — Evidence dataclass, lines 221–237
aip-spec/reference/aip_runtime/evidence_store.py — compute_digest() uses hashlib.sha256

5. MCP Bridge — Zero-Modification Integration

The most important adapter: any existing MCP server becomes an AIP execution backend without modification.

5.1 Architecture

AIP Runtime.invoke()
    ↓
MCPAdapter.handle(capability, input, options)
    ↓
_StdioTransport (per MCP server)
    ↓ JSON-RPC 2.0 over stdio
MCP Server (unmodified)

Code evidence:

aip-spec/reference/aip_runtime/mcp_adapter.py — class hierarchy: MCPAdapter → _StdioTransport → subprocess

5.2 Auto-Discovery

MCP tools are automatically discovered and registered as AIP capabilities:

def discover_capabilities(self, provider_id: str, ...) -> list[Capability]:
    result = transport.request("tools/list", {})
    tools = result.get("tools", [])
    # Each MCP tool → one AIP Capability with:
    #   capability_id = f"{provider_id}.{name}"
    #   risk_tier = inferred from tool name/description
    #   execution = ExecutionBinding(type="mcp", endpoint=provider_id, method=name)

Code evidence:

aip-spec/reference/aip_runtime/mcp_adapter.py — discover_capabilities(), lines 296–346

5.3 Risk Tier Inference

Tools are automatically classified by keyword analysis:

Keywords	Risk Tier
write, delete, remove, create, update, modify, send, deploy, execute, run	HIGH
fetch, request, post, put, patch, connect, upload	MEDIUM
(default)	LOW

Code evidence:

aip-spec/reference/aip_runtime/mcp_adapter.py — infer_risk_tier() function, lines 231–238

5.4 Server Lifecycle

Each MCP server is managed as a subprocess with full lifecycle control:

Start: subprocess + JSON-RPC initialize handshake (protocol version 2024-11-05)
Health check: is_alive() via process poll
Auto-restart: up to max_restarts (default 3)
Graceful stop: SIGTERM → wait 5s → SIGKILL

Code evidence:

aip-spec/reference/aip_runtime/mcp_adapter.py — _StdioTransport class, lines 78–221

5.5 Error Mapping

MCP errors are translated to AIP error codes:

MCP Error	AIP Error Code
`tool_not_found`	`CAPABILITY_NOT_FOUND`
`invalid_params`	`INPUT_VALIDATION_FAILED`
`invalid_request`	`INPUT_VALIDATION_FAILED`
Timeout	`TIMEOUT`
Server exited	`TRANSPORT_ERROR`

Code evidence:

aip-spec/reference/aip_runtime/mcp_adapter.py — _MCP_ERROR_MAP dict, line 244

6. HTTP Binding

AIP exposes three HTTP endpoints for external integration:

Method	Path	Description
POST	`/invoke`	Invoke a capability through the full control plane
GET	`/capabilities`	List all registered capabilities
GET	`/evidence/{id}`	Retrieve an evidence record

Built on FastAPI with Pydantic validation. Intentionally minimal — registry admin, policy admin, and quota admin endpoints are deferred to later versions.

Code evidence:

aip-spec/reference/aip_runtime/http_binding.py — create_app(runtime: AIPRuntime) -> FastAPI

@app.post("/invoke")
async def invoke(req: InvokeRequest) -> JSONResponse:
    """Full control flow: resolve → policy → quota → execute → evidence."""
    aip_request = InvocationRequest(...)
    response = runtime.invoke(aip_request)
    status_code = 200 if response.status == "success" else 422
    return JSONResponse(content=result, status_code=status_code)

7. Error Taxonomy

AIP defines a structured error taxonomy with 11 error codes:

Error Code	Description	Retryable
`CAPABILITY_NOT_FOUND`	Capability ID does not exist in registry	No
`CAPABILITY_VERSION_NOT_FOUND`	Specific version not found	No
`INPUT_VALIDATION_FAILED`	Input does not match schema	No
`POLICY_DENIED`	Policy engine blocked the invocation	No
`QUOTA_EXCEEDED`	Budget limit reached	Yes (after window)
`AUTH_REQUIRED`	Authentication needed	No
`AUTH_FAILED`	Authentication credentials invalid	No
`EXECUTION_FAILED`	Handler threw an exception	Yes
`TIMEOUT`	Execution exceeded time limit	Yes
`CIRCUIT_OPEN`	Circuit breaker tripped	Yes (after cooldown)
`TRANSPORT_ERROR`	No handler for transport type	No

Code evidence:

aip-spec/reference/aip_runtime/models.py — InvocationError dataclass with code, message, retryable, retry_after_ms
aip-spec/reference/aip_runtime/runtime.py — _fail() method constructs error responses

8. Relationship to OctopusOS Kernel

AIP was extracted from OctopusOS governance infrastructure. The kernel’s internal contracts map directly to AIP objects:

AIP Object	OctopusOS Kernel Contract
Capability	`CapabilityManifest` + `CapabilityRegistryEntryV2`
Invocation	`PlanStep` → `DispatchRouter` → `DispatchResult`
Policy	`PolicyDecision` + `AccessDecision` + `DiscoveryPolicy`
Evidence	`AuditEnvelope` + `EvidenceIndexEntry` + `ExecutionOutcome`
Risk Tier	`RiskTaxonomy` (LOW / MEDIUM / HIGH / CRITICAL)
Lifecycle	`LifecycleTier` (shadow / staged / active)

Code evidence:

kernel/contracts/policy_engine.py — RiskTaxonomy, BudgetQuotaSpec, PolicyDecision, ApprovalPipelineState
kernel/governance/policy_engine.py — evaluate_policy_decision()
kernel/governance/registry.py — validate_registry_snapshot() (fail-closed)
kernel/governance/quota.py — evaluate_quota() (fail-closed)
kernel/ports/evidence/interfaces.py — EvidenceIndexPort, ReplayLedgerPort

The kernel uses the same frozen-dataclass pattern, the same risk tier ordering, and the same fail-closed defaults — AIP is the public specification of governance patterns that were battle-tested inside OctopusOS.

9. Reference Implementation Architecture

The reference implementation is 6 modules, zero external dependencies for the core runtime:

Module	Lines	Responsibility
`models.py`	253	Frozen dataclasses for all AIP objects
`runtime.py`	246	5-phase control flow orchestrator
`policy_engine.py`	301	Policy targeting, conditions, decisions
`registry.py`	120	Capability resolution with version logic
`evidence_store.py`	75	Append-only immutable audit store
`mcp_adapter.py`	449	MCP server lifecycle and bridge
`http_binding.py`	261	FastAPI HTTP wrapper (optional)

Code evidence:

aip-spec/reference/aip_runtime/ — all 7 modules
aip-spec/reference/tests/ — 34 tests covering core flows

Dependency Graph

Reference Implementation Dependency Graph

http_binding.py

FastAPI

mcp_adapter.py

subprocess

runtime.py

5-phase orchestrator

registry.py

Capability resolution

policy_engine.py

Policy evaluation

evidence_store.py

Immutable audit

models.py

All modules depend on this

10. Four Scenarios Walkthrough

The AIP animation on the site demonstrates four representative scenarios that illustrate the protocol’s behavior across risk tiers.

Scenario 1: Read File (LOW Risk — Allowed)

Request:  fs.file.read → path=/workspace/readme.md
Policy:   No explicit policy matches
Default:  LOW risk → allow
Result:   ✅ Executed, evidence recorded

Scenario 2: Write File with Policy Modification (HIGH Risk — Modified)

Request:  fs.file.write → path=/workspace/output.txt
Policy 1: pol_sandbox → allow (path is within /workspace)
Policy 2: pol_force_dry_run → modify (inject dry_run: true)
Result:   ✅ Executed with dry_run=true, evidence recorded

Scenario 3: Write Outside Sandbox (HIGH Risk — Denied)

Request:  fs.file.write → path=/etc/passwd
Policy:   pol_sandbox → deny (path outside /workspace)
Result:   ❌ POLICY_DENIED, execution never reached, evidence recorded

Scenario 4: Delete (CRITICAL Risk — Denied)

Request:  fs.file.delete → path=/workspace/temp.log
Policy:   pol_no_critical_agent → deny (CRITICAL ops blocked for agents)
Default:  CRITICAL risk → deny even without policy
Result:   ❌ POLICY_DENIED, evidence recorded

Key insight: in scenarios 3 and 4, the execution phase is never reached. The policy engine terminates the control flow before any side effect occurs. Yet an evidence record is still created — the audit trail is complete regardless of outcome.

11. Design Patterns Summary

Pattern	Implementation	Purpose
Frozen dataclasses	`@dataclass(frozen=True)` everywhere	Immutability guarantee
Fail-closed defaults	`_DEFAULT_BY_RISK` with CRITICAL→deny	Safety by default
Terminal vs cumulative	deny/require_approval stop; modify accumulates	Predictable evaluation
Risk monotonicity	`most_restrictive_tier()` only increases	Cannot bypass declared risk
Append-only evidence	`ValueError` on duplicate evidence_id	Tamper-resistant audit
SHA-256 digests	`hashlib.sha256` for input/output	Privacy-preserving proof
Transport abstraction	`ExecutionHandler` callable protocol	Pluggable backends
Glob-based targeting	`fnmatch.fnmatch()` for policy matching	Flexible policy scoping
Version resolution	active > staged > shadow priority	Safe rollout lifecycle
Delegation chain	`Actor.delegation_chain: tuple[str, ...]`	Multi-agent accountability

12. Architecture Diagram

Complete AIP Architecture

Agent Runtime

ClaudeGPTCursorCustom Agents

AIP Control Plane

Registry (resolve)Policy Engine (target+eval)Quota (check)Dispatch (handler)

Evidence Store

Immutable, append-onlySHA-256 digests + full audit

Execution Backends

MCP (stdio)REST (httpx)CLI (exec)Container (docker)Worker Queue (redis/amqp)