AIP — Agent Infrastructure Protocol Analysis

A deep-dive into the design, architecture, and implementation of AIP v1 — the governance control plane that sits above tool protocols like MCP, providing risk-classified capability management, policy evaluation, quota enforcement, and immutable evidence audit for every AI agent invocation.


1. Why AIP Exists

Modern AI agents invoke external tools via protocols like MCP (Model Context Protocol). MCP gives agents reach — the ability to read files, call APIs, and execute commands. But it offers no answer to:

  • Who is allowed to call this tool?
  • Under what conditions should the call be modified or blocked?
  • What happened after the call, and can we prove it?

AIP fills this gap. It is not a replacement for MCP — it is a governance layer above it.

Code evidence:

  • aip-spec/reference/aip_runtime/mcp_adapter.py — MCP servers are consumed as execution backends, not replaced
  • aip-spec/spec/aip-1.md — §1.1: “AIP sits between the agent runtime and execution adapters”
AIP Control Flow Architecture
1
Agent Runtime
Claude, GPT, Cursor, custom agents
2
AIP Control Plane
Governance, policy, quota, evidence
3
Execution Adapters
MCP, REST, CLI, internal, container, worker
4
External Services
GitHub, filesystem, databases, APIs

2. Design Principles

AIP is built on five foundational principles, each traceable to code.

2.1 Capability-First

The core abstraction is the Capability — a governed, typed, risk-classified unit of work. Every tool, API, or function an agent can invoke is wrapped as a Capability with explicit metadata.

Code evidence:

  • aip-spec/reference/aip_runtime/models.py@dataclass(frozen=True) class Capability with capability_id, version, risk_tier, input_schema, output_schema, execution, lifecycle

2.2 Governance-Native

Policy evaluation, approval workflows, quota enforcement, and audit records are protocol-level primitives — not optional add-ons.

Code evidence:

  • aip-spec/reference/aip_runtime/runtime.pyinvoke() executes 5 mandatory phases; policy evaluation cannot be skipped
  • aip-spec/reference/aip_runtime/policy_engine.py_DEFAULT_BY_RISK: CRITICAL defaults to deny even without explicit policy

2.3 Execution-Agnostic

AIP defines semantics, not wire format. The same Capability contract can dispatch to MCP, REST, CLI, internal functions, containers, or worker queues.

Code evidence:

  • aip-spec/reference/aip_runtime/models.pyExecutionBinding.type: str # mcp | rest | cli | internal | container | worker
  • aip-spec/reference/aip_runtime/runtime.pyself._handlers: dict[str, ExecutionHandler] — handlers pluggable per transport type

2.4 Immutability by Design

All core data models are frozen dataclasses. Evidence records cannot be modified after creation. Capability versions are append-only.

Code evidence:

  • aip-spec/reference/aip_runtime/models.py — every model uses @dataclass(frozen=True)
  • aip-spec/reference/aip_runtime/evidence_store.pyrecord() raises ValueError if evidence_id already exists

2.5 Fail-Closed Defaults

When no policy explicitly allows an invocation, the risk tier determines the default behavior. CRITICAL operations are denied by default.

Code evidence:

  • aip-spec/reference/aip_runtime/policy_engine.py:
_DEFAULT_BY_RISK = {
    "LOW": "allow",
    "MEDIUM": "allow",
    "HIGH": "require_approval",
    "CRITICAL": "deny",
}

3. The 5-Phase Control Flow

Every AIP invocation follows a deterministic 5-phase pipeline. No phase can be skipped, and the order is fixed.

Code evidence:

  • aip-spec/reference/aip_runtime/runtime.pyinvoke() method, lines 64–195

Phase 1: Resolve Capability

The registry looks up the requested capability_id with version resolution:

  • Exact version match if specified
  • Otherwise: latest non-deprecated version (active > staged > shadow)

Code evidence:

  • aip-spec/reference/aip_runtime/registry.pyget() method with status priority: {"active": 0, "staged": 1, "shadow": 2}
candidates.sort(
    key=lambda c: (
        status_priority.get(c.lifecycle.status, 99),
        -c.major_version,
        -c.minor_version,
    )
)

If no capability is found, the control flow terminates immediately with CAPABILITY_NOT_FOUND.

Phase 2: Evaluate Policies

All matching policies are evaluated in priority order (lower number = higher priority). Each policy has targeting rules and condition evaluation.

DecisionBehaviorTerminal?
denyStop immediately, block executionYes
require_approvalStop immediately, await approvalYes
modifyApply input/option overrides, continueNo (cumulative)
allowMark as explicitly allowed, continueNo
log_onlyRecord evaluation, continueNo

Code evidence:

  • aip-spec/reference/aip_runtime/policy_engine.pyevaluate() method:
# Terminal: deny
if rule.decision == "deny":
    return "deny", summaries, evaluations, {}

# Terminal: require_approval
if rule.decision == "require_approval":
    return "require_approval", summaries, evaluations, {}

# Cumulative: modify
if rule.decision == "modify" and rule.modifications:
    # ... merge modifications

Policy Targeting

Policies match invocations via four dimensions:

DimensionMechanism
CapabilitiesGlob patterns (github.*, fs.file.*)
Risk tiersExact match (HIGH, CRITICAL)
ActorsGlob patterns on actor_id
Actor typesExact match (agent, user, scheduler)

Code evidence:

  • aip-spec/reference/aip_runtime/policy_engine.py_policy_matches_target() uses fnmatch.fnmatch() for glob patterns

Condition Language

Leaf conditions use a {field, operator, value} structure with dot-path field resolution into the invocation request.

Supported operators: eq, ne, gt, lt, gte, lte, in, not_in, matches, starts_with, contains

Composites: all_of (AND), any_of (OR), not_ (NOT) — evaluated recursively.

Code evidence:

  • aip-spec/reference/aip_runtime/policy_engine.py_evaluate_condition() with recursive AND/OR/NOT, _resolve_field() for dot-path access, _compare() for 11 operators

Risk Tier Monotonicity

A critical safety invariant: policies can only make the effective risk tier more restrictive, never less.

# MUST NOT reduce below capability's declared tier
if RISK_TIER_ORDER.get(candidate, 0) >= RISK_TIER_ORDER.get(
    capability.risk_tier, 0
):
    effective_risk_tier = most_restrictive_tier(
        effective_risk_tier, candidate
    )

Phase 3: Check Quota

Budget enforcement across multiple dimensions:

DimensionDescription
max_calls_per_minuteRate limiting
max_calls_per_hourHourly cap
max_calls_per_dayDaily cap
max_tokens_per_dayToken budget
max_cost_per_dayCost ceiling (USD)
max_parallelConcurrency limit

Quotas scope to: capability, actor, session, tenant, or global. The most restrictive limit wins when quotas exist at multiple scopes.

Code evidence:

  • aip-spec/reference/aip_runtime/models.pyQuotaSpec dataclass with all dimensions

Phase 4: Dispatch Execution

The runtime routes to the registered handler for the capability’s transport type, applying any accumulated policy modifications.

transport_type = capability.execution.type if capability.execution else "internal"
handler = self._handlers.get(transport_type)
output = handler(capability, effective_input, effective_options)

If no handler exists for the transport type, TRANSPORT_ERROR is returned.

Code evidence:

  • aip-spec/reference/aip_runtime/runtime.py — dispatch section with handler lookup and execution

Phase 5: Record Evidence

Every invocation — success or failure — produces exactly one immutable evidence record.

Code evidence:

  • aip-spec/reference/aip_runtime/evidence_store.py:
class EvidenceStore:
    """AIP-1 §3.4.1 guarantees:
    - Immutable: records cannot be modified after creation.
    - Complete: every invocation produces exactly one record.
    - Auditable: enough info to answer who/what/when/why/cost.
    """

    def record(self, evidence: Evidence) -> None:
        if evidence.evidence_id in self._records:
            raise ValueError(
                f"Evidence {evidence.evidence_id} already exists (immutability violated)"
            )

Evidence records include: SHA-256 digests of input/output, full policy evaluations, cost metrics, execution details, error information, and actor delegation chains.


4. Four Core Objects

AIP defines exactly four core domain objects, all mapped to JSON schemas.

4.1 Capability

The fundamental unit of agent functionality — a governed, typed, risk-classified operation.

FieldTypeDescription
capability_idstringHierarchical dot-separated ID (e.g., github.repo.read)
versionstringMAJOR.MINOR (backward-compatible within major)
risk_tierenumLOW | MEDIUM | HIGH | CRITICAL
lifecycleobjectshadowstagedactivedeprecated
input_schemaJSON SchemaInput validation contract
output_schemaJSON SchemaOutput structure contract
executionExecutionBindingTransport type, endpoint, method
constraintsobjectRate limits, timeouts, token budgets
auth_requirementsobjectoauth2, api_key, oidc, mtls, bearer_token

Code evidence:

  • aip-spec/reference/aip_runtime/models.pyCapability dataclass, lines 43–64
  • aip-spec/spec/schemas/capability.json — canonical JSON schema

4.2 Invocation (Request / Response)

The transactional unit: one request in, one response out, one evidence record created.

Request fields: invocation_id, capability_id, actor (with delegation_chain), input, context (run_id, plan_id, step_index), options (dry_run, timeout, priority, trace)

Response fields: status (success | error | denied | timeout | circuit_open | quota_exceeded), output, error, cost, evidence_id, policy_decisions

Code evidence:

  • aip-spec/reference/aip_runtime/models.pyInvocationRequest (lines 98–105), InvocationResponse (lines 134–141)

4.3 Policy

Declarative rules that govern capability invocations.

@dataclass(frozen=True)
class Policy:
    policy_id: str
    target: PolicyTarget         # capabilities, risk_tiers, actors, actor_types
    rules: tuple[PolicyRule, ...]  # ordered, first match per policy
    description: str = ""
    enabled: bool = True
    priority: int = 100          # lower = evaluated first

Code evidence:

  • aip-spec/reference/aip_runtime/models.pyPolicy, PolicyRule, PolicyTarget, PolicyCondition, PolicyModifications, lines 147–198

4.4 Evidence

Immutable audit record produced for every invocation.

FieldDescription
evidence_idUnique record ID
invocation_idWhich invocation produced this
input_digestSHA-256 of input (privacy-preserving)
output_digestSHA-256 of output
policy_evaluationsFull chain of policy decisions
costLatency, execution time, tokens, monetary cost
executionTransport type, endpoint, retry count, circuit state
errorError code, message, retryable flag

Code evidence:

  • aip-spec/reference/aip_runtime/models.pyEvidence dataclass, lines 221–237
  • aip-spec/reference/aip_runtime/evidence_store.pycompute_digest() uses hashlib.sha256

5. MCP Bridge — Zero-Modification Integration

The most important adapter: any existing MCP server becomes an AIP execution backend without modification.

5.1 Architecture

AIP Runtime.invoke()

MCPAdapter.handle(capability, input, options)

_StdioTransport (per MCP server)
    ↓ JSON-RPC 2.0 over stdio
MCP Server (unmodified)

Code evidence:

  • aip-spec/reference/aip_runtime/mcp_adapter.py — class hierarchy: MCPAdapter_StdioTransport → subprocess

5.2 Auto-Discovery

MCP tools are automatically discovered and registered as AIP capabilities:

def discover_capabilities(self, provider_id: str, ...) -> list[Capability]:
    result = transport.request("tools/list", {})
    tools = result.get("tools", [])
    # Each MCP tool → one AIP Capability with:
    #   capability_id = f"{provider_id}.{name}"
    #   risk_tier = inferred from tool name/description
    #   execution = ExecutionBinding(type="mcp", endpoint=provider_id, method=name)

Code evidence:

  • aip-spec/reference/aip_runtime/mcp_adapter.pydiscover_capabilities(), lines 296–346

5.3 Risk Tier Inference

Tools are automatically classified by keyword analysis:

KeywordsRisk Tier
write, delete, remove, create, update, modify, send, deploy, execute, runHIGH
fetch, request, post, put, patch, connect, uploadMEDIUM
(default)LOW

Code evidence:

  • aip-spec/reference/aip_runtime/mcp_adapter.pyinfer_risk_tier() function, lines 231–238

5.4 Server Lifecycle

Each MCP server is managed as a subprocess with full lifecycle control:

  • Start: subprocess + JSON-RPC initialize handshake (protocol version 2024-11-05)
  • Health check: is_alive() via process poll
  • Auto-restart: up to max_restarts (default 3)
  • Graceful stop: SIGTERM → wait 5s → SIGKILL

Code evidence:

  • aip-spec/reference/aip_runtime/mcp_adapter.py_StdioTransport class, lines 78–221

5.5 Error Mapping

MCP errors are translated to AIP error codes:

MCP ErrorAIP Error Code
tool_not_foundCAPABILITY_NOT_FOUND
invalid_paramsINPUT_VALIDATION_FAILED
invalid_requestINPUT_VALIDATION_FAILED
TimeoutTIMEOUT
Server exitedTRANSPORT_ERROR

Code evidence:

  • aip-spec/reference/aip_runtime/mcp_adapter.py_MCP_ERROR_MAP dict, line 244

6. HTTP Binding

AIP exposes three HTTP endpoints for external integration:

MethodPathDescription
POST/invokeInvoke a capability through the full control plane
GET/capabilitiesList all registered capabilities
GET/evidence/{id}Retrieve an evidence record

Built on FastAPI with Pydantic validation. Intentionally minimal — registry admin, policy admin, and quota admin endpoints are deferred to later versions.

Code evidence:

  • aip-spec/reference/aip_runtime/http_binding.pycreate_app(runtime: AIPRuntime) -> FastAPI
@app.post("/invoke")
async def invoke(req: InvokeRequest) -> JSONResponse:
    """Full control flow: resolve → policy → quota → execute → evidence."""
    aip_request = InvocationRequest(...)
    response = runtime.invoke(aip_request)
    status_code = 200 if response.status == "success" else 422
    return JSONResponse(content=result, status_code=status_code)

7. Error Taxonomy

AIP defines a structured error taxonomy with 11 error codes:

Error CodeDescriptionRetryable
CAPABILITY_NOT_FOUNDCapability ID does not exist in registryNo
CAPABILITY_VERSION_NOT_FOUNDSpecific version not foundNo
INPUT_VALIDATION_FAILEDInput does not match schemaNo
POLICY_DENIEDPolicy engine blocked the invocationNo
QUOTA_EXCEEDEDBudget limit reachedYes (after window)
AUTH_REQUIREDAuthentication neededNo
AUTH_FAILEDAuthentication credentials invalidNo
EXECUTION_FAILEDHandler threw an exceptionYes
TIMEOUTExecution exceeded time limitYes
CIRCUIT_OPENCircuit breaker trippedYes (after cooldown)
TRANSPORT_ERRORNo handler for transport typeNo

Code evidence:

  • aip-spec/reference/aip_runtime/models.pyInvocationError dataclass with code, message, retryable, retry_after_ms
  • aip-spec/reference/aip_runtime/runtime.py_fail() method constructs error responses

8. Relationship to OctopusOS Kernel

AIP was extracted from OctopusOS governance infrastructure. The kernel’s internal contracts map directly to AIP objects:

AIP ObjectOctopusOS Kernel Contract
CapabilityCapabilityManifest + CapabilityRegistryEntryV2
InvocationPlanStepDispatchRouterDispatchResult
PolicyPolicyDecision + AccessDecision + DiscoveryPolicy
EvidenceAuditEnvelope + EvidenceIndexEntry + ExecutionOutcome
Risk TierRiskTaxonomy (LOW / MEDIUM / HIGH / CRITICAL)
LifecycleLifecycleTier (shadow / staged / active)

Code evidence:

  • kernel/contracts/policy_engine.pyRiskTaxonomy, BudgetQuotaSpec, PolicyDecision, ApprovalPipelineState
  • kernel/governance/policy_engine.pyevaluate_policy_decision()
  • kernel/governance/registry.pyvalidate_registry_snapshot() (fail-closed)
  • kernel/governance/quota.pyevaluate_quota() (fail-closed)
  • kernel/ports/evidence/interfaces.pyEvidenceIndexPort, ReplayLedgerPort

The kernel uses the same frozen-dataclass pattern, the same risk tier ordering, and the same fail-closed defaults — AIP is the public specification of governance patterns that were battle-tested inside OctopusOS.


9. Reference Implementation Architecture

The reference implementation is 6 modules, zero external dependencies for the core runtime:

ModuleLinesResponsibility
models.py253Frozen dataclasses for all AIP objects
runtime.py2465-phase control flow orchestrator
policy_engine.py301Policy targeting, conditions, decisions
registry.py120Capability resolution with version logic
evidence_store.py75Append-only immutable audit store
mcp_adapter.py449MCP server lifecycle and bridge
http_binding.py261FastAPI HTTP wrapper (optional)

Code evidence:

  • aip-spec/reference/aip_runtime/ — all 7 modules
  • aip-spec/reference/tests/ — 34 tests covering core flows

Dependency Graph

Reference Implementation Dependency Graph
http_binding.py
FastAPI
mcp_adapter.py
subprocess
runtime.py
5-phase orchestrator
registry.py
Capability resolution
policy_engine.py
Policy evaluation
evidence_store.py
Immutable audit
models.py
All modules depend on this

10. Four Scenarios Walkthrough

The AIP animation on the site demonstrates four representative scenarios that illustrate the protocol’s behavior across risk tiers.

Scenario 1: Read File (LOW Risk — Allowed)

Request:  fs.file.read → path=/workspace/readme.md
Policy:   No explicit policy matches
Default:  LOW risk → allow
Result:   ✅ Executed, evidence recorded

Scenario 2: Write File with Policy Modification (HIGH Risk — Modified)

Request:  fs.file.write → path=/workspace/output.txt
Policy 1: pol_sandbox → allow (path is within /workspace)
Policy 2: pol_force_dry_run → modify (inject dry_run: true)
Result:   ✅ Executed with dry_run=true, evidence recorded

Scenario 3: Write Outside Sandbox (HIGH Risk — Denied)

Request:  fs.file.write → path=/etc/passwd
Policy:   pol_sandbox → deny (path outside /workspace)
Result:   ❌ POLICY_DENIED, execution never reached, evidence recorded

Scenario 4: Delete (CRITICAL Risk — Denied)

Request:  fs.file.delete → path=/workspace/temp.log
Policy:   pol_no_critical_agent → deny (CRITICAL ops blocked for agents)
Default:  CRITICAL risk → deny even without policy
Result:   ❌ POLICY_DENIED, evidence recorded

Key insight: in scenarios 3 and 4, the execution phase is never reached. The policy engine terminates the control flow before any side effect occurs. Yet an evidence record is still created — the audit trail is complete regardless of outcome.


11. Design Patterns Summary

PatternImplementationPurpose
Frozen dataclasses@dataclass(frozen=True) everywhereImmutability guarantee
Fail-closed defaults_DEFAULT_BY_RISK with CRITICAL→denySafety by default
Terminal vs cumulativedeny/require_approval stop; modify accumulatesPredictable evaluation
Risk monotonicitymost_restrictive_tier() only increasesCannot bypass declared risk
Append-only evidenceValueError on duplicate evidence_idTamper-resistant audit
SHA-256 digestshashlib.sha256 for input/outputPrivacy-preserving proof
Transport abstractionExecutionHandler callable protocolPluggable backends
Glob-based targetingfnmatch.fnmatch() for policy matchingFlexible policy scoping
Version resolutionactive > staged > shadow prioritySafe rollout lifecycle
Delegation chainActor.delegation_chain: tuple[str, ...]Multi-agent accountability

12. Architecture Diagram

Complete AIP Architecture
Agent Runtime
ClaudeGPTCursorCustom Agents
AIP Control Plane
Registry (resolve)Policy Engine (target+eval)Quota (check)Dispatch (handler)
Evidence Store
Immutable, append-onlySHA-256 digests + full audit
Execution Backends
MCP (stdio)REST (httpx)CLI (exec)Container (docker)Worker Queue (redis/amqp)
LinkedIn X
OctopusOS
How can we help?