Brain & Memory: The Cognitive Architecture of an AI Operating Kernel

A biological brain has two systems: fast intuition (cerebellum) and slow reasoning (cortex). OctopusOS replicates this with a dual-brain architecture — Big Brain (LLM) for complex reasoning, Small Brain (Cerebellum) for fast local inference — unified by a 3-tier memory system, knowledge distillation pipeline, and structured ReAct reasoning loop.

1. Why an AI OS Needs a Brain and Memory

Traditional AI agents are stateless: they process a prompt, return a result, and forget. But a true AI Operating System must remember, learn, and reason across time — just like a biological organism.

Without persistent memory and learning loops, an AI agent:

Repeats the same mistakes across sessions
Cannot build organizational knowledge
Has no mechanism to improve routing accuracy over time
Lacks structured multi-step reasoning

OctopusOS solves this with a dual-brain + triple-memory architecture that provides fast inference, continuous learning, persistent recall, and structured reasoning.

Dual Brain Architecture — LLM + Cerebellum

User Input → Mode Router

Big Brain (LLM)

OpenAI / Local Provider

High-complexity routing

Fallback when Cerebellum uncertain

Generates distillation examples

Distillation

LLM → Cerebellum

Small Brain (Cerebellum)

Bayesian Beta + Adam

Fast local inference (~1ms)

Learns from execution feedback

4-way classification model

4-Way Classification Output

CHAT

QUERY

PLAN

EXEC

2. The Big Brain: LLM Routing

The Big Brain leverages external LLM providers (OpenAI, local models) for complex routing decisions. When the Cerebellum is uncertain, the LLM takes over.

# Routing via LLM — kernel/runtime/_wl_llm_routing.py
async def _model_route(self, *, run_id, user_text, context_digest, ...):
    prompt = build_routing_prompt(user_text, context_digest)
    response = await self.llm_port.complete(prompt)
    decision = parse_routing_decision(response)
    # Store for distillation
    self._store_distillation_example(
        routing_input={"user_text": user_text},
        llm_decision=decision,
    )
    return decision

The LLM classifies every user input into one of 4 modes:

CHAT

Greetings and small talkSimple Q&A without tool callsPreference expressionFeedback acknowledgment

QUERY

Factual questionsDocumentation lookupRAG-enhanced searchStatus inquiries

PLAN

Migration strategiesArchitecture decisionsComplex workflowsRisk assessments

EXEC

Run commandsDeploy servicesFile operationsAPI calls

3. The Small Brain: Cerebellum Local Model

The Cerebellum is a lightweight, local classification model that provides sub-millisecond routing without any external API call. It learns from execution feedback and LLM distillation.

@dataclass(frozen=True)
class CerebellumModel:
    weights: dict[str, dict[str, float]]  # route → feature → weight
    bias: dict[str, float]                # route → bias
    model_version: str
    feature_config: FeatureConfig

Key algorithms:

Bayesian Beta priors: Each route starts with Beta(2, 2) distribution, updated on feedback
Feature bag construction: Tokenizes input into tok:word features + metadata features
Weighted scoring: score(route) = bias + Σ(weight[feature] * value)
Adam optimizer: Adaptive learning rate with momentum for gradient updates
Replay buffer: Stores recent examples for curriculum learning

Cerebellum Learning Pipeline

Feature Extraction

Tokenize input textExtract metadata (app_id, context_digest)Build feature bag

Model Prediction

Weighted scoring across 4 routesBayesian Beta confidenceSoftmax normalization

Feedback Integration

train_on_feedback() updates weightsAdam optimizer with gradient clippingReplay buffer sampling

Distillation Learning

High-confidence LLM decisions capturedTraining step updates modelModel version incremented

4. Knowledge Distillation: LLM → Cerebellum

When the LLM makes a high-confidence routing decision (confidence >= 0.5), it generates a DistillationExample that is stored in a buffer. When the buffer reaches the cadence threshold (every 50 runs), a training step fires.

@dataclass(frozen=True)
class DistillationExample:
    text: str
    context_digest: str
    llm_route: str        # CHAT | QUERY | PLAN | EXEC
    llm_confidence: float # >= 0.5 to qualify
    features: dict[str, float]
    ts_ms: int = 0

The distillation pipeline ensures the Cerebellum gradually absorbs the LLM’s routing intelligence, reducing dependence on expensive API calls over time.

Knowledge Distillation Pipeline

LLM Routing Decisions

CHAT 0.92QUERY 0.87EXEC 0.81PLAN 0.78

↓

Distillation Buffer0/50

confidence ≥ 0.5 | cadence: every 50 runs

↓

Training Step

Loss: —

Examples: —

5. The MemoryPoint Contract

All memory in OctopusOS is stored as MemoryPoint — a frozen, immutable dataclass that carries content, metadata, and embedding vector.

@dataclass(frozen=True)
class MemoryPoint:
    memory_id: str
    app_id: str
    namespace: str        # "session" | "short" | "long"
    content: str
    embedding: tuple[float, ...]
    payload: dict         # Extensible metadata
    tier: str
    hit_count: int = 0
    created_ms: int = 0
    expires_ms: int = 0

The payload dict is the extension mechanism — all new features (role isolation, chains, central memory) use payload fields rather than modifying the frozen contract:

_role_id — Role-level memory isolation
_chain_parent_id — Temporal chain links
_contributed_by — Central memory source tracking

6. Three-Tier Memory Lifecycle

OctopusOS organizes memory into three tiers with automatic lifecycle management:

Tier	TTL	Purpose	Promotion
Session	30 minutes	Active conversation context	Auto-expire
Short-term	1 day	Cross-session recent recall	→ Long when hit_count >= 5
Long-term	Permanent	Organizational knowledge	Never expires

The GC cycle runs periodically: expire stale entries → compact near-duplicates → promote high-hit memories.

Three-Tier Memory Lifecycle

Session Memory

Active conversation context, highest churn

TTL: 30 min

Short-term Memory

Cross-session recall, moderate retention

TTL: 1 day

Long-term Memory

Organizational knowledge, persistent

TTL: Permanent

hit_count ≥ 5 → promote

7. Role-Level Memory Isolation & Central Memory

Role Isolation

Each role (e.g., sysadmin, developer) gets its own memory scope via payload._role_id filtering. This prevents cross-role memory leakage while sharing the same underlying store.

def _memory_query_by_role(self, *, app_id, role_id, namespace, query, limit):
    all_points = self._memory_query(app_id=app_id, namespace=namespace, ...)
    return [p for p in all_points if p.payload.get("_role_id") == role_id]

Central Memory

Cross-app organizational knowledge lives in a special __central__ namespace. Any app can read, but writes require explicit authorization and are tagged with _contributed_by.

CENTRAL_APP_ID = "__central__"

def _central_memory_write(self, *, caller_app_id, point):
    enriched_payload = {**point.payload, "_contributed_by": caller_app_id}
    # Write to central store

8. Memory Chains: Temporal Links

Memory chains connect related memories through time using _chain_parent_id. This enables causal reasoning — tracing how a sequence of events led to a conclusion.

def build_chains(points: list[MemoryPoint]) -> dict[str, list[MemoryPoint]]:
    """Group memory points into chains by following _chain_parent_id links."""

def traverse_chain(points: list[MemoryPoint], root_id: str) -> list[MemoryPoint]:
    """Walk from root to all descendants in temporal order."""

def chain_depth(points: list[MemoryPoint], memory_id: str) -> int:
    """Count how deep a memory is in its chain."""

Memory Chain Traversal

Root Memory

Initial observation or decision

Child Memory

Follow-up action or result

Grandchild

Consequence or learned pattern

Leaf Memory

Final outcome or conclusion

9. Mind Map: The Reasoning Graph

The Mind Map transforms raw memory points and reasoning results into a visual graph of interconnected knowledge. Each node represents a memory, chain, or reasoning conclusion; edges represent relationships.

@dataclass(frozen=True)
class MindMapNode:
    node_id: str
    label: str
    node_type: str  # memory | chain | reasoning | preference
    tier: str
    metadata: dict
    x: float = 0.0
    y: float = 0.0

@dataclass(frozen=True)
class MindMapEdge:
    edge_id: str
    source_id: str
    target_id: str
    relation: str   # contributes | triggers | informs | evidence
    weight: float = 1.0

The force_layout() function positions nodes using a spring-electric model — connected nodes attract, all nodes repel, creating a readable graph layout.

Mind Map — Memory & Reasoning Graph

MemoryPoint

Chain Node

Reasoning

10. The ReAct Loop: Structured Multi-turn Reasoning

Complex tasks require multiple rounds of reasoning. The ReAct loop provides explicit Thought → Action → Observation structure for each step.

@dataclass(frozen=True)
class ReActStep:
    step_id: str
    thought: str = ""        # What the agent is thinking
    action: str = ""         # SEARCH | CALL_SKILL | CHAT | PLAN
    action_input: dict = {}  # Parameters for the action
    observation: str = ""    # Result of the action
    ts_ms: int = 0

@dataclass(frozen=True)
class ReActScratchpad:
    run_id: str
    steps: list[ReActStep] = []
    current_goal: str = ""
    accumulated_context: str = ""

Each step emits a REACT_STEP_COMPLETED LiveEvent for full auditability. The scratchpad summary is injected into the next LLM call for context continuity.

ReAct Loop — Thought → Action → Observation

Thought

→

Action

→

Observation

↻

Step 1

Thought:User wants disk usage

Step 2

Thought:Disk nearly full

Step 3

Thought:Report results

11. Evidence & Auditability

Every memory operation creates an auditable trail:

Event	Description
`MEMORY_ROLE_SCOPED_READ`	Role-filtered memory query
`MEMORY_ROLE_SCOPED_WRITE`	Role-tagged memory write
`CENTRAL_MEMORY_READ`	Cross-app central memory access
`CENTRAL_MEMORY_WRITE`	Central memory contribution
`MEMORY_CHAIN_TRAVERSED`	Chain walk completed
`MIND_MAP_SNAPSHOT_BUILT`	Mind map generated
`DISTILLATION_EXAMPLE_STORED`	LLM decision captured for distillation
`DISTILLATION_STEP_COMPLETED`	Cerebellum training step finished
`REACT_STEP_COMPLETED`	Single ReAct step executed
`REACT_SCRATCHPAD_EMITTED`	Full scratchpad snapshot

All events flow through the EvidenceStorePort for immutable storage. Any memory write, revoke, or learning conclusion can be traced back to its origin through evidence_pointer.

12. Architecture Summary

The Brain & Memory architecture ties together six subsystems into a unified cognitive layer:

Brain & Memory Architecture Stack

ReAct Loop

Thought → Action → ObservationScratchpad context accumulationChat ↔ Plan escalation

Dual Brain Router

Big Brain (LLM) for complex decisionsSmall Brain (Cerebellum) for fast inferenceConfidence-based fallback

Knowledge Distillation

LLM → DistillationExample → Buffer → TrainingCerebellum absorbs LLM intelligenceReduces API dependency over time

Three-Tier Memory

Session (30min) → Short (1d) → Long (permanent)GC: expire → compact → promoteMemoryPoint frozen contract

Memory Extensions

Role isolation via _role_idCentral memory via __central__Temporal chains via _chain_parent_id

Mind Map & Evidence

Force-directed reasoning graph10 LiveEvent types for auditImmutable evidence chain

Together, these subsystems give OctopusOS the ability to remember across sessions, learn from every interaction, reason through complex tasks, and audit every cognitive decision — the foundation of a truly intelligent operating system.