Brain & Memory: The Cognitive Architecture of an AI Operating Kernel

A biological brain has two systems: fast intuition (cerebellum) and slow reasoning (cortex). OctopusOS replicates this with a dual-brain architecture — Big Brain (LLM) for complex reasoning, Small Brain (Cerebellum) for fast local inference — unified by a 3-tier memory system, knowledge distillation pipeline, and structured ReAct reasoning loop.


1. Why an AI OS Needs a Brain and Memory

Traditional AI agents are stateless: they process a prompt, return a result, and forget. But a true AI Operating System must remember, learn, and reason across time — just like a biological organism.

Without persistent memory and learning loops, an AI agent:

  • Repeats the same mistakes across sessions
  • Cannot build organizational knowledge
  • Has no mechanism to improve routing accuracy over time
  • Lacks structured multi-step reasoning

OctopusOS solves this with a dual-brain + triple-memory architecture that provides fast inference, continuous learning, persistent recall, and structured reasoning.

Dual Brain Architecture — LLM + Cerebellum
User Input → Mode Router
Big Brain (LLM)
OpenAI / Local Provider
High-complexity routing
Fallback when Cerebellum uncertain
Generates distillation examples
Distillation
LLM → Cerebellum
Small Brain (Cerebellum)
Bayesian Beta + Adam
Fast local inference (~1ms)
Learns from execution feedback
4-way classification model
4-Way Classification Output
CHAT
QUERY
PLAN
EXEC

2. The Big Brain: LLM Routing

The Big Brain leverages external LLM providers (OpenAI, local models) for complex routing decisions. When the Cerebellum is uncertain, the LLM takes over.

# Routing via LLM — kernel/runtime/_wl_llm_routing.py
async def _model_route(self, *, run_id, user_text, context_digest, ...):
    prompt = build_routing_prompt(user_text, context_digest)
    response = await self.llm_port.complete(prompt)
    decision = parse_routing_decision(response)
    # Store for distillation
    self._store_distillation_example(
        routing_input={"user_text": user_text},
        llm_decision=decision,
    )
    return decision

The LLM classifies every user input into one of 4 modes:

CHAT
Greetings and small talkSimple Q&A without tool callsPreference expressionFeedback acknowledgment
QUERY
Factual questionsDocumentation lookupRAG-enhanced searchStatus inquiries
PLAN
Migration strategiesArchitecture decisionsComplex workflowsRisk assessments
EXEC
Run commandsDeploy servicesFile operationsAPI calls

3. The Small Brain: Cerebellum Local Model

The Cerebellum is a lightweight, local classification model that provides sub-millisecond routing without any external API call. It learns from execution feedback and LLM distillation.

@dataclass(frozen=True)
class CerebellumModel:
    weights: dict[str, dict[str, float]]  # route → feature → weight
    bias: dict[str, float]                # route → bias
    model_version: str
    feature_config: FeatureConfig

Key algorithms:

  • Bayesian Beta priors: Each route starts with Beta(2, 2) distribution, updated on feedback
  • Feature bag construction: Tokenizes input into tok:word features + metadata features
  • Weighted scoring: score(route) = bias + Σ(weight[feature] * value)
  • Adam optimizer: Adaptive learning rate with momentum for gradient updates
  • Replay buffer: Stores recent examples for curriculum learning
Cerebellum Learning Pipeline
Feature Extraction
Tokenize input textExtract metadata (app_id, context_digest)Build feature bag
Model Prediction
Weighted scoring across 4 routesBayesian Beta confidenceSoftmax normalization
Feedback Integration
train_on_feedback() updates weightsAdam optimizer with gradient clippingReplay buffer sampling
Distillation Learning
High-confidence LLM decisions capturedTraining step updates modelModel version incremented

4. Knowledge Distillation: LLM → Cerebellum

When the LLM makes a high-confidence routing decision (confidence >= 0.5), it generates a DistillationExample that is stored in a buffer. When the buffer reaches the cadence threshold (every 50 runs), a training step fires.

@dataclass(frozen=True)
class DistillationExample:
    text: str
    context_digest: str
    llm_route: str        # CHAT | QUERY | PLAN | EXEC
    llm_confidence: float # >= 0.5 to qualify
    features: dict[str, float]
    ts_ms: int = 0

The distillation pipeline ensures the Cerebellum gradually absorbs the LLM’s routing intelligence, reducing dependence on expensive API calls over time.

Knowledge Distillation Pipeline
LLM Routing Decisions
CHAT 0.92QUERY 0.87EXEC 0.81PLAN 0.78
Distillation Buffer0/50
confidence ≥ 0.5 | cadence: every 50 runs
Training Step
Loss:
Examples:

5. The MemoryPoint Contract

All memory in OctopusOS is stored as MemoryPoint — a frozen, immutable dataclass that carries content, metadata, and embedding vector.

@dataclass(frozen=True)
class MemoryPoint:
    memory_id: str
    app_id: str
    namespace: str        # "session" | "short" | "long"
    content: str
    embedding: tuple[float, ...]
    payload: dict         # Extensible metadata
    tier: str
    hit_count: int = 0
    created_ms: int = 0
    expires_ms: int = 0

The payload dict is the extension mechanism — all new features (role isolation, chains, central memory) use payload fields rather than modifying the frozen contract:

  • _role_id — Role-level memory isolation
  • _chain_parent_id — Temporal chain links
  • _contributed_by — Central memory source tracking

6. Three-Tier Memory Lifecycle

OctopusOS organizes memory into three tiers with automatic lifecycle management:

TierTTLPurposePromotion
Session30 minutesActive conversation contextAuto-expire
Short-term1 dayCross-session recent recall→ Long when hit_count >= 5
Long-termPermanentOrganizational knowledgeNever expires

The GC cycle runs periodically: expire stale entries → compact near-duplicates → promote high-hit memories.

Three-Tier Memory Lifecycle
T1
Session Memory
Active conversation context, highest churn
TTL: 30 min
T2
Short-term Memory
Cross-session recall, moderate retention
TTL: 1 day
T3
Long-term Memory
Organizational knowledge, persistent
TTL: Permanent
hit_count ≥ 5 → promote

7. Role-Level Memory Isolation & Central Memory

Role Isolation

Each role (e.g., sysadmin, developer) gets its own memory scope via payload._role_id filtering. This prevents cross-role memory leakage while sharing the same underlying store.

def _memory_query_by_role(self, *, app_id, role_id, namespace, query, limit):
    all_points = self._memory_query(app_id=app_id, namespace=namespace, ...)
    return [p for p in all_points if p.payload.get("_role_id") == role_id]

Central Memory

Cross-app organizational knowledge lives in a special __central__ namespace. Any app can read, but writes require explicit authorization and are tagged with _contributed_by.

CENTRAL_APP_ID = "__central__"

def _central_memory_write(self, *, caller_app_id, point):
    enriched_payload = {**point.payload, "_contributed_by": caller_app_id}
    # Write to central store

Memory chains connect related memories through time using _chain_parent_id. This enables causal reasoning — tracing how a sequence of events led to a conclusion.

def build_chains(points: list[MemoryPoint]) -> dict[str, list[MemoryPoint]]:
    """Group memory points into chains by following _chain_parent_id links."""

def traverse_chain(points: list[MemoryPoint], root_id: str) -> list[MemoryPoint]:
    """Walk from root to all descendants in temporal order."""

def chain_depth(points: list[MemoryPoint], memory_id: str) -> int:
    """Count how deep a memory is in its chain."""
Memory Chain Traversal
1
Root Memory
Initial observation or decision
2
Child Memory
Follow-up action or result
3
Grandchild
Consequence or learned pattern
4
Leaf Memory
Final outcome or conclusion

9. Mind Map: The Reasoning Graph

The Mind Map transforms raw memory points and reasoning results into a visual graph of interconnected knowledge. Each node represents a memory, chain, or reasoning conclusion; edges represent relationships.

@dataclass(frozen=True)
class MindMapNode:
    node_id: str
    label: str
    node_type: str  # memory | chain | reasoning | preference
    tier: str
    metadata: dict
    x: float = 0.0
    y: float = 0.0

@dataclass(frozen=True)
class MindMapEdge:
    edge_id: str
    source_id: str
    target_id: str
    relation: str   # contributes | triggers | informs | evidence
    weight: float = 1.0

The force_layout() function positions nodes using a spring-electric model — connected nodes attract, all nodes repel, creating a readable graph layout.

Mind Map — Memory & Reasoning Graph
m1Deploy scriptm2MySQL configm3Backup policyc1Migration chainc2Incident chainr1Risk: high disk
MemoryPoint
Chain Node
Reasoning

10. The ReAct Loop: Structured Multi-turn Reasoning

Complex tasks require multiple rounds of reasoning. The ReAct loop provides explicit Thought → Action → Observation structure for each step.

@dataclass(frozen=True)
class ReActStep:
    step_id: str
    thought: str = ""        # What the agent is thinking
    action: str = ""         # SEARCH | CALL_SKILL | CHAT | PLAN
    action_input: dict = {}  # Parameters for the action
    observation: str = ""    # Result of the action
    ts_ms: int = 0

@dataclass(frozen=True)
class ReActScratchpad:
    run_id: str
    steps: list[ReActStep] = []
    current_goal: str = ""
    accumulated_context: str = ""

Each step emits a REACT_STEP_COMPLETED LiveEvent for full auditability. The scratchpad summary is injected into the next LLM call for context continuity.

ReAct Loop — Thought → Action → Observation
Thought
Action
Observation
Step 1
Thought:User wants disk usage
Step 2
Thought:Disk nearly full
Step 3
Thought:Report results

11. Evidence & Auditability

Every memory operation creates an auditable trail:

EventDescription
MEMORY_ROLE_SCOPED_READRole-filtered memory query
MEMORY_ROLE_SCOPED_WRITERole-tagged memory write
CENTRAL_MEMORY_READCross-app central memory access
CENTRAL_MEMORY_WRITECentral memory contribution
MEMORY_CHAIN_TRAVERSEDChain walk completed
MIND_MAP_SNAPSHOT_BUILTMind map generated
DISTILLATION_EXAMPLE_STOREDLLM decision captured for distillation
DISTILLATION_STEP_COMPLETEDCerebellum training step finished
REACT_STEP_COMPLETEDSingle ReAct step executed
REACT_SCRATCHPAD_EMITTEDFull scratchpad snapshot

All events flow through the EvidenceStorePort for immutable storage. Any memory write, revoke, or learning conclusion can be traced back to its origin through evidence_pointer.


12. Architecture Summary

The Brain & Memory architecture ties together six subsystems into a unified cognitive layer:

Brain & Memory Architecture Stack
ReAct Loop
Thought → Action → ObservationScratchpad context accumulationChat ↔ Plan escalation
Dual Brain Router
Big Brain (LLM) for complex decisionsSmall Brain (Cerebellum) for fast inferenceConfidence-based fallback
Knowledge Distillation
LLM → DistillationExample → Buffer → TrainingCerebellum absorbs LLM intelligenceReduces API dependency over time
Three-Tier Memory
Session (30min) → Short (1d) → Long (permanent)GC: expire → compact → promoteMemoryPoint frozen contract
Memory Extensions
Role isolation via _role_idCentral memory via __central__Temporal chains via _chain_parent_id
Mind Map & Evidence
Force-directed reasoning graph10 LiveEvent types for auditImmutable evidence chain

Together, these subsystems give OctopusOS the ability to remember across sessions, learn from every interaction, reason through complex tasks, and audit every cognitive decision — the foundation of a truly intelligent operating system.

LinkedIn X
OctopusOS
How can we help?