Brain & Memory: The Cognitive Architecture of an AI Operating Kernel
A biological brain has two systems: fast intuition (cerebellum) and slow reasoning (cortex). OctopusOS replicates this with a dual-brain architecture — Big Brain (LLM) for complex reasoning, Small Brain (Cerebellum) for fast local inference — unified by a 3-tier memory system, knowledge distillation pipeline, and structured ReAct reasoning loop.
1. Why an AI OS Needs a Brain and Memory
Traditional AI agents are stateless: they process a prompt, return a result, and forget. But a true AI Operating System must remember, learn, and reason across time — just like a biological organism.
Without persistent memory and learning loops, an AI agent:
- Repeats the same mistakes across sessions
- Cannot build organizational knowledge
- Has no mechanism to improve routing accuracy over time
- Lacks structured multi-step reasoning
OctopusOS solves this with a dual-brain + triple-memory architecture that provides fast inference, continuous learning, persistent recall, and structured reasoning.
2. The Big Brain: LLM Routing
The Big Brain leverages external LLM providers (OpenAI, local models) for complex routing decisions. When the Cerebellum is uncertain, the LLM takes over.
# Routing via LLM — kernel/runtime/_wl_llm_routing.py
async def _model_route(self, *, run_id, user_text, context_digest, ...):
prompt = build_routing_prompt(user_text, context_digest)
response = await self.llm_port.complete(prompt)
decision = parse_routing_decision(response)
# Store for distillation
self._store_distillation_example(
routing_input={"user_text": user_text},
llm_decision=decision,
)
return decision
The LLM classifies every user input into one of 4 modes:
3. The Small Brain: Cerebellum Local Model
The Cerebellum is a lightweight, local classification model that provides sub-millisecond routing without any external API call. It learns from execution feedback and LLM distillation.
@dataclass(frozen=True)
class CerebellumModel:
weights: dict[str, dict[str, float]] # route → feature → weight
bias: dict[str, float] # route → bias
model_version: str
feature_config: FeatureConfig
Key algorithms:
- Bayesian Beta priors: Each route starts with Beta(2, 2) distribution, updated on feedback
- Feature bag construction: Tokenizes input into
tok:wordfeatures + metadata features - Weighted scoring:
score(route) = bias + Σ(weight[feature] * value) - Adam optimizer: Adaptive learning rate with momentum for gradient updates
- Replay buffer: Stores recent examples for curriculum learning
4. Knowledge Distillation: LLM → Cerebellum
When the LLM makes a high-confidence routing decision (confidence >= 0.5), it generates a DistillationExample that is stored in a buffer. When the buffer reaches the cadence threshold (every 50 runs), a training step fires.
@dataclass(frozen=True)
class DistillationExample:
text: str
context_digest: str
llm_route: str # CHAT | QUERY | PLAN | EXEC
llm_confidence: float # >= 0.5 to qualify
features: dict[str, float]
ts_ms: int = 0
The distillation pipeline ensures the Cerebellum gradually absorbs the LLM’s routing intelligence, reducing dependence on expensive API calls over time.
5. The MemoryPoint Contract
All memory in OctopusOS is stored as MemoryPoint — a frozen, immutable dataclass that carries content, metadata, and embedding vector.
@dataclass(frozen=True)
class MemoryPoint:
memory_id: str
app_id: str
namespace: str # "session" | "short" | "long"
content: str
embedding: tuple[float, ...]
payload: dict # Extensible metadata
tier: str
hit_count: int = 0
created_ms: int = 0
expires_ms: int = 0
The payload dict is the extension mechanism — all new features (role isolation, chains, central memory) use payload fields rather than modifying the frozen contract:
_role_id— Role-level memory isolation_chain_parent_id— Temporal chain links_contributed_by— Central memory source tracking
6. Three-Tier Memory Lifecycle
OctopusOS organizes memory into three tiers with automatic lifecycle management:
| Tier | TTL | Purpose | Promotion |
|---|---|---|---|
| Session | 30 minutes | Active conversation context | Auto-expire |
| Short-term | 1 day | Cross-session recent recall | → Long when hit_count >= 5 |
| Long-term | Permanent | Organizational knowledge | Never expires |
The GC cycle runs periodically: expire stale entries → compact near-duplicates → promote high-hit memories.
7. Role-Level Memory Isolation & Central Memory
Role Isolation
Each role (e.g., sysadmin, developer) gets its own memory scope via payload._role_id filtering. This prevents cross-role memory leakage while sharing the same underlying store.
def _memory_query_by_role(self, *, app_id, role_id, namespace, query, limit):
all_points = self._memory_query(app_id=app_id, namespace=namespace, ...)
return [p for p in all_points if p.payload.get("_role_id") == role_id]
Central Memory
Cross-app organizational knowledge lives in a special __central__ namespace. Any app can read, but writes require explicit authorization and are tagged with _contributed_by.
CENTRAL_APP_ID = "__central__"
def _central_memory_write(self, *, caller_app_id, point):
enriched_payload = {**point.payload, "_contributed_by": caller_app_id}
# Write to central store
8. Memory Chains: Temporal Links
Memory chains connect related memories through time using _chain_parent_id. This enables causal reasoning — tracing how a sequence of events led to a conclusion.
def build_chains(points: list[MemoryPoint]) -> dict[str, list[MemoryPoint]]:
"""Group memory points into chains by following _chain_parent_id links."""
def traverse_chain(points: list[MemoryPoint], root_id: str) -> list[MemoryPoint]:
"""Walk from root to all descendants in temporal order."""
def chain_depth(points: list[MemoryPoint], memory_id: str) -> int:
"""Count how deep a memory is in its chain."""
9. Mind Map: The Reasoning Graph
The Mind Map transforms raw memory points and reasoning results into a visual graph of interconnected knowledge. Each node represents a memory, chain, or reasoning conclusion; edges represent relationships.
@dataclass(frozen=True)
class MindMapNode:
node_id: str
label: str
node_type: str # memory | chain | reasoning | preference
tier: str
metadata: dict
x: float = 0.0
y: float = 0.0
@dataclass(frozen=True)
class MindMapEdge:
edge_id: str
source_id: str
target_id: str
relation: str # contributes | triggers | informs | evidence
weight: float = 1.0
The force_layout() function positions nodes using a spring-electric model — connected nodes attract, all nodes repel, creating a readable graph layout.
10. The ReAct Loop: Structured Multi-turn Reasoning
Complex tasks require multiple rounds of reasoning. The ReAct loop provides explicit Thought → Action → Observation structure for each step.
@dataclass(frozen=True)
class ReActStep:
step_id: str
thought: str = "" # What the agent is thinking
action: str = "" # SEARCH | CALL_SKILL | CHAT | PLAN
action_input: dict = {} # Parameters for the action
observation: str = "" # Result of the action
ts_ms: int = 0
@dataclass(frozen=True)
class ReActScratchpad:
run_id: str
steps: list[ReActStep] = []
current_goal: str = ""
accumulated_context: str = ""
Each step emits a REACT_STEP_COMPLETED LiveEvent for full auditability. The scratchpad summary is injected into the next LLM call for context continuity.
11. Evidence & Auditability
Every memory operation creates an auditable trail:
| Event | Description |
|---|---|
MEMORY_ROLE_SCOPED_READ | Role-filtered memory query |
MEMORY_ROLE_SCOPED_WRITE | Role-tagged memory write |
CENTRAL_MEMORY_READ | Cross-app central memory access |
CENTRAL_MEMORY_WRITE | Central memory contribution |
MEMORY_CHAIN_TRAVERSED | Chain walk completed |
MIND_MAP_SNAPSHOT_BUILT | Mind map generated |
DISTILLATION_EXAMPLE_STORED | LLM decision captured for distillation |
DISTILLATION_STEP_COMPLETED | Cerebellum training step finished |
REACT_STEP_COMPLETED | Single ReAct step executed |
REACT_SCRATCHPAD_EMITTED | Full scratchpad snapshot |
All events flow through the EvidenceStorePort for immutable storage. Any memory write, revoke, or learning conclusion can be traced back to its origin through evidence_pointer.
12. Architecture Summary
The Brain & Memory architecture ties together six subsystems into a unified cognitive layer:
Together, these subsystems give OctopusOS the ability to remember across sessions, learn from every interaction, reason through complex tasks, and audit every cognitive decision — the foundation of a truly intelligent operating system.