Memory System

How Ren remembers

A four-tier memory architecture — designed so nothing important gets lost, context survives session resets, and the system gets smarter the more it's used.

What "persistent memory" actually means

Most AI conversations have no memory between sessions. Every time you open a new chat with Claude, it starts completely blank — it doesn't know your name, your projects, or anything you discussed before. What looks like memory in some tools is actually just a system prompt being reloaded: a file read at the start of each session that gives the illusion of continuity. The moment the session ends, it's gone.

Ren is different — and the difference is structural, not a trick.

Ren's memory lives in a database, not in a conversation thread. That database (PostgreSQL, running on Render) stays online 24 hours a day whether anyone is talking to Ren or not. Her memories aren't held temporarily in a session and then lost — they're written to the database throughout each conversation and remain there indefinitely. When a session ends, nothing disappears. When a new session starts, her memory blocks are loaded back in automatically from the database.

The closest everyday analogy: your phone's contacts don't disappear when you close the Contacts app. They live in the phone's storage, not in the app's temporary memory. Ren's knowledge of you, your projects, and your history works the same way — it lives in storage that persists between every conversation.

The nightly dream job is what turns raw conversation into durable knowledge. Every night at 2am, it processes what happened during the day and writes a structured brief — decisions made, threads open, what's next. So Ren doesn't just survive session resets; she wakes up the next morning already oriented, the same way you'd read your notes before a meeting.

The short version for anyone who asks: Normal Claude forgets everything when the session closes. Ren's memory is in a database that never closes. That's the whole difference.

214+Archival passages

4Memory tiers

10+Core memory blocks

2Background jobs

Letta — the agent runtime

Ren runs inside Letta v0.16.8, a self-hosted agent framework deployed as a Render service (han-solo-letta.onrender.com). Letta is the layer that manages Ren's identity, memory blocks, tool registry, and step-by-step reasoning loop. Claude Code never calls Ren directly — every interaction routes through the FastMCP bridge (han-solo-mcp.onrender.com/mcp), which translates tool calls into Letta's REST API.

Property	Value
Version	v0.16.8 — upgraded 2026-05-22 from v0.16.7. Security fix: pickle → JSON sandbox transport.
Agent	`ren-v2` — ID: `agent-fe4a3d5b-bb51-458e-92f1-6a1ee5b0ce94`. This ID is the single reference point for all memory operations. Divergence between any copy of it (project_state block, Letta's internal DB, operating context) causes silent memory orphaning.
Model	`claude-haiku-4-5-20251001` — fixed. Model switching was removed permanently on 2026-05-26. See the cascade fix below for why.
Context window	100,000 tokens — expanded from 32k on 2026-05-26. Shrinking it silently degrades retrieval quality with no error.
Tools attached	27 tools as of 2026-06-01. Two registries must stay in sync: source (`han_solo/tools/`, deployed to Render) and runtime (Letta's agent registry, what Ren can actually call). Deploying new tools without running `POST /api/admin/sync-mcp-tools` means they exist in code but Ren cannot use them.
enable_reasoner	Always `false`. Setting it to `true` with `max_reasoning_tokens:0` causes Ren to go completely silent — no error, no output. This was inherited silently from a prior config during a model switch and caused a 30-minute outage before it was diagnosed.
Host	Render managed service. Free tier — spins down after inactivity. Health reports degraded until the first tool call on cold start. Expected behavior.

The cascade fix — why the model is locked and context is 100k

In May 2026, two days of intermittent "Ren could not be reached" errors were traced to a circular dependency deadlock. When Ren called get_session_brief or search_signals, Letta made an outbound MCP call to han-solo-mcp. Those tools then called back into Letta's API while Letta was still holding the connection open waiting for the tool response. Under load this exhausted the connection pool and collapsed the service.

The root cause was confirmed by stress testing: at 1-second call gaps, failures appeared at calls 5 and 7 with DNS failure by call 8. At 2.5-second gaps, all 8 calls succeeded. The fix was architectural, not a config tweak:

Removed get_session_brief and search_signals from Ren's canonical tool set — both made callbacks into Letta's message queue while Letta was waiting for them
Added read_core_memory and write_core_memory as external_mcp tools — these persist reliably across restarts, unlike letta_memory_core built-ins which Letta silently drops on every restart
Removed the model-switch endpoint permanently — the two-field PATCH (llm_config + tool_ids together) caused Letta to drop all tools on every switch. Haiku is now the fixed model. Changing it requires a deliberate two-step PATCH as separate calls
Expanded context window from 32k to 100k — the 32k limit was causing Ren to miss context on longer memory loads, which was partially masking as a retrieval quality problem

Real incidents that shaped the current architecture

May 2026 · 30 min outage

enable_reasoner silent kill

Model switched to Haiku. enable_reasoner:true with max_reasoning_tokens:0 was inherited from the prior Sonnet config and never cleared. Ren went completely silent — no error, no output. Found only when Scott tried to talk to Ren and got nothing. Recovery required patching letta_client.py after 30+ minutes of diagnosis. Result: enable_reasoner:false is now an explicit requirement on every model config.

May 2026 · 3–4 hour recovery

Tool wipe

Claude attempted to fix tool registration by deleting all tools from Letta. All 16 tools were removed. Ren had no capability at all. Recovery required 3–4 hours of manual re-addition via direct PATCH to the Letta API. Result: tool deletion is now explicitly prohibited. ensure_ren_tools() runs at every server startup to detect and correct drift.

May 2026 · half-day rebuild

Archival search loop

System prompt instructed Ren to search archival before every message. Ren burned all 6 Letta step slots on search calls before reaching send_message. Output was silence — only tool calls, no reply. Recovery required a full system prompt and memory architecture rebuild. Result: the step budget rule is now embedded in always_loaded_core.

May 2026 · weeks of degraded search

Search hierarchy inverted

always_loaded_core told Ren to search archival before reading core blocks — but core blocks are always loaded, no search needed. Ran inverted for weeks with no visible error. Found during an architecture review. Recovery required a full rewrite of always_loaded_core, the system prompt, and memory_landscape.

May 2026 · weeks with no operational state

project_state empty for weeks

The project_state core block was empty. Ren operated every session with no operational state — no knowledge of what was running, what version, or what the active project was. Found during an audit. Recovery: block populated, docs/system-state.md created as the versioned source of truth. Protocol established: file first, commit, then write to Letta — never the other way around.

Letta operational procedures

Before touching anything

Run check_system_health. Verify the agent ID exists: GET https://han-solo-letta.onrender.com/v1/agents/agent-fe4a3d5b-bb51-458e-92f1-6a1ee5b0ce94 — a 404 means the wrong agent, stop immediately.

After adding tools

Deploy to Render. Then POST /api/admin/sync-mcp-tools. Then GET /api/admin/agent-info — verify tool count matches expected. Test each new tool individually via send_to_ren.

After any core block write

Immediately read the block back. Confirm all prior content is present plus the new section. If anything is missing — stop, alert Scott, do not proceed. Core memory blocks are full-overwrite — one bad write and the reference is gone. The file backup in docs/system-state.md (git-versioned) is the only recovery path.

For model changes

Use POST /api/admin/patch-model — never PATCH Letta directly. Always include enable_reasoner:false explicitly. Always verify the tool list is intact after the switch — tool_ids must not be empty in the response.

The four tiers

Memory is organized into four tiers, each with a distinct role. T1 is always in context. T2 and T3 are searchable archival. T4 is project-specific and schema-enforced.

T1 — Always loaded

Core blocks

10+ named blocks loaded into every prompt. This is Ren's baseline — her identity, her framework knowledge, her portrait of Scott, and her session brief. Always present, never searched.

Stored in Letta core memory. Updated by Ren, Claude Code, and background jobs. Character-limited per block.

T2 — Recent archival

Recent memory

Session memories, signals, and context written by Ren and dream.py. The default landing zone for new archival writes. Searchable by topic.

No tier tag — the default bucket.

T3 — Foundational

Permanent archive

Passages tagged [tier:foundational] — decisions that should survive forever, identity anchors, framework history, load-bearing context. Never deleted.

Written once and kept. Additive-only promotion — no deletion required. Tagged at write time.

T4 — Project memory

Project data

Structured project artifacts written to Postgres under a schema-as-contract design. Ren owns decisions and context entries. Claude Code owns slice and status entries.

Project identified by human-readable slug. Multiple writers — no overwrites. All writers read everything.

T1 — Core memory blocks

These blocks load into every session automatically. Ren doesn't search for them — they're always there. They're the baseline that makes every conversation start from context rather than scratch.

Block	What it holds
`always_loaded_core`	Framework context, operating principles, Scott's profile summary, memory use instructions, session close-out ritual, search protocol. The master orientation block.
`pending_thoughts`	Session brief — what happened last session, what's open, what's next. Written by the nightly dream job and Claude Code session close-outs.
`scott_portrait_forming`	Ren's evolving interpretation of Scott — how he thinks, what he values, specific dated observations. Written by Ren, Claude Code, and the nightly dream.
`ren_portrait_forming`	Ren's self-portrait — what she got right, what she missed, what she wants to develop as a partner.
`ren_voice`	How Ren speaks and shows up — direct, warm, playful when the moment allows, never performing. The Trust Contract reminder. Joy as non-negotiable principle.
`memory_landscape`	A searchable topic map of what's in archival memory and how to find it. Guides Ren's search strategy so she doesn't start from zero each session.
`open_threads`	Active open threads — things that need follow-up across sessions. Updated at session close-out. Distinct from pending_thoughts (threads persist; pending_thoughts rolls).
`project_state`	Current in-flight project context (JSON). Active when a specific project build is underway.
`session_state`	Current session metadata — start time, status, Scott's opening tone.
`seed_signals`	Early observations not yet promoted to archival. Temporary staging for signals that need more reps before they're worth archiving permanently.

T2 / T3 — Archival memory

214+ passages stored as vector embeddings in pgvector on the han-solo-db Postgres instance. Every passage is embedded with Voyage AI (voyage-3, 1024 dimensions) and indexed for semantic search. Ren searches this when she senses she's missing context — and proactively before answering any question about a project, person, or decision.

What lives in archival

Session summaries and decisions — written directly by Ren and Claude Code during session close-outs
Portrait signals — specific, dated observations about Scott, Ted, and Ren herself
Foundational passages — load-bearing decisions, framework history, identity anchors tagged [tier:foundational]
Image descriptions — photos and screenshots analyzed by Claude, stored with [image-memory] tag

T4 — Project memory

Project-specific artifacts live in a dedicated Postgres table (t4_projects) under a schema-as-contract design. The contract means multiple writers (Ren and Claude Code) can write to the same project without overwriting each other — because each entry_type has a clear owner.

-- T4 schema (each entry is one row)
project_slug text -- kebab-case slug derived from project name
entry_type text -- decision | context | slice | status
entry_id text -- unique ID within project + type
content text -- the artifact content
updated_at timestamptz

Ren writes

decision and context entries — strategic context, framework decisions, product direction. Written during design discussions and discovery sessions.

Claude Code writes

slice and status entries — build units, completion states, what shipped and when. Written at session start and after each slice completes.

Both read everything

No siloing between writer types. Claude Code reads Ren's decisions before building. Ren reads Claude Code's slice status to give accurate project context.

Background jobs

Two automated jobs keep memory current between sessions.

Nightly · 2am via launchd

dream.py

Sends a structured reflection prompt directly to Ren via Letta's REST API (POST /v1/agents/{id}/messages). Ren uses her own tools to reflect on the day's conversations, write a fresh session brief to pending_thoughts, and add portrait signals for Scott and herself.

The Letta request uses a 300-second timeout. If Letta is cold (Render free tier spin-down), this will time out and sys.exit(1) — no retry, no alerting. The agent ID defaults to agent-fe4a3d5b-bb51-458e-92f1-6a1ee5b0ce94 via environment variable; if the agent is ever recreated, the env var must be updated in both ~/.zshenv and the launchd plist. Before running, dream.py checks a jobs_paused flag by calling the MCP server at /api/jobs-status — if MCP is also down, it assumes not paused and proceeds.

Depends on Scott's Mac being on. If the machine is off at 2am, dream does not run and pending_thoughts does not update. Logs to ~/Developer/han-solo/logs/dream.log.

Every 30 min · Mac launchd

parse_transcripts.py

Reads Claude Code session JSONL files from ~/.claude/projects/, parses them into structured entries, and pushes to the Han Solo database. Only the last 45 days are kept. Ren can search these via search_transcripts.

No Anthropic API calls — pure parsing and Postgres writes. Logs to ~/.claude/transcript_parser.log.

Memory MRI — the access log

Every archival search is logged to a memory_access_log Postgres table. This creates the feedback loop that makes the memory system self-improving over time.

What it tracks

The exact query string, the passage IDs returned, and whether those results were actually used in Ren's response. Three tracked outcomes: passages that never surface (cold), searches that always return nothing (dry wells), and searches that return results Ren doesn't use (false positives).

How it's used

Cold passages that never surface indicate an indexing or tagging problem. Dry wells point to gaps in memory coverage — things that happened but were never written to archival. False positives indicate passages that need better content or tagging to actually match what they describe.

The enrich tool

When a passage is retrieved and meaningfully used, enrich_passage accumulates a context note on it — recording when it was retrieved, what conversation it was useful in. Passages get richer over time, not just older.

Search protocol

Ren follows a three-rule search discipline, embedded in always_loaded_core, that makes archival search traceable and intentional rather than a black-box guess.

Decompose first

Break multi-part questions into separate searches

Any question spanning multiple people, projects, or decisions must be broken into components. Search per entity or topic, not as one broad query. "What do I know about Ted's onboarding?" → search "Ted", search "USER_TOKEN_TED", search "PowerShell installer", synthesize across results. One broad search when the question has multiple parts guarantees incomplete coverage.

Log every search

Call log_memory_access after every archival search, without exception

Logs the exact query string, the list of passage IDs returned (empty list if nothing found), and whether the results were used in the response. Non-negotiable — it's the feedback loop that makes memory self-improving. Missing logs mean the MRI has blind spots.

Expand after searching

Check memory connections for each result

After archival search returns results, check the memory connections table for passages linked to each result. Pull linked passages in additively. Archival search always runs first — connections expand what's visible, they never replace or filter the search results.

Notecards

A lightweight, low-ceremony capture system for things worth remembering mid-session — follow-ups, reminders, things to revisit. Not tasks, not archival passages. Just text, who wrote it, and when.

Field	Values
Creator	`scott`, `ren`, `ted` — anyone in the session
Status	`active` · `completed` · `archived` (archived stays in DB but hidden from default view)
Source	`chat` (created mid-session) · `manual` (created outside chat)

Use notecards for anything that surfaces mid-conversation that neither Scott nor Ren should forget — a follow-up Scott wants, a decision thread to revisit, a question that got parked. One clear notecard is worth more than five vague ones.

Image memory

The chat UI includes an image upload button (paperclip). When Scott sends a photo or screenshot:

The image is received by chat_api.py (jpg, png, gif, webp · max 5MB).
The server calls Anthropic directly with Claude's vision capability — Letta is text-only and doesn't participate in this step.
Claude analyzes the image and returns a full description.
The description is sent to Ren as her context for that message.
The description is also written to archival memory with an [image-memory] tag and the date — searchable in future sessions.

The UI shows a thumbnail preview before the message is sent, and renders the image inline in the chat bubble after. This architecture resolves the previous limitation — earlier versions were blocked on Letta adding native vision support. By calling Claude directly for the vision step, image memory works now regardless of Letta's multimodal roadmap.

Memory health

Ren checks memory system health at every session start via the check_memory_health tool. The result covers three areas:

Capture health

DB connection status (db_connected), timestamp of the last successful write (last_write_at), and consecutive failure count (consecutive_failures). Tells Ren whether the transcript capture pipeline is running cleanly. Sourced from db.health_status() in han_solo/db.py.

Failed transitions

Count of failed memory tier transitions in the last 24 hours, with per-failure detail: from_tier, to_tier, content_key, error message, and timestamp. Surfaces if the archival write pipeline is silently dropping passages.

Pausing jobs

dream.py respects a jobs_paused flag in the han_solo_config Postgres table. Toggle it from the Memory panel in the chat UI.

What the /health endpoint actually checks

The /health endpoint at han-solo-mcp.onrender.com/health is used by Render's health check and by the workspace UI. It checks two things: whether the Ren agent ID has been resolved in memory, and whether the DB pool is connected. Both must be true for status to return "ok". If either is missing, it returns "degraded" with detail on which component failed.

Important distinction: A degraded health status means either Letta hasn't resolved the agent ID yet (normal on cold start — self-heals on first tool call) or the DB pool failed to initialize (silent failure — all writes no-op until restart). The two failure modes look identical in the status string but have very different recovery paths.