Architecture

How it works

Five pieces working together — each with a distinct role. Understanding the stack means you know what breaks and why, and what it costs.

The stack

Service	What it does	Cost
han-solo-db	PostgreSQL 16 + pgvector on Render. Stores everything — Letta agents, memory blocks, archival passages, conversation history. The source of truth for all of Ren's memory.	$7/mo
han-solo-letta	Letta v0.16.7 on Render. The AI memory runtime — manages Ren as a MemGPT agent, handles tool calls, conversation storage, and memory compression. Ren's brain lives here.	$7/mo
han-solo-mcp	FastMCP server on Render. The bridge between Claude Code and Letta — 15 tools for reading/writing Ren's memory, plus the chat UI and REST API. Deployed from GitHub on every push.	$7/mo
Anthropic API	Claude Haiku 4.5 — the model powering Ren's responses. Every message you send hits this API. Pay-per-token, billed separately from the Claude Pro subscription.	~$0.003–0.01/message
Voyage AI	Embedding model (voyage-3, 1024-dim) for archival memory search. When Ren searches her memory, Voyage converts the query to a vector and finds semantically similar passages.	Minimal

Total fixed cost: ~$21/month for Render (3 services). Anthropic API is variable — $50 in credits lasts months at normal usage. The Claude Pro/Max subscription is separate and covers claude.ai and Claude Code only.

What happens when you send a message

Every message in the chat UI goes through this chain:

Chat UI (browser) — you type a message and hit Send.
han-solo-mcp — receives the message via POST /api/send, authenticates your bearer token, forwards to Letta.
Letta — receives the message, loads Ren's core memory blocks into context (always_loaded_core, pending_thoughts, portraits), and sends everything to Anthropic.
Anthropic API — Claude Haiku generates Ren's response, potentially calling tools (archival search, memory write, web fetch).
Letta — stores the exchange in PostgreSQL, returns the response.
han-solo-mcp — passes the response back to the UI.
Chat UI — renders Ren's reply in gold.

Tool calls (when Ren searches memory, writes to a block, or fetches a page) add extra round-trips to Anthropic. Each tool call is a separate LLM inference — this is why sessions with heavy archival searching or page fetching cost more.

Ren's memory layers

Core memory blocks (always loaded)

Six blocks loaded into every prompt. This is the baseline context cost — roughly 12,000 characters that Ren carries into every message whether she uses it or not.

Block	What it holds	Limit
`always_loaded_core`	Framework context, working norms, Scott's profile summary, memory use instructions, session close-out ritual	10,000 chars
`pending_thoughts`	Session brief — what happened last session, what's open, what's next. Written nightly by the dream script.	8,000 chars
`scott_portrait_forming`	Ren's evolving interpretation of Scott — how he thinks, what he cares about, specific observations	20,000 chars
`ren_portrait_forming`	Ren's self-portrait — what she got right, what she missed, what she wants to develop	20,000 chars
`seed_signals`	Early-session observations, dated signals, relational notes not yet moved to archival	20,000 chars
`project_state`	Current in-flight project context (JSON). Used when a specific project is active.	10,000 chars

Archival memory (searchable)

108+ passages stored as vector embeddings in pgvector. Ren searches this with archival_memory_search when she needs context that isn't in her core blocks. Holds project summaries, ren-memory file chunks, MemPalace drawers, and session-captured insights.

Archival search is triggered by Ren's judgment — she searches when she knows she might be missing something. You can ask her to search explicitly too.

Conversation history

Every message stored in PostgreSQL. Letta loads recent messages into context on each send. This is what fills the context window over time and causes the crash if not managed.

Context window: 200,000 tokens (Haiku's limit). Core blocks use ~3,000 tokens. Each message exchange uses 500–2,000 tokens. Heavy page fetches can use 10,000+ tokens in a single exchange. Auto-rollover fires at 50 messages to stay well clear of the limit.

Session rollover

The context window crash that happened on 2026-05-13 (the day this docs site was built) exposed a gap: no way to reset a Letta conversation without losing everything. The fix is now live.

When the session rolls over — either automatically at 50 messages, or manually via the "New session" button:

All six core memory blocks are copied to a new Letta agent.
The old agent is left intact (its conversation history is readable for recovery).
The new agent becomes active for all subsequent messages.
The chat UI clears and shows a "Session refreshed — memory intact" divider.

Ren's memory is never lost in a rollover — only the raw conversation thread resets. The nightly dream captures session content before that happens.

The nightly dream

Every night at 2:00am, a script on Scott's Mac sends Ren a structured reflection prompt. She:

Searches her conversation history for the day's exchanges
Searches archival memory for relevant context
Writes a fresh session brief to pending_thoughts
Adds portrait signals for Scott and herself if anything worth noting happened
Checks the Letta GitHub releases page for updates newer than the current version

The dream runs via launchd (com.scotth.rendream.plist) and logs to ~/Developer/han-solo/logs/dream.log. No external dependencies — stdlib Python only.

Claude Code integration

Claude Code connects to Han Solo via the MCP server at han-solo-mcp.onrender.com/mcp. This gives Claude Code 15 tools:

Memory tools — read/write core blocks, insert archival passages
Portrait tools — read/write Scott, Ted, and Ren portraits; add portrait signals
Phase tools — project state management
Brief tools — session brief read/write
Signal tools — append dated observations

Claude Code uses these tools to keep Ren's memory current after sessions — writing session summaries, portrait signals, and project state updates without requiring manual copy-paste.

Deployment

All three Render services deploy from github.com/scoots31/han-solo. A push to main triggers automatic redeploy of han-solo-mcp (the only service with code that changes). Letta and the database are stable services that rarely need touching.

Detailed deployment notes — 16 challenges logged and resolved — are in ~/Developer/han-solo/DEPLOYMENT.md. Read that before touching the stack.