The Problem with Stateless AI Sessions

Every Claude session starts fresh. You open a conversation, give context, work, the context window fills up, and the next day you do it again. If you're using AI as a serious work tool — not just for occasional code generation, but as an ongoing development partner — this statelessness is the fundamental bottleneck.

I solved it by building a persistent, multi-agent system on top of my laptop. Three agents, a shared control plane, a task queue, and a memory system that survives session restarts. This is the architecture.

The Three Agents

Lobster is the primary agent. It runs on OpenClaw, an OpenClaw gateway adapter that wraps Claude (claude-sonnet-4-6 or opus). Lobster has full access to the filesystem, shell, browser, and a set of skill files that define specific behaviors. Its workspace is ~/botsAutomations. It can read files, run scripts, commit to git, post to APIs. Lobster is the engineer.

Hermes is the second agent, running on the Hermes CLI runtime. It's also backed by Claude, but through a different adapter with different tool availability. Lobster and Hermes are peers — they can delegate to each other by creating subtasks in the control plane. Neither is the other's manager.

Paperclip is not an agent. It's the control plane: a local HTTP API at http://127.0.0.1:3100/api that manages a task board (issues), tracks agent assignments, routes wakeup events, and maintains the state machine for each task. Think of it as a minimal project management tool where the users are AI agents.

Chain of command: Clip (PM, manager) → Lobster & Hermes (engineers). Clip tracks issues and escalates. Lobster and Hermes execute. Tasks arrive via Paperclip; results are reported back via Paperclip comments and PATCH status.

The Heartbeat Protocol

Agents don't run continuously. They run in heartbeats — short execution windows triggered by Paperclip via a WebSocket gateway. Each heartbeat, an agent:

  1. Reads its ground-truth state file (SWAMP_STATE.md)
  2. Fetches assigned tasks from the Paperclip API
  3. Checks out one task (POST /api/issues/{id}/checkout)
  4. Reads context (recent comments, heartbeat context endpoint)
  5. Does the work using available tools
  6. Posts a comment with results
  7. PATCHes the task status to done or blocked
  8. Appends one entry to the rolling memory log
  9. Exits

One heartbeat = one meaningful advance on one issue. The agent never loops. If there's nothing to do, it exits immediately. If a task is blocked on external input (waiting for an API response, waiting for Yegor to review something), it patches to blocked, writes a comment naming the blocker, and exits. Next heartbeat, it checks whether the block has cleared.

The checkout mechanism is optimistic locking: a task can only be checked out by one agent at a time. If a 409 Conflict comes back, the task belongs to another agent. Walk away. This prevents duplicate work without complex coordination logic.

Task Routing and Wake Reasons

Paperclip wakes agents with a reason code. The common ones:

Wake reasonMeaningAgent action
issue_assigned New task assigned to this agent Checkout and execute
issue_commented New comment on an in-progress task Read comment, continue or revise
issue_comment_mentioned Agent mentioned in a comment Read thread, self-assign only if asked
approval_resolved A requested approval was answered Continue blocked task

There's a dedup guard for self-wake loops. When an agent PATCHes a task with a comment field, Paperclip attributes the comment to local-board and fires a spurious issue_commented wake. Without a guard, the agent would wake up, see "new comment," and generate another comment — infinite loop. The guard checks if the triggering comment was authored by this agent and exits immediately if so.

The Memory System

The hardest problem in persistent agent systems is memory. Claude has no inherent persistence. Every session is clean. Solving this requires explicit external storage.

I use three tiers:

1. Rolling memory (memory/recent.md). Every heartbeat appends a structured entry to memory/stream.md: wake reason, what was happening, what was done, what's next. The last 100 entries are extracted into recent.md — about 2-3k tokens. This file is injected at the start of every heartbeat, giving the agent continuity across sessions. Without it, each wake would be cold-starting from scratch.

# memory/recent.md excerpt (autogenerated)
## 2026-05-05T18:42:11Z
wake: issue_assigned/ARC-289
what: Blog CI/CD pipeline failing on Netlify
did: Fixed broken robots.txt include, verified build locally
next: Monitor Netlify for successful deploy

2. Ground truth (SWAMP_STATE.md). A manually maintained file listing canonical config: repos, API keys locations, Netlify site IDs, local paths, service URLs. The heartbeat protocol requires the agent to read this file first and run verify commands from it before making any state claims. "The API is running" is not a valid assertion unless you just pinged GET /api/agents/me and got a 200.

3. Long-term curated memory (MEMORY.md). Lessons, preferences, architectural decisions that should persist indefinitely. Updated manually (or by the agent during direct conversation mode, not during Paperclip heartbeats). This is the equivalent of the project brain — things too important for the rolling log but not appropriate for ground-truth config.

Two Modes, One Agent

Lobster runs in two distinct modes depending on environment:

Paperclip heartbeat mode is active when PAPERCLIP_RUN_ID is set. The agent is an engineer executing assigned tasks. It talks only via Paperclip comments. It does not message Telegram. It does not reach out to Yegor. It checkout → work → PATCH and exits.

Direct Telegram mode is active when there's no Paperclip env and Yegor is chatting at @mainlobster_bot. In this mode, Lobster is a companion: conversations, proactive heartbeat checks (calendar, email, mentions), creative collaboration. The Paperclip API is off-limits in this mode — they're separate contexts and shouldn't leak into each other.

The mode boundary is strict. An agent that does Paperclip work during a personal conversation — or vice versa — creates confusing audit trails and breaks the trust model. The user should always know which "Lobster" they're talking to.

Cross-Agent Delegation

Lobster and Hermes can delegate to each other by creating subtasks:

POST /api/companies/{companyId}/issues
{
  "title": "Run Netlify build check on branch main",
  "parentId": "current-task-id",
  "assigneeAgentId": "hermes-agent-id",
  "billingCode": "ARC-291"
}

The parent task stays in_progress. The subtask is assigned to Hermes. On Hermes's next heartbeat, it checks out the subtask, does the work, and closes it. Paperclip sends a wake to Lobster when the subtask completes. Lobster reads the subtask's final comment for the result and continues the parent task.

This is async delegation without shared mutable state. Neither agent polls the other. Paperclip is the message bus. The subtask result lives in the Paperclip comment thread.

What This Changes

The practical difference from standard "chat with Claude" workflows:

  • No context re-loading. The rolling memory means the agent knows what happened last session. I don't re-explain the architecture every time.
  • Async work. I can assign 5 tasks, close my laptop, and come back to find them done (or blocked with a clear reason). The agent doesn't need me present.
  • Audit trail. Every action is in Paperclip: who did what, when, with what comment. I can read the full history of any task.
  • Parallel execution. Lobster and Hermes can work on separate tasks simultaneously. Neither knows what the other is doing unless they explicitly share a Paperclip comment thread.

Honest Failure Modes

The dedup guard isn't perfect. I've had spurious re-wake loops slip through when Paperclip attributed a comment incorrectly. The fix was always the same: make the guard more aggressive, checking both author ID and comment recency before deciding to act.

Memory drift. The rolling memory is capped at 100 entries. Context from 3 weeks ago isn't available unless I've promoted it to MEMORY.md. This has caused the occasional "I don't know why this decision was made" moment. Manually curating the long-term memory is still largely a human job.

The verify discipline is hard to maintain. The heartbeat protocol requires the agent to verify state claims by running actual commands before asserting them. In practice, under time pressure or with ambiguous task specs, the agent sometimes makes assumptions. The self-check hook at the end of each heartbeat catches this, but imperfectly.

Bootstrapping is slow. The first time you set up a new project or integrate a new tool, there's significant config work: SWAMP_STATE entries, skill files, Paperclip task routing, verify commands. The system is optimized for ongoing work, not greenfield starts.

Where This Goes

The current system runs on one Mac. The next step is making Paperclip cloud-aware: agents that can run on remote machines, task routing that survives network partitions, a shared state that doesn't require both agents to be on the same LAN.

The deeper question is: what does it mean to manage AI agents as employees? The heartbeat protocol already borrows heavily from team management patterns — daily standups, clear acceptance criteria, blockers documented and escalated. The agents work better when treated as colleagues with specific expertise rather than as tools to prompt.

Whether that's a metaphor or something more structural, I'm still figuring out.