AI Memory System

Memory is how the system gets better at knowing you.

Everything else in MateClaw is static the moment you configure it. Agents, tools, knowledge bases — they change when you change them. Memory is the one part that changes on its own, as a byproduct of actual use. That's the whole point.

Your AI dreams about you while you sleep

That's not a marketing line. It's literal code in the memory/dreaming/ package.

Every night at 2 AM (default; configurable) a scheduled job runs — its name is Dreaming. It walks every agent's conversation trail from the day, consolidates scattered signals into a coherent understanding of you, filters out one-offs and contradictions and stale facts, promotes recurring patterns into MEMORY.md, and appends "what it saw, what it concluded, what it rewrote" to DREAMS.md — a human-readable audit trail of how memory got to where it is today.

When you open MateClaw the next morning, it picks up where yesterday left off — not from zero.

Every other AI starts each day from scratch. MateClaw continues from where yesterday ended.

This page covers the four layers that make up memory, the files the system writes for each agent, and how agents themselves read and write those files during a conversation.

The four layers

  ┌────────────────────────────────────────────────────────────┐
  │  1. This turn                                                │
  │     What you're saying, what was just said, auto-trimmed     │
  │     to the model's token budget                              │
  │     Updated: every turn                                      │
  └────────────────────────────────────────────────────────────┘
                            │
                            ▼ (after conversation completes)
  ┌────────────────────────────────────────────────────────────┐
  │  2. Post-chat extraction                                     │
  │     Pulls the worth-keeping bits out of the conversation,    │
  │     writes them into PROFILE.md / MEMORY.md / today's note   │
  │     Updated: asynchronously, after each meaningful chat      │
  └────────────────────────────────────────────────────────────┘
                            │
                            ▼ (daily at 2:00 AM, configurable)
  ┌────────────────────────────────────────────────────────────┐
  │  3. Nightly consolidation (Dreaming)                         │
  │     Scans recent daily notes, finds recurring patterns,      │
  │     merges them into MEMORY.md, logs the run in DREAMS.md    │
  │     Updated: scheduled; manual trigger available             │
  └────────────────────────────────────────────────────────────┘
                            │
                            ▼ (next conversation picks up the latest)
  ┌────────────────────────────────────────────────────────────┐
  │  4. Workspace files as system prompt                         │
  │     The four markdown files are injected every turn          │
  │     Updated: file changes take effect on the next turn       │
  └────────────────────────────────────────────────────────────┘

Each layer operates at a different timescale. Short-term is this turn. Extraction is after each conversation. Consolidation is nightly. Workspace file injection is every turn uses whatever's current. Together they form a loop — what you say becomes context, context becomes files, files become system prompt, system prompt becomes what the agent knows tomorrow.

Memory knows who's who: per-owner isolation (1.5.0)

Before, an employee's memory was shared: whether it was you logged into the web, a colleague in a Feishu group, or an end user coming in through a third-party API, the memory piled into the same MEMORY.md. One employee serving multiple people would cross wires.

1.5.0 gives every memory an owner and a visibility scope.

A unified owner_key

Whatever the identity source, it normalizes to one prefixed string:

Source	owner_key
Web console	`user:<user id>`
IM channel (Feishu / DingTalk / WeCom…)	`<channel>:<sender id>`
Third-party API (with endUserId)	`api:<endUserId>`
System / cron	`system`

Three visibility scopes

scope	Who reads it	Typical content
PERSONAL	Only the matching owner	Memory extracted from conversations defaults here
TEAM	Everyone using this employee	Agent config files (AGENTS.md / SOUL.md / PROFILE.md), backfilled legacy data
GLOBAL	Always visible across employees / workspaces	Preset facts, system reference material

Recall prefers personal memory

The system prompt bakes in only the shared TEAM/GLOBAL memory (cacheable); each turn then prefetches that owner's personal memory by owner_key. So when someone asks "what stack does my project use," the employee recalls that person's private memory files first, not generic KB material.

On the structured "fact" layer: the fact recall query itself supports owner-visibility filtering (PERSONAL is owner-only, TEAM/GLOBAL shared). But the current automatic fact projection is built mainly from shared memory files and doesn't set ownerKey/scope on insert — so personalization shows up more in the personal-memory-file prefetch; per-owner facts are still being filled in.

Third-party APIs pass through an end-user identity

/api/v1/chat and /api/v1/chat/stream request bodies gain an optional endUserId field (a string, to preserve large-integer precision). One PAT-authenticated integration represents one MateClaw user but can pass a distinct endUserId per end user, and memory isolates per end user automatically.

It's a feature flag

The master switch is mate.memory.lifecycle-mediator-enabled.

Mind the default

The Java property's bare default is false, but the application.yml shipped with the release sets it to true — so per-owner isolation is on by default in a default install. To go back to the old shared behavior (all writes to TEAM), set it to false explicitly in your config.

When on: conversation extraction writes to the owner's PERSONAL memory and recall filters by owner_key; when off, all writes fall back to shared TEAM. Multi-tenant instances stay on; single-user deployments can turn it off.

Under the hood: migration V137 adds owner_key + scope columns to mate_workspace_file / mate_memory_recall / mate_fact, backfilling legacy rows as TEAM (so no memory gets hidden on upgrade). Memory tools like remember resolve owner_key from the current request context — when the flag is on they write to that owner's PERSONAL memory, when off they fall back to shared writes.

Multi-layer memory with pluggable providers

The memory layer is not one hard-coded implementation. It's an interface — the multi-layer architecture lets you stack providers:

The default provider is the workspace-file-based memory described in the rest of this page. It ships with MateClaw, and for most people it's all they'll ever need.
Custom providers can be dropped in for specialized retrieval — vector-based long-term memory, graph memory, external memory services.
Layering means a single agent can talk to multiple providers at once. A short-term provider returns recent context; a semantic provider returns related memories; a Wiki provider returns authoritative references. They compose at read time.

For most agents, default is enough and you should ignore this section. If you're building something specialized — an agent that needs to remember thousands of facts with vector search, an agent that needs graph-structured memory — this is where you plug in. See Architecture.

The four files every agent has

Every agent has its own workspace. Four markdown files form the backbone of long-term memory:

workspace/{agentId}/
├── AGENTS.md          # How the agent uses memory — behavior guide
├── SOUL.md            # Who the agent is — core identity, personality, boundaries
├── PROFILE.md         # Who you are — user profile, preferences, background
├── MEMORY.md          # What matters — key decisions, project context, todos
└── memory/
    ├── 2026-04-09.md  # Daily notes — what happened today, append-only
    ├── 2026-04-10.md
    └── 2026-04-11.md

The first four are injected into the system prompt on every turn (if enabled=true). Daily notes are not — they feed consolidation instead.

What each file is for

AGENTS.md — the agent's user manual for itself. When to write memory, what goes where, what tools are available. Seed: enabled=true, sort_order=0.
SOUL.md — who the agent fundamentally is. Self-awareness, evolution guidance, privacy and boundary principles. Edit when you want to change the agent's character at a deep level. Seed: enabled=true, sort_order=1.
PROFILE.md — what the agent has learned about you. Name, occupation, tech stack, communication preferences. Updated by the extractor when conversations reveal something durable. Full-replace writes. Seed: enabled=true, sort_order=2.
MEMORY.md — what the agent has decided matters enough to keep. Active projects, unresolved decisions, open threads, things you asked it to remember. Updated by both the extractor and the consolidator. Seed: enabled=true, sort_order=3.

New in 1.3.0: workflows can write memory

From v1.3.0, the workflow write_memory step can write the run's output directly into an employee's MEMORY.md (or any enabled memory file) when the flow completes. Four merge strategies: append / replace_section / upsert_kv / overwrite. Memory is no longer written exclusively by the conversation extractor or the Dreaming consolidator — a business-process outcome can be persisted too.

Daily notes

Conversation highlights archived by date, in append mode — multiple conversations in one day concatenate into the same file. Not injected into the system prompt (enabled=false). They exist so the consolidator has something to scan at 2 AM.

Short-term: the context window

Before every LLM call, MateClaw builds the prompt that actually gets sent:

[System Prompt]                        ← Always first
[Workspace file injection]             ← AGENTS / SOUL / PROFILE / MEMORY
[Conversation context summary]         ← Only if earlier turns got compressed
[Message 1: user]
[Message 2: assistant]
...
[Current user message]                 ← Always last

Workspace files are injected sorted by sort_order, formatted as:

--- AGENTS.md ---
(content)

--- SOUL.md ---
(content)

--- PROFILE.md ---
(content)

--- MEMORY.md ---
(content)

Only files with enabled=true are included.

When context gets too big

Three-stage defense:

Stage 1 — proactive compression. When estimated total exceeds 75% of the budget (default window 128k tokens), the system calls the LLM to summarize earlier turns. The most recent 2 turns (4 messages) survive verbatim. The summary is cached for 30 minutes.

Stage 2 — emergency recovery. If the LLM still returns context-too-large, the system stops calling the LLM. It discards older messages, keeps the last 2 turns, and retries once.

Stage 3 — hard trim. If tokens are still over budget, messages drop from the front until the prompt fits. The last 2 messages are always preserved.

Security design — the summary is injected as a user message, not a system message. Deliberate: preventing compressed historical user input from being elevated into system-level instructions eliminates an injection vector.

Configuration

yaml

mate:
  agent:
    conversation:
      window:
        default-max-input-tokens: 128000   # Global max
        compact-trigger-ratio: 0.75        # Compression trigger
        preserve-recent-pairs: 2           # Turns preserved verbatim
        summary-max-tokens: 300            # Compression budget

Post-chat extraction

After a conversation ends, the system asynchronously pulls out what's memorable and writes it to PROFILE.md, MEMORY.md, and the day's daily note. This happens off the user-response path — it never blocks the next turn.

What triggers it

After a turn completes, the system handles extraction on a background thread. A few preconditions must pass before it actually runs:

Auto-summarize is on
The conversation wasn't itself triggered by the consolidation cron job (avoids recursion)
Message count meets the minimum (default 4)
The last user message is long enough (default at least 10 chars)

All pass — extraction begins.

Concurrency control

Cooldown — same agent won't extract twice within 5 minutes (default)
Per-agent lock — if an extraction is already running for this agent, the new request is skipped

What the LLM actually does

Load conversation messages
Read current PROFILE.md, MEMORY.md, today's daily note
Build a transcript: up to 30 messages, each truncated to 2000 chars
Call the LLM with the memory-summarize prompt templates
Parse the JSON response
Apply writes

LLM response schema

Field	Type	What it does
`should_update`	boolean	Whether memory needs updating
`reason`	string	Why (for audit)
`daily_entry`	string	Content to append to today's daily note
`memory_update`	string	Full new content for MEMORY.md
`profile_update`	string	Full new content for PROFILE.md

File write rules

PROFILE.md — full replace, only if profile_update is non-empty
MEMORY.md — full replace, only if memory_update is non-empty
memory/YYYY-MM-DD.md — append, created with date heading if missing

Consolidation and dreaming

The third layer runs on a schedule. Its job is to watch daily notes pile up and periodically ask: what's the pattern here, what should be promoted into core memory, what's stale and should be forgotten?

What it does

Lists the agent's memory/*.md files, takes the most recent 7 days
Reads those + the current MEMORY.md
Calls the LLM with the consolidation prompt templates
The LLM returns {should_update, reason, memory_content}
If should_update is true, MEMORY.md is fully replaced

Trigger methods

Automatic — every agent has a row in the system's scheduled jobs, set to run nightly at 2 AM
Manual — POST /api/v1/memory/{agentId}/emergence

Why it's not recursive

Consolidation triggers a "conversation" through the agent. Without protection, that conversation would re-trigger the post-chat extraction listener, which would trigger another conversation, ad infinitum.

The event carries a trigger-source flag. The extraction listener sees that the conversation was started by the consolidation job and skips it.

DREAMS.md — the consolidation diary

Each consolidation run appends a short entry to workspace/{agentId}/DREAMS.md:

what it looked at
what patterns it found
what changed in MEMORY.md
the date

Human-readable audit trail — open DREAMS.md and see how the memory got to its current state. Caps its own growth; old entries get summarized when the file exceeds a threshold.

Scored emergence and recall tracking

Consolidation tracks:

Which memory entries were actively recalled in recent conversations — read patterns feed back into importance
Scored emergence — candidate patterns ranked by frequency + recency + explicit recall, only high-scoring ones make it into MEMORY.md
Multi-gate filtering — low-signal extractions (one-off mentions, contradictions, things the user later corrected) get filtered before becoming memory
Dreaming status API — GET /api/v1/memory/{agentId}/dreaming/status

Full lifecycle (opt-in via flag)

Memory grows from "dream nightly" to a complete turn-by-turn lifecycle. This behavior lands behind feature flags — default off in the open-source build, on in production builds.

What it does:

Every turn is bookkept — the system takes notes at the start and end of every turn, not just at nightly consolidation
Fact projection — conversations are projected into structured "fact" rows the agent can query. Trust scoring + decay built in.
Structured nightly report — consolidation produces a full report; you can re-consolidate by topic on demand
Morning card — the first conversation of the day surfaces yesterday's report; you Confirm / Edit / Forget each fact
Contradiction inbox — when new facts conflict with old ones, you get a queue instead of silent overwrites
Explicit forget — say "forget that," and it actually forgets, everywhere
Feedback scoring — thumbs up/down on retrieved facts feeds back into trust
SOUL auto-evolution — the agent's persona file rewrites itself from accumulated facts
Monthly archive — old reports roll into a compressed monthly archive, browsable in the timeline
Memory Browser — timeline, facts, contradictions, diff viewer, and a trust bar across the top

Enable in application.yml:

yaml

mateclaw:
  memory:
    dream-v2:
      enabled: true
      fact-projection: true
      contradictions: true
      morning-card: true

Agents reading and writing their own memory

Memory isn't just something that happens to an agent. The agent itself can actively read and write its own files during a conversation, through a set of workspace memory tools:

Method	What it does
`list_workspace_memory_files`	List files, optional filename prefix filter, sorted by `sort_order`
`read_workspace_memory_file`	Read a specific file's content
`write_workspace_memory_file`	Create or overwrite a file (full replace)
`edit_workspace_memory_file`	Find-and-replace edit (incremental, `replaceAll` supported)

Keyword search over its own memory

New in 1.4.0

An employee can do more than read whole files — during a conversation it can search all of its workspace memory files by keyword and jump straight to the line.

This is an agent runtime capability: the employee supplies a keyword and the system searches across its own workspace memory files:

Tokenization — CJK is split into 2-character sliding windows, Latin text on whitespace, so both languages match
Per-file weighted scoring — hits in core files like AGENTS.md / MEMORY.md / PROFILE.md rank above hits in the daily ledger
What comes back — each hit gives a filename + line number + an 80-char context snippet (matched term highlighted) + a relevance score
Scan scope — up to ~50 candidate files, sorted by score, highest first

Use it when the employee wants to confirm "did I note this before?" or recover a specific decision spread across many days of notes — without pulling whole files into context.

Examples

List:

json

// in
{"agentId": 1, "filenamePrefix": "memory/"}
// out
{"agentId": 1, "count": 3, "files": [
  {"filename": "memory/2026-04-09.md", "enabled": false, "fileSize": 512},
  ...
]}

Read:

json

// in
{"agentId": 1, "filename": "MEMORY.md"}
// out
{"agentId": 1, "filename": "MEMORY.md", "enabled": true, "content": "..."}

Edit:

json

// in
{"agentId": 1, "filename": "MEMORY.md", "oldText": "old", "newText": "new"}
// out
{"agentId": 1, "filename": "MEMORY.md", "replacements": 1}

Safety rules

.md files only
No absolute paths, no .. directory traversal
write is a full overwrite — read first if you care about existing content
Newly created files have enabled=false by default

Memory snapshot export / import

New in 1.4.0

An employee's entire accumulated memory can be packaged into a ZIP and taken with you — for backup, migration to another deployment, or cloning a coworker who "already knows you."

A snapshot packages an employee's core memory into a single ZIP:

AGENTS.md / MEMORY.md / PROFILE.md / SOUL.md / KNOWLEDGE.md
daily ledger files (memory/YYYY-MM-DD.md)
a manifest.json (what's in the package, and which employee it came from)

Three endpoints

Method	Path	Role	What it does
GET	`/api/v1/agents/{agentId}/workspace/memory/export`	Viewer	Export the ZIP — even read-only access can take a backup
POST	`.../workspace/memory/import/preview`	Member	Dry run: parse the ZIP, classify each file as create / update / skip, write nothing
POST	`.../workspace/memory/import`	Member	Apply the import, written atomically

Preview to see the diff, confirm, then import — you always know what will change before it does.

Safety guards

Whitelist — only the file types listed above are accepted; everything else is ignored
Zip-bomb guards — ≤ 500 entries, ≤ 1 MB each (uncompressed), ≤ 16 MB total; anything over is rejected
UI toggle state is not serialized — enabled / sortOrder are kept out of the snapshot; on import into a new employee the target decides them by seed rules, rather than forcing the source's toggle state

UI

The Agent Context page right panel has Export / Import buttons
Import shows a diff first (what's created, overwritten, skipped) and only writes after you confirm

Configuration reference

Memory extraction & consolidation

yaml

mate:
  memory:
    # --- Automatic extraction ---
    auto-summarize-enabled: true
    min-messages-for-summarize: 4
    min-user-message-length: 10
    skip-cron-conversations: true
    summary-max-tokens: 1000
    max-transcript-messages: 30

    # --- Concurrency ---
    cooldown-minutes: 5

    # --- Consolidation / dreaming ---
    emergence-enabled: true
    emergence-day-range: 7

    # --- per-owner memory isolation (1.5.0) ---
    # The value shipped with the release is true (on): conversation extraction writes to the owner's
    # PERSONAL memory and recall filters by owner_key. Set false for the old shared behavior (all writes
    # to TEAM). The bare Java-property default is false.
    lifecycle-mediator-enabled: true

Prefix: mate.memory.

Context window

yaml

mate:
  agent:
    conversation:
      window:
        default-max-input-tokens: 128000
        compact-trigger-ratio: 0.75
        preserve-recent-pairs: 2
        summary-max-tokens: 300

API endpoints

Method	Path	Purpose
POST	`/api/v1/memory/{agentId}/emergence`	Manually trigger consolidation
POST	`/api/v1/memory/{agentId}/summarize/{conversationId}`	Manually trigger extraction
GET	`/api/v1/memory/{agentId}/dreaming/status`	Last run, next run, latest DREAMS.md entry

For developers extending the memory layer, see Architecture.

Agents — how agents use memory during a turn
LLM Wiki — the deliberate knowledge layer, contrasted with passive memory
Tools — the workspace memory tool is one of many
Configuration — full config reference
Architecture — backend code organization, SPI extension points

AI Memory System ​

The four layers ​

Memory knows who's who: per-owner isolation (1.5.0) ​

A unified owner_key ​

Three visibility scopes ​

Recall prefers personal memory ​

Third-party APIs pass through an end-user identity ​

It's a feature flag ​

Multi-layer memory with pluggable providers ​

The four files every agent has ​

What each file is for ​

Daily notes ​

Short-term: the context window ​

When context gets too big ​

Configuration ​

Post-chat extraction ​

What triggers it ​

Concurrency control ​

What the LLM actually does ​

LLM response schema ​

File write rules ​

Consolidation and dreaming ​

What it does ​

Trigger methods ​

Why it's not recursive ​

DREAMS.md — the consolidation diary ​

Scored emergence and recall tracking ​

Full lifecycle (opt-in via flag) ​

Agents reading and writing their own memory ​

Keyword search over its own memory ​

Examples ​

Safety rules ​

Memory snapshot export / import ​

Three endpoints ​

Safety guards ​

UI ​

Configuration reference ​

Memory extraction & consolidation ​

Context window ​

API endpoints ​

Next ​

AI Memory System

The four layers

Memory knows who's who: per-owner isolation (1.5.0)

A unified owner_key

Three visibility scopes

Recall prefers personal memory

Third-party APIs pass through an end-user identity

It's a feature flag

Multi-layer memory with pluggable providers

The four files every agent has

What each file is for

Daily notes

Short-term: the context window

When context gets too big

Configuration

Post-chat extraction

What triggers it

Concurrency control

What the LLM actually does

LLM response schema

File write rules

Consolidation and dreaming

What it does

Trigger methods

Why it's not recursive

DREAMS.md — the consolidation diary

Scored emergence and recall tracking

Full lifecycle (opt-in via flag)

Agents reading and writing their own memory

Keyword search over its own memory

Examples

Safety rules

Memory snapshot export / import

Three endpoints

Safety guards

UI

Configuration reference

Memory extraction & consolidation

Context window

API endpoints

Next