Skip to content

v1.5.0

Stable · 2026-06-04 · Previous stable v1.4.0

Three big things this release

Let me say it straight.

In v1.4.0 we made the employee more autonomous — you set a goal, it locks on, self-checks, keeps itself going. But that self-check was fuzzy: the evaluator gave a 0–1 score, and you couldn't see what was missing or how many sub-tasks were left.

This release is about three things: making autonomy verifiable, making the knowledge base self-maintaining, and making the whole system genuinely multi-user.

First, goals grew a checklist. No more fuzzy score. The employee breaks a goal into a few independently verifiable criteria and the evaluator ticks them off one by one, every turn. Done means every box ticked — no "95% is close enough." The ring around the avatar, on hover, is now a checklist you can read box by box.

Second, the Wiki learned to maintain itself. Pages interlink with [[wikilinks]]; renames and deletes cascade-fix the links; a one-click broken-link lint. Knowledge splits into a fact layer and an experience layer — change a fact page and the experience pages that depend on it auto-flag as "needs review." Page types (pageType) carry schemas and permissions, so which employee can read/write which kind of page is controlled. You can attach processing pipelines that fire when a page hits some condition. And a local directory can be mounted as a knowledge source with scheduled incremental sync.

Third, memory knows who's who. Before, an employee's memory was one big pot — whoever chatted, it all piled into the same MEMORY.md. Now every memory carries an owner_key and a visibility scope (personal / team / global). One employee serving a group keeps each person's private memory separate; third-party APIs can even pass through an end-user identity to isolate memory per end user.

Plus two medium things: each employee can bind one primary knowledge base, and model selection actually honors your preferred provider. And a pile of polish.

That's it.


1. Goals grew a checklist — from "a score" to "ticked boxes"

In v1.4.0 the evaluator returned a completion score (0–1) and a one-line "what's missing" each turn. The problem: what does 0.8 mean? Which parts are done, which aren't, how many steps remain — you couldn't see it, and the employee was deciding whether to continue off a fuzzy number too.

This release replaces that with a checklist.

A goal = a set of independently verifiable criteria. You say "deploy the blog to fly.io" and the employee (or evaluator) breaks it into concrete criteria: DNS resolves correctly, SSL cert valid, health check passes, smoke tests green. Each one is a sentence a human can read and an LLM can judge.

The evaluator has two modes:

ModeWhenWhat it does
bootstrapNo criteria yetDecomposes the goal into a checklist; each starts "not passed"
verdictCriteria existJudges each one: satisfied? with evidence

Both modes use structured output — the evaluator must return a typed object (criterion id + passed + evidence), not free text we have to parse.

Completion is now deterministic. There used to be a fuzzy threshold. Now: completion only when every criterion passes. 19 of 20 passed (a 0.95 score) is still "continue," not "done." Miss one, and one is missing.

Auto-followup targets the remaining criteria. With autoFollowup on, if the employee finishes a turn without ticking everything, the injected follow-up prompt lists the criteria still open — "5/8 done, remaining: ① … ② …, take the next step on these" — instead of a vague "continue."

The ring, on hover, is a checklist card. With no checklist it's a one-line tooltip (title + what's missing); with a checklist it's a card: title + X/Y progress, then each criterion prefixed by (open) or (green, done, struck through). While evaluating, a sand-gold breathing halo surrounds the avatar.

Evaluator SPI. The evaluation logic implements Spring AI's Evaluator interface — it does both goal-specific checklist verdicts and can be reused as a generic evaluator. Failed evaluator calls still count against the LLM budget (no free rides), so the budget accounting stays honest.

One new tool + one new API endpoint:

  • Agent tool addGoalCriterion — append a criterion to a live goal without restarting it
  • REST POST /api/v1/goals/{id}/criteria — append a criterion programmatically
  • Goal creation can carry an initial checklist directly: criteria: ["...", "..."] (skips the bootstrap round)

New config keys (mateclaw.goal.*): default-auto-followup (create-time default for auto-followup), allow-auto-followup (runtime master switch — off means no goal injects a follow-up), max-followups-per-run (hard cap on auto-followups within a single graph run, default 8).

Full details in Persistent Goals.

The v1.4.0 goal was "the employee remembers what it's doing." The v1.5.0 goal is "the employee knows exactly which boxes are still open." From a score to a checklist you can tick.


2. The Wiki learned to maintain itself

This is the heaviest chunk of the release. The Wiki grew from "a searchable knowledge base" into "a knowledge engine that maintains its own consistency, layers itself, and runs its own pipelines."

Write [[target-slug]] or [[slug|display text]] in a page body to link to another page.

  • Slug-first resolution — links match by slug exactly (case-insensitive), no fuzzy guessing. [[...]] inside fenced (```) or inline code is left alone.
  • Rename / delete cascade — rename a page's slug and in the same transaction every wikilink across the KB pointing at the old slug is rewritten, alias text preserved ([[oldslug|x]][[newslug|x]]). Cascade delete cleans up references too.
  • Broken-link lintPOST .../lint/broken-links starts an async scan job; results are persisted onto the page rows, so they survive a restart. GET .../lint/broken-links returns the aggregate (how many pages have broken links, total broken refs).
  • Clickable wikilinks in chat — wikilinks rendered in a chat answer are clickable; a cross-KB lookup navigates to the target page.

Knowledge is layered — fact vs experience

Each page can carry a knowledge layer:

  • fact — "what is": foundational fact pages (unlabeled defaults to fact)
  • experience — "what it means": synthesis, analysis, insight, which depends on a set of fact pages

Staleness propagates. An experience page declares which fact pages it depends on (edges stored by page id, so renames don't break them). When a fact page is updated during ingest, every experience page depending on it is auto-marked stale (needs review) + a reason. The wiki_stale_pages tool lists everything currently flagged.

Search can filter by knowledge layer (facts only / experience only / all).

Page types now have profiles and permissions

pageType profile (KB-scoped) — defines which page types a KB has (e.g. "concept / tutorial / decision record"), each carrying: a structured-field schema, route/create/merge-stage prompts, and a Markdown template. New pages get their metadata schema-validated on save, with the validation status recorded. At most one enabled profile per KB; unconfigured KBs use a built-in default.

pageType permissions (per-agent) — for "this agent + this KB + this page type" you can set read/create/update/delete flags plus a write policy (allow immediate / approval_required / deny). page_type='*' is the KB-wide default; exact matches beat the wildcard. Read and write fall back differently: an unmatched read falls back to the KB-level default read policy (allow_all by default, so KBs stay fully readable after upgrade); write is opt-in tightened — allow with no rules, but once any rule exists the KB is "locked down" and a page type with no matching rule resolves to deny (fail-safe).

Knowledge bases can run pipelines

Wiki Pipeline — define a processing flow for a KB, fired automatically by page events:

  • Triggers: page_type_count (a page-type count crosses a threshold), page_created (a page of a given type is created), stale_marked (pages get flagged stale)
  • Step executors: llm (run input through the model, output becomes the step result), skill (run a skill from a restricted set, as the owner agent)
  • Definitions are written in YAML or JSON, with CRUD + validate endpoints; every run and every step is persisted and queryable (.../pipelines/{id}/runs, .../pipeline-runs/{runId}), deduplicated by (definition, trigger, subject, bucket) for idempotency.

A local directory can be a knowledge source — pluggable + scheduled incremental

Ingest-Source SPI — knowledge sources are a pluggable interface (WikiIngestSourceProvider) with a built-in filesystem provider: give a KB a source_directory and files in it get ingested.

  • Scheduled incremental sync — a background scheduler (@SchedulerLock so only one node runs per cycle) scans periodically, detects changes by content hash, and re-ingests only new/modified files (text and binary).
  • Security is fail-closed — paths are normalized then toRealPath()-resolved to follow symlinks (closing TOCTOU), and validated against an allowed-roots allowlist; under the production profile an empty allowlist rejects everything by default.
  • Status + manual triggerGET .../source-watcher shows watcher status, POST .../source-watcher/scan runs a scan immediately.

New Wiki tools

  • wiki_update_pagein-place edit of a page (keeps the slug), gated by the pageType "update" permission
  • wiki_stale_pages — list every page currently flagged for review

All wiki write tools (create / update / archive / delete) now pass through pageType permission gating; read tools filter lists and search results by pageType readability (an unreadable type is treated as nonexistent — no existence leak).

The admin console gains a Wiki advanced management panel (five sub-pages: page-type profile / layers & staleness / permissions / source watcher / pipelines). Full details in LLM Wiki.


3. Memory knows who's who — per-owner isolation

Before, an employee's memory was shared: whether it was you logged into the web, a colleague in a Feishu group, or an end user coming in through a third-party API, the memory piled into the same MEMORY.md. One employee serving multiple people would cross wires.

This release gives every memory an owner and a visibility scope.

A unified owner_key. Whatever the identity source, it normalizes to one prefixed string:

Sourceowner_key
Web consoleuser:<user id>
IM channel (Feishu/DingTalk/WeCom…)<channel>:<sender id>
Third-party API (with endUserId)api:<endUserId>
System / cronsystem

Three visibility scopes:

  • PERSONAL — only the matching owner can read it. Memory extracted from conversations defaults here.
  • TEAM — everyone using this employee can read it. Agent config files (AGENTS.md / SOUL.md / PROFILE.md) and backfilled legacy data live here.
  • GLOBAL — always visible across employees/workspaces. Preset facts, system reference material.

Recall prefers personal memory. The system prompt bakes in only the shared TEAM/GLOBAL memory (cacheable); each turn then prefetches that owner's personal memory files by owner_key — so when someone asks "what stack does my project use," the employee recalls that person's private memory first, not generic KB material. (The fact recall query supports owner-visibility filtering too; per-owner automatic fact projection is still being filled in.)

Third-party APIs can pass through an end-user identity. /api/v1/chat and /api/v1/chat/stream request bodies gain an optional endUserId field (a string, to preserve large-integer precision). One PAT-authenticated integration represents one MateClaw user but can pass a distinct endUserId per end user, and memory isolates per end user automatically.

It's a feature flag. The master switch is mate.memory.lifecycle-mediator-enabled — the bare Java-property default is false, but the application.yml shipped with the release sets it to true, so isolation is on in a default install. To go back to the old shared behavior (all writes to TEAM), set it to false explicitly in your config.

Migration V137 adds owner_key + scope columns to mate_workspace_file / mate_memory_recall / mate_fact, backfilling legacy rows as TEAM (so no memory gets hidden on upgrade). Full details in Memory.


4. Each employee binds one primary knowledge base

The agent editor gains a "Knowledge Base" tab where you can designate a primary knowledge base per employee.

  • The KB is still a workspace-shared resource. Binding doesn't change the KB's ownership or visibility — other employees can still access the same KB.
  • "Primary KB" is just a default. It tells the wiki tools: when a call doesn't name a kbName / kbId explicitly, default to this KB. Unset falls back to the workspace's most recently updated KB.

Stored on mate_agent.primary_kb_id (migration V130). Versions before 1.5.0 stored this on mate_wiki_knowledge_base.agent_id (one-to-one, exclusive); V130 backfills the old value into primary_kb_id and keeps the old column as a read-only fallback. See API Reference and Agent Engine.


5. Model selection actually honors you

Preferred provider drives the primary model. Once you set preferred providers on an employee (mate_agent_provider_preference, ordered by sortOrder), primary-model selection actually follows that preference. The full precedence:

  1. A conversation-pinned model wins — the chat header bound a model to this conversation
  2. then the per-agent model override (modelName) — the employee has a model pinned on it
  3. then the global default model
  4. only when none of those are set does preferred-provider routing kick in — picking the preferred provider's primary model

Preferred-provider routing has a capability gate: if the employee's bound skills declare a need like requires-model: vision, routing first picks a provider that can satisfy those modalities; only if none can does it fall back unconstrained.

New Claude Opus 4.8 model entries (migration V131): Anthropic-direct claude-opus-4-8 / claude-opus-4-8-fast, the OpenRouter passthrough equivalents, and a Claude Code OAuth entry. Both variants support the xhigh thinking tier. Not set as the default — admins designate one explicitly after install.

See Model Configuration.


A few more things

Chat experience:

  • Execution-plan & tool-call detail viewer — every plan step and every tool-call row gets a "view details" icon on the right; click it for a frosted-glass dialog showing the full request arguments and response output (the parts the inline preview truncates), with copy buttons. The data lives in message metadata, so it survives a page reload.
  • /skill slash menu in chat — type / in the composer to open a searchable skill picker: ↑↓ to move, Enter/Tab to select, Esc to close. Selecting inserts Use the "skill name" skill: into the box; you add context and send, and the employee runs load_skill to pull it. The list comes from GET /api/v1/skills/enabled (including MCP/ACP-derived virtual skills).

Generated files survive restarts (#243):

  • Tool-generated files (documents/images/audio…) are now persisted to disk under data/generated-files/, with a 7-day retention window + a 6-hour cleanup sweep, plus an in-memory LRU on top. Download links keep working after a restart and are no longer bounded by the old 10-minute in-memory window.
  • The frontend intercepts /api/v1/files/generated/{id} downloads via a global click delegator: success goes through an authenticated fetch → blob download; failure (404/410/expired) shows a toast — a dead link no longer wedges the whole SPA.

Channel / model reliability:

  • MCP tool read timeout default 30s → 60s (#247) — a single callTool round-trip that legitimately runs longer no longer gets cut off; each MCP server stays individually tunable in the UI (5–300s).
  • Shared inbound media pipelineWeChat and WeCom are currently wired onto a shared inbound-media downloader + magic-byte type detection + exponential-backoff retry (other IM channels to follow). File types are decided from content bytes (no more hardcoded image/*); HEIC/WEBP/DOCX/XLSX and friends are detected correctly.
  • Feishu: follow-up text auto-carries recent files (#201) — send a file in a Feishu chat first (even without @-mentioning the employee), then a text message, and the cached files are auto-attached as content parts for the employee (5 files per chat, 60-minute TTL).
  • DashScope tool calls fixed — the web_search tool got an internal rename (DashScope's native protocol reserves search and rejects the whole request); plus error classification was tightened — an illegal-tool-name InvalidParameter is no longer misread as "model not found" and used to evict a healthy model from the failover pool.
  • Plan-Execute triage rebalanced — whether to split into multiple steps is now decided by "does the goal decompose into clearly independent sub-tasks," not by difficulty; multi-part goals are no longer forced into single-step mode.

Full list: git log v1.4.0..HEAD.


Upgrade path

Config is fully compatible. All your agents / skills / wikis / channels / cron / workflows / triggers / goals stay put across the upgrade.

New schema migrates automatically via Flyway. Goal checklist column (mate_agent_goal.criteria, V140), memory owner/scope (V137), per-agent primary KB (V130), Claude 4.8 model entries (V131), Wiki broken-links/layers/page-types/permissions/pipelines (V129, V133–V136), MCP default timeout (V139) all migrate on first startup; existing databases auto-baseline.

If you're already running v1.4.0 in production:

  • Goal checklists are additive — existing goals keep running on the old logic; only new goals use the checklist. With completion now meaning "all boxes ticked," auto-followup is more rigorous about finishing the remaining criteria — it stops when the budget is exhausted, so behavior stays bounded.
  • Memory isolation defaults on — the mate.memory.lifecycle-mediator-enabled value shipped with the release is true, so after upgrade conversation extraction writes per-owner PERSONAL memory and recall filters by owner_key. To keep the old shared behavior, set it to false explicitly (the bare Java-property default is false).
  • Wiki permissions default open — an agent with no pageType rules for a KB is fully read/write (old behavior). To lock a KB down, add rules in the Wiki advanced panel.
  • Source watcher follows config — it only scans a KB that has a source_directory set and the watcher enabled; under the production profile set the mate.wiki.allowed-source-roots allowlist.
  • MCP read timeout default becomes 60s — existing MCP server rows keep their stored value; new ones default to 60s.

What this release means for you

If you're a regular user

Set a goal with a checklist. "Translate this article, publish it, reply to comments" — it breaks it into a few criteria, ticks them off where you can see, and stops only when all are ticked. Hover the ring on the avatar to see which boxes are still open.

If you're building a knowledge base

Mount a local document directory as a knowledge source with scheduled incremental sync; weave pages together with [[wikilinks]] so renames and deletes don't leave dead links; mark "conclusions that move with the facts" as the experience layer, and a fact change auto-prompts you to review them.

If you manage a team

Turn on memory isolation so one employee serves the whole group without crossing private memories; use pageType permissions to control who can write which kind of KB page.

If you're a developer

POST /api/v1/goals/{id}/criteria to append criteria programmatically; pass endUserId to /api/v1/chat to isolate memory per end user; Wiki pipelines + the Ingest-Source SPI to wire the KB into your own data flow.

If you run production

Upgrade. Disk-persisted generated files make download links survive restarts; the 60s MCP timeout stops slow tools getting cut; the DashScope tool-call and error-classification fixes make tool use on qwen-family models actually work.


One more thing.

Checklist. Layers. Owner.

An employee deciding "close enough" off a fuzzy score is estimating.An employee that splits the goal into criteria, ticks each, and stops only when all are ticked is doing acceptance.

Same with the knowledge base. A pile of searchable documents is a warehouse. A web of pages that interlink, layer themselves, and prompt you to review conclusions when a fact changes — that's knowledge.

Same with memory. One pot everyone stirs into will cross wires eventually. A memory that recognizes "whose memory is this" is one you can finally give to a group.

What this release did is make autonomy verifiable, knowledge self-maintaining, and memory personal.