Skip to content

v1.4.0

Stable · 2026-05-23 · Previous stable: v1.3.0

Five things

Let me cut to it.

In v1.3.0 we assembled employees into business processes — workflows and triggers, a group of employees collaborating on a procedure. But the procedure was fixed by you, and each employee still stopped after one turn.

This release we put the focus back on the employee: make a single employee more autonomous, able to build and lead a sub-team, with a toolset that scales with the task — then make the whole system multi-user and turn Feishu into a first-class citizen.

One — Persistent Goals are in. State a goal once; the employee locks it, self-evaluates every turn, keeps itself going until done or out of budget. Two — Subagent delegation became a tree. Employees delegate to employees, three levels deep; plus async delegation and a "digital-employee builder" that spins up a whole team from one sentence. Three — Progressive tool/skill disclosure. Only core tools are advertised by default; the employee calls enable_tool / load_skill when it needs more. However many tools you install, the context doesn't blow up. Four — Workspace RBAC. Owner / Admin / Member / Viewer, four roles plus capability gating; menus and endpoints close down by role. The first time MateClaw is usable by a team. Five — Feishu is a first-class citizen. Interactive cards, approval cards, streaming cards, voice transcription, file/audio/video both ways, channel-native tools — anything you can do in Feishu, the employee can do.

That's it.


1. Persistent Goals — the employee follows through, you don't nag every turn

You said "deploy this blog to fly.io," the employee answered one turn and stopped. Next turn you had to ask again: "Is DNS set up? The cert? Did the tests run?" — you were keeping the goal for it.

This release we flipped it. You say it once, the employee locks the goal, and self-checks every turn: what's still missing? Should I take another step myself?

It's not a new button in the chat. It's a state of the employee — a ring of light around the assistant avatar; how full it is, is how close it is to done. Done, the ring vanishes. Hover the avatar for the full tooltip (title + what's missing); don't hover, it doesn't nag you.

Three ways to set a goal:

  • Let the employee set it — when first describing the task, signal it's a long one and ask explicitly to "lock it with setGoal, turnBudget=8, autoFollowup on." The employee recognizes the signals and creates the goal.
  • Command the tool directly — tell it to call setGoal, with which params, and "don't ask for pre-confirmation."
  • Create it via APIPOST /api/v1/goals, for automation and external scripts.

Four built-in tools every employee has by default (agent-wide, system-level, no manual binding): setGoal (create), addGoalCriterion (append a criterion), completeGoal (mark done), getGoalStatus (check progress).

Auto-followup is the key. With autoFollowupEnabled on, after the employee answers a turn a lightweight evaluator scores completion (0–1) and "what's missing"; if it judges "continue," it injects a follow-up at the end of the conversation and the employee takes the next step on its own. What you feel: it answers a bit → pauses half a beat → keeps going, like a person finishing a step, thinking, and continuing.

A few deliberate constraints:

  • Evaluation happens after the answer streams — it never blocks you from reading the reply; the ring updates a moment after the answer appears
  • One goal per conversation — at most one active goal at a time, kept concurrency-safe by a generated column + unique index
  • Subagents can't see the goal tools — the goal is the parent conversation's state; children are stateless executors (see section 2)
  • Budget exhausted means stopturnsUsed >= turnBudget or LLM-call budget spent flips the state to exhausted, the ring turns red-orange, you decide whether to add budget or let go
  • Terminal states don't revive — completed / exhausted / abandoned are the end; to continue, open a new goal, avoiding the budget-accounting mess of "restart"

Point the evaluator at a cheap small model (mateclaw.goal.evaluator-model); on completion the employee syncs the goal summary into long-term memory. Full guide: Persistent Goals.

A Goal isn't a feature added to the employee. It changes the employee's state. The old employee "forgot when it answered." Now it remembers one thing across many turns: what it's doing, what's missing, when it's done.


2. Subagent delegation became a tree — employees build and lead teams

Since v1.1.0 an employee could delegate to another employee. But that was "single-level, synchronous, one at a time."

This release we made it a tree.

Recursive delegation, up to 3 levels deep. A parent delegates to a child, the child can delegate further — a "project manager" employee can spin up "frontend / backend / QA" employees, each delegating again. Every subagent has a stable subagentId + parentSubagentId + depth, with events relayed to the root conversation in real time.

Three delegation tools:

ToolBehavior
delegateToAgentSynchronously delegate one child, return after its final result; optional inheritParentContext carries recent parent context over
delegateParallelFan out several children at once, return after they all collect
delegateAsyncDelegate in the background, get a task_id immediately, fetch later with taskOutput — long tasks don't block the parent conversation

Async delegation has an attribution gatetaskOutput only lets the same conversation and same user fetch the result of a task they spawned; another conversation can't peek.

Children deny a default set of toolsdelegateToAgent / delegateParallel (recursion guard), the setGoal family and the remember family (goal and memory ownership stay with the parent), create_employee (no recursive team-building). Tunable via mateclaw.delegation.child-denied-tools.

You can see the whole tree in the UI. A nested subagent timeline in the chat stream + an always-on plan panel — delegation start, each child's name / depth / task excerpt, completion badges on finish (success / timeout / error / duration / content length). The multi-level shape reads at a glance.

The "digital-employee builder" skill — spin up a team from one sentence. Describe a need, and this built-in skill: clarifies the requirement → designs 2–6 roles → creates each as a real employee via create_employee → chains them into a workflow draft for you to review. Its companion list_capability_catalog lets it look up which skills/tools are bindable before it acts. Employees are enabled on creation; binding mirrors template apply.

Long tasks no longer lose context to "prompt too long." Structured compaction runs a four-stage strategy (soft trim → hard clear → pre-prune → LLM structured summary), always preserving the prefix (system prompt + goal anchor), injecting the summary as a UserMessage. Delegation tool results are never compacted (child execution isn't reproducible). A 10-minute cooldown after a failed summary prevents cascades.

One employee working alone is a tool. One employee working with a team it built itself is an organization.


3. Progressive tool/skill disclosure — however many tools, the context doesn't blow up

The old approach: take every tool an employee can use, with every tool's description, and dump it all into the system prompt. With many tools, just "what can I use" eats thousands of tokens — before the model does any work, half the context is gone.

That was the engineer's shortcut.

This release we switched to two-tier disclosure:

  • Core tier (CORE) — always visible to the model, callable out of the box
  • Extension tier (EXTENSION) — by default the system prompt lists only a compressed directory (name + source + one-line description), without the full schema. The employee opens it when needed.

Two new built-in tools are the switches:

  • enable_tool(toolName) — activate an extension-tier tool for the rest of the conversation. It validates the tool is in this employee's effective set; once active it's callable on the next reasoning turn of the current ReAct loop (ReasoningNode recomputes the toolset each turn, so it takes effect immediately)
  • load_skill(skillName, filePath?) — load a skill's SKILL.md on demand. The content is injected through message history (not the system prompt), which keeps the prompt cache stable and pins loaded skills to the top of later turns so it doesn't reload

Default tiering: generative tools (image_generate / music_generate / video_generate / model3d_generate) and browser_use default to extension; everything else defaults to core. The Tools page has tier UI — built-in and channel tools get a per-row tier toggle so admins can move tools between core / extension (MCP / ACP sources are locked).

An escape hatch for conservative deployments: mateclaw.tools.disclosure.mode=legacy turns tiering off and advertises all tools again; mateclaw.skill.disclosure.load-skill-tool.enabled=false falls back to the old readSkillFile.

The system prompt should scale with the task, not with the total tool count. Install 50 tools on an employee and it shouldn't burn thousands of extra tokens every turn for the privilege.


4. Workspace RBAC — the first time MateClaw is usable by a team

MateClaw used to be a single-person system — one admin who saw and changed everything. Want to bring a colleague in? There was no "read-only," no "manage only your own area."

This release we laid the multi-user foundation. Four roles, capability gating.

RoleWhat it can do
ViewerRead-only — can chat, can view the Wiki; no management surfaces, can't create/modify
MemberContent contributor — on top of Viewer, can manage agents, manage the Wiki, view memory and the dashboard
AdminResource manager — on top of Member, can manage skills, channels, models, security, settings
OwnerWorkspace owner — same capabilities as Admin, plus deleting the workspace and transferring ownership

Capabilities are the backend's single source of truth. The backend RoleCapabilities defines the "role → capability set" mapping; the frontend never derives it locally — after a workspace switch or a capability-related 403, it calls GET /api/v1/workspaces/{id}/access for the effectiveRole + capabilities and renders from that.

  • Endpoints close down by role — system-level endpoints (models / providers / OAuth / datasources) require global admin (@RequireGlobalAdmin); skills / tools / plugins go by workspace role (reads need member, writes need admin)
  • Frontend routes and sidebar are capability-gated — routes declare requiredCapability, nav items filter by store.can(cap), no menu flash before capabilities load; a Viewer lands on /chat
  • Global admin vs workspace rolemate_user.role='admin' is the system-wide global admin (manages users, creates workspaces); mate_workspace_member.role is the per-workspace membership role. Global admins span all workspaces and hold owner-equivalent power in workspaces they haven't joined

Member management out of the box. POST /api/v1/workspaces/{id}/members adds a member — if the user doesn't exist it creates the account (with a password); if it exists and a password is given it resets the password (handles re-adds). Changing roles and removing members require admin+; an owner can't be changed or removed.

Viewers can still chat normally — deliberately opened up this release: a Viewer can read the active model and read an employee's workspace files, otherwise "read-only" would mean you can't even chat.

Full guides: Security & Approval and Workspaces.

One person using MateClaw is a personal assistant. A team using MateClaw — each seeing only what they should — is an organization's operating system.


5. Feishu is a first-class citizen

Feishu used to be "receive and send text." This release we made Feishu deep — anything you can do in Feishu, the employee can do.

Interactive cards (Schema 2.0) — structured replies (JSON, Markdown with tables/headers/lists, long-form text) auto-render as Feishu interactive cards instead of an escaped string blob. Short plain text still goes as text. Controlled by the channel setting card_format (default auto); over Feishu's 32 KB payload ceiling it falls back to text.

Approval cards — a tool-approval flow now sends an Approve / Deny button card right in Feishu. You tap one, the system injects a synthetic /approve / /deny message, and the employee runs the approved action end-to-end — no switching back to the web.

Streaming cards (CardKit) — the employee's reply streams character by character in a Feishu card instead of waiting for the whole answer. 500 ms throttle, first token flushes immediately. card_streaming_enabled is on by default and falls back to accumulate-then-send if card creation/delivery fails.

Voice / file / audio / video both ways:

  • Inbound voice transcription — Feishu voice messages get transcribed via SttService and fed to the employee, so it sees content instead of an [audio] placeholder
  • Inbound file / audio / video download — not just images; files, audio, and video download, cache locally, and render in the conversation and UI via /api/v1/files/generated/{id} (media_download_enabled is now on by default — mind disk and privacy)
  • Outbound generated files become native attachments/api/v1/files/generated/{id} URLs the employee replies with auto-convert to native Feishu attachments (10 MB images, 30 MB files/audio/video; audio opus only, video mp4 only, the rest degrade to file)

Channel-native tools (ChannelToolProvider SPI) — Feishu platform capabilities like calendar and docs (feishu_calendar_list_events / feishu_doc_read / feishu_doc_create) are exposed directly as employee tools — no separate MCP server, no duplicate credentials. Read tools are on by default, write tools require approval (DB-seeded guard rules auto-apply NEEDS_APPROVAL to mutating tools).

A few more:

  • Sender context injection — the employee prompt now carries "channel / sender / (group) chat" info so it can tailor the reply by origin
  • DONE reaction — a ✅ reaction on the inbound message after a successful reply (enable_done_reaction on by default)
  • Mention filteringrequire_mention=true uses the Feishu SDK's mentions field to filter out group messages that didn't @ the bot; bot open_id is prefetched on startup + a 60 s negative cache

QQ got scan-to-bind too — scan-to-authorize through the QQ Open Platform Lite portal, no copying AppID / AppSecret by hand (AES-256-GCM exchange, 12-minute TTL).

IM conversations respect per-conversation model selection too — same as the web, each conversation can bind its own model (see below).

Details: Channels.


A few more things

Models / providers:

  • Native Gemini — Gemini runs through the native generateContent API (instead of making do with OpenAI-compat). A new GeminiChatModel handles systemInstruction / functionCall / inline images, parses the streaming SSE, and strips JSON Schema keywords Gemini rejects
  • Nano Banana image generation — Google images go through Nano Banana Pro (gemini-3-pro-image-preview); the image tool passes input images as inline parts to support image editing
  • xAI / Grok provider — wired in OpenAI-compat style, Grok 3 / Grok 4, with an xAI brand icon in the UI
  • Embedding models from any provider (#79) — Settings → Models gains an embedding section; pick an embedding model from any provider, reuse that provider's API key, KBs pick their embedding model from a dropdown; keyless local proxies use NoopApiKey
  • Xiaomi MiMo thinking-mode multi-turn fix (#189)reasoning_content is now kept correctly across turns

Chat experience:

  • Per-conversation model selection (#150) — the chat-header ModelSelector binds a model per conversation, stored in mate_conversation, effective on the next message; unset falls back to the workspace default
  • Conversation list: pin / multi-select delete / filter by employee (#144) — pin from the ⋮ menu, enter select-mode for batch delete, a filter-by-employee dropdown appears with 2+ employees
  • Global shortcuts Ctrl+K / Ctrl+N — Ctrl+K opens the employee picker to jump, Ctrl+N starts a new conversation, neither fires inside an input field
  • Sessions admin page — reached from the header menu, server-side pagination (search by title/ID), depth-styled cards, an inline editable model chip per row

Scheduler / console:

  • Cron and triggers merged into a unified Scheduler page — three tabs: scheduled jobs / event triggers / run history, with a visual cron builder (segmented editor + presets + human-readable preview)
  • New wiki_process scheduled-task type — schedule KB processing off-peak (pick a KB + optional force-reprocess), no agent binding required, queued asynchronously
  • Runtime view folded into the Employees page/backstage redirects to /agents?view=live; one Employees page with a segmented toggle between "Roster" and "Live"
  • Sidebar notification badges — pending approvals (red count → Security), stuck employees (orange dot → Live view)
  • Dashboard model-config card + onboarding provider enablement — the home screen shows enabled providers, liveness, and the active model

Wiki:

  • New raw-material formats: HTML / Excel / PowerPoint / CSV — HTML is cleaned of script/style/nav/footer noise via jsoup, Excel/PPT use Apache POI, CSV uses Tika
  • Cascade-delete a knowledge base — one transaction deletes KB → raw materials → pages → citations → chunks → processing jobs; the UI shows the raw-material and page counts before deletion so you see the blast radius
  • create_page tool returns the new page id; KB failure stats count only the latest job per raw material

Memory:

  • Agent memory snapshot export / import — package an employee's AGENTS.md / MEMORY.md / PROFILE.md / SOUL.md / KNOWLEDGE.md + daily ledger into a ZIP for backup/migration; a dry-run preview before import (create / update / skip classification), with a whitelist + zip-bomb guards (≤500 entries, ≤1 MB each, ≤16 MB total). Export / import from the Agent Context page's right panel
  • Memory keyword search — runtime tools can search an employee's own workspace files by keyword (CJK 2-char windows + Latin tokenization, per-file weighted scoring, returning line numbers + highlighted snippets)

Skills:

  • Skill lifecycle curator — a daily sweep ages idle, agent-created skills through active → stale → archived (default 30 days to stale, 90 to archived). Settings → Skill Curator gives a control panel to preview / pause / toggle; built-in, pinned, MCP/ACP, and sys-/ops--prefixed skills are exempt. Configurable under mateclaw.skill.curator.*
  • SkillMarket lifecycle UI — the Skills page adds Enabled / Stale / Archived tabs, cards show "last used," and the detail drawer offers manual archive / restore / pin
  • skill-authoring built-in skill — teaches employees how to write SKILL.md (frontmatter / validator limits / directory placement / common pitfalls), so they can author skills without knowing JVM internals
  • Virtual SKILL.md synthesis for MCP/ACP-derived skills (#136) — MCP/ACP integrations also become navigable skill catalogs that load_skill can reach
  • Typed wrapper tools for script entrypoints — declare scripts: with JSON Schema in SKILL.md, and each entrypoint becomes a named tool with typed params; the model fills the fields, the runtime serializes and passes them to the subprocess

LLM reliability:

  • Automatic failover on rate-limit — when the primary is 429-throttled and same-model retries are exhausted, fail over to the next backup provider (it used to just return the error)
  • Network connection errors classified as retryable — "network connection error" maps to a retryable SERVER_ERROR; MAX_RETRIES bumped 5 → 10 to ride out provider flaps during Wiki batch loads
  • DashScope max_tokens clamped to 8192 — over that, DashScope returns 400, which used to be misread as "model not found" and silently switch providers; now clamped, so the model you picked actually runs
  • Soft-deleted models no longer block the uniqueness check (#173) + tombstone rows purged from mate_model_config (V118)

Deploy / build:

  • Server packaged as an executable JARjava -jar mateclaw-server-1.4.0.jar works out of the box (added the spring-boot repackage goal)
  • Compile with -parameters — a centralized-POM migration had dropped this flag, so @PathVariable / @RequestParam without explicit names failed to bind at runtime (a batch of endpoints 500'd); restored
  • OpenPDF 3 migration — flying-saucer-pdf bumped to 10.2 pulling OpenPDF 3.0.3 (com.lowagie.textorg.openpdf.text), alongside JDA / ShedLock / Pebble / Tika / jsoup upgrades
  • Snowflake ID precision check in CIscripts/check-snowflake-precision.sh scans for v-model.number, Number(id), type="number" and the like, wired into pnpm lint / pnpm build, so 19-digit IDs can't get truncated by the JS Number type

Full list: git log v1.3.0..HEAD.


Upgrade path

Config is fully compatible. All your agents / skills / wikis / channels / cron jobs / workflows / triggers come across untouched.

New table schemas are migrated by Flyway automatically. Goal (mate_agent_goal*, V120), tool tiers (mate_tool.disclosure_tier / mate_mcp_server.disclosure_tier), channel-native tools (mate_tool.channel_id, V119) tables are created/altered on first boot; existing databases auto-baseline.

If you're already running v1.3.0 in production:

  • Goals are on by default but don't change old behavior — don't create a goal and conversations work exactly as before; the graph node passes through calls with no goal
  • Progressive disclosure is on by default — employees see core tools + the extension directory. If your automation depends on "all tools always visible," set mateclaw.tools.disclosure.mode=legacy to restore the old behavior
  • RBAC is invisible for single-user setups — an existing single-admin deployment behaves the same after upgrade (the global admin holds everything); to bring in a team, add members and assign roles in the workspace
  • Feishu media_download_enabled is now on by default — after upgrade Feishu downloads inbound files/audio/video. Turn it off in channel settings if you care about disk or privacy
  • The skill curator is on by default — idle skills get aged. For conservative deployments set mateclaw.skill.curator.enabled=false, or pause it in the UI

What this means for you

If you're a regular user

Set a goal for an employee. "Translate this article to English, post it, reply to comments" — say it once, watch the ring of light around the avatar follow it through to done.

If you manage a team

Pull colleagues into the workspace and assign roles: make ops a Member who manages agents, make an intern a read-only Viewer. Then use the digital-employee builder to spin up a specialized team from one sentence and chain it into a workflow you hand to the system.

If you're a developer

Create goals programmatically with POST /api/v1/goals, fetch async-delegation results with taskOutput, pull a capability set from GET /workspaces/{id}/access to build your own gating. enable_tool / load_skill let you install a wall of tools without blowing the context.

If you run production

Upgrade. Automatic rate-limit failover + network-error retries make long tasks steadier; the Snowflake precision check in CI kills a whole class of ghost bugs; the executable JAR makes deployment simpler.

If you gave up before because something didn't quite work

Come back. Employees follow goals on their own now, build sub-teams, and scale their toolset to the task; the whole system is usable by a team, and Feishu is a first-class citizen. Every change is here because a real user got blocked.


One more thing.

Goals. Sub-teams. Roles.

An employee that stops when it answers is a chatbot.An employee that locks a goal and follows it to done is an assistant.An employee working with a team it built, inside a workspace many people share — that's an organization's operating system.

That's the direction a personal AI operating system is supposed to grow.