Skip to content

v1.3.0

Stable · 2026-05-13 · Previous stable: v1.2.0

Five things

Let me cut to it.

In v1.2.0 we renamed agents to "digital employees." But an employee who works alone is just a starting point — real work needs orchestration.

This release we filled in the missing piece.

One — Workflow is in. Compose multiple employees plus system actions into a linear business process. Two — Triggers are in. Let things that happen in the system start a workflow or talk to an employee, automatically. Three — Wiki is no longer just a search index. It's a processing pipeline. Template-driven transformations turn every raw material and every page into a structured product. Four — Each employee binds MCP tools independently; multimodal traffic gets a sidecar. No more "one MCP server, everybody sees it." Five — Office files come straight out of the chat. Docx / Xlsx / Pptx / PDF — four document generation tools, no subprocess, no npm, no Office install.

That's it.


1. Workflow — MateClaw graduates from chatbot to business-process OS

You wanted to say: "let the data analyst enrich a customer record, then let the enterprise-sales employee run VIP onboarding, then fan out to Feishu and email at the same time, then write the result to memory once both channels acknowledge."

Before this release, that meant: stitch the prompt yourself, write the cron yourself, handle the approval yourself.

This release we did the wiring for you.

Open the Workflow menu. You see a linear array of steps plus a mode field. That's it. Not a 30-node Dify-style canvas. Not a drag-and-drop if/else maze. Intentionally minimal.

Seven step modes cover 90% of business flows:

  • sequential — run in order; previous step's output flows to the next as
  • fan_out / collect — run a group of steps in parallel, then collect
  • conditional — Pebble expression decides whether to run
  • await_approval — pause the run, request approval, resume after sign-off. Persisted across server restarts.
  • dispatch_channel — fan out the previous output to multiple channels
  • write_memory — write the result into the employee's MEMORY.md (four merge strategies: append / replace_section / upsert_kv / overwrite)

Two ways to edit:

  • JSON-first — Monaco + JSON schema + Pebble static checks + template dropdowns. For people who can write DSL. Red squiggles while typing, compile diagnostics before publish.
  • @vue-flow/core canvas — for people who want to see the shape. Read-only linear render in this release; drag-to-edit ships in the next.

Don't know the DSL? Open natural-language draft generation (POST /workflows/draft/generate). Describe the flow in one sentence, an agent generates graph_json plus compile diagnostics. It does not auto-publish. You review, you hit publish.

A few engineering details, invisible but load-bearing:

  • Integer revisions — publish writes an immutable new row; drafts and published versions are decoupled. Your in-flight runs won't suddenly change semantics because someone edited the draft.
  • Inline payload storage — large inputs/outputs spill to a payload:// URI, the database doesn't bloat
  • Cross-workspace ACL — publish-time validation that every agent / channel / employeeId reference belongs to this workspace
  • Run history — every step's input / output / duration / token count / failure chain is recorded; you can open any past run and replay it
  • Async dispatch + GC schedulers — long-running workflows don't pin a request thread

Workflow is not a replacement for ReAct or Plan-and-Execute. Single-employee multi-turn reasoning still runs on those engines. Workflow is for assembling employees into a business process — promoting "a task" into "a procedure."


2. Triggers — events drive workflows now

OK, the workflow is written. Who starts it?

Before, the answer was: you do. Manually. Or you write a cron yourself. Or you ship a webhook endpoint yourself.

This release we unified that. Triggers wire "an event happening somewhere" to "an action that should run."

Six pattern types. They cover every triggering scenario you can think of:

PatternFires when
cronA cron expression matches — reuses the existing cron module's ShedLock + Spring TaskScheduler
webhookA generic event passes through — POST /api/v1/triggers/events
channel_messageA channel receives a message — filterable by channelType + senderEquals
agent_lifecycleEmployee lifecycle event — spawned / terminated / crashed
content_matchSubstring match (case-insensitive) on the event content
workflow_completionAn upstream workflow enters a terminal state

Two action targets: start a workflow, or send a message directly to an employee.

Safe-by-default governance — this is the part that matters:

  • Event dedup — events with a dedup_key already seen within the default 60s window get dropped
  • Per-trigger rate limit — default cap of 10/min keeps one chatty trigger from drowning the queue
  • Bot self-msg filter — Feishu / DingTalk / WeCom echo bot messages back as events? The default-bound SPI lets the channel adapter recognize and discard them
  • Recursion guardworkflow_completion → workflow → another workflow_completion… dispatch chains past depth 5 get cut + alert
  • Unknown pattern types fail closed — a typo or a future-added pattern won't silently fire every trigger in the workspace

Cross-instance consistency — in a multi-instance deployment, the pattern_version self-cancel mechanism plus periodic syncFromDatabase keeps every node converged. cron triggers grab ShedLock through CronDelegationPort so they fire exactly once.

Webhook ACK is fire-and-forget by design — receive → envelope wrap → dedup check → bot-self check → rate-limit check → ACK 200 → async dispatch. Upstream gateways see 200 and stop retrying.

The UI has a structured form per pattern. No hand-writing patternJson. Pick cron → cron expression input + timezone dropdown + next-fire preview. Pick content_match → substring input. Pick agent_lifecycle → agent dropdown + phase dropdown.


3. Wiki is no longer just a search index. It's a processing pipeline.

The old Wiki worked one direction. You threw a document in, it got chunked, embedded, and could be retrieved by semantic search. One way. Raw in, recall out.

This release Wiki learned to process.

The transformations engine — attach a "template" to a raw material or a page, run it through an LLM, save the structured output back to the Wiki.

Concretely:

  • User-defined templates — write a prompt, pick a model, pick an output format (markdown / json), decide whether to auto-save as a synthesis page
  • Per-template model picker — analyze contracts with Claude Opus, do summaries with a cheap Flash. No more "one LLM does everything."
  • Run templates against pages — not just raw materials. Feed an existing synthesis page back through a template, produce a new one.
  • Cross-material aggregator (map-reduce) — run a template against every raw material in a KB, then map-reduce the runs into one KB page. That's real synthesis.
  • Reverse-citation extractor — a synthesis page is bound to the exact source chunks it cited. Click the page, see where every claim came from.
  • Structured JSON output + optional JSON Schema — downstream code can consume the output directly without parsing markdown
  • Cancel a running transformation + re-run any past run — long task halfway done and you spot a prompt bug? Cancel, edit prompt, rerun.
  • Token usage recorded per run — you can see exactly how much each template is costing you
  • Side-by-side compare modal — tweaked the prompt, want to see the difference? Open compare.

Seven seeded templates aligned with enterprise scenarios: contract clause extraction, account intel, risk summary, KPI distillation, meeting-notes structuring, knowledge-page structuring, Q&A pair generation. Install once, ready to run.

Synthesis pages themselves get embedded at page level. So now search hits don't just return raw chunks — they also return products that you (or an agent) already processed.

The Wiki UI was rebuilt too — library home + workspace split. The library home is the entry into every KB in your workspace. Inside a KB: four tabs — raws / pages / templates / runs.

Wiki went from "passive retrieval" to "active processing." Raw materials don't just sit there waiting to be recalled — they get transformed into more useful artifacts, and those artifacts get recalled too.


4. MCP per-agent tool binding + multimodal sidecar

Before, connecting an MCP server meant every employee in the workspace saw all of its tools. You installed GitHub MCP for your executive assistant — and customer support also thought it could open PRs.

That was the engineer's shortcut.

This release we fixed it.

Each employee binds MCP tools independently:

  • The tool picker groups by server and tags by status — connected / stale / unavailable / orphan
  • Validation on save: do the tools you're binding actually exist on the current MCP server? If not, save is rejected — no publishing an agent that's going to crash
  • Namespace collisions auto-prefix — two servers both have a search tool? Becomes server-a:search / server-b:search; calls are unambiguous
  • Rename an MCP server, bindings follow automatically
  • Stable prefixed callback names + per-server tool cache persisted to disk — no re-probe on restart

MCP-derived skills and tools flow through the same picker endpoints that built-in tools use. The "tool catalog" and "what this employee can use" are the same view.

Multimodal sidecar routing — issue #87

Before, you sent an image to a text-only main model (DeepSeek V3.5, say) — it refused or hallucinated. The engineer's compromise: "you should pick a vision model."

That's wrong. Users shouldn't be responsible for your model-selection problem.

This release does sidecar routing — when the main model is text-only and the message has an image, the system automatically calls the configured vision model to describe the image, then injects the description into the main conversation. Main conversation stays cheap. Vision gets invoked only when needed.

  • Settings → Multimodal to configure the sidecar vision model — pick any vision-capable provider (GPT-4V / Claude Sonnet / GLM-V / Doubao Vision)
  • Routing badge above the input — you can see whether this message goes "main model direct" or "sidecar first." The whole path is visible.
  • Reply model attribution on the assistant bubble — which model wrote this answer, right there on the side of the bubble

The "hard ban" got dismantled along the way — user-defined tools used to be suppressed by hard-coded "if user asks X, answer Y" rules. This release respects user config — you installed the tool, you decide what it does.


5. Office files straight out of the chat + image edit

The employee writes a contract summary. You say, "export to Word."

Before, that meant either: agent gave you markdown and you copy-pasted, or you installed an npm subprocess script.

This release ships four document-rendering tools that run inside the JVM — no subprocess, no npm, no Office install required:

  • DocxRenderTool — Markdown → .docx. Headings, lists, tables, images, all rendered.
  • XlsxRenderTool — Markdown tables → .xlsx. Multi-sheet supported.
  • PptxRenderTool — Markdown section headings → one section per .pptx slide
  • PdfRenderTool — Markdown → .pdf. Native CJK support.

They coexist with the old docx skill — clear division of labor: the old skill is for "edit an existing Word doc with style preservation and tracked changes." The new tools are for "create from scratch."

Image edit (issue #75)image_generate now takes image / images parameters with five reference forms:

  1. A https:// URL
  2. An inline data:image/png;base64,...
  3. An attachment://<id> for the current message
  4. A msg:<conversationId>:<attachmentIndex> referencing an image from earlier in this conversation
  5. An array of URLs or paths (multi-image fusion)

Meaning — you say to an image you already discussed, "change the background to red." The agent uses form 4 to point back at that exact image, calls the image-edit model, recolors it. No re-upload.

New models:

  • DashScope-compat mode — same sk- key, dotted version-number families (qwen3.5-plus / qwen3.6-plus / qwen3-vl-plus, etc.)
  • New Wanxiang / qwen-image series — 14 new image models, 3 new video models (including happyhorse-1.0-t2v)
  • Xiaomi MiMo provider — MiMo V2.5 Pro / V2.5 / V2 Pro / V2 Omni / V2 Flash
  • Tencent Hunyuan 3D — already shipped in v1.2.0; this release completes the icons and routing

render_html_image tool — turn HTML the employee generated into an image and deliver it as a native IM attachment, sidestepping "format not supported."


Context engineering — long tasks no longer forget

When a digital employee runs a 30-step task, the biggest problem isn't model intelligence. It's memory loss.

This release we did several invisible, mission-critical things in the context layer:

  • The first user message is anchored — compaction never evicts it. Your original goal stays.
  • Compaction never splits a tool_call / tool_response pair — before, you could end up with a "call" without its "return" and reasoning would collapse. Not anymore.
  • Older tool results stay raw instead of getting rewritten to lossy summaries. What should spill to disk, spills. What shouldn't be rewritten, isn't.
  • Spill markers persist across multiple compaction phases — a long task with three compaction rounds doesn't suddenly lose its spill references
  • compact_status SSE event — the frontend knows the compaction state and loads history from the latest boundary. No half-page artifacts.
  • Head-side orphan tool-response repair — orphan tool responses on a pagination cut (order-sensitive forward scan) get reaped instead of polluting the next round
  • Per-conversation spill files — retention sweep cleans them on schedule and conversation delete purges them immediately

A few streaming improvements too:

  • Streamed tool-call arguments are sanitized to valid JSON in flight — the LLM emits, the runtime repairs, the receiver always gets valid JSON
  • Smarter repetition detection — employee arguing with itself? content-repetition cap cuts it off before token burn
  • Recovery affordance card for non-transient errors — LLM returned an error that won't self-heal (quota, auth)? A card tells you what to do instead of leaving you watching a spinner
  • Transient TLS / socket errors get retried — a network blip doesn't surface as a red error

A few more things

Channels (WeCom deep tuning):

  • WeCom approval cards — keepalive on outgoing cards, so long tasks don't time out the upstream session
  • Group chat attribution — per-sender ID attribution and a time-windowed debounce boundary; two people speaking in the same group don't collide
  • Adaptive debounce window for paste-split long messages — a long paste broken into five messages gets reassembled into one
  • Inbound quoted-message context — employees can see which earlier message you replied to
  • WeCom file pipeline completed — upload size pre-check + group reply fallback + appmsg parsing + public-account-article paste-body hint
  • Async tool results forward back to IM — long task finishes, the result lands in the originating channel (Feishu / DingTalk / WeCom / Slack), files uploaded per channel
  • WS / long-polling channels go through a leader lease — multi-instance deployments stop producing duplicate replies
  • Fake generated-file URLs scrubbed — employees no longer invent https://example.com/file.docx

Enterprise scenario workbench — contract review + account intel + approvals + audit in one workbench, each panel scrollable with a centered vertical stepper. A preview of the v1.4 scenario-app direction.

Deploy / build:

  • Docker no longer requires DASHSCOPE_API_KEY — bring up with OpenAI / Ollama / any provider; it just boots
  • Five heavyweight bundled skills are now optionaloptional: true in SKILL frontmatter, off by default, enable on demand, faster startup
  • mateclaw-ui build heap bumped to 6 GB + pnpm pinned to v10 + build scripts whitelisted — first-time fork builds no longer OOM
  • Spring AI 1.1.5 → 1.1.6 / Spring AI Alibaba 1.1.2.2 → 1.1.2.3

LLM / providers:

  • OpenAI Whisper STT routable to any OpenAI-compatible endpoint (issue #76)
  • OpenAI-compatible HTTP client pinned to HTTP/1.1 — some proxies don't like HTTP/2 and were corrupting TLS frames
  • Custom OpenAI-compatible providers can opt out of API Key requirement (issue #89) — for local proxies
  • Default model activates automatically after OAuth login — no manual model pick required after sign-in

Fixes (you won't notice but will swear less):

  • Conversation delete is actually delete — cascade cleans messages / approvals / async tasks / cancels running workers; no orphan rows (root cause for issues like #66)
  • Assistant message loss from column truncation closed off — and JDBC charset corrected to Java's canonical UTF-8
  • Tool guard rule blank-row guards — one empty rule no longer crashes the whole guard module
  • Digital-employee isolation hardened — creation / skill binding / prompt catalog all locked end-to-end
  • Agent templates pre-bind skills and tools — pick a template and it's ready; no more "now go configure it"
  • Duplicate-name agent save surfaces the real error — not a vague toast anymore
  • MCP virtual cards (from MCP server / ACP) reject edits and scans (issue #83)
  • MCP server names preserve Chinese characters (issue #65)
  • MCP tools auto-included in the effective allowlist (issue #108) — no more manual allowlist edit every time you add a server
  • JSON examples in i18n templates escaped — vue-i18n stops parsing them as interpolation
  • Clipboard copy degrades gracefully in non-HTTPS contexts — local IPs work
  • OCR triggers automatically when PDF text extraction returns unreadable bytes
  • Chat-LLM model popup uses row-based liveness pings — each row tells you whether that model is currently reachable
  • Mermaid streaming renders without flicker + copy/download buttons — agent emits mermaid live, the diagram doesn't jitter

Full list: git log v1.2.0..HEAD.


Upgrade path

Config is fully compatible. All your agents / skills / wikis / channels / cron jobs come across untouched.

New table schemas are migrated by Flyway automatically. Workflow (mate_workflow*), trigger (mate_trigger*), wiki-transformation (mate_wiki_transformation*) tables are created on first boot; existing databases auto-baseline.

If you're already running v1.2.0 in production:

  • Upgrade — Workflow and Triggers menus appear. Try a single demo flow in an internal workspace first (e.g. "9 a.m. daily customer morning report") before rolling out
  • MCP per-agent binding auto-migrates on first upgrade — old workspace-level MCP config stays put, each agent defaults to binding the full set (preserving prior behavior); tighten as needed
  • Multimodal sidecar is off by default — without a sidecar vision model configured, the main model handles images exactly as before. Configure it to opt in.

What this means for you

If you're a regular user

Try the five seeded workflow templates: morning briefing, contract review, account intel, approval dispatch, scheduled knowledge-page synthesis. No DSL required — the structured form gets you running in a few clicks.

If you manage a team

That recurring flow you've been doing manually — "9 a.m. every day, summarize yesterday's customer questions, send to the ops group" — write it as a workflow once, hand it to the system forever. Employees stop drifting because your prompt drifts: the procedure is fixed, who owns each step is fixed.

If you're a developer

JSON-first DSL + Monaco + schema validation. Use POST /workflows/draft/generate to bootstrap a draft from natural language, then review by hand. Trigger entry is POST /api/v1/triggers/events — wire your existing systems straight in.

If you run production

Upgrade. Try the default-on governance (dedup / rate limit / bot self-msg / recursion guard) with a handful of triggers. In multi-instance deployments, CronDelegationPort ensures cron triggers fire exactly once across the cluster.

If you gave up before because something didn't quite work

Come back. MCP tools bind per employee now, multimodal routes itself, Office files render straight out of the chat, and the Wiki learned to process. Every change is here because a real user got blocked.


One more thing.

Digital employees. Workflows. Triggers.

One employee working alone is a tool.Multiple employees collaborating on a procedure is an organization.Events that start the organization automatically — that's an operating system.

That's what a personal AI operating system is supposed to feel like.