Multi-Agent Engine
They're called "digital employees" now. The back office uses that term throughout. The runtime is still an Agent under the hood, but the UI, the mental model, and the templates treat each one as a coworker on your team. The renaming brings a worldview shift with it: you give an employee a Role, a Goal, and a Backstory — they know who they are and why they exist. You don't have to write a cold system prompt asking an "agent" to please understand the task.
An employee is a personality with tools. Multiple employees form a team.
That's the short version. The longer one: an employee is a name, a system prompt that defines how it thinks (built from role / goal / backstory), a model that actually thinks, a set of tools it's allowed to reach for, optional knowledge bases it can read, optional skills that extend what it can do, its own slice of memory, and a choice of how to approach hard problems — incrementally (ReAct) or with a plan (Plan-and-Execute).
You can have many employees. Each one is specialized. You give them different jobs.
What a digital employee has
| Piece | What it is |
|---|---|
| Name | How you and your team find them |
| Icon | Pixel-art style, color coded by role |
| Role | One sentence — "I'm the product researcher" / "I'm customer support" |
| Goal | One sentence — "I help you see how the market is moving" |
| Backstory | Where they came from, why they exist, what they care about; auto-spliced into the final system prompt |
| Employee-card tagline | The "self-introduction" shown on the card |
| System prompt | Their personality, rules, style, priorities (role/goal/backstory inject automatically) |
| Type | react or plan_execute |
| Tools | Which tools they're allowed to call (built-in, MCP, skills, ACP-bridged) |
| Knowledge bases | LLM Wikis they can read from (KB hot cache auto-injects into the system prompt) |
| Workspace memory | Their own PROFILE.md, MEMORY.md, SOUL.md, AGENTS.md, and daily notes |
| Max iterations | How many reasoning loops are allowed before forced convergence |
| Enabled flag | Off switch |
Notice what's not here: the model. A single global default model (set in Settings → Models) is used for every agent at runtime. The model_name field on the agent row is a legacy artifact — it's ignored. This is intentional: swapping models across your whole deployment is one click, not thirty.
Templates: hire a coworker who already knows the job
You don't start from scratch. Digital Employees → New opens a two-tier template picker.
5 career templates (recommended)
Each one ships with a role, goal, backstory, the right toolset, a pixel-art avatar, and a color that belongs to the role. Open one, it works:
- Product Researcher — competitive scans, market tracking, interview synthesis
- Customer Support — catch every question, look it up in the KB, escalate what they can't resolve
- Knowledge Curator — feed scattered material into the LLM Wiki, maintain bidirectional links, periodic consolidation
- Data Analyst — query datasources, run SQL, build charts, write conclusions
- Executive Assistant — calendar, email drafts, cross-tool coordination
Generic templates (blank or half-finished)
- General Assistant — the default chat employee
- Research / Code / Writing / Knowledge Curator / Data Analyst — semi-finished, organized by purpose
- Custom — fully blank, if you know exactly what you want
Pick one, give them a name, adjust the role and goal, save. Working coworker in under a minute. Every field is editable after creation.
Two ways of thinking
ReAct — think, act, observe, continue
The default. An agent in ReAct mode runs a loop: reason about what to do next, act (maybe by calling a tool), observe the result, decide whether to loop again or answer.
Use it for:
- simple Q&A that might need one or two tool calls
- conversational interaction where each user turn is small
- tasks where the agent needs to react to what it learns along the way
Example: "What's the weather in Beijing today?" → reason (need current data), act (call web search), observe (15–26°C, sunny), answer.
Plan-and-Execute — plan first, execute second
For larger tasks. The agent starts by generating a plan — an ordered list of 2 to 6 steps. Then it executes each step, one at a time. When done, it summarizes everything it did.
Use it for:
- multi-step research ("investigate X, compare Y, write a brief")
- anything where the steps are knowable up front
- anything where you want to watch progress — the plan and each step's status show up in a persistent task list next to the conversation
Example: "Research Spring AI frameworks, compare the top three, write me a brief." → plan (4 steps) → execute in order → summarize.
How to choose
| Situation | Use | Why |
|---|---|---|
| Simple Q&A, single-tool calls | ReAct | No planning overhead |
| Information retrieval | ReAct | Usually done in 2–3 cycles |
| Multi-step ordered work | Plan-and-Execute | Explicit plan is easier to watch and debug |
| Research + comparison + writing | Plan-and-Execute | Each step feeds the next |
| "Read this file and tell me X" | ReAct | One tool, one answer |
| "Build me a structured report on X" | Plan-and-Execute | Multiple gathering + synthesis steps |
Change an agent's type at any time. Same system prompt works reasonably in both modes.
Multi-agent parallel delegation
An agent doesn't work alone. One agent can delegate to another — or to three at once.
- Single delegation — hand a sub-task to a specific agent; it runs in an isolated session, results stream back
- Parallel delegation — fan out to multiple agents at once, each in its own session
- Live child visibility — see reasoning, tool calls, and progress for each child in the ChatConsole as it happens
- Routing hints — built into the system prompt, so agents know when to handle it themselves vs. when to delegate
Example: coding agent takes the Jira ticket, research agent pulls competitor data, writing agent drafts the Slack reply. Three in parallel, results flow back to the orchestrator.
Multi-level subagent delegation tree
New in 1.4.0
Delegation is no longer flat. A parent employee can delegate to children, and those children can delegate further — recursively, up to 3 levels deep. A temporary team can grow its own hierarchy for a specific task.
Three delegation tools, one per cadence:
delegateToAgent— synchronous. Hand a sub-task to a specific employee, wait for it to finish, and return only after the child's final result. OptionalinheritParentContextcarries the parent conversation's recent context to the child, so you don't have to re-explain the background.delegateParallel— fan out. Delegate to several children at once; each runs in its own isolated session and the results are collected together.delegateAsync— background. Returns atask_idimmediately while the child runs in the background; fetch the result later withtaskOutput.taskOutputhas an attribution gate — only the same conversation + the same user that spawned the task can read its result, preventing cross-conversation / cross-user leakage.
Children deny a default set of tools so the tree can't run away:
delegateToAgent/delegateParallel(recursion guard — children can't launch their own synchronous/parallel delegations, avoiding a delegation storm)- the
setGoalfamily + therememberfamily (goal and memory ownership stays with the parent) create_employee(children can't conjure new employees)
This default deny list is tunable via mateclaw.delegation.child-denied-tools.
Delegation pairs with the Goals system — the parent sets goals, breaks the work down, and delegates sub-tasks; children focus on execution.
UI — nested subagent timeline + always-on plan panel
The ChatConsole draws the whole delegation tree, not a flat log:
- Delegation start is marked clearly
- Each child shows its name / depth / task excerpt
- Completion badges: success / timeout / error, plus duration and content length
- Every subagent has a stable id + parentId + depth, so the nesting is legible in the timeline — you can see exactly who delegated to whom
- The plan panel is always on — no longer Plan-and-Execute only; delegation-tree progress folds into the same panel
Build a team from one sentence: the digital-employee builder skill
New in 1.4.0
Don't want to create employees one at a time? Give it a sentence and let the "digital-employee builder" skill assemble the whole team for you.
The skill starts from your one sentence and runs the full chain:
- Clarify the requirement — it pins down the vague sentence first, confirming the problem you're actually trying to solve
- Design the roles — breaks it into 2 to 6 complementary roles
- Create each one — calls
create_employeeper role to produce real, usable employees - Chain them into a workflow draft — links the employees into a workflow draft you can tweak right away
The companion tool list_capability_catalog lets the skill survey which tools / skills / knowledge bases the deployment has available before assigning capabilities to roles. Created employees are enabled on creation — no extra toggle to flip.
Deep thinking
Not every question deserves deep reasoning, but some do. MateClaw lets you turn on deep thinking per agent, per conversation:
thinkingLevel:off/low/medium/high/max- Supports Anthropic extended thinking, DashScope qwq reasoning, OpenAI o1
reasoning_effort=high - The thinking block streams into the UI as a collapsible panel — you see the model reason, tokens don't get wasted on tasks that don't need it
Hiring a digital employee
Digital Employees → New:
- Pick a template (one of the 5 career templates, a generic template, or Custom)
- Name them, choose an avatar (pixel-art library, or upload your own)
- Write a one-sentence Role, a one-sentence Goal, a few-sentence Backstory
- Write a one-line employee-card tagline — the self-introduction shown on the card
- Choose the type (
reactorplan_execute) - Write (or edit) the system prompt (role / goal / backstory get auto-appended — don't repeat them)
- Pick which tools they can use, bind any knowledge bases they should read
- Set
max_iterations(default 10) - Save
Live immediately. Call them from chat or via API.
Tool binding (per-agent tool picker)
New in 1.3.0
In v1.2.0 the employee's tool binding was a flat "check what you want" list. v1.3.0 reworks this into a grouped + status-aware + namespace-aware picker, specifically to handle MCP tool grime.
Open the digital-employee editor's Tools tab and you get:
- Grouped by source: built-in tools / skill-injected tools / MCP tools (further grouped per server) / ACP tools
- Status badges: each tool carries a tag —
connected— currently usablestale— this MCP server is currently unreachable, but the binding is preserved (it'll work as soon as the server is back)unavailable— server / skill has been disabled; binding is preserved but the runtime won't surface it to the employeeorphan— references a tool that no longer exists (server removed, tool renamed); the save action rejects orphan references and forces cleanup
- Namespace collisions: when two different MCP servers expose the same tool name (e.g. both have
read_file), the picker shows the fully prefixed names (server-a__read_file/server-b__read_file); the employee's system prompt maps them back to the originals so the LLM doesn't get confused - Validation on save: every checked tool runs through
AgentBindingService.validate(...)— any orphan reference fails save and must be cleared - MCP server rename: bindings tied to a renamed server follow automatically (matched via persisted tool cache) — no need to re-tick
UI: Agents → pick employee → Tools.
Implementation details: see MCP.
Knowledge base binding (per-agent primary KB)
New in 1.5.0
The employee editor has a new "Knowledge Base" tab where you can pick a primary KB for each employee. Knowledge bases stay workspace-shared — binding only declares "this is the one I default to," it doesn't restrict other employees' access.
Short version: each employee can pick one knowledge base as their "primary KB" — the default they query. Or pick none.
The model (worth reading once so it doesn't surprise you later):
- Knowledge bases are workspace-shared. A KB belongs to the workspace it was created in; every employee in that workspace can see it. Binding a KB to an employee does not make it exclusive — other employees can still use it
- The "primary KB" is just a default. It tells the wiki tools (
wiki_search/wiki_read/wiki_backlinks/ ...): "when the caller doesn't specifykbName/kbId, use this one" - Multiple employees can pick the same KB as primary. They don't interfere — each one's binding is its own, the KB itself isn't mutated
- Not binding is fine. With no primary set, the runtime falls back to the most-recently-updated KB in the workspace
UI: Employees → pick employee → Edit → Knowledge Base.
| Option | Behavior |
|---|---|
| 🚫 No primary KB | Clear the binding; the next time the employee's wiki tools omit kbName, the runtime falls back to the workspace's most-recently-active KB |
| 📚 <KB name> | Set this KB as primary; wiki tools default to it. The row also shows the KB's page count |
Each row shows: icon, name, description, page count. The list is the full set of KBs in the current workspace — including ones already picked as primary by other employees.
How the runtime decides "which KB to read"
When an employee invokes a wiki tool, the resolution order is:
- The tool call explicitly carried
kbName/kbId— use that - No explicit target → check the employee's
primaryKbId; if it points to a workspace-visible KB, use that - No
primaryKbIdeither → pick the most-recently-updated KB from the workspace's visible set - The workspace has zero KBs → tool returns empty, the LLM decides what to do next
Migration note: early versions persisted the binding on mate_wiki_knowledge_base.agent_id (one-to-one, exclusive semantics). Starting with the V130 migration, every legacy kb.agent_id is backfilled into the corresponding agent.primary_kb_id; the old column stays around as a read-only fallback, but new writes only touch agent.primary_kb_id. If you relied on kb.agent_id to isolate a KB to a specific agent, revisit those bindings in the editor — KBs are now visible to every employee in the workspace.
System prompt best practices
The system prompt is the employee's voice, priorities, and constraints. Role / Goal / Backstory, skill instructions, and workspace memory all get automatically appended to the final prompt — you don't write those yourself.
Your part should cover:
- How they should speak — tone, style, phrasing preferences ("professional but not stiff" / "stay cautious in customer-facing replies")
- What they're allowed and expected to do — the task boundary
- How to behave when uncertain — "search first, don't make things up" / "ask before running a dangerous command"
- Output format — if you need structure, say so
Leave out:
- Tool descriptions — auto-injected
- Workspace memory instructions — they come from
AGENTS.md - Framework-specific behavior (tool call format, ReAct structure) — don't fight the runtime
Example:
You are a professional technical documentation assistant. Your responsibilities:
- Search and organize technical materials based on user needs
- Answer questions using clear, structured formatting
- Ensure code examples are syntactically correct
- When unsure, search first rather than fabricating information
Guidelines:
- Cite sources when referencing external information
- For time-sensitive questions, get the current date before searching
For developers: how the agent actually runs
If you're just using agents, skip this section. If you're building on top of them — adding nodes, customizing routing, plugging in extensions — go straight to Architecture. The graph topologies, node lists, shared state keys, and extension points all live there.
Lifecycle states
| State | Meaning |
|---|---|
IDLE | Ready for input |
PLANNING | Generating a plan (Plan-and-Execute mode) |
EXECUTING | Running tool calls or sub-tasks |
RUNNING | Active ReAct loop or Plan-Execute graph execution |
WAITING_USER_INPUT | Paused for user response |
DONE | Completed |
FAILED | Execution failed |
ERROR | Error state |
Why the turn ended:
| Value | Meaning |
|---|---|
NORMAL | LLM gave a direct final answer |
SUMMARIZED | Completed after a context-compression pass |
MAX_ITERATIONS_REACHED | Forced convergence at iteration limit |
ERROR_FALLBACK | Degraded answer after an error |
Reliability features
These are things the runtime does so agents don't fail in ways you'd have to debug:
- Context pruning — when the context window gets too full, earlier turns get summarized by the LLM and the summary replaces them. Cached for 30 minutes. Injected as a user message, not a system message, to prevent prompt injection from historical content.
- Structured compaction (on prompt-too-long) — when the model returns "prompt too long," the runtime walks a four-stage escalation: soft trim → hard clear → pre-prune → LLM structured summary. At every stage it always preserves the prefix — the system prompt + the goal anchor stay intact — and injects the final summary as a UserMessage. Delegation tool results are never compacted (they're a child's hard-won output; lose them and they're gone). After a failed summary there's a 10-minute cooldown, so the runtime won't keep hammering the LLM inside the same over-budget turn.
- Thinking recovery — if a stream breaks mid-response, the partial thinking and content persist and show up when the conversation reloads.
- Iteration limit handler — instead of crashing when
max_iterationsis hit, the runtime forces a best-effort summary answer. - Stale stream cleanup — every open SSE stream is tracked, abandoned ones are reaped automatically.
- 429 retry — LLM rate-limit errors trigger automatic retries with backoff.
- Repetition detection — agents looping on the same tool call get forced out.
- Configurable tool timeouts — one slow tool can't freeze a turn.
- Channel health monitor — failing channel adapters restart with exponential backoff.
None of these are user-facing buttons. They just happen.
Agent management API
Create
curl -X POST http://localhost:18088/api/v1/agents \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Tech Assistant",
"description": "A professional technical documentation assistant",
"agentType": "react",
"systemPrompt": "You are a professional technical documentation assistant...",
"maxIterations": 10
}'List / Get / Update / Delete
curl http://localhost:18088/api/v1/agents -H "Authorization: Bearer YOUR_JWT_TOKEN"
curl http://localhost:18088/api/v1/agents/1 -H "Authorization: Bearer YOUR_JWT_TOKEN"
curl -X PUT http://localhost:18088/api/v1/agents/1 \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-d '{"name":"Tech Assistant v2","maxIterations":15}'
curl -X DELETE http://localhost:18088/api/v1/agents/1 \
-H "Authorization: Bearer YOUR_JWT_TOKEN"Streaming chat
curl -N "http://localhost:18088/api/v1/agents/1/chat/stream?message=hello&conversationId=default" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"Debugging
DEBUG logging in application.yml:
logging:
level:
vip.mate.agent: DEBUG
vip.mate.agent.graph: DEBUGYou'll see node-by-node execution: state transitions, dispatcher routing, iteration counts, tool call arguments and results, Tool Guard check results.
Common issues
| Symptom | Likely cause |
|---|---|
| Agent doesn't respond or times out | Model config wrong, API key invalid, quota exhausted |
| Agent stuck in a loop | max_iterations too low, or a tool returning errors repeatedly |
MAX_ITERATIONS_REACHED happening often | Refine the system prompt or raise the limit |
| Tool calls silently failing | Tool Guard is blocking — check mate_tool_guard_audit_log |
| Approval-waiting graph won't resume | toolCallPayload format mismatch in chatWithReplay |
Next
- Tools — what agents can call
- Skills — how to extend what agents can do
- LLM Wiki — how knowledge gets read by agents
- Memory — how agents remember across conversations
- Workflow (1.3.0+) — orchestrate multiple digital employees and system actions into a business process
- Triggers (1.3.0+) — let events automatically start workflows or agent conversations
- Architecture — the StateGraph runtime in depth
