Chat & Messaging

Chat is where you actually work. Everything else in MateClaw — agents, tools, memory, wiki, channels — exists so that what happens inside this box can be good.

This page is about what that box actually does. Not the REST endpoint. Not the SSE event schema. What you see, what it does for you, and why the interaction design is what it is. (The API part is at the bottom for integrators who need it.)

What you see

You type. The agent thinks. Tokens start streaming. You watch.

But it's not just a wall of text coming back. A single assistant response is composed of segments, and each segment has a type:

Thinking — the agent's internal reasoning, shown in a collapsed panel you can open. Off by default for brevity, one click to expand. Preserved even if the network drops mid-turn.
Tool call — a card showing the tool name, its arguments, and (when it finishes) its result. ChatGPT-style, inline with the conversation, not hidden behind a debug panel.
Content — the actual reply text, streamed token by token.
Attachments — generated images, videos, music, TTS clips — bound to the message that produced them, not floating in a new bubble.

Segments arrive progressively. They persist to the database in real time — meaning if you refresh the page mid-response, you don't lose what's already rendered. If the backend dies and restarts, the partial reply is still there when it comes back.

This used to not be true. Now it is.

The task list, for plans that take time

When you ask something that triggers Plan-and-Execute, a persistent task list appears next to the conversation. It shows:

The current plan (2–6 steps, generated by the agent before execution starts)
Each step's live status: pending → running → done (or failed)
Each step's output, captured as the agent executes
A compact summary when the plan completes

The task list survives page refresh. It survives navigating away and coming back. It survives Plan generation failures. If a plan blows up mid-flight, you see which step blew up and why — not a blank spinner that never resolves.

Use Plan-and-Execute when the task needs several ordered steps and you want to watch the work happen. Use ReAct for everything smaller.

Thinking, tool calls, and what to trust

One of the questions MateClaw tries to answer with its chat UI is: should you trust what the AI just told you? The default answer elsewhere is "look at the answer and guess". MateClaw tries to do better.

Thinking is visible. If the agent's thinking was sloppy, you can expand the thinking panel and see it. If it skipped a step, it's in there. If it hallucinated a fact before catching itself, you can watch it catch itself.

Tool calls are visible. Whether the agent searches the web, reads a file, or queries the Wiki — for every tool call you see the query, the result, and how the agent used it. Nothing is hidden in a "trust me" box.

Phase hints are visible. At the top of a streaming response, a small indicator shows the current phase — thinking, searching, reading, generating, summarizing. You're never staring at a spinner wondering whether the agent is alive.

Trust is earned by showing the work. MateClaw shows the work.

Execution-plan & tool-call detail viewer (1.5.0). Every plan step and every tool-call row gets a "view details" icon on the right. Click it for a frosted-glass dialog showing the full request arguments and response output — the parts the inline preview truncates — with copy buttons for request and response, and a status badge (in progress / completed / failed / pending). The data lives in message metadata, so plan steps and tool calls stay readable after a page reload.

Multi-channel realtime sync

The ChatConsole isn't just where you chat. It's an operations console.

Realtime sync for external channels — a WeChat user talks to your agent, you see the reasoning, tool calls, and streaming reply in the ChatConsole sidebar. No refresh. No waiting.
Running indicator — conversations with an active agent run show an amber pulse on their icon. You see what's alive at a glance.
Switch doesn't kill — flip to a different conversation mid-stream, the previous one keeps running in the background. Flip back, reconnect to the live buffer. Not a single token lost.
No duplicate bubbles — the reconcile layer matches client-uuid placeholders with DB-persisted messages via ID promotion. Messages don't flash into duplicates.
Actionable error cards — Ollama "does not support tools" is no longer "unknown error". You get specific guidance: "switch to qwen3 / qwen2.5:7b+ / llama3.1:8b+".

Attachments and file uploads

Three ways to give the agent a file:

Method	Behavior
Click the attachment button	Open a file picker, select one or more files
Paste from clipboard (Ctrl/Cmd+V)	Paste images or files copied from other apps
Drag & drop into the chat area	A translucent overlay appears; drop anywhere inside

Drop a folder on the desktop app and the agent gets a reference to the folder's absolute path — it can then walk it with the file-reader or shell tool. Drop a folder on the web and MateClaw recursively expands it and uploads each file individually.

Upload limits, default:

Setting	Default
Max file size	100 MB
Max request size	200 MB
Allowed types	All

Images handed to a vision-capable model get attached for visual understanding. PDFs and DOCX files go through text extraction (with OCR fallback for scanned material). Everything the agent reads lands in its context for that turn.

Tool-generated files: download links survive restarts (1.5.0, #243)

Files a worker generates via tools (documents / images / audio…) are now persisted to disk under data/generated-files/, with a 7-day retention window + a 6-hour cleanup sweep and an in-memory LRU on top — download links keep working after a restart and are no longer bounded by the old 10-minute in-memory window. The frontend intercepts /api/v1/files/generated/{id} downloads via a global click delegator: success goes through an authenticated fetch → blob download; failure (404/410/expired) just shows a toast, so a dead link no longer wedges the whole page.

Primary model can't see images? "Multimodal sidecar" routing

Added in 1.3.0

When the agent's primary model is text-only (e.g. deepseek-chat, kimi-k2), uploading an image no longer breaks. The runtime auto-routes through a sidecar. See issue #87.

How it works:

You configure a vision sidecar model in Settings → Models → Multimodal sidecar (e.g. glm-4v, qwen-vl-max).
On every upload, the router checks whether the primary model supports vision:
- Yes → take the existing native multimodal path; raw image bytes go to the primary model.
- No → sidecar fires. The vision model captions the image once, the description is folded back into the user message text, and the primary model answers as if you'd typed those words.

The primary chat stays cheap (one vision-model call per uploaded image, conversation flow unchanged). And — importantly — your custom tools are no longer blocked: previously the system prompt forced a "do not call any tools" instruction when an attachment couldn't be consumed natively. That hard ban is gone. With a media-capable tool bound to the agent, the LLM can now choose to delegate to it.

The whole routing decision is fully visible:

Hint above the input box: pasting an image immediately surfaces "will route to xxx (sidecar mode)".
Routing badge on the assistant bubble: the action row gets a 🔀 routed-to-xxx (image) chip; hover for primary / sidecar / provider details.
Per-message attribution: every reply shows which model actually served it. Previously a black box, now a glass box.

Current limit (v1): image sidecar only. Video attachments still require either switching to a video-capable primary model, or binding a custom video tool (which is no longer suppressed). Video sidecar is queued for the next iteration.

How messages really flow

This is the thirty-second version. The ninety-second version is in Agents.

You type
   │
   ▼
POST /api/v1/chat?agentId={id}                ← or SSE for streaming (POST /api/v1/chat/stream)
   │
   ▼
Conversation Manager                          ← load/create conversation, append user message
   │
   ▼
Agent Engine                                  ← ReAct loop or Plan-and-Execute graph
   │     ┌──► context window assembly: system prompt + workspace files + history
   │     ├──► tool calls (guarded by Tool Guard; may pause for approval)
   │     ├──► wiki reads (if bound to a knowledge base)
   │     └──► memory writes (async, after the turn ends)
   │
   ▼
SSE stream / direct response                  ← segment-by-segment delivery
   │
   ▼
Persist to mate_message                       ← real-time, segment-by-segment

The thing to notice: persistence is synchronous with streaming. Segments land in the database as they arrive from the LLM, not in a single write at the end. That's why refreshing mid-stream doesn't eat your reply.

Conversations

A conversation is a sequence of messages scoped to a single agent and a single user. MateClaw stores them in two tables:

mate_conversation

Column	Purpose
`id`	Conversation ID
`user_id`	Owner
`agent_id`	Which agent this conversation runs against
`title`	Auto-generated from the first user message (editable)
`create_time` / `update_time`	Timestamps

mate_message

Column	Purpose
`id`	Message ID
`conversation_id`	Parent conversation
`role`	`user` / `assistant` / `system` / `tool`
`content`	Full text of the message (for segmented responses: the concatenated final content)
`segments`	JSON array of segments (thinking, tool_call, tool_result, content), for progressive display
`tool_calls`	JSON array of tool calls made by the assistant
`tool_call_id`	For tool-role messages, the call they satisfy
`create_time`	Timestamp

The segment representation is what powers the progressive display. It also makes the database the source of truth — the UI can reconstruct any past response exactly as it looked while streaming.

Per-conversation model selection

Added in 1.4.0

The model selector in the chat header now binds a model to the conversation, not as a global switch. See issue #150.

Switching the model in the header affects only this conversation: the choice is stored on the conversation and takes effect starting with the next message. A conversation you never set explicitly falls back to the workspace default model. The runtime model indicator stays in sync with whatever is pinned on the conversation — what you see is what the next turn actually uses.

This isolation also makes model config more robust: a single bad model id no longer takes its whole provider offline. The broken conversation only affects itself; everything else keeps running.

Conversation list management

Added in 1.4.0

The conversation sidebar grew from a plain history list into an actionable operations panel. See issue #144.

Pin / unpin — from each row's ⋮ overflow menu. Important threads stay at the top in a "Pinned" group.
Multi-select batch delete — enter multi-select mode and a checkbox appears on each row; tick several and delete them in one go.
Filter by employee — when the workspace has 2 or more employees, a dropdown appears at the top of the sidebar to filter the list by employee (hidden with a single employee, so there's no pointless control).
Status dots — read each conversation's state at a glance: currently generating (blue pulse), an active goal in progress, or unread content.

Global keyboard shortcuts

Added in 1.4.0

Two global shortcuts let you jump between conversations without touching the mouse. The hint lives in the sidebar footer.

Shortcut	Action
`Ctrl/Cmd + K`	Open the employee picker to jump to any chat
`Ctrl/Cmd + N`	Start a new conversation

Ctrl+N does not fire while you're typing in an input or textarea — its native behavior is left alone.

Session Admin page

Added in 1.4.0

When conversations outgrow the sidebar, reach a dedicated admin page from the chat header overflow menu ("Session Admin"), at /sessions.

This page exists for the "lots of conversations" case:

Server-side pagination — no more cramming thousands of conversations into the sidebar.
Search by title or ID — filter as you type to locate a specific conversation.
Depth-styled card layout — one card per conversation, denser than the sidebar.
Inline editable model chip — each row shows and switches that conversation's model directly, without entering it first.
Back button — one click returns you to the chat console.

Shared employee picker

Added in 1.4.0

A single shared picker dialog is reused in three places: the sidebar, the Ctrl+K shortcut, and the new-conversation modal.

All three entry points open the same dialog with identical behavior. Agent icons inside it are color-coded per employee, so in a multi-employee workspace you can tell who's who at a glance.

Context window management

Every turn, MateClaw builds the prompt that actually goes to the LLM. Roughly:

System prompt — the agent's instructions
Workspace file injection — AGENTS.md, SOUL.md, PROFILE.md, MEMORY.md (only enabled=true files)
Conversation summary — if earlier turns got compressed
Recent turns — as many as fit in the token budget
Current user message — always last

When the total exceeds defaultMaxInputTokens × compactTriggerRatio (default 128000 × 0.75 = 96000), the system calls the LLM to summarize earlier turns, caches the result for 30 minutes, and sends a compact version. If the LLM still returns a context_length_exceeded error, emergency trimming kicks in: discard older messages without calling the LLM, keep the last two turns.

More detail, plus the security rationale for injecting summaries as UserMessage rather than SystemMessage, is in Memory.

Multi-channel: same agent, everywhere

Different channels use different transports, but the agent underneath is the same. Same system prompt. Same tools. Same memory.

Channel	Transport	Streaming
Web	SSE	Yes
DingTalk	Stream (WebSocket) / Webhook	Yes (AI Card)
Feishu (Lark)	WebSocket / Webhook	No
WeChat Work (WeCom)	Long connection / Webhook	No
WeChat Personal	HTTP long polling	No
Telegram	Long-Polling / Webhook	Typing indicator
Discord	Gateway WebSocket	Typing indicator
QQ	WebSocket / Callback	No
Slack	Webhook / Socket mode	No

Go deeper in Channels.

API reference (for integrators)

Send a message

bash

curl -X POST 'http://localhost:18088/api/v1/chat?agentId=1' \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is the current time in Tokyo?",
    "conversationId": "conv-abc123"
  }'

Omit conversationId to start a new conversation. agentId is a query parameter, not a path segment.

SSE streaming

The SSE endpoint is POST /api/v1/chat/stream with agentId in the JSON body. Browser-native EventSource only supports GET, so integrators should use fetch() and read the response stream:

javascript

const resp = await fetch('/api/v1/chat/stream', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_JWT_TOKEN',
    'Content-Type': 'application/json',
    'Accept': 'text/event-stream',
  },
  body: JSON.stringify({
    agentId: 1,
    message: 'What is the current time in Tokyo?',
    conversationId: 'conv-abc123',
  }),
});

const reader = resp.body.getReader();
const decoder = new TextDecoder();
let buf = '';
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });
  // Split on SSE `\n\n` event boundaries and dispatch segments
}

See mateclaw-ui/src/composables/chat/useChat.ts for a full client implementation.

SSE event types

Event	Meaning
`phase`	Phase change — `thinking`, `action`, `observation`, `summarizing`
`message`	A content chunk — append to the current content segment
`thinking`	A thinking chunk — append to the thinking segment
`tool_call_start`	The agent is invoking a tool (tool name + arguments)
`tool_call_end`	The tool finished (result summary)
`plan_created`	Plan-and-Execute generated a plan
`step_start` / `step_end`	Plan-and-Execute step boundaries
`approval_required`	A guarded tool call needs human approval
`_usage_final`	Token usage statistics (end of stream)
`done`	Stream complete
`error`	Something went wrong

Conversation management

bash

# List
curl http://localhost:18088/api/v1/conversations \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

# Get messages
curl http://localhost:18088/api/v1/conversations/conv-abc123/messages \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

# Delete
curl -X DELETE http://localhost:18088/api/v1/conversations/conv-abc123 \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Agents — what's actually doing the thinking
Memory — what persists across conversations
Channels — every other place chat can happen
LLM Wiki — what the agent can read while answering

Chat & Messaging ​

What you see ​

The task list, for plans that take time ​

Thinking, tool calls, and what to trust ​

Multi-channel realtime sync ​

Attachments and file uploads ​

Primary model can't see images? "Multimodal sidecar" routing ​

How messages really flow ​

Conversations ​

Per-conversation model selection ​

Conversation list management ​

Global keyboard shortcuts ​

Session Admin page ​

Shared employee picker ​

Context window management ​

Multi-channel: same agent, everywhere ​

API reference (for integrators) ​

Send a message ​

SSE streaming ​

SSE event types ​

Conversation management ​

Next ​

Chat & Messaging

What you see

The task list, for plans that take time

Thinking, tool calls, and what to trust

Multi-channel realtime sync

Attachments and file uploads

Primary model can't see images? "Multimodal sidecar" routing

How messages really flow

Conversations

Per-conversation model selection

Conversation list management

Global keyboard shortcuts

Session Admin page

Shared employee picker

Context window management

Multi-channel: same agent, everywhere

API reference (for integrators)

Send a message

SSE streaming

SSE event types

Conversation management

Next