Chat API Reference
Detailed contract for /api/v1/chat including streaming, routing, and semantic memory fields.
Detailed contract for /api/v1/chat including streaming, routing, and semantic memory fields.
- Copy the request sample from this page.
- Run it in API Explorer with your key.
- Confirm stream done payload (finish_reason + charged credits).
- Move the same payload into your backend code.
Endpoint
| Method | Path | Auth | Response |
|---|---|---|---|
| POST | /api/v1/chat | Bearer mm_sk | SSE (stream=true) or JSON |
Request fields
| Field | Type | Required | Notes |
|---|---|---|---|
| model | string | yes | Specific model id or auto |
| messages | Message[] | yes | role + content blocks |
| temperature | number | no | 0 to 2 |
| max_tokens | integer | no | 1 to 128000 |
| stream | boolean | no | default true |
| routing | object | no | Mesh fallback strategy |
| cost_saver | boolean | no | forces auto + cost goal |
| optimization_goal | string | no | balanced|latency|cost|reliability |
| semantic_memory | boolean | no | default true — cross-session recall |
| semantic_top_k | integer | no | 1..12 |
| semantic_min_score | number | no | 0..1 |
Streaming events
In single-model chat mode, SSE messages are plain JSON chunks that include a delta field (no explicit event field).
In Mesh/failover mode (when routing is set, or when Auto uses an implicit fallback chain), chunks are wrapped in explicit events (event: "route" | "chunk" | "trace"), followed by a final done payload with billing metadata.
| Event | Contains | Use |
|---|---|---|
| (delta chunk) | delta, done | Append delta to the assistant message as it streams |
| route | model, status, reason/error | Mesh/Auto failover: observe retries and circuit-open skips |
| chunk | model, delta, tokens, cost, finish_reason | Mesh/Auto failover: render deltas (event-wrapped) |
| trace | final_model, attempts (+ terminal_error on failure) | Mesh/Auto failover: inspect routing summary |
| done | credits_charged, credits_remaining, finish_reason | Finalize UI + billing state |
| terminal error | error, status_code, refunded | Handle failure (stream ends after this) |
Request example
{
"model": "auto",
"cost_saver": true,
"optimization_goal": "cost",
"messages": [
{"role": "user", "content": "Design retry logic for API failures."}
],
"semantic_memory": true,
"semantic_top_k": 4,
"stream": true
}
Done event example
{
"event": "done",
"id": "request_uuid",
"resolved_model": "deepseek-v3",
"finish_reason": "stop",
"credits_charged": 1,
"credits_remaining": 2038
}
Non-stream response example
{
"id": "request_uuid",
"model": "gpt-5.2",
"content": "...",
"prompt_tokens": 42,
"completion_tokens": 312,
"latency_ms": 1180,
"cost": 0.0039,
"credits_charged": 1,
"credits_remaining": 2038,
"finish_reason": "stop",
"mode": "chat"
}
When cost_saver=true, the backend normalizes to model=auto and optimization_goal=cost before routing.
ChatKit-style guided help
Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.
Sign in to ask implementation questions and get runnable snippets.
Sign in to use assistant