API Core

Chat API Reference

Detailed contract for /api/v1/chat including streaming, routing, and semantic memory fields.

12 minUpdated 2026-02-15

Summary

Detailed contract for /api/v1/chat including streaming, routing, and semantic memory fields.

6 deep-dive sections3 code samples

Quick Start

Copy the request sample from this page.
Run it in API Explorer with your key.
Confirm stream done payload (finish_reason + charged credits).
Move the same payload into your backend code.

Endpoint

Method	Path	Auth	Response
POST	/api/v1/chat	Bearer mm_sk	SSE (stream=true) or JSON

Request fields

Field	Type	Required	Notes
model	string	yes	Specific model id or auto
messages	Message[]	yes	role + content blocks
temperature	number	no	0 to 2
max_tokens	integer	no	1 to 128000
stream	boolean	no	default true
routing	object	no	Mesh fallback strategy
cost_saver	boolean	no	forces auto + cost goal
optimization_goal	string	no	balanced\|latency\|cost\|reliability
semantic_memory	boolean	no	default true — cross-session recall
semantic_top_k	integer	no	1..12
semantic_min_score	number	no	0..1

Streaming events

In single-model chat mode, SSE messages are plain JSON chunks that include a delta field (no explicit event field).

In Mesh/failover mode (when routing is set, or when Auto uses an implicit fallback chain), chunks are wrapped in explicit events (event: "route" | "chunk" | "trace"), followed by a final done payload with billing metadata.

Event	Contains	Use
(delta chunk)	delta, done	Append delta to the assistant message as it streams
route	model, status, reason/error	Mesh/Auto failover: observe retries and circuit-open skips
chunk	model, delta, tokens, cost, finish_reason	Mesh/Auto failover: render deltas (event-wrapped)
trace	final_model, attempts (+ terminal_error on failure)	Mesh/Auto failover: inspect routing summary
done	credits_charged, credits_remaining, finish_reason	Finalize UI + billing state
terminal error	error, status_code, refunded	Handle failure (stream ends after this)

Request example

{
  "model": "auto",
  "cost_saver": true,
  "optimization_goal": "cost",
  "messages": [
    {"role": "user", "content": "Design retry logic for API failures."}
  ],
  "semantic_memory": true,
  "semantic_top_k": 4,
  "stream": true
}

Done event example

{
  "event": "done",
  "id": "request_uuid",
  "resolved_model": "deepseek-v3",
  "finish_reason": "stop",
  "credits_charged": 1,
  "credits_remaining": 2038
}

Non-stream response example

{
  "id": "request_uuid",
  "model": "gpt-5.2",
  "content": "...",
  "prompt_tokens": 42,
  "completion_tokens": 312,
  "latency_ms": 1180,
  "cost": 0.0039,
  "credits_charged": 1,
  "credits_remaining": 2038,
  "finish_reason": "stop",
  "mode": "chat"
}

Important behavior

When cost_saver=true, the backend normalizes to model=auto and optimization_goal=cost before routing.

Compare/Blend/Judge reference Mesh mode tutorial Semantic memory API

Docs Assistant

ChatKit-style guided help

Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.

Authentication and API Keys

Auto Routing and Optimization (Load Balancer Mode)