API Core

Chat API Reference

Detailed contract for /api/v1/chat including streaming, routing, and semantic memory fields.

12 minUpdated 2026-02-15
Summary

Detailed contract for /api/v1/chat including streaming, routing, and semantic memory fields.

6 deep-dive sections3 code samples
Quick Start
  1. Copy the request sample from this page.
  2. Run it in API Explorer with your key.
  3. Confirm stream done payload (finish_reason + charged credits).
  4. Move the same payload into your backend code.

Endpoint

MethodPathAuthResponse
POST/api/v1/chatBearer mm_skSSE (stream=true) or JSON

Request fields

FieldTypeRequiredNotes
modelstringyesSpecific model id or auto
messagesMessage[]yesrole + content blocks
temperaturenumberno0 to 2
max_tokensintegerno1 to 128000
streambooleannodefault true
routingobjectnoMesh fallback strategy
cost_saverbooleannoforces auto + cost goal
optimization_goalstringnobalanced|latency|cost|reliability
semantic_memorybooleannodefault true — cross-session recall
semantic_top_kintegerno1..12
semantic_min_scorenumberno0..1

Streaming events

In single-model chat mode, SSE messages are plain JSON chunks that include a delta field (no explicit event field).

In Mesh/failover mode (when routing is set, or when Auto uses an implicit fallback chain), chunks are wrapped in explicit events (event: "route" | "chunk" | "trace"), followed by a final done payload with billing metadata.

EventContainsUse
(delta chunk)delta, doneAppend delta to the assistant message as it streams
routemodel, status, reason/errorMesh/Auto failover: observe retries and circuit-open skips
chunkmodel, delta, tokens, cost, finish_reasonMesh/Auto failover: render deltas (event-wrapped)
tracefinal_model, attempts (+ terminal_error on failure)Mesh/Auto failover: inspect routing summary
donecredits_charged, credits_remaining, finish_reasonFinalize UI + billing state
terminal errorerror, status_code, refundedHandle failure (stream ends after this)

Request example

{
  "model": "auto",
  "cost_saver": true,
  "optimization_goal": "cost",
  "messages": [
    {"role": "user", "content": "Design retry logic for API failures."}
  ],
  "semantic_memory": true,
  "semantic_top_k": 4,
  "stream": true
}

Done event example

{
  "event": "done",
  "id": "request_uuid",
  "resolved_model": "deepseek-v3",
  "finish_reason": "stop",
  "credits_charged": 1,
  "credits_remaining": 2038
}

Non-stream response example

{
  "id": "request_uuid",
  "model": "gpt-5.2",
  "content": "...",
  "prompt_tokens": 42,
  "completion_tokens": 312,
  "latency_ms": 1180,
  "cost": 0.0039,
  "credits_charged": 1,
  "credits_remaining": 2038,
  "finish_reason": "stop",
  "mode": "chat"
}
Important behavior

When cost_saver=true, the backend normalizes to model=auto and optimization_goal=cost before routing.

Docs Assistant

ChatKit-style guided help

Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.

Sign in to ask implementation questions and get runnable snippets.

Sign in to use assistant
Previous
Authentication and API Keys
Next
Auto Routing and Optimization (Load Balancer Mode)