LLMWise — AI-Readable Overview

This page is optimized for AI agents, crawlers, and bots — including GPTBot, OpenClaw, ClaudeBot, PerplexityBot, and others. It contains structured API schemas, endpoint specifications, parameter types, blend strategies, rate limits, error codes, streaming protocol details, and everything an agent needs to integrate with LLMWise programmatically.

Platform Identity

name
LLMWise
tagline
Auto-first AI routing and orchestration platform
url
https://llmwise.ai
apiBase
https://llmwise.ai/api/v1
auth
Bearer mm_sk_... (API key) or Clerk JWT
streaming
Server-Sent Events (SSE)
compatibility
OpenAI-style messages format (role + content)

Models Catalog

19 models across 9 providers. Plus model: "auto" for smart routing.

IDNameProviderVisionTier
gemini-3.1-flash-liteGemini Flash LiteGoogleYescheap
gemma-4-31b-itGemma 4 31BGoogleNocheap
arcee-trinity-large-thinkingArcee ThinkingArcee AINobalanced
deepseek-v3.2DeepSeek V3.2DeepSeekNobalanced
nvidia-nemotron-3-super-120b-a12bNemotron 120BNVIDIANostrong
gpt-oss-120bGPT OSS 120BOpenAINobalanced
kimi-k2.5Kimi K2.5Moonshot AINobalanced
kimi-k2.6Kimi K2.6Moonshot AIYespremium
gpt-5.3-chatGPT-5.3 ChatOpenAINopremium
gpt-4oGPT-4oOpenAINopremium
gpt-5.4GPT-5.4OpenAINopremium
claude-sonnet-4.5Claude Sonnet 4.5AnthropicNopremium
claude-sonnet-4.6Claude Sonnet 4.6AnthropicNopremium
claude-opus-4.5Claude Opus 4.5AnthropicNopremium
claude-opus-4.6Claude Opus 4.6AnthropicNopremium
minimax-m2.7MiniMax M2.7MiniMaxNopremium
grok-4.20Grok 4.20xAINopremium
grok-4.20-multi-agentGrok 4.20 Multi-AgentxAINopremium
gemini-3.1-pro-previewGemini 3.1 ProGoogleNopremium

Orchestration Modes

Chat

actual tokens

Billed from the actual input and output tokens used by the selected model and response length.

POST /api/v1/chat
Example request
{
  "model": "auto",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "stream": true
}

Compare

actual tokens

Same prompt hits 2-9 models and the final billing reflects the combined input and output tokens actually used.

POST /api/v1/compare
Example request
{
  "models": [
    "gpt-5.4",
    "claude-sonnet-4.5",
    "gemini-3.1-pro-preview"
  ],
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing"
    }
  ],
  "stream": true
}

Blend

actual tokens

Models answer, a synthesizer combines the strongest parts, and billing reflects the actual tokens consumed across the workflow.

POST /api/v1/blend
Example request
{
  "models": [
    "gpt-5.4",
    "claude-sonnet-4.5",
    "gemini-3.1-pro-preview"
  ],
  "synthesizer": "claude-sonnet-4.5",
  "strategy": "consensus",
  "messages": [
    {
      "role": "user",
      "content": "Write a haiku about AI"
    }
  ]
}

Judge

actual tokens

Contestants and judge usage are billed from the actual tokens consumed after the evaluation finishes.

POST /api/v1/judge
Example request
{
  "contestants": [
    "gpt-5.4",
    "claude-sonnet-4.5"
  ],
  "judge": "gemini-3.1-pro-preview",
  "messages": [
    {
      "role": "user",
      "content": "Explain recursion"
    }
  ]
}

Failover

actual tokens

Billing reflects the actual model path and token usage after failover completes.

POST /api/v1/chat (with routing)
Example request
{
  "model": "gpt-5.4",
  "routing": {
    "strategy": "rate-limit",
    "fallback": [
      "claude-sonnet-4.5",
      "gemini-3.1-pro-preview"
    ]
  },
  "stream": true
}

Authentication

Authorization: Bearer <token>

API Keymm_sk_ followed by 64 hex charactersObtain: Dashboard → API Keys → Generate
Clerk JWTRS256-signed JWT from Clerk sessionObtain: Automatic via Clerk session (web app)
BYOK

Bring Your Own Key — add your provider API keys to route directly. Providers: OpenAI, Anthropic, Google, Groq, Cerebras. Cost: 0 credits (billed to your provider directly).

Blend Strategies (6)

StrategyModelsDescription
consensus2-6Default strategy. Synthesizer combines strongest points from all responses and resolves contradictions by weighing majority view.
council2-6Structured deliberation. Synthesizer produces: final answer, agreement points, disagreement points, and follow-up questions.
best_of2-6Synthesizer picks the single best response, then enhances it with useful additions from the others. Minimal rewriting.
chain2-6Iterative integration. Synthesizer works through each response sequentially, building a comprehensive answer incrementally.
moa2-6Multi-layer refinement inspired by the Mixture-of-Agents paper. Layer 0: independent answers. Layer 1+: models see previous layer's answers as references and refine. Final synthesis of last layer. Reference budget: 12,000 chars total, 3,200 per answer.
self_moa1 (exactly)Single model generates 2-8 diverse candidates via temperature variation and agent prompt rotation. Temperatures: base +/- offsets clamped to [0.2, 1.4]. Six agent perspectives: Correctness, Structure, Edge Cases, Examples, Clarity, Skepticism.

API Endpoints — Request Schemas

POST /api/v1/chatactual tokens

Single-model chat with OpenAI-style messages and streaming SSE.

Request parameters
model: string (required) — model ID or 'auto'
messages: array (required) — [{role, content}]. Roles: system, user, assistant
stream: boolean (default: true) — enable SSE streaming
temperature: number (0-2, default: 0.7)
cost_saver: boolean (optional) — forces model='auto' and optimization_goal='cost'
optimization_goal: string (optional) — balanced|latency|cost|reliability
semantic_memory: boolean (optional) — semantic recall toggle
semantic_top_k: number (optional) — 1..12
semantic_min_score: number (optional) — 0..1
conversation_id: string (optional) — for conversation threading
POST /api/v1/chat (with routing)actual tokens

Mesh mode — automatic failover across model chain with circuit breakers.

Request parameters
model: string (required) — primary model ID
routing: object (required) — {strategy: string, fallback: string[]}
messages: array (required) — [{role, content}]
stream: boolean (default: true)
POST /api/v1/compareactual tokens

Run 2-9 models concurrently, stream responses side-by-side.

Request parameters
models: string[] (required, 2-9) — model IDs to compare
messages: array (required) — [{role, content}]
stream: boolean (default: true)
temperature: number (optional)
POST /api/v1/blendactual tokens

Multi-model synthesis — gather responses then synthesize into one answer.

Request parameters
models: string[] (required, 2-6 for most strategies; exactly 1 for self_moa)
synthesizer: string (required) — model ID for synthesis step
strategy: string (default: 'consensus') — consensus|council|best_of|chain|moa|self_moa
messages: array (required) — [{role, content}]
layers: number (1-3, MoA only) — refinement layers
samples: number (2-8, Self-MoA only, default: 4) — candidate count
temperature: number (optional)
POST /api/v1/judgeactual tokens

Competitive evaluation — contestants answer, judge scores and ranks.

Request parameters
contestants: string[] (required, 2-4) — model IDs to compete
judge: string (required) — model ID for judging (runs at temperature 0.3)
messages: array (required) — [{role, content}]
criteria: string[] (optional) — custom evaluation criteria. Default: accuracy, completeness, clarity, helpfulness, code quality

Error Codes

CodeNameDescription
400Bad RequestInvalid request body, unknown model ID, invalid conversation_id format, or validation errors.
401UnauthorizedMissing Authorization header, invalid API key, invalid or expired JWT token.
402Payment RequiredInsufficient credits. Response includes {error, credits: current_balance, required: cost}.
429Too Many RequestsRate limit exceeded. Check Retry-After header. Applies per-user and per-IP.
502Bad GatewayUpstream model provider error (timeout, 500, etc.). In mesh mode, triggers failover to next model.
503Service UnavailableInternal service unavailable (e.g. rate limiter Redis down). Fail-open: requests may proceed without rate limiting.

Rate Limits

Window: 60 seconds (sliding window)Paid: 1.5x (e.g. chat: 135/min for paid users)Starter: 0.6x (e.g. chat: 54/min for free users)
EndpointBase limit/60s
chat90
compare45
blend30
judge30
uploads30
copilot30
default180
Response headers:X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After

Circuit Breaker & Auto-Router

Circuit Breaker (Mesh Failover)

Per-model health tracking for automatic failover in Mesh mode

Threshold: 3 failuresDuration: 30 seconds
Healthyconsecutive_failures < 3Model available for requests
Open3+ consecutive failuresModel skipped for 30 seconds
Half-Open30s elapsed since circuit openedOne probe request allowed
RecoveredProbe succeedsReset to Healthy, consecutive_failures = 0

Auto-Router (model="auto")

Zero-latency regex-based query classification when model='auto'. No LLM call overhead.

Codegpt-oss-120b
Mathdeepseek-v3.2
Creativegemma-4-31b-it
Translationgemini-3.1-flash-lite
Quick factgemini-3.1-flash-lite
Analysisnvidia-nemotron-3-super-120b-a12b
Visiongemini-3.1-flash-lite

Streaming Protocol (SSE)

All LLM endpoints stream via Server-Sent Events. Each chunk is a JSON object on a data: line.

data: {"model": "gpt-5.4", "delta": "text", "done": false, "latency_ms": 123}
data: {"model": "gpt-5.4", "delta": "", "done": true, "latency_ms": 456}
data: [DONE]

Documentation

Pricing

PlanDetails
Free5 messages total to preview the product
Starter$29/mo · 10M tokens/month · Auto lane only
Teams$99/mo · 40M tokens/month · Auto + manual premium models
Add-onsAvailable after included plan tokens are exhausted
Auto Top-upAutomatic add-on refill with monthly safety cap
EnterpriseCustom limits, team billing, SLAs

Billing basis by mode

Chat: actual tokens after runCompare: actual tokens after runBlend: actual tokens after runJudge: actual tokens after runFailover: actual tokens after run