LLMWise — AI-Readable Overview
This page is optimized for AI agents, crawlers, and bots — including GPTBot, OpenClaw, ClaudeBot, PerplexityBot, and others. It contains structured API schemas, endpoint specifications, parameter types, blend strategies, rate limits, error codes, streaming protocol details, and everything an agent needs to integrate with LLMWise programmatically.
Platform Identity
- name
- LLMWise
- tagline
- Auto-first AI routing and orchestration platform
- url
- https://llmwise.ai
- apiBase
- https://llmwise.ai/api/v1
- auth
- Bearer mm_sk_... (API key) or Clerk JWT
- streaming
- Server-Sent Events (SSE)
- compatibility
- OpenAI-style messages format (role + content)
Models Catalog
19 models across 9 providers. Plus model: "auto" for smart routing.
| ID | Name | Provider | Vision | Tier |
|---|---|---|---|---|
| gemini-3.1-flash-lite | Gemini Flash Lite | Yes | cheap | |
| gemma-4-31b-it | Gemma 4 31B | No | cheap | |
| arcee-trinity-large-thinking | Arcee Thinking | Arcee AI | No | balanced |
| deepseek-v3.2 | DeepSeek V3.2 | DeepSeek | No | balanced |
| nvidia-nemotron-3-super-120b-a12b | Nemotron 120B | NVIDIA | No | strong |
| gpt-oss-120b | GPT OSS 120B | OpenAI | No | balanced |
| kimi-k2.5 | Kimi K2.5 | Moonshot AI | No | balanced |
| kimi-k2.6 | Kimi K2.6 | Moonshot AI | Yes | premium |
| gpt-5.3-chat | GPT-5.3 Chat | OpenAI | No | premium |
| gpt-4o | GPT-4o | OpenAI | No | premium |
| gpt-5.4 | GPT-5.4 | OpenAI | No | premium |
| claude-sonnet-4.5 | Claude Sonnet 4.5 | Anthropic | No | premium |
| claude-sonnet-4.6 | Claude Sonnet 4.6 | Anthropic | No | premium |
| claude-opus-4.5 | Claude Opus 4.5 | Anthropic | No | premium |
| claude-opus-4.6 | Claude Opus 4.6 | Anthropic | No | premium |
| minimax-m2.7 | MiniMax M2.7 | MiniMax | No | premium |
| grok-4.20 | Grok 4.20 | xAI | No | premium |
| grok-4.20-multi-agent | Grok 4.20 Multi-Agent | xAI | No | premium |
| gemini-3.1-pro-preview | Gemini 3.1 Pro | No | premium |
Orchestration Modes
Chat
actual tokensBilled from the actual input and output tokens used by the selected model and response length.
{
"model": "auto",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"stream": true
}Compare
actual tokensSame prompt hits 2-9 models and the final billing reflects the combined input and output tokens actually used.
{
"models": [
"gpt-5.4",
"claude-sonnet-4.5",
"gemini-3.1-pro-preview"
],
"messages": [
{
"role": "user",
"content": "Explain quantum computing"
}
],
"stream": true
}Blend
actual tokensModels answer, a synthesizer combines the strongest parts, and billing reflects the actual tokens consumed across the workflow.
{
"models": [
"gpt-5.4",
"claude-sonnet-4.5",
"gemini-3.1-pro-preview"
],
"synthesizer": "claude-sonnet-4.5",
"strategy": "consensus",
"messages": [
{
"role": "user",
"content": "Write a haiku about AI"
}
]
}Judge
actual tokensContestants and judge usage are billed from the actual tokens consumed after the evaluation finishes.
{
"contestants": [
"gpt-5.4",
"claude-sonnet-4.5"
],
"judge": "gemini-3.1-pro-preview",
"messages": [
{
"role": "user",
"content": "Explain recursion"
}
]
}Failover
actual tokensBilling reflects the actual model path and token usage after failover completes.
{
"model": "gpt-5.4",
"routing": {
"strategy": "rate-limit",
"fallback": [
"claude-sonnet-4.5",
"gemini-3.1-pro-preview"
]
},
"stream": true
}Authentication
Authorization: Bearer <token>
Bring Your Own Key — add your provider API keys to route directly. Providers: OpenAI, Anthropic, Google, Groq, Cerebras. Cost: 0 credits (billed to your provider directly).
Blend Strategies (6)
| Strategy | Models | Description |
|---|---|---|
| consensus | 2-6 | Default strategy. Synthesizer combines strongest points from all responses and resolves contradictions by weighing majority view. |
| council | 2-6 | Structured deliberation. Synthesizer produces: final answer, agreement points, disagreement points, and follow-up questions. |
| best_of | 2-6 | Synthesizer picks the single best response, then enhances it with useful additions from the others. Minimal rewriting. |
| chain | 2-6 | Iterative integration. Synthesizer works through each response sequentially, building a comprehensive answer incrementally. |
| moa | 2-6 | Multi-layer refinement inspired by the Mixture-of-Agents paper. Layer 0: independent answers. Layer 1+: models see previous layer's answers as references and refine. Final synthesis of last layer. Reference budget: 12,000 chars total, 3,200 per answer. |
| self_moa | 1 (exactly) | Single model generates 2-8 diverse candidates via temperature variation and agent prompt rotation. Temperatures: base +/- offsets clamped to [0.2, 1.4]. Six agent perspectives: Correctness, Structure, Edge Cases, Examples, Clarity, Skepticism. |
API Endpoints — Request Schemas
Single-model chat with OpenAI-style messages and streaming SSE.
model: string (required) — model ID or 'auto'
messages: array (required) — [{role, content}]. Roles: system, user, assistant
stream: boolean (default: true) — enable SSE streaming
temperature: number (0-2, default: 0.7)
cost_saver: boolean (optional) — forces model='auto' and optimization_goal='cost'
optimization_goal: string (optional) — balanced|latency|cost|reliability
semantic_memory: boolean (optional) — semantic recall toggle
semantic_top_k: number (optional) — 1..12
semantic_min_score: number (optional) — 0..1
conversation_id: string (optional) — for conversation threadingMesh mode — automatic failover across model chain with circuit breakers.
model: string (required) — primary model ID
routing: object (required) — {strategy: string, fallback: string[]}
messages: array (required) — [{role, content}]
stream: boolean (default: true)Run 2-9 models concurrently, stream responses side-by-side.
models: string[] (required, 2-9) — model IDs to compare
messages: array (required) — [{role, content}]
stream: boolean (default: true)
temperature: number (optional)Multi-model synthesis — gather responses then synthesize into one answer.
models: string[] (required, 2-6 for most strategies; exactly 1 for self_moa)
synthesizer: string (required) — model ID for synthesis step
strategy: string (default: 'consensus') — consensus|council|best_of|chain|moa|self_moa
messages: array (required) — [{role, content}]
layers: number (1-3, MoA only) — refinement layers
samples: number (2-8, Self-MoA only, default: 4) — candidate count
temperature: number (optional)Competitive evaluation — contestants answer, judge scores and ranks.
contestants: string[] (required, 2-4) — model IDs to compete
judge: string (required) — model ID for judging (runs at temperature 0.3)
messages: array (required) — [{role, content}]
criteria: string[] (optional) — custom evaluation criteria. Default: accuracy, completeness, clarity, helpfulness, code qualityError Codes
| Code | Name | Description |
|---|---|---|
| 400 | Bad Request | Invalid request body, unknown model ID, invalid conversation_id format, or validation errors. |
| 401 | Unauthorized | Missing Authorization header, invalid API key, invalid or expired JWT token. |
| 402 | Payment Required | Insufficient credits. Response includes {error, credits: current_balance, required: cost}. |
| 429 | Too Many Requests | Rate limit exceeded. Check Retry-After header. Applies per-user and per-IP. |
| 502 | Bad Gateway | Upstream model provider error (timeout, 500, etc.). In mesh mode, triggers failover to next model. |
| 503 | Service Unavailable | Internal service unavailable (e.g. rate limiter Redis down). Fail-open: requests may proceed without rate limiting. |
Rate Limits
| Endpoint | Base limit/60s |
|---|---|
| chat | 90 |
| compare | 45 |
| blend | 30 |
| judge | 30 |
| uploads | 30 |
| copilot | 30 |
| default | 180 |
Circuit Breaker & Auto-Router
Circuit Breaker (Mesh Failover)
Per-model health tracking for automatic failover in Mesh mode
Auto-Router (model="auto")
Zero-latency regex-based query classification when model='auto'. No LLM call overhead.
Streaming Protocol (SSE)
All LLM endpoints stream via Server-Sent Events. Each chunk is a JSON object on a data: line.
data: {"model": "gpt-5.4", "delta": "text", "done": false, "latency_ms": 123}
data: {"model": "gpt-5.4", "delta": "", "done": true, "latency_ms": 456}
data: [DONE]Documentation
Pricing
| Plan | Details |
|---|---|
| Free | 5 messages total to preview the product |
| Starter | $29/mo · 10M tokens/month · Auto lane only |
| Teams | $99/mo · 40M tokens/month · Auto + manual premium models |
| Add-ons | Available after included plan tokens are exhausted |
| Auto Top-up | Automatic add-on refill with monthly safety cap |
| Enterprise | Custom limits, team billing, SLAs |