LLMWise — AI-Readable Overview
This page is optimized for AI agents, crawlers, and bots — including GPTBot, OpenClaw, ClaudeBot, PerplexityBot, and others. It contains structured API schemas, endpoint specifications, parameter types, blend strategies, rate limits, error codes, streaming protocol details, and everything an agent needs to integrate with LLMWise programmatically.
Platform Identity
- name
- LLMWise
- tagline
- Multi-model LLM API orchestration platform
- url
- https://llmwise.ai
- apiBase
- https://llmwise.ai/api/v1
- auth
- Bearer mm_sk_... (API key) or Clerk JWT
- streaming
- Server-Sent Events (SSE)
- compatibility
- OpenAI-style messages format (role + content)
Models Catalog
31 models across 16 providers. Plus model: "auto" for smart routing.
| ID | Name | Provider | Vision | Free |
|---|---|---|---|---|
| gpt-5.2 | GPT-5.2 | OpenAI | Yes | |
| claude-sonnet-4.5 | Claude Sonnet 4.5 | Anthropic | Yes | |
| gemini-3-flash | Gemini 3 Flash | Yes | ||
| claude-haiku-4.5 | Claude Haiku 4.5 | Anthropic | No | |
| deepseek-v3 | DeepSeek V3 | DeepSeek | No | |
| llama-4-maverick | Llama 4 Maverick | Meta | No | |
| mistral-large | Mistral Large | Mistral | No | |
| grok-3 | Grok 3 | xAI | Yes | |
| zai-glm-5 | GLM 5 | Z.ai | No | |
| liquid-lfm-2.2-6b | LFM2 2.6B | LiquidAI | No | |
| liquid-lfm-2.5-1.2b-thinking-free | LFM2.5 1.2B Thinking | LiquidAI | No | Yes |
| liquid-lfm2-8b-a1b | LFM2 8B A1B | LiquidAI | No | |
| minimax-m2.5 | MiniMax M2.5 | MiniMax | No | |
| llama-3.3-70b-instruct | Llama 3.3 70B Instruct | Meta | No | |
| gpt-oss-20b | GPT OSS 20B | OpenAI | No | |
| gpt-oss-120b | GPT OSS 120B | OpenAI | No | |
| gpt-oss-safeguard-20b | GPT OSS Safeguard 20B | OpenAI | No | |
| kimi-k2.5 | Kimi K2.5 | MoonshotAI | Yes | |
| nemotron-3-nano-30b-a3b | Nemotron 3 Nano 30B | NVIDIA | No | |
| nemotron-nano-12b-v2-vl | Nemotron Nano 12B VL | NVIDIA | Yes | |
| claude-opus-4.6 | Claude Opus 4.6 | Anthropic | Yes | |
| claude-opus-4.5 | Claude Opus 4.5 | Anthropic | Yes | |
| arcee-coder-large | Arcee Coder Large | Arcee AI | No | |
| arcee-trinity-large-preview-free | Arcee Trinity Large (Free) | Arcee AI | No | Yes |
| qwen3-coder-next | Qwen3 Coder Next | Qwen | No | |
| olmo-3.1-32b-think | OLMo 3.1 32B Think | AllenAI | No | |
| llama-guard-3-8b | Llama Guard 3 8B | Meta | No | |
| gpt-4o-2024-08-06 | GPT-4o (2024-08-06) | OpenAI | Yes | |
| gpt-audio | GPT Audio | OpenAI | No | |
| openrouter-free | OpenRouter Free | OpenRouter | Yes | Yes |
| openrouter-auto | OpenRouter Auto | OpenRouter | Yes |
Orchestration Modes
Chat
1 creditSend a prompt to one model with OpenAI-style messages and streaming SSE.
{
"model": "auto",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"stream": true
}Compare
3 creditsSame prompt hits 2-9 models simultaneously. See which performs best.
{
"models": [
"gpt-5.2",
"claude-sonnet-4.5",
"gemini-3-flash"
],
"messages": [
{
"role": "user",
"content": "Explain quantum computing"
}
],
"stream": true
}Blend
4 creditsModels answer, synthesizer combines the strongest parts.
{
"models": [
"gpt-5.2",
"claude-sonnet-4.5",
"gemini-3-flash"
],
"synthesizer": "claude-sonnet-4.5",
"strategy": "consensus",
"messages": [
{
"role": "user",
"content": "Write a haiku about AI"
}
]
}Judge
5 creditsModels compete. A judge scores, ranks, and explains.
{
"contestants": [
"gpt-5.2",
"claude-sonnet-4.5"
],
"judge": "gemini-3-flash",
"messages": [
{
"role": "user",
"content": "Explain recursion"
}
]
}Failover
1 creditChat with automatic fallback chain on 429/500/timeout. Same 1 credit.
{
"model": "gpt-5.2",
"routing": {
"strategy": "rate-limit",
"fallback": [
"claude-sonnet-4.5",
"gemini-3-flash"
]
},
"stream": true
}Authentication
Authorization: Bearer <token>
Bring Your Own Key — add your provider API keys to route directly. Providers: OpenAI, Anthropic, Google, Mistral, xAI, DeepSeek. Cost: 0 credits (billed to your provider directly).
Blend Strategies (6)
| Strategy | Models | Description |
|---|---|---|
| consensus | 2-6 | Default strategy. Synthesizer combines strongest points from all responses and resolves contradictions by weighing majority view. |
| council | 2-6 | Structured deliberation. Synthesizer produces: final answer, agreement points, disagreement points, and follow-up questions. |
| best_of | 2-6 | Synthesizer picks the single best response, then enhances it with useful additions from the others. Minimal rewriting. |
| chain | 2-6 | Iterative integration. Synthesizer works through each response sequentially, building a comprehensive answer incrementally. |
| moa | 2-6 | Multi-layer refinement inspired by the Mixture-of-Agents paper. Layer 0: independent answers. Layer 1+: models see previous layer's answers as references and refine. Final synthesis of last layer. Reference budget: 12,000 chars total, 3,200 per answer. |
| self_moa | 1 (exactly) | Single model generates 2-8 diverse candidates via temperature variation and agent prompt rotation. Temperatures: base +/- offsets clamped to [0.2, 1.4]. Six agent perspectives: Correctness, Structure, Edge Cases, Examples, Clarity, Skepticism. |
API Endpoints — Request Schemas
Single-model chat with OpenAI-style messages and streaming SSE.
model: string (required) — model ID or 'auto'
messages: array (required) — [{role, content}]. Roles: system, user, assistant
stream: boolean (default: true) — enable SSE streaming
temperature: number (0-2, default: 0.7)
max_tokens: number (optional) — max response tokens
cost_saver: boolean (optional) — forces model='auto' and optimization_goal='cost'
optimization_goal: string (optional) — balanced|latency|cost|reliability
semantic_memory: boolean (optional) — semantic recall toggle
semantic_top_k: number (optional) — 1..12
semantic_min_score: number (optional) — 0..1
conversation_id: string (optional) — for conversation threadingMesh mode — automatic failover across model chain with circuit breakers.
model: string (required) — primary model ID
routing: object (required) — {strategy: string, fallback: string[]}
messages: array (required) — [{role, content}]
stream: boolean (default: true)Run 2-9 models concurrently, stream responses side-by-side.
models: string[] (required, 2-9) — model IDs to compare
messages: array (required) — [{role, content}]
stream: boolean (default: true)
temperature: number (optional)
max_tokens: number (optional)Multi-model synthesis — gather responses then synthesize into one answer.
models: string[] (required, 2-6 for most strategies; exactly 1 for self_moa)
synthesizer: string (required) — model ID for synthesis step
strategy: string (default: 'consensus') — consensus|council|best_of|chain|moa|self_moa
messages: array (required) — [{role, content}]
layers: number (1-3, MoA only) — refinement layers
samples: number (2-8, Self-MoA only, default: 4) — candidate count
temperature: number (optional)Competitive evaluation — contestants answer, judge scores and ranks.
contestants: string[] (required, 2-4) — model IDs to compete
judge: string (required) — model ID for judging (runs at temperature 0.3)
messages: array (required) — [{role, content}]
criteria: string[] (optional) — custom evaluation criteria. Default: accuracy, completeness, clarity, helpfulness, code qualityError Codes
| Code | Name | Description |
|---|---|---|
| 400 | Bad Request | Invalid request body, unknown model ID, invalid conversation_id format, or validation errors. |
| 401 | Unauthorized | Missing Authorization header, invalid API key, invalid or expired JWT token. |
| 402 | Payment Required | Insufficient credits. Response includes {error, credits: current_balance, required: cost}. |
| 429 | Too Many Requests | Rate limit exceeded. Check Retry-After header. Applies per-user and per-IP. |
| 502 | Bad Gateway | Upstream model provider error (timeout, 500, etc.). In mesh mode, triggers failover to next model. |
| 503 | Service Unavailable | Internal service unavailable (e.g. rate limiter Redis down). Fail-open: requests may proceed without rate limiting. |
Rate Limits
| Endpoint | Base limit/60s |
|---|---|
| chat | 90 |
| compare | 45 |
| blend | 30 |
| judge | 30 |
| uploads | 30 |
| copilot | 30 |
| default | 180 |
Circuit Breaker & Auto-Router
Circuit Breaker (Mesh Failover)
Per-model health tracking for automatic failover in Mesh mode
Auto-Router (model="auto")
Zero-latency regex-based query classification when model='auto'. No LLM call overhead.
Streaming Protocol (SSE)
All LLM endpoints stream via Server-Sent Events. Each chunk is a JSON object on a data: line.
data: {"model": "gpt-5.2", "delta": "text", "done": false, "latency_ms": 123}
data: {"model": "gpt-5.2", "delta": "", "done": true, "latency_ms": 456}
data: [DONE]Documentation
Pricing
| Plan | Details |
|---|---|
| Free Trial | 40 credits, 7-day expiry, no credit card |
| Pay-per-use | Add credits anytime, paid credits never expire |
| Auto Top-up | Automatic refill with monthly safety cap |
| Enterprise | Custom limits, team billing, SLAs |