LLMWise — AI-Readable Overview

This page is optimized for AI agents, crawlers, and bots — including GPTBot, OpenClaw, ClaudeBot, PerplexityBot, and others. It contains structured API schemas, endpoint specifications, parameter types, blend strategies, rate limits, error codes, streaming protocol details, and everything an agent needs to integrate with LLMWise programmatically.

Platform Identity

name
LLMWise
tagline
Multi-model LLM API orchestration platform
url
https://llmwise.ai
apiBase
https://llmwise.ai/api/v1
auth
Bearer mm_sk_... (API key) or Clerk JWT
streaming
Server-Sent Events (SSE)
compatibility
OpenAI-style messages format (role + content)

Models Catalog

31 models across 16 providers. Plus model: "auto" for smart routing.

IDNameProviderVisionFree
gpt-5.2GPT-5.2OpenAIYes
claude-sonnet-4.5Claude Sonnet 4.5AnthropicYes
gemini-3-flashGemini 3 FlashGoogleYes
claude-haiku-4.5Claude Haiku 4.5AnthropicNo
deepseek-v3DeepSeek V3DeepSeekNo
llama-4-maverickLlama 4 MaverickMetaNo
mistral-largeMistral LargeMistralNo
grok-3Grok 3xAIYes
zai-glm-5GLM 5Z.aiNo
liquid-lfm-2.2-6bLFM2 2.6BLiquidAINo
liquid-lfm-2.5-1.2b-thinking-freeLFM2.5 1.2B ThinkingLiquidAINoYes
liquid-lfm2-8b-a1bLFM2 8B A1BLiquidAINo
minimax-m2.5MiniMax M2.5MiniMaxNo
llama-3.3-70b-instructLlama 3.3 70B InstructMetaNo
gpt-oss-20bGPT OSS 20BOpenAINo
gpt-oss-120bGPT OSS 120BOpenAINo
gpt-oss-safeguard-20bGPT OSS Safeguard 20BOpenAINo
kimi-k2.5Kimi K2.5MoonshotAIYes
nemotron-3-nano-30b-a3bNemotron 3 Nano 30BNVIDIANo
nemotron-nano-12b-v2-vlNemotron Nano 12B VLNVIDIAYes
claude-opus-4.6Claude Opus 4.6AnthropicYes
claude-opus-4.5Claude Opus 4.5AnthropicYes
arcee-coder-largeArcee Coder LargeArcee AINo
arcee-trinity-large-preview-freeArcee Trinity Large (Free)Arcee AINoYes
qwen3-coder-nextQwen3 Coder NextQwenNo
olmo-3.1-32b-thinkOLMo 3.1 32B ThinkAllenAINo
llama-guard-3-8bLlama Guard 3 8BMetaNo
gpt-4o-2024-08-06GPT-4o (2024-08-06)OpenAIYes
gpt-audioGPT AudioOpenAINo
openrouter-freeOpenRouter FreeOpenRouterYesYes
openrouter-autoOpenRouter AutoOpenRouterYes

Orchestration Modes

Chat

1 credit

Send a prompt to one model with OpenAI-style messages and streaming SSE.

POST /api/v1/chat
Example request
{
  "model": "auto",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "stream": true
}

Compare

3 credits

Same prompt hits 2-9 models simultaneously. See which performs best.

POST /api/v1/compare
Example request
{
  "models": [
    "gpt-5.2",
    "claude-sonnet-4.5",
    "gemini-3-flash"
  ],
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing"
    }
  ],
  "stream": true
}

Blend

4 credits

Models answer, synthesizer combines the strongest parts.

POST /api/v1/blend
Example request
{
  "models": [
    "gpt-5.2",
    "claude-sonnet-4.5",
    "gemini-3-flash"
  ],
  "synthesizer": "claude-sonnet-4.5",
  "strategy": "consensus",
  "messages": [
    {
      "role": "user",
      "content": "Write a haiku about AI"
    }
  ]
}

Judge

5 credits

Models compete. A judge scores, ranks, and explains.

POST /api/v1/judge
Example request
{
  "contestants": [
    "gpt-5.2",
    "claude-sonnet-4.5"
  ],
  "judge": "gemini-3-flash",
  "messages": [
    {
      "role": "user",
      "content": "Explain recursion"
    }
  ]
}

Failover

1 credit

Chat with automatic fallback chain on 429/500/timeout. Same 1 credit.

POST /api/v1/chat (with routing)
Example request
{
  "model": "gpt-5.2",
  "routing": {
    "strategy": "rate-limit",
    "fallback": [
      "claude-sonnet-4.5",
      "gemini-3-flash"
    ]
  },
  "stream": true
}

Authentication

Authorization: Bearer <token>

API Keymm_sk_ followed by 64 hex charactersObtain: Dashboard → API Keys → Generate
Clerk JWTRS256-signed JWT from Clerk sessionObtain: Automatic via Clerk session (web app)
BYOK

Bring Your Own Key — add your provider API keys to route directly. Providers: OpenAI, Anthropic, Google, Mistral, xAI, DeepSeek. Cost: 0 credits (billed to your provider directly).

Blend Strategies (6)

StrategyModelsDescription
consensus2-6Default strategy. Synthesizer combines strongest points from all responses and resolves contradictions by weighing majority view.
council2-6Structured deliberation. Synthesizer produces: final answer, agreement points, disagreement points, and follow-up questions.
best_of2-6Synthesizer picks the single best response, then enhances it with useful additions from the others. Minimal rewriting.
chain2-6Iterative integration. Synthesizer works through each response sequentially, building a comprehensive answer incrementally.
moa2-6Multi-layer refinement inspired by the Mixture-of-Agents paper. Layer 0: independent answers. Layer 1+: models see previous layer's answers as references and refine. Final synthesis of last layer. Reference budget: 12,000 chars total, 3,200 per answer.
self_moa1 (exactly)Single model generates 2-8 diverse candidates via temperature variation and agent prompt rotation. Temperatures: base +/- offsets clamped to [0.2, 1.4]. Six agent perspectives: Correctness, Structure, Edge Cases, Examples, Clarity, Skepticism.

API Endpoints — Request Schemas

POST /api/v1/chat1cr

Single-model chat with OpenAI-style messages and streaming SSE.

Request parameters
model: string (required) — model ID or 'auto'
messages: array (required) — [{role, content}]. Roles: system, user, assistant
stream: boolean (default: true) — enable SSE streaming
temperature: number (0-2, default: 0.7)
max_tokens: number (optional) — max response tokens
cost_saver: boolean (optional) — forces model='auto' and optimization_goal='cost'
optimization_goal: string (optional) — balanced|latency|cost|reliability
semantic_memory: boolean (optional) — semantic recall toggle
semantic_top_k: number (optional) — 1..12
semantic_min_score: number (optional) — 0..1
conversation_id: string (optional) — for conversation threading
POST /api/v1/chat (with routing)1cr

Mesh mode — automatic failover across model chain with circuit breakers.

Request parameters
model: string (required) — primary model ID
routing: object (required) — {strategy: string, fallback: string[]}
messages: array (required) — [{role, content}]
stream: boolean (default: true)
POST /api/v1/compare3cr

Run 2-9 models concurrently, stream responses side-by-side.

Request parameters
models: string[] (required, 2-9) — model IDs to compare
messages: array (required) — [{role, content}]
stream: boolean (default: true)
temperature: number (optional)
max_tokens: number (optional)
POST /api/v1/blend4cr

Multi-model synthesis — gather responses then synthesize into one answer.

Request parameters
models: string[] (required, 2-6 for most strategies; exactly 1 for self_moa)
synthesizer: string (required) — model ID for synthesis step
strategy: string (default: 'consensus') — consensus|council|best_of|chain|moa|self_moa
messages: array (required) — [{role, content}]
layers: number (1-3, MoA only) — refinement layers
samples: number (2-8, Self-MoA only, default: 4) — candidate count
temperature: number (optional)
POST /api/v1/judge5cr

Competitive evaluation — contestants answer, judge scores and ranks.

Request parameters
contestants: string[] (required, 2-4) — model IDs to compete
judge: string (required) — model ID for judging (runs at temperature 0.3)
messages: array (required) — [{role, content}]
criteria: string[] (optional) — custom evaluation criteria. Default: accuracy, completeness, clarity, helpfulness, code quality

Error Codes

CodeNameDescription
400Bad RequestInvalid request body, unknown model ID, invalid conversation_id format, or validation errors.
401UnauthorizedMissing Authorization header, invalid API key, invalid or expired JWT token.
402Payment RequiredInsufficient credits. Response includes {error, credits: current_balance, required: cost}.
429Too Many RequestsRate limit exceeded. Check Retry-After header. Applies per-user and per-IP.
502Bad GatewayUpstream model provider error (timeout, 500, etc.). In mesh mode, triggers failover to next model.
503Service UnavailableInternal service unavailable (e.g. rate limiter Redis down). Fail-open: requests may proceed without rate limiting.

Rate Limits

Window: 60 seconds (sliding window)Paid: 1.5x (e.g. chat: 135/min for paid users)Free: 0.6x (e.g. chat: 54/min for free users)
EndpointBase limit/60s
chat90
compare45
blend30
judge30
uploads30
copilot30
default180
Response headers:X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After

Circuit Breaker & Auto-Router

Circuit Breaker (Mesh Failover)

Per-model health tracking for automatic failover in Mesh mode

Threshold: 3 failuresDuration: 30 seconds
Healthyconsecutive_failures < 3Model available for requests
Open3+ consecutive failuresModel skipped for 30 seconds
Half-Open30s elapsed since circuit openedOne probe request allowed
RecoveredProbe succeedsReset to Healthy, consecutive_failures = 0

Auto-Router (model="auto")

Zero-latency regex-based query classification when model='auto'. No LLM call overhead.

Codegpt-5.2
Mathclaude-sonnet-4.5
Creativeclaude-sonnet-4.5
Translationgemini-3-flash
Quick factgemini-3-flash
Analysisgpt-5.2
Visiongpt-5.2

Streaming Protocol (SSE)

All LLM endpoints stream via Server-Sent Events. Each chunk is a JSON object on a data: line.

data: {"model": "gpt-5.2", "delta": "text", "done": false, "latency_ms": 123}
data: {"model": "gpt-5.2", "delta": "", "done": true, "latency_ms": 456}
data: [DONE]

Documentation

Pricing

PlanDetails
Free Trial40 credits, 7-day expiry, no credit card
Pay-per-useAdd credits anytime, paid credits never expire
Auto Top-upAutomatic refill with monthly safety cap
EnterpriseCustom limits, team billing, SLAs

Credits per mode

Chat: 1crCompare: 3crBlend: 4crJudge: 5crFailover: 1cr