LLMWise — AI-Readable Overview

This page is optimized for AI agents, crawlers, and bots — including GPTBot, OpenClaw, ClaudeBot, PerplexityBot, and others. It contains structured API schemas, endpoint specifications, parameter types, blend strategies, rate limits, error codes, streaming protocol details, and everything an agent needs to integrate with LLMWise programmatically.

Platform Identity

name
LLMWise
tagline
Multi-model LLM API orchestration platform
url
https://llmwise.ai
apiBase
https://llmwise.ai/api/v1
auth
Bearer mm_sk_... (API key) or Clerk JWT
streaming
Server-Sent Events (SSE)
compatibility
OpenAI-style messages format (role + content)

Models Catalog

31 models across 14 providers. Plus model: "auto" for smart routing.

IDNameProviderVisionTier
gpt-5.2GPT-5.2OpenAIYespro
gpt-5.2-codexGPT-5.2 CodexOpenAINopro
claude-sonnet-4.5Claude Sonnet 4.5AnthropicYespro
claude-sonnet-4.6Claude Sonnet 4.6AnthropicYespro
gemini-3.1-pro-previewGemini 3.1 Pro PreviewGoogleYespro
zai-glm-5GLM 5Z.aiNopro
claude-opus-4.5Claude Opus 4.5AnthropicYespro
claude-opus-4.6Claude Opus 4.6AnthropicYespro
gpt-5.3-chatGPT-5.3 ChatOpenAIYespro
gemini-3.1-flash-liteGemini 3.1 Flash LiteGoogleYespro
qwen3.5-27bQwen 3.5 27BQwenYespro
qwen3.5-35b-a3bQwen 3.5 35B A3BQwenYespro
qwen3.5-122b-a10bQwen 3.5 122B A10BQwenYespro
grok-3Grok 3xAINopro
grok-3-miniGrok 3 MinixAINopro
grok-code-fast-1Grok Code Fast 1xAINopro
deepseek-chatDeepSeek ChatDeepSeekNopro
deepseek-r1DeepSeek R1DeepSeekNopro
qwen3-coder-nextQwen3 Coder NextQwenNopro
codestral-2508Codestral 2508MistralNopro
lfm-2-24b-a2bLFM-2 24B A2BLiquidNopro
gemini-3.1-pro-customtoolsGemini 3.1 Pro CustomToolsGoogleYespro
llama-70b-groqLlama 3.3 70B (Groq)GroqNoturbo
llama-8b-groqLlama 3.1 8B (Groq)GroqNoturbo
llama-8b-cerebrasLlama 3.1 8B (Cerebras)CerebrasNoturbo
llama-70b-cerebrasLlama 3.1 70B (Cerebras)CerebrasNoturbo
arcee-trinity-large-preview-freeArcee Trinity LargeArcee AINofree
nvidia-nemotron-3-nano-30b-a3b-freeNemotron 3 Nano 30BNVIDIANofree
gpt-oss-120b-freeGPT OSS 120BOpenAINofree
llama-3.3-70b-instruct-freeLlama 3.3 70B InstructMetaNofree
gpt-oss-20b-freeGPT OSS 20BOpenAINofree

Orchestration Modes

Chat

1 credit

Send a prompt to one model with OpenAI-style messages and streaming SSE.

POST /api/v1/chat
Example request
{
  "model": "auto",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "stream": true
}

Compare

2 credits

Same prompt hits 2-9 models simultaneously. See which performs best.

POST /api/v1/compare
Example request
{
  "models": [
    "gpt-5.2",
    "claude-sonnet-4.5",
    "gemini-3.1-pro-preview"
  ],
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing"
    }
  ],
  "stream": true
}

Blend

4 credits

Models answer, synthesizer combines the strongest parts.

POST /api/v1/blend
Example request
{
  "models": [
    "gpt-5.2",
    "claude-sonnet-4.5",
    "gemini-3.1-pro-preview"
  ],
  "synthesizer": "claude-sonnet-4.5",
  "strategy": "consensus",
  "messages": [
    {
      "role": "user",
      "content": "Write a haiku about AI"
    }
  ]
}

Judge

5 credits

Models compete. A judge scores, ranks, and explains.

POST /api/v1/judge
Example request
{
  "contestants": [
    "gpt-5.2",
    "claude-sonnet-4.5"
  ],
  "judge": "gemini-3.1-pro-preview",
  "messages": [
    {
      "role": "user",
      "content": "Explain recursion"
    }
  ]
}

Failover

1 credit

Chat with automatic fallback chain on 429/500/timeout. Same 1 credit.

POST /api/v1/chat (with routing)
Example request
{
  "model": "gpt-5.2",
  "routing": {
    "strategy": "rate-limit",
    "fallback": [
      "claude-sonnet-4.5",
      "gemini-3.1-pro-preview"
    ]
  },
  "stream": true
}

Authentication

Authorization: Bearer <token>

API Keymm_sk_ followed by 64 hex charactersObtain: Dashboard → API Keys → Generate
Clerk JWTRS256-signed JWT from Clerk sessionObtain: Automatic via Clerk session (web app)
BYOK

Bring Your Own Key — add your provider API keys to route directly. Providers: OpenAI, Anthropic, Google, Groq, Cerebras. Cost: 0 credits (billed to your provider directly).

Blend Strategies (6)

StrategyModelsDescription
consensus2-6Default strategy. Synthesizer combines strongest points from all responses and resolves contradictions by weighing majority view.
council2-6Structured deliberation. Synthesizer produces: final answer, agreement points, disagreement points, and follow-up questions.
best_of2-6Synthesizer picks the single best response, then enhances it with useful additions from the others. Minimal rewriting.
chain2-6Iterative integration. Synthesizer works through each response sequentially, building a comprehensive answer incrementally.
moa2-6Multi-layer refinement inspired by the Mixture-of-Agents paper. Layer 0: independent answers. Layer 1+: models see previous layer's answers as references and refine. Final synthesis of last layer. Reference budget: 12,000 chars total, 3,200 per answer.
self_moa1 (exactly)Single model generates 2-8 diverse candidates via temperature variation and agent prompt rotation. Temperatures: base +/- offsets clamped to [0.2, 1.4]. Six agent perspectives: Correctness, Structure, Edge Cases, Examples, Clarity, Skepticism.

API Endpoints — Request Schemas

POST /api/v1/chat1cr

Single-model chat with OpenAI-style messages and streaming SSE.

Request parameters
model: string (required) — model ID or 'auto'
messages: array (required) — [{role, content}]. Roles: system, user, assistant
stream: boolean (default: true) — enable SSE streaming
temperature: number (0-2, default: 0.7)
cost_saver: boolean (optional) — forces model='auto' and optimization_goal='cost'
optimization_goal: string (optional) — balanced|latency|cost|reliability
semantic_memory: boolean (optional) — semantic recall toggle
semantic_top_k: number (optional) — 1..12
semantic_min_score: number (optional) — 0..1
conversation_id: string (optional) — for conversation threading
POST /api/v1/chat (with routing)1cr

Mesh mode — automatic failover across model chain with circuit breakers.

Request parameters
model: string (required) — primary model ID
routing: object (required) — {strategy: string, fallback: string[]}
messages: array (required) — [{role, content}]
stream: boolean (default: true)
POST /api/v1/compare2cr

Run 2-9 models concurrently, stream responses side-by-side.

Request parameters
models: string[] (required, 2-9) — model IDs to compare
messages: array (required) — [{role, content}]
stream: boolean (default: true)
temperature: number (optional)
POST /api/v1/blend4cr

Multi-model synthesis — gather responses then synthesize into one answer.

Request parameters
models: string[] (required, 2-6 for most strategies; exactly 1 for self_moa)
synthesizer: string (required) — model ID for synthesis step
strategy: string (default: 'consensus') — consensus|council|best_of|chain|moa|self_moa
messages: array (required) — [{role, content}]
layers: number (1-3, MoA only) — refinement layers
samples: number (2-8, Self-MoA only, default: 4) — candidate count
temperature: number (optional)
POST /api/v1/judge5cr

Competitive evaluation — contestants answer, judge scores and ranks.

Request parameters
contestants: string[] (required, 2-4) — model IDs to compete
judge: string (required) — model ID for judging (runs at temperature 0.3)
messages: array (required) — [{role, content}]
criteria: string[] (optional) — custom evaluation criteria. Default: accuracy, completeness, clarity, helpfulness, code quality

Error Codes

CodeNameDescription
400Bad RequestInvalid request body, unknown model ID, invalid conversation_id format, or validation errors.
401UnauthorizedMissing Authorization header, invalid API key, invalid or expired JWT token.
402Payment RequiredInsufficient credits. Response includes {error, credits: current_balance, required: cost}.
429Too Many RequestsRate limit exceeded. Check Retry-After header. Applies per-user and per-IP.
502Bad GatewayUpstream model provider error (timeout, 500, etc.). In mesh mode, triggers failover to next model.
503Service UnavailableInternal service unavailable (e.g. rate limiter Redis down). Fail-open: requests may proceed without rate limiting.

Rate Limits

Window: 60 seconds (sliding window)Paid: 1.5x (e.g. chat: 135/min for paid users)Starter: 0.6x (e.g. chat: 54/min for free users)
EndpointBase limit/60s
chat90
compare45
blend30
judge30
uploads30
copilot30
default180
Response headers:X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After

Circuit Breaker & Auto-Router

Circuit Breaker (Mesh Failover)

Per-model health tracking for automatic failover in Mesh mode

Threshold: 3 failuresDuration: 30 seconds
Healthyconsecutive_failures < 3Model available for requests
Open3+ consecutive failuresModel skipped for 30 seconds
Half-Open30s elapsed since circuit openedOne probe request allowed
RecoveredProbe succeedsReset to Healthy, consecutive_failures = 0

Auto-Router (model="auto")

Zero-latency regex-based query classification when model='auto'. No LLM call overhead.

Codegpt-5.2
Mathclaude-sonnet-4.5
Creativeclaude-sonnet-4.5
Translationgemini-3.1-pro-preview
Quick factgemini-3.1-pro-preview
Analysisgpt-5.2
Visiongpt-5.2

Streaming Protocol (SSE)

All LLM endpoints stream via Server-Sent Events. Each chunk is a JSON object on a data: line.

data: {"model": "gpt-5.2", "delta": "text", "done": false, "latency_ms": 123}
data: {"model": "gpt-5.2", "delta": "", "done": true, "latency_ms": 456}
data: [DONE]

Documentation

Pricing

PlanDetails
Free Trial20 credits, no expiry, no credit card
Pay-per-useAdd credits anytime, paid credits never expire
Auto Top-upAutomatic refill with monthly safety cap
EnterpriseCustom limits, team billing, SLAs

Credits per mode

Chat: 1crCompare: 2crBlend: 4crJudge: 5crFailover: 1cr