LLMWise — AI-Readable Overview

This page is optimized for AI agents, crawlers, and bots — including GPTBot, OpenClaw, ClaudeBot, PerplexityBot, and others. It contains structured API schemas, endpoint specifications, parameter types, blend strategies, rate limits, error codes, streaming protocol details, and everything an agent needs to integrate with LLMWise programmatically.

/llms.txt /llms-full.txt

Platform Identity

name: LLMWise
tagline: Multi-model LLM API orchestration platform
url: https://llmwise.ai
apiBase: https://llmwise.ai/api/v1
auth: Bearer mm_sk_... (API key) or Clerk JWT
streaming: Server-Sent Events (SSE)
compatibility: OpenAI-style messages format (role + content)

Models Catalog

31 models across 16 providers. Plus model: "auto" for smart routing.

ID	Name	Provider	Vision	Free
gpt-5.2	GPT-5.2	OpenAI	Yes
claude-sonnet-4.5	Claude Sonnet 4.5	Anthropic	Yes
gemini-3-flash	Gemini 3 Flash	Google	Yes
claude-haiku-4.5	Claude Haiku 4.5	Anthropic	No
deepseek-v3	DeepSeek V3	DeepSeek	No
llama-4-maverick	Llama 4 Maverick	Meta	No
mistral-large	Mistral Large	Mistral	No
grok-3	Grok 3	xAI	Yes
zai-glm-5	GLM 5	Z.ai	No
liquid-lfm-2.2-6b	LFM2 2.6B	LiquidAI	No
liquid-lfm-2.5-1.2b-thinking-free	LFM2.5 1.2B Thinking	LiquidAI	No	Yes
liquid-lfm2-8b-a1b	LFM2 8B A1B	LiquidAI	No
minimax-m2.5	MiniMax M2.5	MiniMax	No
llama-3.3-70b-instruct	Llama 3.3 70B Instruct	Meta	No
gpt-oss-20b	GPT OSS 20B	OpenAI	No
gpt-oss-120b	GPT OSS 120B	OpenAI	No
gpt-oss-safeguard-20b	GPT OSS Safeguard 20B	OpenAI	No
kimi-k2.5	Kimi K2.5	MoonshotAI	Yes
nemotron-3-nano-30b-a3b	Nemotron 3 Nano 30B	NVIDIA	No
nemotron-nano-12b-v2-vl	Nemotron Nano 12B VL	NVIDIA	Yes
claude-opus-4.6	Claude Opus 4.6	Anthropic	Yes
claude-opus-4.5	Claude Opus 4.5	Anthropic	Yes
arcee-coder-large	Arcee Coder Large	Arcee AI	No
arcee-trinity-large-preview-free	Arcee Trinity Large (Free)	Arcee AI	No	Yes
qwen3-coder-next	Qwen3 Coder Next	Qwen	No
olmo-3.1-32b-think	OLMo 3.1 32B Think	AllenAI	No
llama-guard-3-8b	Llama Guard 3 8B	Meta	No
gpt-4o-2024-08-06	GPT-4o (2024-08-06)	OpenAI	Yes
gpt-audio	GPT Audio	OpenAI	No
openrouter-free	OpenRouter Free	OpenRouter	Yes	Yes
openrouter-auto	OpenRouter Auto	OpenRouter	Yes

Orchestration Modes

◉

Chat

1 credit

Send a prompt to one model with OpenAI-style messages and streaming SSE.

POST /api/v1/chat

Example request

{
  "model": "auto",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "stream": true
}

◫

Compare

3 credits

Same prompt hits 2-9 models simultaneously. See which performs best.

POST /api/v1/compare

Example request

{
  "models": [
    "gpt-5.2",
    "claude-sonnet-4.5",
    "gemini-3-flash"
  ],
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing"
    }
  ],
  "stream": true
}

◈

Blend

4 credits

Models answer, synthesizer combines the strongest parts.

POST /api/v1/blend

Example request

{
  "models": [
    "gpt-5.2",
    "claude-sonnet-4.5",
    "gemini-3-flash"
  ],
  "synthesizer": "claude-sonnet-4.5",
  "strategy": "consensus",
  "messages": [
    {
      "role": "user",
      "content": "Write a haiku about AI"
    }
  ]
}

◆

Judge

5 credits

Models compete. A judge scores, ranks, and explains.

POST /api/v1/judge

Example request

{
  "contestants": [
    "gpt-5.2",
    "claude-sonnet-4.5"
  ],
  "judge": "gemini-3-flash",
  "messages": [
    {
      "role": "user",
      "content": "Explain recursion"
    }
  ]
}

⬡

Failover

1 credit

Chat with automatic fallback chain on 429/500/timeout. Same 1 credit.

POST /api/v1/chat (with routing)

Example request

{
  "model": "gpt-5.2",
  "routing": {
    "strategy": "rate-limit",
    "fallback": [
      "claude-sonnet-4.5",
      "gemini-3-flash"
    ]
  },
  "stream": true
}

Authentication

Authorization: Bearer <token>

API Keymm_sk_ followed by 64 hex charactersObtain: Dashboard → API Keys → Generate

Clerk JWTRS256-signed JWT from Clerk sessionObtain: Automatic via Clerk session (web app)

BYOK

Bring Your Own Key — add your provider API keys to route directly. Providers: OpenAI, Anthropic, Google, Mistral, xAI, DeepSeek. Cost: 0 credits (billed to your provider directly).

Blend Strategies (6)

Strategy	Models	Description
consensus	2-6	Default strategy. Synthesizer combines strongest points from all responses and resolves contradictions by weighing majority view.
council	2-6	Structured deliberation. Synthesizer produces: final answer, agreement points, disagreement points, and follow-up questions.
best_of	2-6	Synthesizer picks the single best response, then enhances it with useful additions from the others. Minimal rewriting.
chain	2-6	Iterative integration. Synthesizer works through each response sequentially, building a comprehensive answer incrementally.
moa	2-6	Multi-layer refinement inspired by the Mixture-of-Agents paper. Layer 0: independent answers. Layer 1+: models see previous layer's answers as references and refine. Final synthesis of last layer. Reference budget: 12,000 chars total, 3,200 per answer.
self_moa	1 (exactly)	Single model generates 2-8 diverse candidates via temperature variation and agent prompt rotation. Temperatures: base +/- offsets clamped to [0.2, 1.4]. Six agent perspectives: Correctness, Structure, Edge Cases, Examples, Clarity, Skepticism.

API Endpoints — Request Schemas

POST /api/v1/chat1cr

Single-model chat with OpenAI-style messages and streaming SSE.

Request parameters

model: string (required) — model ID or 'auto'
messages: array (required) — [{role, content}]. Roles: system, user, assistant
stream: boolean (default: true) — enable SSE streaming
temperature: number (0-2, default: 0.7)
max_tokens: number (optional) — max response tokens
cost_saver: boolean (optional) — forces model='auto' and optimization_goal='cost'
optimization_goal: string (optional) — balanced|latency|cost|reliability
semantic_memory: boolean (optional) — semantic recall toggle
semantic_top_k: number (optional) — 1..12
semantic_min_score: number (optional) — 0..1
conversation_id: string (optional) — for conversation threading

POST /api/v1/chat (with routing)1cr

Mesh mode — automatic failover across model chain with circuit breakers.

Request parameters

model: string (required) — primary model ID
routing: object (required) — {strategy: string, fallback: string[]}
messages: array (required) — [{role, content}]
stream: boolean (default: true)

POST /api/v1/compare3cr

Run 2-9 models concurrently, stream responses side-by-side.

Request parameters

models: string[] (required, 2-9) — model IDs to compare
messages: array (required) — [{role, content}]
stream: boolean (default: true)
temperature: number (optional)
max_tokens: number (optional)

POST /api/v1/blend4cr

Multi-model synthesis — gather responses then synthesize into one answer.

Request parameters

models: string[] (required, 2-6 for most strategies; exactly 1 for self_moa)
synthesizer: string (required) — model ID for synthesis step
strategy: string (default: 'consensus') — consensus|council|best_of|chain|moa|self_moa
messages: array (required) — [{role, content}]
layers: number (1-3, MoA only) — refinement layers
samples: number (2-8, Self-MoA only, default: 4) — candidate count
temperature: number (optional)

POST /api/v1/judge5cr

Competitive evaluation — contestants answer, judge scores and ranks.

Request parameters

contestants: string[] (required, 2-4) — model IDs to compete
judge: string (required) — model ID for judging (runs at temperature 0.3)
messages: array (required) — [{role, content}]
criteria: string[] (optional) — custom evaluation criteria. Default: accuracy, completeness, clarity, helpfulness, code quality

Error Codes

Code	Name	Description
400	Bad Request	Invalid request body, unknown model ID, invalid conversation_id format, or validation errors.
401	Unauthorized	Missing Authorization header, invalid API key, invalid or expired JWT token.
402	Payment Required	Insufficient credits. Response includes {error, credits: current_balance, required: cost}.
429	Too Many Requests	Rate limit exceeded. Check Retry-After header. Applies per-user and per-IP.
502	Bad Gateway	Upstream model provider error (timeout, 500, etc.). In mesh mode, triggers failover to next model.
503	Service Unavailable	Internal service unavailable (e.g. rate limiter Redis down). Fail-open: requests may proceed without rate limiting.

Rate Limits

Window: 60 seconds (sliding window)Paid: 1.5x (e.g. chat: 135/min for paid users)Free: 0.6x (e.g. chat: 54/min for free users)

Endpoint	Base limit/60s
chat	90
compare	45
blend	30
judge	30
uploads	30
copilot	30
default	180

Response headers:X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After

Circuit Breaker & Auto-Router

Circuit Breaker (Mesh Failover)

Per-model health tracking for automatic failover in Mesh mode

Threshold: 3 failuresDuration: 30 seconds

Healthy — consecutive_failures < 3 → Model available for requests

Open — 3+ consecutive failures → Model skipped for 30 seconds

Half-Open — 30s elapsed since circuit opened → One probe request allowed

Recovered — Probe succeeds → Reset to Healthy, consecutive_failures = 0

Auto-Router (model="auto")

Zero-latency regex-based query classification when model='auto'. No LLM call overhead.

Code → gpt-5.2

Math → claude-sonnet-4.5

Creative → claude-sonnet-4.5

Translation → gemini-3-flash

Quick fact → gemini-3-flash

Analysis → gpt-5.2

Vision → gpt-5.2

Streaming Protocol (SSE)

All LLM endpoints stream via Server-Sent Events. Each chunk is a JSON object on a data: line.

data: {"model": "gpt-5.2", "delta": "text", "done": false, "latency_ms": 123}
data: {"model": "gpt-5.2", "delta": "", "done": true, "latency_ms": 456}
data: [DONE]

Documentation

Pricing

Plan	Details
Free Trial	40 credits, 7-day expiry, no credit card
Pay-per-use	Add credits anytime, paid credits never expire
Auto Top-up	Automatic refill with monthly safety cap
Enterprise	Custom limits, team billing, SLAs

Credits per mode

Chat: 1crCompare: 3crBlend: 4crJudge: 5crFailover: 1cr