LLMWise — AI-Readable Overview

This page is optimized for AI agents, crawlers, and bots — including GPTBot, OpenClaw, ClaudeBot, PerplexityBot, and others. It contains structured API schemas, endpoint specifications, parameter types, blend strategies, rate limits, error codes, streaming protocol details, and everything an agent needs to integrate with LLMWise programmatically.

/llms.txt /llms-full.txt

Platform Identity

name: LLMWise
tagline: Auto-first AI routing and orchestration platform
url: https://llmwise.ai
apiBase: https://llmwise.ai/api/v1
auth: Bearer mm_sk_... (API key) or Clerk JWT
streaming: Server-Sent Events (SSE)
compatibility: OpenAI-style messages format (role + content)

Models Catalog

19 models across 9 providers. Plus model: "auto" for smart routing.

ID	Name	Provider	Vision	Tier
gemini-3.1-flash-lite	Gemini Flash Lite	Google	Yes	cheap
gemma-4-31b-it	Gemma 4 31B	Google	No	cheap
arcee-trinity-large-thinking	Arcee Thinking	Arcee AI	No	balanced
deepseek-v3.2	DeepSeek V3.2	DeepSeek	No	balanced
nvidia-nemotron-3-super-120b-a12b	Nemotron 120B	NVIDIA	No	strong
gpt-oss-120b	GPT OSS 120B	OpenAI	No	balanced
kimi-k2.5	Kimi K2.5	Moonshot AI	No	balanced
kimi-k2.6	Kimi K2.6	Moonshot AI	Yes	premium
gpt-5.3-chat	GPT-5.3 Chat	OpenAI	No	premium
gpt-4o	GPT-4o	OpenAI	No	premium
gpt-5.4	GPT-5.4	OpenAI	No	premium
claude-sonnet-4.5	Claude Sonnet 4.5	Anthropic	No	premium
claude-sonnet-4.6	Claude Sonnet 4.6	Anthropic	No	premium
claude-opus-4.5	Claude Opus 4.5	Anthropic	No	premium
claude-opus-4.6	Claude Opus 4.6	Anthropic	No	premium
minimax-m2.7	MiniMax M2.7	MiniMax	No	premium
grok-4.20	Grok 4.20	xAI	No	premium
grok-4.20-multi-agent	Grok 4.20 Multi-Agent	xAI	No	premium
gemini-3.1-pro-preview	Gemini 3.1 Pro	Google	No	premium

Orchestration Modes

◉

Chat

actual tokens

Billed from the actual input and output tokens used by the selected model and response length.

POST /api/v1/chat

Example request

{
  "model": "auto",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "stream": true
}

◫

Compare

actual tokens

Same prompt hits 2-9 models and the final billing reflects the combined input and output tokens actually used.

POST /api/v1/compare

Example request

{
  "models": [
    "gpt-5.4",
    "claude-sonnet-4.5",
    "gemini-3.1-pro-preview"
  ],
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing"
    }
  ],
  "stream": true
}

◈

Blend

actual tokens

Models answer, a synthesizer combines the strongest parts, and billing reflects the actual tokens consumed across the workflow.

POST /api/v1/blend

Example request

{
  "models": [
    "gpt-5.4",
    "claude-sonnet-4.5",
    "gemini-3.1-pro-preview"
  ],
  "synthesizer": "claude-sonnet-4.5",
  "strategy": "consensus",
  "messages": [
    {
      "role": "user",
      "content": "Write a haiku about AI"
    }
  ]
}

◆

Judge

actual tokens

Contestants and judge usage are billed from the actual tokens consumed after the evaluation finishes.

POST /api/v1/judge

Example request

{
  "contestants": [
    "gpt-5.4",
    "claude-sonnet-4.5"
  ],
  "judge": "gemini-3.1-pro-preview",
  "messages": [
    {
      "role": "user",
      "content": "Explain recursion"
    }
  ]
}

⬡

Failover

actual tokens

Billing reflects the actual model path and token usage after failover completes.

POST /api/v1/chat (with routing)

Example request

{
  "model": "gpt-5.4",
  "routing": {
    "strategy": "rate-limit",
    "fallback": [
      "claude-sonnet-4.5",
      "gemini-3.1-pro-preview"
    ]
  },
  "stream": true
}

Authentication

Authorization: Bearer <token>

API Keymm_sk_ followed by 64 hex charactersObtain: Dashboard → API Keys → Generate

Clerk JWTRS256-signed JWT from Clerk sessionObtain: Automatic via Clerk session (web app)

BYOK

Bring Your Own Key — add your provider API keys to route directly. Providers: OpenAI, Anthropic, Google, Groq, Cerebras. Cost: 0 credits (billed to your provider directly).

Blend Strategies (6)

Strategy	Models	Description
consensus	2-6	Default strategy. Synthesizer combines strongest points from all responses and resolves contradictions by weighing majority view.
council	2-6	Structured deliberation. Synthesizer produces: final answer, agreement points, disagreement points, and follow-up questions.
best_of	2-6	Synthesizer picks the single best response, then enhances it with useful additions from the others. Minimal rewriting.
chain	2-6	Iterative integration. Synthesizer works through each response sequentially, building a comprehensive answer incrementally.
moa	2-6	Multi-layer refinement inspired by the Mixture-of-Agents paper. Layer 0: independent answers. Layer 1+: models see previous layer's answers as references and refine. Final synthesis of last layer. Reference budget: 12,000 chars total, 3,200 per answer.
self_moa	1 (exactly)	Single model generates 2-8 diverse candidates via temperature variation and agent prompt rotation. Temperatures: base +/- offsets clamped to [0.2, 1.4]. Six agent perspectives: Correctness, Structure, Edge Cases, Examples, Clarity, Skepticism.

API Endpoints — Request Schemas

POST /api/v1/chatactual tokens

Single-model chat with OpenAI-style messages and streaming SSE.

Request parameters

model: string (required) — model ID or 'auto'
messages: array (required) — [{role, content}]. Roles: system, user, assistant
stream: boolean (default: true) — enable SSE streaming
temperature: number (0-2, default: 0.7)
cost_saver: boolean (optional) — forces model='auto' and optimization_goal='cost'
optimization_goal: string (optional) — balanced|latency|cost|reliability
semantic_memory: boolean (optional) — semantic recall toggle
semantic_top_k: number (optional) — 1..12
semantic_min_score: number (optional) — 0..1
conversation_id: string (optional) — for conversation threading

POST /api/v1/chat (with routing)actual tokens

Mesh mode — automatic failover across model chain with circuit breakers.

Request parameters

model: string (required) — primary model ID
routing: object (required) — {strategy: string, fallback: string[]}
messages: array (required) — [{role, content}]
stream: boolean (default: true)

POST /api/v1/compareactual tokens

Run 2-9 models concurrently, stream responses side-by-side.

Request parameters

models: string[] (required, 2-9) — model IDs to compare
messages: array (required) — [{role, content}]
stream: boolean (default: true)
temperature: number (optional)

POST /api/v1/blendactual tokens

Multi-model synthesis — gather responses then synthesize into one answer.

Request parameters

models: string[] (required, 2-6 for most strategies; exactly 1 for self_moa)
synthesizer: string (required) — model ID for synthesis step
strategy: string (default: 'consensus') — consensus|council|best_of|chain|moa|self_moa
messages: array (required) — [{role, content}]
layers: number (1-3, MoA only) — refinement layers
samples: number (2-8, Self-MoA only, default: 4) — candidate count
temperature: number (optional)

POST /api/v1/judgeactual tokens

Competitive evaluation — contestants answer, judge scores and ranks.

Request parameters

contestants: string[] (required, 2-4) — model IDs to compete
judge: string (required) — model ID for judging (runs at temperature 0.3)
messages: array (required) — [{role, content}]
criteria: string[] (optional) — custom evaluation criteria. Default: accuracy, completeness, clarity, helpfulness, code quality

Error Codes

Code	Name	Description
400	Bad Request	Invalid request body, unknown model ID, invalid conversation_id format, or validation errors.
401	Unauthorized	Missing Authorization header, invalid API key, invalid or expired JWT token.
402	Payment Required	Insufficient credits. Response includes {error, credits: current_balance, required: cost}.
429	Too Many Requests	Rate limit exceeded. Check Retry-After header. Applies per-user and per-IP.
502	Bad Gateway	Upstream model provider error (timeout, 500, etc.). In mesh mode, triggers failover to next model.
503	Service Unavailable	Internal service unavailable (e.g. rate limiter Redis down). Fail-open: requests may proceed without rate limiting.

Rate Limits

Window: 60 seconds (sliding window)Paid: 1.5x (e.g. chat: 135/min for paid users)Starter: 0.6x (e.g. chat: 54/min for free users)

Endpoint	Base limit/60s
chat	90
compare	45
blend	30
judge	30
uploads	30
copilot	30
default	180

Response headers:X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After

Circuit Breaker & Auto-Router

Circuit Breaker (Mesh Failover)

Per-model health tracking for automatic failover in Mesh mode

Threshold: 3 failuresDuration: 30 seconds

Healthy — consecutive_failures < 3 → Model available for requests

Open — 3+ consecutive failures → Model skipped for 30 seconds

Half-Open — 30s elapsed since circuit opened → One probe request allowed

Recovered — Probe succeeds → Reset to Healthy, consecutive_failures = 0

Auto-Router (model="auto")

Zero-latency regex-based query classification when model='auto'. No LLM call overhead.

Code → gpt-oss-120b

Math → deepseek-v3.2

Creative → gemma-4-31b-it

Translation → gemini-3.1-flash-lite

Quick fact → gemini-3.1-flash-lite

Analysis → nvidia-nemotron-3-super-120b-a12b

Vision → gemini-3.1-flash-lite

Streaming Protocol (SSE)

All LLM endpoints stream via Server-Sent Events. Each chunk is a JSON object on a data: line.

data: {"model": "gpt-5.4", "delta": "text", "done": false, "latency_ms": 123}
data: {"model": "gpt-5.4", "delta": "", "done": true, "latency_ms": 456}
data: [DONE]

Documentation

Pricing

Plan	Details
Free	5 messages total to preview the product
Starter	$29/mo · 10M tokens/month · Auto lane only
Teams	$99/mo · 40M tokens/month · Auto + manual premium models
Add-ons	Available after included plan tokens are exhausted
Auto Top-up	Automatic add-on refill with monthly safety cap
Enterprise	Custom limits, team billing, SLAs

Billing basis by mode

Chat: actual tokens after runCompare: actual tokens after runBlend: actual tokens after runJudge: actual tokens after runFailover: actual tokens after run