# LLMWise

> Multi-model LLM API orchestration platform. One API key to access GPT, Claude, Gemini, DeepSeek, Llama, Mistral, Grok, and 25+ more models. Compare outputs side-by-side, blend the best parts, let AI judge, and auto-failover with circuit breakers. OpenAI-style messages, credit-based pay-per-use, no subscription.

Base URL: https://llmwise.ai
API base: https://llmwise.ai/api/v1
Auth: Bearer token (API key with `mm_sk_` prefix) or Clerk JWT
Streaming: Server-Sent Events (SSE)
Compatibility: OpenAI-style messages (role + content)

## API Endpoints

- `POST /api/v1/chat` — Single-model chat (1 credit). OpenAI-style messages + SSE streaming. Supports model="auto" for heuristic routing.
- `POST /api/v1/compare` — Multi-model comparison (3 credits). Same prompt to 2-9 models simultaneously, streamed side-by-side.
- `POST /api/v1/blend` — Multi-model synthesis (4 credits). 6 strategies: consensus, council, best_of, chain, moa (multi-layer), self_moa (single-model diversity).
- `POST /api/v1/judge` — AI evaluation (5 credits). 2-4 contestants compete, judge model scores 0-10 and ranks.
- `POST /api/v1/chat` with `routing` — Mesh failover (1 credit). Circuit breaker auto-failover on 429/500/timeout.

## Blend Strategies

- **consensus** (default): Combine strongest points, resolve contradictions. 2-6 models, 1 layer.
- **council**: Structured deliberation — agreements, disagreements, follow-ups. 2-6 models, 1 layer.
- **best_of**: Pick best response, enhance with others. 2-6 models, 1 layer.
- **chain**: Iterative sequential integration. 2-6 models, 1 layer.
- **moa**: Mixture-of-Agents multi-layer refinement. 2-6 models, 1-3 layers. Models see previous layer answers.
- **self_moa**: Single model, 2-8 diverse candidates via temperature variation + agent prompts. 1 model, 1 layer.

## SSE Streaming Format

All endpoints stream via SSE. Each line: `data: {JSON}`. Terminator: `data: [DONE]`.
Standard chunk: `{model, delta, done, latency_ms, content_length}`
Final chunk adds: `{ttft_ms, prompt_tokens, completion_tokens, tokens_per_second, cost, finish_reason, full_content}`
Mesh events: `route` (trying/failed/skipped), `chunk` (content), `trace` (summary with final_model, attempts, total_ms)

## Error Codes

- 400: Bad Request (invalid body, unknown model)
- 401: Unauthorized (missing/invalid auth)
- 402: Payment Required (insufficient credits — response: {error, credits, required})
- 429: Too Many Requests (rate limited — check Retry-After header)
- 502: Bad Gateway (upstream provider error)

## Rate Limits

Token bucket, 60s window. Headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After.
Buckets: chat=90, compare=45, blend=30, judge=30, uploads=30, default=180 (per 60s).
Multipliers: paid=1.5x, free=0.6x. IP limits: free=120/60s, paid=360/60s.

## Docs

- [Getting Started](https://llmwise.ai/docs/getting-started): Quickstart guide — API key setup, first request, streaming
- [Dashboard User Guide](https://llmwise.ai/docs/dashboard-user-guide): Web UI walkthrough — chat, compare, settings
- [Authentication & API Keys](https://llmwise.ai/docs/authentication-and-api-keys): Auth methods, API key management, BYOK
- [Chat API Reference](https://llmwise.ai/docs/chat-api-reference): POST /api/v1/chat — models, parameters, streaming format
- [Compare, Blend & Judge Reference](https://llmwise.ai/docs/compare-blend-judge-reference): Multi-model endpoints — compare, blend, judge
- [Blend Strategies & Algorithms](https://llmwise.ai/docs/blend-strategies-and-algorithms): Deep dive into all 6 blend strategies, MoA, circuit breaker, auto-router, optimization scoring
- [Mesh Mode Tutorial](https://llmwise.ai/docs/mesh-mode-tutorial): Failover routing, circuit breakers, strategies
- [Billing & Credits](https://llmwise.ai/docs/billing-and-credits): Credit system, pricing, auto top-up, settlement
- [Rate Limits & Reliability](https://llmwise.ai/docs/rate-limits-and-reliability): Rate limiting, concurrency, error handling
- [Privacy, Security & Data Controls](https://llmwise.ai/docs/privacy-security-and-data-controls): Zero-retention mode, data policies, BYOK encryption

## Guides

- [Replay Lab Tutorial](https://llmwise.ai/docs/replay-lab-tutorial): Test model switches against historical requests
- [Regression Testing](https://llmwise.ai/docs/regression-testing-tutorial): Automated quality checks across model versions
- [Semantic Memory API](https://llmwise.ai/docs/semantic-memory-api): Per-user conversation context via embeddings
- [Webhooks & Sync](https://llmwise.ai/docs/webhooks-and-sync): Clerk and Stripe webhook integration
- [API Explorer](https://llmwise.ai/docs/api-explorer-playground): Interactive API playground

## Optional

- [Full compiled docs](https://llmwise.ai/llms-full.txt): Complete platform documentation in plain text
- [Machine-readable view](https://llmwise.ai/ai): Structured HTML overview for AI agents with full API schemas
- [Landing page machine mode](https://llmwise.ai): Toggle "Machine" button for structured API reference