Billing & Limits

Rate Limits and Reliability

Per-endpoint limits, burst protection, dual-layer enforcement, response headers, circuit breaker failover, and retry strategy.

9 minUpdated 2026-02-15

Summary

Per-endpoint limits, burst protection, dual-layer enforcement, response headers, circuit breaker failover, and retry strategy.

8 deep-dive sections1 code samples

Quick Start

Set top-up and minimum credit policy.
Enable per-user and per-key rate limits.
Test 429 + retry behavior in staging.
Monitor charged credits consistency in Usage.

Reliability stack

Protection layers

App rate limits

Per-user and per-IP windows

Burst protection

Short-window spike detection

Mesh fallback

Provider failover on saturation

Circuit breaker

Auto-skip unhealthy models

Per-endpoint limits

All limits are per 60-second window. Paid users (any purchase history) get a 1.5x multiplier; free-tier users get a 0.6x multiplier.

Endpoint	Bucket	Base limit	Free (0.6x)	Paid (1.5x)
/api/v1/chat	chat	90	54	135
/api/v1/compare	compare	45	27	68
/api/v1/blend	blend	30	18	45
/api/v1/judge	judge	30	18	45
/api/v1/uploads	upload	30	18	45
Copilot	copilot	30	18	45
All other routes	default	180	108	270

Dual-layer enforcement

Every request is checked against two independent counters:

Per-user — keyed by your user ID
Per-IP — keyed by your client IP address (via X-Forwarded-For)

IP-level limits are separate from user limits. Default IP limits: free = 120 req/min, paid = 360 req/min.

Burst protection

A second short-window layer prevents request spikes. Within any 10-second window:

Free users: 30 requests max
Paid users: 90 requests max

If you exceed the burst limit, you receive a 429 with the message "Request burst detected."

Response headers

Every API response includes rate-limit headers:

Header	Description
X-RateLimit-Limit	Maximum requests allowed in current window
X-RateLimit-Remaining	Requests remaining in current window
X-RateLimit-Reset	Seconds until the window resets
Retry-After	Seconds to wait before retrying (on 429)

Fail-open mode

By default, rate limiting runs in fail-open mode. If Redis is unavailable, requests are allowed through rather than blocked. This prevents a Redis outage from taking down your API access. Critical routes can be configured for fail-closed if needed.

Circuit breaker (Mesh mode)

When using Mesh/failover routing, a per-model circuit breaker protects against cascading failures:

3 consecutive failures → circuit opens for 30 seconds
During open state, the model is skipped and the next fallback is tried
After 30 seconds, half-open: one test request is allowed through
A successful test closes the circuit; a failure reopens it

Client retry baseline

for (let attempt = 0; attempt <= 3; attempt += 1) {
  const res = await fetch(url, init);
  if (res.ok) return res;
  if (res.status === 429 || res.status >= 500) {
    const retryAfter = res.headers.get("Retry-After");
    const delay = retryAfter
      ? parseInt(retryAfter, 10) * 1000
      : 300 * (2 ** attempt);
    await new Promise((r) => setTimeout(r, delay));
    continue;
  }
  throw new Error("HTTP " + res.status);
}

Use Retry-After

Always prefer the Retry-After header value over fixed backoff. It tells you exactly when your window resets.

Mesh mode tutorial Billing and credits Replay Lab tutorial

Docs Assistant

ChatKit-style guided help

Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.

Billing and Credits

Privacy, Security, and Data Controls