Billing & Limits

Rate Limits and Reliability

Per-endpoint limits, burst protection, dual-layer enforcement, response headers, circuit breaker failover, and retry strategy.

9 minUpdated 2026-02-15
Summary

Per-endpoint limits, burst protection, dual-layer enforcement, response headers, circuit breaker failover, and retry strategy.

8 deep-dive sections1 code samples
Quick Start
  1. Set top-up and minimum credit policy.
  2. Enable per-user and per-key rate limits.
  3. Test 429 + retry behavior in staging.
  4. Monitor charged credits consistency in Usage.

Reliability stack

Protection layers
App rate limits
Per-user and per-IP windows
Burst protection
Short-window spike detection
Mesh fallback
Provider failover on saturation
Circuit breaker
Auto-skip unhealthy models

Per-endpoint limits

All limits are per 60-second window. Paid users (any purchase history) get a 1.5x multiplier; free-tier users get a 0.6x multiplier.

EndpointBucketBase limitFree (0.6x)Paid (1.5x)
/api/v1/chatchat9054135
/api/v1/comparecompare452768
/api/v1/blendblend301845
/api/v1/judgejudge301845
/api/v1/uploadsupload301845
Copilotcopilot301845
All other routesdefault180108270

Dual-layer enforcement

Every request is checked against two independent counters:

  1. Per-user — keyed by your user ID
  2. Per-IP — keyed by your client IP address (via X-Forwarded-For)

IP-level limits are separate from user limits. Default IP limits: free = 120 req/min, paid = 360 req/min.

Burst protection

A second short-window layer prevents request spikes. Within any 10-second window:

  • Free users: 30 requests max
  • Paid users: 90 requests max

If you exceed the burst limit, you receive a 429 with the message "Request burst detected."

Response headers

Every API response includes rate-limit headers:

HeaderDescription
X-RateLimit-LimitMaximum requests allowed in current window
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetSeconds until the window resets
Retry-AfterSeconds to wait before retrying (on 429)

Fail-open mode

By default, rate limiting runs in fail-open mode. If Redis is unavailable, requests are allowed through rather than blocked. This prevents a Redis outage from taking down your API access. Critical routes can be configured for fail-closed if needed.

Circuit breaker (Mesh mode)

When using Mesh/failover routing, a per-model circuit breaker protects against cascading failures:

  • 3 consecutive failures → circuit opens for 30 seconds
  • During open state, the model is skipped and the next fallback is tried
  • After 30 seconds, half-open: one test request is allowed through
  • A successful test closes the circuit; a failure reopens it

Client retry baseline

for (let attempt = 0; attempt <= 3; attempt += 1) {
  const res = await fetch(url, init);
  if (res.ok) return res;
  if (res.status === 429 || res.status >= 500) {
    const retryAfter = res.headers.get("Retry-After");
    const delay = retryAfter
      ? parseInt(retryAfter, 10) * 1000
      : 300 * (2 ** attempt);
    await new Promise((r) => setTimeout(r, delay));
    continue;
  }
  throw new Error("HTTP " + res.status);
}
Use Retry-After

Always prefer the Retry-After header value over fixed backoff. It tells you exactly when your window resets.

Docs Assistant

ChatKit-style guided help

Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.

Sign in to ask implementation questions and get runnable snippets.

Sign in to use assistant
Previous
Billing and Credits
Next
Privacy, Security, and Data Controls