Rate Limits and Reliability
Per-endpoint limits, burst protection, dual-layer enforcement, response headers, circuit breaker failover, and retry strategy.
Per-endpoint limits, burst protection, dual-layer enforcement, response headers, circuit breaker failover, and retry strategy.
- Set top-up and minimum credit policy.
- Enable per-user and per-key rate limits.
- Test 429 + retry behavior in staging.
- Monitor charged credits consistency in Usage.
Reliability stack
Per-endpoint limits
All limits are per 60-second window. Paid users (any purchase history) get a 1.5x multiplier; free-tier users get a 0.6x multiplier.
| Endpoint | Bucket | Base limit | Free (0.6x) | Paid (1.5x) |
|---|---|---|---|---|
| /api/v1/chat | chat | 90 | 54 | 135 |
| /api/v1/compare | compare | 45 | 27 | 68 |
| /api/v1/blend | blend | 30 | 18 | 45 |
| /api/v1/judge | judge | 30 | 18 | 45 |
| /api/v1/uploads | upload | 30 | 18 | 45 |
| Copilot | copilot | 30 | 18 | 45 |
| All other routes | default | 180 | 108 | 270 |
Dual-layer enforcement
Every request is checked against two independent counters:
- Per-user — keyed by your user ID
- Per-IP — keyed by your client IP address (via
X-Forwarded-For)
IP-level limits are separate from user limits. Default IP limits: free = 120 req/min, paid = 360 req/min.
Burst protection
A second short-window layer prevents request spikes. Within any 10-second window:
- Free users: 30 requests max
- Paid users: 90 requests max
If you exceed the burst limit, you receive a 429 with the message "Request burst detected."
Response headers
Every API response includes rate-limit headers:
| Header | Description |
|---|---|
| X-RateLimit-Limit | Maximum requests allowed in current window |
| X-RateLimit-Remaining | Requests remaining in current window |
| X-RateLimit-Reset | Seconds until the window resets |
| Retry-After | Seconds to wait before retrying (on 429) |
Fail-open mode
By default, rate limiting runs in fail-open mode. If Redis is unavailable, requests are allowed through rather than blocked. This prevents a Redis outage from taking down your API access. Critical routes can be configured for fail-closed if needed.
Circuit breaker (Mesh mode)
When using Mesh/failover routing, a per-model circuit breaker protects against cascading failures:
- 3 consecutive failures → circuit opens for 30 seconds
- During open state, the model is skipped and the next fallback is tried
- After 30 seconds, half-open: one test request is allowed through
- A successful test closes the circuit; a failure reopens it
Client retry baseline
for (let attempt = 0; attempt <= 3; attempt += 1) {
const res = await fetch(url, init);
if (res.ok) return res;
if (res.status === 429 || res.status >= 500) {
const retryAfter = res.headers.get("Retry-After");
const delay = retryAfter
? parseInt(retryAfter, 10) * 1000
: 300 * (2 ** attempt);
await new Promise((r) => setTimeout(r, delay));
continue;
}
throw new Error("HTTP " + res.status);
}
Always prefer the Retry-After header value over fixed backoff. It tells you exactly when your window resets.
ChatKit-style guided help
Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.
Sign in to ask implementation questions and get runnable snippets.
Sign in to use assistant