Billing & Limits

Billing, Plans, and Add-ons

Free preview, Starter and Teams allowances, token-settled billing, add-on credits, and BYOK.

10 minUpdated 2026-02-15

Summary

Free preview, Starter and Teams allowances, token-settled billing, add-on credits, and BYOK.

9 deep-dive sections1 code samples

Quick Start

Set top-up and minimum credit policy.
Enable per-user and per-key rate limits.
Test 429 + retry behavior in staging.
Monitor token and billing consistency in Usage.

Billing principle

LLMWise is now subscription-first with token-settled billing.

Free gives you a 5-message preview
Starter includes 10M tokens/month and stays on the Auto lane only
Teams includes 20M tokens/month and unlocks manual GPT, Claude, and Gemini Pro access
When included plan tokens are exhausted, add-on credits can extend usage
Final billing still settles from the actual tokens used by the selected model(s)

Free preview and plan access

Every new account starts with 5 free messages total. No card required. New users move to Starter or Teams for ongoing usage.

New-user billing policy

New users must have an active Starter or Teams subscription before buying add-on credits.
Legacy wallet users keep their existing wallet path.
BYOK still bypasses LLMWise billing entirely.

Plans

Plan	Allowance	What it means
Free	5 messages total	Preview the product before subscribing
Starter	10M tokens / month	Auto lane only, curated cheap routing pool
Teams	20M tokens / month	Auto + manual GPT, Claude, Gemini Pro, plus Compare / Blend / Judge

How settlement works

Included plan tokens or add-on credits are reserved before the request starts, then settled after execution:

Reserve → Execute → Settle

Reserve

Reserve plan usage or add-on credits upfront

Execute

Send request to model provider

Settle

Compare actual token usage to the starting reserve

Adjust

Return unused reserve or charge the remainder

If actual usage exceeds the starting reserve, the difference is charged. If usage is lower, unused reserve is returned. All adjustments appear in request history and billing views.

Mode reserves

These are starting reserves, not flat final prices:

Mode	Starting reserve	Typical use
Chat	1 credit	Single-model or Auto chat
Compare	2 credits	Parallel model comparison (Teams)
Blend	4 credits	Synthesis workflow (Teams)
Judge	5 credits	Contest + judge scoring (Teams)
Mesh	1 credit	Failover routing

Add-on flow

Minimum add-on purchase is $3. Maximum single purchase is $10,000.

Checkout to add-on crediting

Create checkout

POST /api/v1/credits/purchase

Pay in Stripe

Customer completes checkout

Webhook settle

POST /api/webhooks/stripe

Refresh balance

GET /api/v1/credits/balance

Auto top-up

Enable automatic add-on refills so requests never fail after plan exhaustion:

Complete one Stripe checkout to save a payment method
Enable auto top-up in /settings and set your preferred amount
Set a balance threshold — when add-on credits drop below it, a top-up is triggered
Set a monthly spending cap to control costs

Auto top-ups are processed as off-session Stripe PaymentIntents using your saved payment method. Monthly spending is tracked and capped to prevent runaway charges.

BYOK (Bring Your Own Key)

When a BYOK provider key is configured, requests route directly to the provider using your key. BYOK requests skip credit charges entirely — you pay the provider directly. This is useful when customer contracts require provider-direct billing.

Open catalog models

Some models in the catalog are marked is_free=true (provider-side free tier).
On LLMWise billing, requests still use the normal minimum request charge. On LLMWise billing, requests still settle through the normal token-based billing flow unless you are using BYOK.

Purpose of open catalog models

Provider-free models are best used for:

Prompt and UX prototyping before spending paid credits
Fallback paths for non-critical traffic during provider spikes
A/B checks against paid models so you only pay where quality difference matters

Catalog updates are synced from OpenRouter, so available is_free=true models can change over time.

You can always fetch the current live list from:

GET /api/v1/models

Filter rows where is_free=true.

Rate limits and reliability Webhooks and sync Dashboard user guide

Docs Assistant

ChatKit-style guided help

Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.

Blend Strategies & Orchestration Algorithms

Rate Limits and Reliability