Billing & Limits

Billing, Plans, and Add-ons

Free preview, Starter and Teams allowances, token-settled billing, add-on credits, and BYOK.

10 minUpdated 2026-02-15
Summary

Free preview, Starter and Teams allowances, token-settled billing, add-on credits, and BYOK.

9 deep-dive sections1 code samples
Quick Start
  1. Set top-up and minimum credit policy.
  2. Enable per-user and per-key rate limits.
  3. Test 429 + retry behavior in staging.
  4. Monitor token and billing consistency in Usage.

Billing principle

LLMWise is now subscription-first with token-settled billing.

  • Free gives you a 5-message preview
  • Starter includes 10M tokens/month and stays on the Auto lane only
  • Teams includes 20M tokens/month and unlocks manual GPT, Claude, and Gemini Pro access
  • When included plan tokens are exhausted, add-on credits can extend usage
  • Final billing still settles from the actual tokens used by the selected model(s)

Free preview and plan access

Every new account starts with 5 free messages total. No card required. New users move to Starter or Teams for ongoing usage.

New-user billing policy

New users must have an active Starter or Teams subscription before buying add-on credits.
Legacy wallet users keep their existing wallet path.
BYOK still bypasses LLMWise billing entirely.

Plans

PlanAllowanceWhat it means
Free5 messages totalPreview the product before subscribing
Starter10M tokens / monthAuto lane only, curated cheap routing pool
Teams20M tokens / monthAuto + manual GPT, Claude, Gemini Pro, plus Compare / Blend / Judge

How settlement works

Included plan tokens or add-on credits are reserved before the request starts, then settled after execution:

Reserve → Execute → Settle
1
Reserve
Reserve plan usage or add-on credits upfront
2
Execute
Send request to model provider
3
Settle
Compare actual token usage to the starting reserve
4
Adjust
Return unused reserve or charge the remainder

If actual usage exceeds the starting reserve, the difference is charged. If usage is lower, unused reserve is returned. All adjustments appear in request history and billing views.

Mode reserves

These are starting reserves, not flat final prices:

ModeStarting reserveTypical use
Chat1 creditSingle-model or Auto chat
Compare2 creditsParallel model comparison (Teams)
Blend4 creditsSynthesis workflow (Teams)
Judge5 creditsContest + judge scoring (Teams)
Mesh1 creditFailover routing

Add-on flow

Minimum add-on purchase is $3. Maximum single purchase is $10,000.

Checkout to add-on crediting
1
Create checkout
POST /api/v1/credits/purchase
2
Pay in Stripe
Customer completes checkout
3
Webhook settle
POST /api/webhooks/stripe
4
Refresh balance
GET /api/v1/credits/balance

Auto top-up

Enable automatic add-on refills so requests never fail after plan exhaustion:

  1. Complete one Stripe checkout to save a payment method
  2. Enable auto top-up in /settings and set your preferred amount
  3. Set a balance threshold — when add-on credits drop below it, a top-up is triggered
  4. Set a monthly spending cap to control costs

Auto top-ups are processed as off-session Stripe PaymentIntents using your saved payment method. Monthly spending is tracked and capped to prevent runaway charges.

BYOK (Bring Your Own Key)

When a BYOK provider key is configured, requests route directly to the provider using your key. BYOK requests skip credit charges entirely — you pay the provider directly. This is useful when customer contracts require provider-direct billing.

Open catalog models

Some models in the catalog are marked is_free=true (provider-side free tier).
On LLMWise billing, requests still use the normal minimum request charge. On LLMWise billing, requests still settle through the normal token-based billing flow unless you are using BYOK.

Purpose of open catalog models

Provider-free models are best used for:

  1. Prompt and UX prototyping before spending paid credits
  2. Fallback paths for non-critical traffic during provider spikes
  3. A/B checks against paid models so you only pay where quality difference matters

Catalog updates are synced from OpenRouter, so available is_free=true models can change over time.

You can always fetch the current live list from:

GET /api/v1/models

Filter rows where is_free=true.

Docs Assistant

ChatKit-style guided help

Product-scoped assistant for LLMWise docs and API usage. It does not answer unrelated topics.

Sign in to ask implementation questions and get runnable snippets.

Sign in to use assistant
Previous
Blend Strategies & Orchestration Algorithms
Next
Rate Limits and Reliability