Glossary

What Is a Context Window?

A context window is the maximum number of tokens an LLM can process in a single request, including both input and output.

You only pay credits per request. No monthly subscription. Paid credits never expire.

Replace multiple AI subscriptions with one wallet that includes routing, failover, and optimization.

Why teams start here first
No monthly subscription
Pay-as-you-go credits
Start with trial credits, then buy only what you consume.
Failover safety
Production-ready routing
Auto fallback across providers when latency, quality, or reliability changes.
Data control
Your policy, your choice
BYOK and zero-retention mode keep training and storage scope explicit.
Single API experience
One key, multi-provider access
Use Chat/Compare/Blend/Judge/Failover from one dashboard.
Definition

A context window (also called context length or context limit) defines the maximum amount of text a large language model can consider in a single API call. It includes everything: your system prompt, conversation history, user message, and the model's generated response. When you exceed the context window, the model either truncates the oldest content or returns an error. Context windows range from 4K tokens for older models to 1M+ tokens for the latest frontier models.

Context window sizes by model

Gemini models lead with 1M+ token context windows, suitable for processing entire codebases or book-length documents. Claude Sonnet 4.5 and Claude Opus 4.6 offer 200K token windows — enough for most production use cases. GPT-5.2 supports 128K tokens. DeepSeek V3 also supports 128K. Larger context windows cost more per request because the model processes more input tokens, so choosing the right window size matters for cost control.

Practical implications

In chat applications, context windows determine how much conversation history the model remembers. A 128K window holds roughly 200 pages of text — enough for long multi-turn conversations. For document analysis, the window determines the maximum document size you can process in one pass. For code generation, it limits how much of a codebase the model can reference. When you hit limits, strategies include summarizing older messages, using RAG (retrieval-augmented generation), or selecting a model with a larger window.

Context windows and LLMWise

LLMWise Auto routing considers context window requirements when selecting a model. If your request needs a large context, Auto will avoid models that cannot handle it. You can also use LLMWise semantic memory to persist key conversation context across sessions without consuming window space — the memory system retrieves relevant past context and injects it as a compact summary rather than replaying entire conversation histories.

How LLMWise implements this

LLMWise gives you five orchestration modes — Chat, Compare, Blend, Judge, and Mesh — with built-in optimization policy, failover routing, and replay lab. No monthly subscription is required and paid credits do not expire.

Start free with 40 credits
Evidence snapshot

What Is a Context Window? concept coverage

Knowledge depth for this concept and direct paths to adjacent terms.

Core sections
3
concept angles covered
Related terms
3
connected topics linked
FAQs
4
common confusion resolved
Term type
Glossary
intro + practical implementation

Common questions

What is the largest context window available?
As of early 2026, Google Gemini models offer the largest context windows at over 1 million tokens, capable of processing entire codebases or multiple books in a single request. Claude models offer 200K tokens, and GPT-5.2 supports 128K tokens. Larger windows continue to expand with each model generation.
What happens when I exceed the context window?
The behavior depends on the provider. Most APIs return an error when the total tokens (input + output) would exceed the window. Some providers silently truncate the oldest messages. LLMWise returns a clear error so you can adjust your request. Strategies include summarizing conversation history, reducing system prompt length, or switching to a model with a larger window.
Does a larger context window cost more?
Yes. Token costs are charged per token, so sending more context means higher costs per request. A 100K token input costs roughly 100x more than a 1K token input at the same per-token rate. This is why efficient context management — summarizing history, using RAG, and trimming unnecessary context — is important for production applications.
How does context window relate to memory?
Context window is the model's short-term memory for a single request. LLMWise semantic memory adds long-term memory across sessions by storing and retrieving relevant conversation snippets. This means the model can reference past interactions without consuming context window space, extending effective memory far beyond any single model's window limit.

One wallet, enterprise AI controls built in

You only pay credits per request. No monthly subscription. Paid credits never expire.

Replace multiple AI subscriptions with one wallet that includes routing, failover, and optimization.

Chat, Compare, Blend, Judge, MeshPolicy routing + replay labFailover without extra subscriptions