Glossary

What Is a Context Window?

A context window is the maximum number of tokens an LLM can process in a single request, including both input and output.

I want to try now Read routing guide Open docs

Free preview, Starter for the Auto lane, Teams for manual GPT, Claude, and Gemini Pro access. Add-on credits kick in after included plan tokens are used.

Start on cheap auto-routed models first, then move up only when your workload truly needs premium manual control.

First success in 60 seconds

Step 01Sign up in 10 secondsTry the free preview Step 02Choose your laneStarter Auto or Teams Step 03Send first requestUse Auto first

Why teams start here first

Free preview

5 messages to try it

No card required to see how Auto routing feels before you commit.

Starter

Auto lane only

Curated cheap model pool with no manual premium-model selection.

Teams

Premium when you need it

Manual GPT, Claude, and Gemini Pro access starts here.

Billing

Plan tokens first

Add-on credits only extend usage after included plan tokens are exhausted.

Definition

A context window (also called context length or context limit) defines the maximum amount of text a large language model can consider in a single API call. It includes everything: your system prompt, conversation history, user message, and the model's generated response. When you exceed the context window, the model either truncates the oldest content or returns an error. Context windows range from 4K tokens for older models to 1M+ tokens for the latest frontier models.

Context window sizes by model

Gemini models lead with 1M+ token context windows, suitable for processing entire codebases or book-length documents. Claude Sonnet 4.5 and Claude Opus 4.6 offer 200K token windows - enough for most production use cases. GPT-5.2 supports 128K tokens. DeepSeek V3 also supports 128K. Larger context windows cost more per request because the model processes more input tokens, so choosing the right window size matters for cost control.

Practical implications

In chat applications, context windows determine how much conversation history the model remembers. A 128K window holds roughly 200 pages of text - enough for long multi-turn conversations. For document analysis, the window determines the maximum document size you can process in one pass. For code generation, it limits how much of a codebase the model can reference. When you hit limits, strategies include summarizing older messages, using RAG (retrieval-augmented generation), or selecting a model with a larger window.

Context windows and LLMWise

LLMWise Auto routing considers context window requirements when selecting a model. If your request needs a large context, Auto will avoid models that cannot handle it. You can also use LLMWise semantic memory to persist key conversation context across sessions without consuming window space - the memory system retrieves relevant past context and injects it as a compact summary rather than replaying entire conversation histories.

How LLMWise implements this

LLMWise gives you five orchestration modes — Chat, Compare, Blend, Judge, and Mesh — with built-in optimization policy, failover routing, and replay lab. Start on the free preview, move to Starter for the Auto lane, and use Teams for premium manual access.

Start free

Evidence snapshot

What Is a Context Window? concept coverage

Knowledge depth for this concept and direct paths to adjacent terms.

Core sections

concept angles covered

Related terms

connected topics linked

FAQs

common confusion resolved

Term type

Glossary

intro + practical implementation

Related concepts

what is token cost llm router llm orchestration

Common questions

What is the largest context window available?

As of early 2026, Google Gemini models offer the largest context windows at over 1 million tokens, capable of processing entire codebases or multiple books in a single request. Claude models offer 200K tokens, and GPT-5.2 supports 128K tokens. Larger windows continue to expand with each model generation.

What happens when I exceed the context window?

The behavior depends on the provider. Most APIs return an error when the total tokens (input + output) would exceed the window. Some providers silently truncate the oldest messages. LLMWise returns a clear error so you can adjust your request. Strategies include summarizing conversation history, reducing system prompt length, or switching to a model with a larger window.

Does a larger context window cost more?

Yes. Token costs are charged per token, so sending more context means higher costs per request. A 100K token input costs roughly 100x more than a 1K token input at the same per-token rate. This is why efficient context management - summarizing history, using RAG, and trimming unnecessary context - is important for production applications.

How does context window relate to memory?

Context window is the model's short-term memory for a single request. LLMWise semantic memory adds long-term memory across sessions by storing and retrieving relevant conversation snippets. This means the model can reference past interactions without consuming context window space, extending effective memory far beyond any single model's window limit.

Start on Auto, move up only when you need it

Free preview, Starter for the Auto lane, Teams for manual GPT, Claude, and Gemini Pro access. Add-on credits kick in after included plan tokens are used.

Start on cheap auto-routed models first, then move up only when your workload truly needs premium manual control.

Starter Auto laneTeams premium manual accessPlan tokens + add-ons

Start free See pricing examples

Get LLM insights in your inbox

Pricing changes, new model launches, and optimization tips. No spam.

AI Gateway: One API for Every LLM Generic LLM Gateways Separate Provider Accounts Cheapest LLM API: Best Value AI Models for Developers How to Compare LLM Models Side by Side Best LLM for Coding and Software Development