A context window is the maximum number of tokens an LLM can process in a single request, including both input and output.
You only pay credits per request. No monthly subscription. Paid credits never expire.
Replace multiple AI subscriptions with one wallet that includes routing, failover, and optimization.
A context window (also called context length or context limit) defines the maximum amount of text a large language model can consider in a single API call. It includes everything: your system prompt, conversation history, user message, and the model's generated response. When you exceed the context window, the model either truncates the oldest content or returns an error. Context windows range from 4K tokens for older models to 1M+ tokens for the latest frontier models.
Gemini models lead with 1M+ token context windows, suitable for processing entire codebases or book-length documents. Claude Sonnet 4.5 and Claude Opus 4.6 offer 200K token windows — enough for most production use cases. GPT-5.2 supports 128K tokens. DeepSeek V3 also supports 128K. Larger context windows cost more per request because the model processes more input tokens, so choosing the right window size matters for cost control.
In chat applications, context windows determine how much conversation history the model remembers. A 128K window holds roughly 200 pages of text — enough for long multi-turn conversations. For document analysis, the window determines the maximum document size you can process in one pass. For code generation, it limits how much of a codebase the model can reference. When you hit limits, strategies include summarizing older messages, using RAG (retrieval-augmented generation), or selecting a model with a larger window.
LLMWise Auto routing considers context window requirements when selecting a model. If your request needs a large context, Auto will avoid models that cannot handle it. You can also use LLMWise semantic memory to persist key conversation context across sessions without consuming window space — the memory system retrieves relevant past context and injects it as a compact summary rather than replaying entire conversation histories.
LLMWise gives you five orchestration modes — Chat, Compare, Blend, Judge, and Mesh — with built-in optimization policy, failover routing, and replay lab. No monthly subscription is required and paid credits do not expire.
Start free with 40 creditsKnowledge depth for this concept and direct paths to adjacent terms.
You only pay credits per request. No monthly subscription. Paid credits never expire.
Replace multiple AI subscriptions with one wallet that includes routing, failover, and optimization.