A context window is the maximum number of tokens an LLM can process in a single request, including both input and output.
Free preview, Starter for the Auto lane, Teams for manual GPT, Claude, and Gemini Pro access. Add-on credits kick in after included plan tokens are used.
Start on cheap auto-routed models first, then move up only when your workload truly needs premium manual control.
A context window (also called context length or context limit) defines the maximum amount of text a large language model can consider in a single API call. It includes everything: your system prompt, conversation history, user message, and the model's generated response. When you exceed the context window, the model either truncates the oldest content or returns an error. Context windows range from 4K tokens for older models to 1M+ tokens for the latest frontier models.
Gemini models lead with 1M+ token context windows, suitable for processing entire codebases or book-length documents. Claude Sonnet 4.5 and Claude Opus 4.6 offer 200K token windows - enough for most production use cases. GPT-5.2 supports 128K tokens. DeepSeek V3 also supports 128K. Larger context windows cost more per request because the model processes more input tokens, so choosing the right window size matters for cost control.
In chat applications, context windows determine how much conversation history the model remembers. A 128K window holds roughly 200 pages of text - enough for long multi-turn conversations. For document analysis, the window determines the maximum document size you can process in one pass. For code generation, it limits how much of a codebase the model can reference. When you hit limits, strategies include summarizing older messages, using RAG (retrieval-augmented generation), or selecting a model with a larger window.
LLMWise Auto routing considers context window requirements when selecting a model. If your request needs a large context, Auto will avoid models that cannot handle it. You can also use LLMWise semantic memory to persist key conversation context across sessions without consuming window space - the memory system retrieves relevant past context and injects it as a compact summary rather than replaying entire conversation histories.
LLMWise gives you five orchestration modes — Chat, Compare, Blend, Judge, and Mesh — with built-in optimization policy, failover routing, and replay lab. Start on the free preview, move to Starter for the Auto lane, and use Teams for premium manual access.
Start freeKnowledge depth for this concept and direct paths to adjacent terms.
Free preview, Starter for the Auto lane, Teams for manual GPT, Claude, and Gemini Pro access. Add-on credits kick in after included plan tokens are used.
Start on cheap auto-routed models first, then move up only when your workload truly needs premium manual control.
Pricing changes, new model launches, and optimization tips. No spam.