Select Page

21 Feb 2026 | 0 comments

"The context window is an LLM's 'working memory,' defining the maximum amount of input (prompt + conversation history) it can process and 'remember' at once." - Context window -

“The context window is an LLM’s ‘working memory,’ defining the maximum amount of input (prompt + conversation history) it can process and ‘remember’ at once.” – Context window

What is a Context Window?

The context window is an LLM’s short-term working memory, representing the maximum amount of information-measured in tokens-that it can process in a single interaction. This includes the input prompt, conversation history, system instructions, uploaded files, and even the output it generates.

A token is approximately three-quarters of an English word or four characters. For example, a ‘128k-token’ model can handle roughly 96,000 words, equivalent to a 300-page book, but this encompasses every element in the exchange, with tokens accumulating and billed per turn until trimmed or summarised.

Key Characteristics and Limitations

  • Total Scope: Encompasses prompt, history, instructions, and generated response-distinct from the model’s vast pre-training data.
  • Performance Degradation: As the window fills, LLMs may forget earlier details, repeat rejected ideas, or lose coherence, akin to human short-term memory limits.
  • Growth Trends: Early models had small windows; by mid-2023, 100,000 tokens became common, with models like Google’s Gemini now handling two million tokens (over 3,000 pages).

Implications for AI Applications

Larger context windows enable complex tasks like processing lengthy documents, debugging codebases, or analysing product reviews. However, models often prioritise prompt beginnings or ends, though recent advancements improve full-window coherence via expanded training data, optimised architectures, and scaled hardware.

When limits are hit, strategies include chunking documents, summarising history, or using external memory like scratchpads-persisting notes outside the window for agents to retrieve.

Best Related Strategy Theorist: Andrej Karpathy

Andrej Karpathy is the foremost theorist linking context windows to strategic AI engineering, famously likening LLMs to operating systems where the model acts as the CPU and the context window as RAM-limited working memory requiring careful curation.

Born in 1986 in Slovakia, Karpathy earned a PhD in computer vision from the University of Toronto under Geoffrey Hinton, a ‘Godfather of AI’. He pioneered recurrent neural networks (RNNs) for sequence modelling, foundational to memory in early language models. At OpenAI (2015-2017), he contributed to real-time language translation; at Tesla (2017-2022), he led Autopilot vision, advancing neural nets for autonomous driving.

Now founder of Eureka Labs (AI education) and former OpenAI employee, Karpathy popularised the context window analogy in lectures and blogs, emphasising ‘context engineering’-optimising inputs like an OS manages RAM. His insights guide agent design, advocating scratchpads and external memory to extend effective capacity, directly influencing frameworks like LangChain and Anthropic’s tools.

Karpathy’s biography embodies the shift from vision to language AI, making him uniquely positioned to strategise around memory constraints in production-scale systems.

 

References

1. https://forum.cursor.com/t/context-window-must-know-if-you-dont-know/86786

2. https://www.producttalk.org/glossary-ai-context-window/

3. https://platform.claude.com/docs/en/build-with-claude/context-windows

4. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-a-context-window

5. https://www.blog.langchain.com/context-engineering-for-agents/

6. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

 

Download brochure

Introduction brochure

What we do, case studies and profiles of some of our amazing team.

Download

Our latest podcasts on Spotify
Global Advisors | Quantified Strategy Consulting