It's called a context window because it's all an LLM can "see"
You remember your first kiss, your first day of college, your last promotion, your last birthday, the taste of a chocolate chip cookie. Your brain (like all human brains) has a virtually unlimited long-term memory, estimated to be 2.5 million GB or more. On the other hand, you might be able to remember just a few random numbers at once. Our short-term working memory can hold ~5-10 things, more if they are encoded efficiently (like in a story).
LLMs have "minds" that operate very differently than ours. They have a combined short & long term memory called a context window (you might have heard of it).
What Is a Context Window, Really?
Large Language Models are fundamentally next-token generators. They predict the next chunk of text based on everything that came before it, using something called an attention mechanism to decide which parts of the conversation matter most.
Think of it like trying to follow a book but you're only allowed to remember a certain number of passages (in your long-term memory, as in if someone mentioned the passage to you, you would have zero recollection of it).
The Numbers Are Deceptive
Modern LLMs have billions of parameters which are roughly equivalent to simple synapses, that control how the model thinks. Despite that, they can only remember tens to hundreds of thousands of words at a time. (The bible has ~800k words for comparison).
Because they come pre-trained with general information from the internet, the memory problem is far less obvious. It's the equivalent of having lost any ability to form long-term memories beyond the last few hours of conversation. Every new conversation starts fresh.
| Model | Context Window | Roughly Equivalent To |
|---|---|---|
| GPT-3.5 | ~4K tokens | A short story |
| GPT-4 | ~8K-128K tokens | A novella to a novel |
| Claude 3 | ~200K tokens | A few novels |
Even the largest context windows are tiny compared to human memory and experience.
What This Means for You
Systems built on top of chatbots try to work around this limitation. They summarize, reorganize, and retrieve information from external databases. But at the end of the day, you can only pack so much into that window.
Practical Tips
-
Repeat important context — Don't assume the AI still "remembers" everything you said, especially in long conversations.
-
Front-load critical information — Put the most important details at the beginning of your prompt, or repeat them near the end.
-
Use structured formats — Bullet points and clear sections help the model's attention mechanism focus on what matters.
-
Break up complex tasks — Instead of one massive conversation, use multiple focused sessions.
-
Leverage system prompts — If you're building applications, use the system prompt for persistent instructions that should always be "remembered."