Context Windows: Why Your AI 'Forgets' Mid-Conversation

Understanding the limits of LLM memory and how to work around them. Learn what context windows really are and why they matter for building AI systems.

Context Window

It's called a context window because it's all an LLM can "see"

You remember your first kiss, your first day of college, your last promotion, your last birthday, the taste of a chocolate chip cookie. Your brain (like all human brains) has a virtually unlimited long-term memory, estimated to be 2.5 million GB or more. On the other hand, you might be able to remember just a few random numbers at once. Our short-term working memory can hold ~5-10 things, more if they are encoded efficiently (like in a story).

LLMs have "minds" that operate very differently than ours. They have a combined short & long term memory called a context window (you might have heard of it).

What Is a Context Window, Really?

Large Language Models are fundamentally next-token generators. They predict the next chunk of text based on everything that came before it, using something called an attention mechanism to decide which parts of the conversation matter most.

Think of it like trying to follow a book but you're only allowed to remember a certain number of passages (in your long-term memory, as in if someone mentioned the passage to you, you would have zero recollection of it).

The Numbers Are Deceptive

Modern LLMs have billions of parameters which are roughly equivalent to simple synapses, that control how the model thinks. Despite that, they can only remember tens to hundreds of thousands of words at a time. (The bible has ~800k words for comparison).

Because they come pre-trained with general information from the internet, the memory problem is far less obvious. It's the equivalent of having lost any ability to form long-term memories beyond the last few hours of conversation. Every new conversation starts fresh.

ModelContext WindowRoughly Equivalent To
GPT-3.5~4K tokensA short story
GPT-4~8K-128K tokensA novella to a novel
Claude 3~200K tokensA few novels

Even the largest context windows are tiny compared to human memory and experience.

What This Means for You

Systems built on top of chatbots try to work around this limitation. They summarize, reorganize, and retrieve information from external databases. But at the end of the day, you can only pack so much into that window.

Practical Tips

  1. Repeat important context — Don't assume the AI still "remembers" everything you said, especially in long conversations.

  2. Front-load critical information — Put the most important details at the beginning of your prompt, or repeat them near the end.

  3. Use structured formats — Bullet points and clear sections help the model's attention mechanism focus on what matters.

  4. Break up complex tasks — Instead of one massive conversation, use multiple focused sessions.

  5. Leverage system prompts — If you're building applications, use the system prompt for persistent instructions that should always be "remembered."

Need Help Implementing This?

Our team can help you build custom solutions tailored to your business needs.