What Context Assembly Is
Before generating a response, an AI system assembles everything it can reference into a single context window.
This window has a fixed size (measured in tokens). Everything inside the window can influence the response. Everything outside does not exist to the model.
The Context Window
Modern LLMs have context windows ranging from 128K to 1M+ tokens.
| Model | Approximate Context Window |
|---|---|
| GPT 5.2 | 200K tokens |
| Claude 4.5 | 200K tokens (1M in beta) |
| Gemini 3.0 | 1M+ tokens |
| Llama 4 | 128K tokens |
Larger windows allow more information, but attention quality degrades with length.
How Context Gets Filled
For a typical AI assistant query, context assembly includes:
- System prompt (instructions, persona)
- Conversation history (prior messages)
- Retrieved documents (from RAG pipeline)
- User query (the current question)
The order matters. Items placed earlier receive more attention weight.
How AI Finds Information to Cite
When you ask ChatGPT or Perplexity a question, it doesn't just guess from memory.
Modern AI systems search for relevant content first, then use what they find to answer you. This process is called Retrieval Augmented Generation (RAG).
The simple version:
- You ask a question
- The system searches a database of content
- It pulls the best matches into the AI's working memory
- The AI answers based on what it retrieved
Think of it like a research assistant. You ask a question. They search the filing cabinet, pull relevant documents, and answer based on what they found.
The technical version:
- Embed the user query into a vector
- Search a vector database using similarity metrics
- Retrieve top-k matching document chunks
- Inject retrieved content into the context window
- Generate a response grounded in retrieved context
The critical point remains the same:
If your content isn't indexed, it can't be retrieved. If it's not retrieved, it's not in the context. If it's not in the context, you don't exist.
What Gets Retrieved
Retrieval systems use semantic similarity.
They compare:
- Query embedding
- Document chunk embeddings
Top matches (by cosine similarity) are retrieved.
Documents compete for retrieval slots.
If your content's embedding is:
- Too generic → matches everything weakly
- Too specific → matches only narrow queries
- Ambiguous → matches unpredictably
Then you lose to competitors with cleaner signals.
Chunking and Boundaries
Long documents are split into chunks before embedding.
Chunking decisions affect retrieval:
| Chunk Strategy | Effect |
|---|---|
| Fixed-size (512 tokens) | May split mid-sentence |
| Sentence-based | Preserves semantic units |
| Paragraph-based | Captures local context |
| Section-based | Maintains document structure |
Bad chunking can fragment your claims.
A claim like "Gong serves 4,900+ customers and is SOC 2 certified" might be split across chunks, weakening both.
Why Retrievability Determines Visibility
The retrieval phase is binary for each document:
- Retrieved → enters context → can influence response
- Not retrieved → invisible → cannot be cited
There is no partial retrieval. There is no "almost made it."
Retrieval is the gatekeeping layer.
What Makes Content Retrievable
High retrievability:
- Self-contained claims (don't require external context)
- Clear entity identification in each chunk
- Explicit statement of what, who, why
- Consistent terminology matching likely queries
Low retrievability:
- Claims spread across multiple sections
- Entity names only in headers, not body text
- Assumed context never stated
- Jargon that doesn't match user vocabulary
Practical Example
Low retrievability:
"Our platform helps you do more with less. With AI-powered insights, teams can work smarter and achieve better outcomes."
This matches almost any productivity query weakly, and no specific query strongly.
High retrievability:
"Gong's revenue intelligence platform records and analyzes B2B sales calls. It automatically identifies deal risks, successful talk patterns, and competitor mentions across your pipeline."
This will retrieve for queries about:
- Revenue intelligence platforms
- Sales call recording tools
- Deal risk analysis
- Conversation analytics
Key Takeaway
Context assembly is zero-sum.
Every token in the window is a slot. Your content competes for those slots against everything else the system could retrieve.
If you're not retrieved, you're not considered. Retrievability is the first gate to AI visibility.