What chunk size should I use for RAG?

A good starting point is 500–800 characters (roughly 100–200 words) with 10–15% overlap. Shorter documents (FAQs, product specs) work well at 256–512 chars. Dense technical documents benefit from larger chunks (800–1,500 chars). Always evaluate retrieval quality on your specific dataset — there is no universally optimal size.

What does overlap do in chunking?

Overlap ensures that sentences or concepts that fall at a chunk boundary are included in both adjacent chunks. Without overlap, a retrieval query matching content that spans two chunks may retrieve only one chunk, missing half the context. 50–100 character overlap covers most sentence boundaries.

Should I chunk by characters or tokens?

Token-based chunking is more precise because LLMs have token limits, not character limits. However, characters are simpler and more portable across models. For most use cases, character-based chunking with a ~4 chars/token conversion factor works well. Use token-based chunking when you are close to context limits.

Does LangChain use the same chunking as this tool?

LangChain's RecursiveCharacterTextSplitter uses a hierarchy of separators (paragraph, sentence, word, character) to try to split at semantic boundaries before falling back to character splits. This tool uses pure character-based splitting for simplicity and to demonstrate the core concept. For production RAG, LangChain's recursive splitter usually produces better semantic chunks.

guide

RAG Document Chunking: How to Split Text for Retrieval-Augmented Generation

By Rui Barreira · Last updated: 13 June 2026

Chunking is the process of splitting long documents into smaller pieces before indexing them in a vector database for Retrieval-Augmented Generation (RAG). The chunk size and overlap you choose directly affect retrieval quality — and therefore the accuracy of your LLM's answers. Use brevio RAG Chunk Previewer to visualise how your documents will be split before building your pipeline.

Why Chunking Matters

RAG systems work by embedding text chunks into a vector space and retrieving the chunks most semantically similar to a query. If your chunks are too large, the retrieved chunk contains the answer but also a lot of irrelevant content — the model receives noisy context and may generate less accurate answers. If chunks are too small, they may lack the surrounding context needed for the model to generate a coherent response.

The goal is to find the sweet spot where each chunk is semantically coherent (covers one idea or topic), small enough for precise retrieval, and large enough to provide sufficient context for generation.

Chunk Size vs Retrieval Quality

Chunk Size	Retrieval Precision	Context Quality	Best For
100–256 chars	Very high	Low (too brief)	Short facts, Q&A pairs
256–512 chars	High	Medium	FAQ sections, product specs
512–1000 chars	Medium	Good	General prose, documentation
1000–2000 chars	Lower	High	Dense technical content, code

What Is Overlap?

Overlap is the number of characters repeated between adjacent chunks. Without overlap, a sentence or concept that falls at a chunk boundary may be split across two chunks. If a query matches only the second half of the sentence, the retrieved chunk starts mid-sentence — missing the first half entirely.

With 100 characters of overlap, the last 100 characters of chunk N are also the first 100 characters of chunk N+1. This ensures that content near boundaries is represented in multiple chunks, increasing the likelihood that a relevant query retrieves complete context.

Typical overlap values: 50–200 characters for character-based chunking, or 10–20% of the chunk size. Overlap above 30% starts producing redundant retrievals with minimal benefit.

Character vs Token Chunking

Character-based chunking (what this tool does) is simpler and portable across providers. Token-based chunking is more precise because LLM context windows are measured in tokens, not characters. For most use cases, character chunking with a rough 4 chars/token conversion works well:

500 chars ≈ 125 tokens
1,000 chars ≈ 250 tokens
4,000 chars ≈ 1,000 tokens

Use token-based chunking when you are approaching the model's context window limit and need exact counts. Use character-based chunking when portability and simplicity matter more than precision.

Optimal Chunk Sizes by Content Type

Content Type	Recommended Chunk Size	Recommended Overlap	Rationale
FAQ documents	256–512 chars	50 chars	Each Q&A is self-contained
Product descriptions	400–600 chars	100 chars	One product per chunk
Technical documentation	800–1,200 chars	150 chars	Concepts span multiple sentences
Legal/compliance text	1,000–1,500 chars	200 chars	Context required for interpretation
Source code	Function or class level	0 chars	Semantic unit = function

How to Preview Chunking

Open brevio RAG Chunk Previewer.
Paste a representative sample from your document — 2,000–5,000 characters is enough to see the chunking pattern.
Adjust the chunk size slider and watch how the chunks change. Look for chunks that start mid-sentence or cut off mid-idea.
Adjust overlap until boundary content appears in adjacent chunks.
Check the chunk count. For a 10,000-word document at 500 chars/chunk, expect 150–200 chunks — a manageable size for most vector databases.

Alternatives: Semantic and Recursive Chunking

Character-based chunking is the simplest approach. Production RAG pipelines often use more sophisticated methods:

Recursive character splitting (LangChain): Tries to split at paragraph boundaries first, then sentences, then words, then characters. Produces more semantically coherent chunks than pure character splitting.
Semantic chunking: Uses sentence embeddings to detect topic shifts and split there. More compute-intensive but produces the best semantic coherence.
Document structure splitting: Uses Markdown headings, HTML tags, or PDF section markers as natural split points. Best for well-structured documents.

DevTools Verification

The RAG Chunk Previewer runs entirely in JavaScript. No POST requests are made when you paste text or adjust sliders. Your document content never leaves your browser.

Frequently Asked Questions

What chunk size should I use for RAG?: A good starting point is 500–800 characters (roughly 100–200 words) with 10–15% overlap. Shorter documents (FAQs, product specs) work well at 256–512 chars. Dense technical documents benefit from larger chunks (800–1,500 chars). Always evaluate retrieval quality on your specific dataset — there is no universally optimal size.
What does overlap do in chunking?: Overlap ensures that sentences or concepts that fall at a chunk boundary are included in both adjacent chunks. Without overlap, a retrieval query matching content that spans two chunks may retrieve only one chunk, missing half the context. 50–100 character overlap covers most sentence boundaries.
Should I chunk by characters or tokens?: Token-based chunking is more precise because LLMs have token limits, not character limits. However, characters are simpler and more portable across models. For most use cases, character-based chunking with a ~4 chars/token conversion factor works well. Use token-based chunking when you are close to context limits.
Does LangChain use the same chunking as this tool?: LangChain's RecursiveCharacterTextSplitter uses a hierarchy of separators (paragraph, sentence, word, character) to try to split at semantic boundaries before falling back to character splits. This tool uses pure character-based splitting for simplicity and to demonstrate the core concept. For production RAG, LangChain's recursive splitter usually produces better semantic chunks.