guide

Token Counting for LLMs: What Counts, Why It Matters, and How to Check

By Rui Barreira · Last updated: 13 June 2026

You can count tokens in any text instantly using brevio Token Counter — paste your text and see an approximate token count alongside context window usage for Claude, GPT-4o, and Gemini. All processing runs in your browser with no data sent to any server.

Token counting matters for two reasons: cost and context limits. Every token in a request is billed by the provider. Every token also occupies space in the model's context window — exceed it and either the request fails or the oldest content gets truncated. Understanding token counts before you build prevents surprises in both your invoice and your application behaviour.

What Is a Token?

A token is the basic unit that LLMs process. Tokens are produced by a byte pair encoding (BPE) algorithm applied to the training corpus. Common short words become single tokens (“the”, “in”, “is”). Less common or longer words split into multiple tokens (“extraordinary” → [“extra”, “ordinary”] under many tokenisers). Punctuation, whitespace, and special characters have their own token representations.

The practical rule of thumb for English prose: 1 token ≈ 0.75 words, or about 4 characters. So 1,000 words ≈ 1,333 tokens, and a standard 500-word blog post ≈ 667 tokens. These ratios hold for typical English content — code, non-English text, and dense technical content may tokenise differently.

Why Token Count Matters for Pricing

All major LLM providers bill per-token with separate rates for input (prompt) and output (completion). The distinction matters because output tokens cost significantly more — generating text requires more compute than reading it. GPT-4o charges $2.50/1M input tokens but $10/1M output tokens. Claude Sonnet 4.6 charges $3.00 input and $15.00 output.

A 1,000-word system prompt sent on every request adds ≈1,333 input tokens to every API call. At $3.00/1M tokens on Claude Sonnet, that is $0.004 per request — trivial for 100 requests, but $400/day at 100,000 daily requests. Prompt caching reduces this cost by 80–90% on supported providers.

Context Window Comparison

ModelProviderContext windowApprox. word equivalent
Claude Opus 4.8Anthropic200,000 tokens~150,000 words (~500 pages)
Claude Sonnet 4.6Anthropic200,000 tokens~150,000 words (~500 pages)
Claude Haiku 4.5Anthropic200,000 tokens~150,000 words (~500 pages)
GPT-4oOpenAI128,000 tokens~96,000 words (~320 pages)
GPT-4o miniOpenAI128,000 tokens~96,000 words (~320 pages)
Gemini 2.5 ProGoogle1,000,000 tokens~750,000 words (~2,500 pages)
Gemini 2.5 FlashGoogle1,000,000 tokens~750,000 words (~2,500 pages)
Llama 3.3 70B (Groq)Groq128,000 tokens~96,000 words (~320 pages)

How to Verify No Data Is Transmitted

  1. Open DevTools with F12 (Windows/Linux) or ⌘⌥I (Mac).
  2. Go to the Network tab and filter to Fetch/XHR.
  3. Paste text into the brevio Token Counter.
  4. Observe: no network requests fire. Token counting uses a JavaScript regex-based algorithm that runs entirely in your browser. Your text never leaves your device.

Strategies for Staying Within Context Limits

  • Summarise conversation history. Long multi-turn conversations accumulate tokens fast. After 5–10 turns, summarise the conversation so far into a 200-token summary rather than sending the full transcript.
  • Chunk large documents. For document QA, split documents into 500–1,000 token chunks and retrieve only the relevant chunks using semantic search (RAG pattern). Send 3–5 relevant chunks instead of the entire document.
  • Use system prompts efficiently. Keep system prompts short and specific. Every word in the system prompt is paid on every API call.
  • Structured output reduces tokens. Asking for JSON output instead of prose descriptions can reduce output tokens by 30–50% for classification or extraction tasks.
  • Choose the right model. For tasks where context window matters more than cost, Gemini 2.5 Pro's 1M token window is the largest available. For most applications, Claude's 200K or GPT-4o's 128K is sufficient.

Frequently Asked Questions

Do different providers count tokens the same way?

No. OpenAI uses tiktoken (cl100k_base for GPT-4o). Anthropic uses its own tokeniser but provides counting APIs. Google has its own tokeniser. Counts are within 5–10% for English prose but diverge for code, non-Latin scripts, and very long words. Use the provider's official tokeniser for billing-sensitive counts.

How do I count tokens exactly in code?

For OpenAI: pip install tiktoken, then import tiktoken; enc = tiktoken.get_encoding("cl100k_base"); len(enc.encode(text)). For Anthropic: use the Python SDK's client.count_tokens(text) method. For Google: model.count_tokens(text) from the genai SDK.

What happens if I exceed the context window?

The API returns an error (context length exceeded) or, for some models in some configurations, silently truncates the oldest content. The specific behaviour depends on the model and provider. Brevio Token Counter's warning at 80% and error at 100% of the context window gives you advance notice before hitting this limit.

Is the token counter accurate for code?

Code tokenises differently from prose. Python source code typically tokenises more efficiently (fewer tokens per character) because keywords and identifiers are common. However, complex variable names, string literals, and indentation may tokenise less efficiently. The heuristic approximation in brevio Token Counter is within ±15% for typical code samples.

Frequently Asked Questions

Do different providers count tokens the same way?
No. OpenAI uses tiktoken (cl100k_base for GPT-4o), Anthropic has its own tokeniser, and Google uses its own. Counts are within 5–10% for English prose but diverge for code and non-Latin scripts.
How do I count tokens exactly in code?
For OpenAI: pip install tiktoken, then len(tiktoken.get_encoding("cl100k_base").encode(text)). For Anthropic: client.count_tokens(text) from the Python SDK. For Google: model.count_tokens(text) from genai.
What happens if I exceed the context window?
The API returns an error (context length exceeded) or silently truncates the oldest content, depending on the model and provider. Plan ahead with a token counter to avoid hitting this limit in production.
Is the token counter accurate for code?
Code tokenises differently from prose. Python source code is often more efficient per character. The heuristic approximation in brevio Token Counter is within ±15% for typical code samples.
More free toolsSee all 162
Merge PDFsCompress ImageJSON FormatterPassword GeneratorVAT CalculatorQR Code Generator