LLM API Cost Calculator: How to Estimate Your Bill Before You Build
By Rui Barreira · Last updated: 13 June 2026
You can estimate your LLM API bill before writing a single line of code using brevio LLM Cost Calculator — enter your expected input and output token counts and compare Claude, GPT-4o, Gemini, and Llama side by side. All calculations run in your browser with no API calls.
Underestimating LLM costs is the most common mistake in AI product development. A feature that costs $0.001 per request seems free until you have 100,000 daily active users running five requests each — suddenly that is $500 per day or $180,000 per year. Accurate cost estimation before building prevents architecture decisions that are expensive to reverse.
How LLM Pricing Works
Every major LLM provider charges separately for input tokens (the prompt you send) and output tokens (the response generated). Output tokens are almost always more expensive — generating text requires more compute than reading it. The ratio varies by model: GPT-4o charges $2.50 per 1M input tokens and $10 per 1M output tokens (a 4:1 ratio), while Claude Opus 4.8 charges $15 input and $75 output (also 5:1).
Understanding this asymmetry changes how you architect prompts. A 500-token system prompt sent on every request adds up identically to input tokens. A 2,000-token response costs 4× more than a 500-token response. Designing your application to reduce output verbosity — telling the model to be concise, using structured output formats like JSON instead of prose — is often the highest-leverage cost optimisation.
What Counts as a Token
One token is roughly 0.75 English words, or about 4 characters of typical prose. Short words like "the", "a", "in" each count as one token. Longer words may be split — "extraordinary" tokenises as 4 tokens under GPT-4o's cl100k_base tokeniser. Non-English languages and code tokenise differently: Python code is typically more efficient than Japanese text per character.
A quick rule of thumb: 1,000 words ≈ 1,333 tokens. A typical API response of 500 words ≈ 667 tokens. A full A4 page of text ≈ 500–700 tokens depending on formatting density.
Estimating Tokens by Use Case
| Use case | Typical input tokens | Typical output tokens | Notes |
|---|---|---|---|
| Customer support chatbot (per turn) | 500–1,500 | 200–500 | Includes system prompt + conversation history |
| RAG document QA (per query) | 2,000–8,000 | 300–800 | Large context from retrieved chunks |
| Batch document summarisation | 3,000–10,000 | 200–500 | Summarisation is output-light |
| Code generation (per function) | 300–800 | 200–600 | Output can be longer than prompt |
| JSON extraction from unstructured text | 500–2,000 | 100–400 | Structured output reduces token count |
| Email classification (per email) | 300–600 | 10–50 | Classification only — very output-light |
Real Cost Examples
Processing 1 million words (a large book) as input tokens (≈1.33M tokens) costs: GPT-4o $3.33, Claude Sonnet 4.6 $3.99, Gemini 2.5 Flash $0.20, Claude Haiku 4.5 $1.06. For batch processing workloads with high input-to-output ratios, Gemini Flash and GPT-4o mini are usually the clear winners on cost.
For a production chatbot handling 10,000 conversations per day with average input of 1,200 tokens and output of 400 tokens: daily cost at GPT-4o = (12M × $2.50 + 4M × $10) / 1M = $30 + $40 = $70/day ($25,550/year). The same load on GPT-4o mini: (12M × $0.15 + 4M × $0.60) / 1M = $1.80 + $2.40 = $4.20/day ($1,533/year) — a 17× cost reduction for equivalent volume, at the price of reduced reasoning capability.
Model Pricing Comparison (June 2026)
| Model | Provider | Input $/1M | Output $/1M | Best for |
|---|---|---|---|---|
| Claude Opus 4.8 | Anthropic | $15.00 | $75.00 | Complex reasoning, long context, agentic tasks |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | Balanced capability and cost, most general tasks |
| Claude Haiku 4.5 | Anthropic | $0.80 | $4.00 | High-volume, latency-sensitive, simple tasks |
| GPT-4o | OpenAI | $2.50 | $10.00 | General tasks, multimodal |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | High-volume classification and extraction |
| Gemini 2.5 Pro | $1.25 | $10.00 | Long context, reasoning | |
| Gemini 2.5 Flash | $0.15 | $0.60 | Speed-sensitive, cost-sensitive workloads | |
| Llama 3.3 70B (Groq) | Groq | $0.59 | $0.79 | Open-weights, fast inference |
Cost Optimisation Strategies
- Prompt caching. Anthropic and OpenAI both offer prompt caching for repeated system prompts or fixed context. Cached input tokens cost 10–25% of normal input price. If your system prompt is 2,000 tokens and you send 100,000 requests, caching saves $240–$285 at Sonnet 4.6 pricing.
- Route by complexity. Use a cheap, fast model (Haiku, GPT-4o mini) for simple classification or retrieval decisions, and reserve expensive models for reasoning-heavy tasks. A routing layer that sends 80% of requests to a cheaper model and 20% to the premium model can reduce total cost by 60–70%.
- Control output length. Instruct the model to be concise. "Answer in 2–3 sentences" can reduce output tokens by 50–70% with minimal quality loss for factual queries. For summarisation, specify target word counts.
- Batch API. Both Anthropic and OpenAI offer batch processing APIs at 50% discount for non-real-time workloads (processing takes up to 24 hours). For bulk document classification, batch API halves your cost with no quality tradeoff.
- Reduce context window. Every token in the conversation history is charged as input. Summarising old turns rather than sending the full transcript reduces costs for long conversations. A sliding window that keeps the last 3 turns plus a running summary is a common pattern.
How to Verify No Network Request Is Made
- Open DevTools with F12 (Windows/Linux) or ⌘⌥I (Mac).
- Go to the Network tab and filter to Fetch/XHR.
- Enter token counts in the LLM Cost Calculator.
- Observe: no network requests fire. All arithmetic runs locally in your browser using JavaScript.
Frequently Asked Questions
How are tokens different from words?
Tokens are sub-word units produced by a byte pair encoding (BPE) algorithm trained on a large text corpus. Common words like "the" are a single token. Less common words split into multiple tokens — "tokenization" becomes ["token", "ization"] under most tokenisers. The 0.75 words-per-token heuristic is a reasonable estimate for English prose; code and non-English text vary.
Do all providers use the same tokeniser?
No. OpenAI uses tiktoken (cl100k_base for GPT-4o, o200k_base for newer models). Anthropic uses its own tokeniser but provides a counting API and Python/JS libraries. The token counts are similar for English text (within 5–10%) but can diverge significantly for code or non-Latin scripts.
Is there a free tier for LLM APIs?
Anthropic offers $5 in free credits on signup. OpenAI varies by region — some accounts get $5 in credits. Google offers a free tier for Gemini API through Google AI Studio. Groq offers a generous free rate-limited tier. All free tiers have rate limits that make them unsuitable for production but fine for development.
How do I get an accurate token count before sending a request?
Use the provider's official tokeniser library: tiktoken for OpenAI (pip install tiktoken), Anthropic's anthropic Python SDK which has a count_tokens method, or Google's genai.count_tokens(). For estimates during design, the 0.75 words-per-token ratio is accurate enough for planning purposes.
Frequently Asked Questions
- How are tokens different from words?
- Tokens are sub-word units from byte pair encoding — roughly 0.75 English words per token. Common words are 1 token; longer words split into multiple tokens. For estimation, 1,000 words ≈ 1,333 tokens.
- Do all LLM providers use the same tokeniser?
- No. OpenAI uses tiktoken, Anthropic uses its own tokeniser, Google has its own as well. Counts are similar for English prose (within 5–10%) but can diverge for code or non-Latin scripts.
- Is there a free tier for LLM APIs?
- Anthropic offers $5 on signup, OpenAI varies by region, Google offers a free Gemini tier through AI Studio, and Groq has a generous free rate-limited tier. All have rate limits unsuitable for production.
- How do I get an accurate token count before sending a request?
- Use tiktoken for OpenAI, the Anthropic SDK's count_tokens method, or Google's genai.count_tokens(). For planning, 0.75 words-per-token is accurate enough for cost estimates.