Common LLM API Errors Explained: 400, 401, 429, Context Length, and More
By Rui Barreira · Last updated: 13 June 2026
You can decode any LLM API error in seconds using brevio LLM Error Decoder — search by HTTP status code, error name, or provider and get the cause and fix instantly. All lookups run in your browser with no API call required.
LLM API errors fall into two categories: client errors (4xx) that you caused and can fix immediately, and server errors (5xx) that the provider caused and require retry logic. Getting this distinction right saves hours of debugging. A 401 means your key is wrong. A 429 means you are sending too fast. A 500 means the provider is having a bad day — retry and move on.
HTTP Status Code Refresher
4xx errors are client errors — the request you sent is invalid in some way. You need to change something before retrying. 5xx errors are server errors — your request was valid but the provider failed to process it. These are transient and should be retried with backoff. Never retry a 400 or 401 immediately in a loop — you will waste quota and potentially trigger further rate limiting.
| Code | Category | Retry? | Common cause |
|---|---|---|---|
| 400 | Client error | No — fix request first | Invalid model name, empty messages array |
| 401 | Client error | No — fix API key first | Missing or expired API key |
| 403 | Client error | No — upgrade account | Account tier lacks model access |
| 404 | Client error | No — fix model name | Typo in model ID |
| 429 | Rate limit | Yes — with backoff | Too many requests or too many tokens |
| 500 | Server error | Yes — with backoff | Provider-side transient failure |
| 503 | Server error | Yes — with backoff | Provider overloaded or in maintenance |
Deep Dive: 429 Rate Limiting
The 429 error is the most common error in production LLM applications. Every provider enforces multiple independent rate limits simultaneously. OpenAI enforces requests per minute (RPM), requests per day (RPD), tokens per minute (TPM), and tokens per day (TPD). Hitting any single limit returns a 429 even if you are well under the others. The error message body tells you which limit you hit — always log and parse it.
The retry-after header, when present, tells you exactly how long to wait. When it is absent, use exponential backoff with jitter. Jitter — adding a small random delay to each retry — prevents the thundering herd problem where hundreds of clients all retry at the same time and immediately re-trigger the rate limit.
Here is a production-ready retry function in JavaScript:
async function callWithRetry(fn, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn()
} catch (err) {
const status = err?.status ?? err?.response?.status
const isRetryable = status === 429 || status === 500 || status === 503
if (!isRetryable || attempt === maxRetries) throw err
// Respect retry-after header if present
const retryAfter = err?.headers?.['retry-after']
const waitMs = retryAfter
? parseInt(retryAfter) * 1000
: Math.pow(2, attempt) * 1000 + Math.random() * 500
await new Promise(resolve => setTimeout(resolve, waitMs))
}
}
}This pattern: retries 3 times on 429/500/503, respects the retry-after header when present, uses exponential backoff (1s, 2s, 4s) plus up to 500ms of random jitter, and surfaces non-retryable errors immediately without wasting time.
Context Length Exceeded
Context length errors occur when the total token count of your messages array — system prompt, conversation history, and the current message combined — exceeds the model's context window. This is a client error (400 variant) and cannot be retried without modification.
The three strategies to handle context length errors:
- Truncation. Remove the oldest messages from the conversation history until you are under the limit. The simplest approach — drop messages from index 1 (keeping the system prompt at index 0) until the token count is within budget.
- Summarisation. Instead of discarding messages, summarise them with a separate cheap model call. "Summarise the following conversation in 200 words" and replace the history with the summary. Preserves context better than truncation at the cost of one extra API call.
- Switch models. If you genuinely need the full context, move to a larger context window. GPT-4o supports 128K tokens, Claude supports 200K, and Gemini 1.5 Pro supports 1 million tokens. Verify the cost difference — larger context windows are sometimes (not always) more expensive per token.
401 and 403: Key Management Best Practices
A 401 means the API key is missing or unrecognised. A 403 means the key is valid but lacks permission for the resource you requested. Both are client errors that require zero retries.
Common 401 causes in production:
- Environment variable not loaded (process.env.OPENAI_API_KEY is undefined in certain deployment environments).
- Key was rotated in the dashboard but the new key was not deployed to the application.
- The Bearer prefix was accidentally included in the key value itself: setting the key to "Bearer sk-abc123" instead of just "sk-abc123".
- Using an Anthropic key with the OpenAI SDK endpoint or vice versa.
For 403, check your account tier first. GPT-4o access requires a paid OpenAI account with a billing method on file. Some models (GPT-4o with structured outputs, fine-tuned model access) require specific account configurations. The error message body will usually specify exactly which permission is missing.
500 and 503: Retry Patterns with Jitter
Both 500 (Internal Server Error) and 503 (Service Unavailable) are provider-side failures. They are almost always transient — caused by a brief overload, a deployment, or a transient infrastructure issue. The correct response is to retry with exponential backoff.
Check the provider status page before assuming your code is the problem:
- OpenAI: status.openai.com
- Anthropic: status.anthropic.com
- Google (Gemini): cloud.google.com/support/docs/dashboard
- Groq: groqstatus.com
If the status page shows an active incident, do not retry aggressively — you will generate noise and potentially rate-limit yourself further. Wait for the all-clear, then resume normal operation.
Error Format Comparison: OpenAI vs Anthropic vs Gemini
| Provider | Error wrapper key | Type field | Example message key |
|---|---|---|---|
| OpenAI | error | error.type | error.message |
| Anthropic | error (or top-level type: "error") | error.type | error.message |
| Gemini | error | error.status | error.message |
| Groq | error | error.type | error.message |
The OpenAI error format is the most widely adopted — Groq and many other providers use it directly. Anthropic's format is similar but uses a slightly different envelope for streaming errors: a top-level type: "error" object containing the nested error object. When writing a unified error handler for multiple providers, check both err.error.message and err.message to cover all cases.
Anthropic-Specific: overloaded_error
Anthropic returns a named overloaded_error type distinct from the generic 529 or 503. This appears during high-traffic periods on the Anthropic API. It is always transient — treat it identically to a 503 and retry with backoff. Claude Haiku is typically less affected by overload events than Claude Sonnet or Opus during peak periods, making it a viable fallback model for latency-sensitive applications.
How to Verify the Fix Is Working
- Add structured logging to your retry handler — log the status code, attempt number, and wait time on each retry event.
- Confirm 4xx errors are not being retried (they should surface immediately as thrown exceptions).
- Test your backoff by temporarily using an invalid API key — you should see a single 401 thrown with no retries.
- Test your retry logic by temporarily using an exhausted test key that returns 429 — you should see backoff logs and eventual failure after max retries.
Frequently Asked Questions
What does a 429 error mean in LLM APIs?
A 429 error means you have exceeded a rate limit. LLM APIs enforce limits on requests per minute (RPM), requests per day (RPD), tokens per minute (TPM), and tokens per day (TPD). Use exponential backoff with jitter to retry. Parse the error message body to determine which specific limit you hit and adjust your request rate accordingly.
How do I fix context length exceeded errors?
Reduce your prompt length, summarise conversation history, or switch to a model with a larger context window (GPT-4o: 128K, Claude: 200K, Gemini: 1M). For document question-answering, use RAG to retrieve only the relevant chunks rather than sending the full document on every request.
Why am I getting a 401 error even with a valid API key?
Check that the key is not revoked, that you are passing it as Authorization: Bearer YOUR_KEY (not including "Bearer" inside the key value itself), and that the key has the correct permissions for the model you are requesting. Also confirm the environment variable is actually being loaded in your deployment environment — undefined env vars are a common production-only 401 source.
Should I retry on 500 errors?
Yes. 500 and 503 errors from LLM providers are typically transient. Implement 3 retry attempts with exponential backoff (1s, 2s, 4s) and add random jitter to avoid thundering herd problems. If errors persist after 3 retries, surface the failure and check the provider status page before investigating your code further.
Frequently Asked Questions
- What does a 429 error mean in LLM APIs?
- A 429 error means you have exceeded a rate limit. LLM APIs enforce limits on requests per minute (RPM), requests per day (RPD), tokens per minute (TPM), and tokens per day (TPD). Use exponential backoff with jitter to retry.
- How do I fix context length exceeded errors?
- Reduce your prompt length, summarise conversation history, or switch to a model with a larger context window (GPT-4o: 128K, Claude: 200K, Gemini: 1M). For document Q&A, use RAG to retrieve only relevant chunks.
- Why am I getting a 401 error even with a valid API key?
- Check that the key is not revoked, that you are passing it as Authorization: Bearer YOUR_KEY (not in the key value itself), and that the key has the correct permissions for the model you are requesting.
- Should I retry on 500 errors?
- Yes. 500 and 503 errors from LLM providers are typically transient. Implement 3 retry attempts with exponential backoff (1s, 2s, 4s) and add random jitter to avoid thundering herd problems.