Developer Tools

LLM Memory Calculator

Calculate how much VRAM you need to run any LLM. Covers fp32, fp16, int8, int4 quantization and KV cache.

Model preset

Parameters (billions)

Precision

Context window (tokens)

Batch size

Number of layers

Hidden dimension

Model weights

140.00 GB

KV cache

5.37 GB

Activations

28.00 GB

Recommended VRAM

209 GB

Total GPU memory needed: 173.37 GB + 20% headroom = 209 GB recommended

GPU compatibility

GPU	VRAM	Fits?
RTX 3060	12 GB	❌
RTX 3090	24 GB	❌
RTX 4090	24 GB	❌
A100 40GB	40 GB	❌
A100 80GB	80 GB	❌
H100 80GB	80 GB	❌
H200 141GB	141 GB	❌

Memory estimates are approximations. KV cache calculation uses fp16 byte width for non-fp32 precisions. Activation memory is estimated at 20% of model weight memory. All calculations run in your browser — no API calls.

guide

How Much VRAM Do You Need to Run an LLM Locally?

Calculate GPU memory requirements for any LLM. Learn how model size, quantization (fp16, int8, int4), and KV cache affect VRAM needs.

→

More free toolsSee all 162 →

Merge PDFs Compress Image JSON Formatter Password Generator VAT Calculator QR Code Generator