How to Calculate A/B Test Statistical Significance — Free (2026)
By Rui Barreira · Last updated: 18 June 2026
Determine whether your A/B test result is statistically significant with the brevio A/B Test Calculator — free, no signup, runs entirely in your browser. Enter visitors and conversions for each variant to get the Z-score, p-value, and confidence level instantly.
Running an A/B test and seeing that variant B converted at 3.6% versus control A at 3.0% does not automatically mean B is better. With a small sample, this difference could easily be due to random chance. Statistical significance testing quantifies the probability that the observed difference is real.
How to Use the Tool
- Enter visitors and conversions for Control (A). These are the baseline numbers before any change.
- Enter visitors and conversions for Variant (B). These are the numbers for the version being tested.
- Click Calculate Significance. The tool returns conversion rates, relative lift, Z-score, p-value, and whether the result is significant at 95% confidence.
How Statistical Significance Is Calculated
The tool uses the two-proportion Z-test. The formula is: Z = (p₂ − p₁) / SE, where the standard error is SE = √(p_pool × (1 − p_pool) × (1/n₁ + 1/n₂)) and the pooled proportion is p_pool = (c₁ + c₂) / (n₁ + n₂).
The p-value is the probability of observing a difference at least as large as this one if the null hypothesis (no real difference) were true. A p-value below 0.05 means there is less than a 5% chance the result is due to random variation — this is the 95% confidence threshold used by most product teams.
The relative lift is (p₂ − p₁) / p₁ × 100%. A 3.0% control rate and a 3.6% variant rate produce a 20% relative lift — meaningful even if the absolute difference is only 0.6 percentage points.
Minimum Sample Size
The tool also shows the minimum recommended sample size per variant, based on a 10% relative minimum detectable effect (MDE) at 80% power and 95% confidence. If your current sample is below this number, your test may not have enough power to detect real differences reliably — you risk false negatives (missing a real improvement).
Frequently Asked Questions
- What does p < 0.05 actually mean?
- If the null hypothesis were true (the variants perform identically), you would expect to see a result this extreme or more extreme less than 5% of the time. It does not mean there is a 95% probability that your variant is better — it means the data is unlikely under the assumption of no difference.
- Can I stop my test early if it reaches significance?
- This is called "peeking" and it inflates the false positive rate significantly. If you plan to check results multiple times, use a sequential testing method or apply a Bonferroni correction. The safest approach is to decide your sample size in advance and only read results once.
- What is statistical power?
- Power is the probability that your test will detect a real difference if one exists. At 80% power, you will miss a real improvement 20% of the time (false negative). Higher power requires larger sample sizes but reduces the risk of inconclusive tests.
Related tools: Sample Size Calculator · Conversion Rate Calculator
Frequently Asked Questions
- What does p < 0.05 actually mean?
- If the null hypothesis were true (the variants perform identically), you would expect to see a result this extreme or more extreme less than 5% of the time. It does not mean there is a 95% probability that your variant is better.
- Can I stop my test early if it reaches significance?
- This is called "peeking" and it inflates the false positive rate. If you plan to check results multiple times, use a sequential testing method or apply a Bonferroni correction. The safest approach is to decide your sample size in advance and only read results once.
- What is statistical power?
- Power is the probability that your test will detect a real difference if one exists. At 80% power, you will miss a real improvement 20% of the time (false negative). Higher power requires larger sample sizes.