PromptPaste PromptPaste ← AI Prompts for Product Managers

A/B test hypothesis

Last tested 2026-04-26 Part of AI Prompts for Product Managers
I'm planning an A/B test. Help me write a rigorous hypothesis. Use this exact structure:

1. Hypothesis: stated as a directional prediction with a magnitude. Format: 'Doing X will change metric Y by at least Z%, because [mechanism].'

2. Experiment design: audience, split, duration (or sample-size threshold), exclusions.

3. Success criteria:
- Primary: the metric and threshold.
- Secondary: a sanity-check metric.
- Guardrails: 1-2 metrics that must NOT degrade (downstream funnel, refund rate, support load).

4. Expected lift: a range with reasoning. Why this number, not a different one?

5. What would falsify: the specific result that would tell us we were wrong - not just 'no effect' but the direction and threshold.

The test I'm planning:
{{test_description}}

Why it works

A/B test hypotheses are the prompt where most PMs accidentally do bad statistics. The mistake is writing 'we'll test a new button color' and skipping the falsifiability question. This prompt forces five things: a hypothesis stated as a directional prediction (not just 'we'll see if'), the experiment design including audience and duration, the success criteria with thresholds (not vague 'better'), the expected lift with reasoning, and what would falsify the hypothesis. The falsifiability section is what separates an experiment from a guess. PMs who run this prompt before kicking off any A/B find their tests sharper, their results easier to act on, and their disagreements with growth/data teams shorter. Tested cleanest on Claude Opus 4.7.

Example output

Hypothesis: Replacing the 'Add to cart' button label with 'Buy now' will increase the cart-add rate by at least 8% on the product detail page, because the 'Buy now' framing reduces the perceived commitment cost (no separate checkout step implied).

Experiment design:
- Audience: 50/50 split of US visitors to product detail pages.
- Duration: 14 days minimum, or until 4,000 visitors per arm (whichever comes first).
- Exclusions: returning logged-in customers (already past the perceived-commitment-cost barrier).

Success criteria:
- Primary: cart-add rate increase >= 8%, p < 0.05.
- Secondary: no decrease in checkout-completion rate (sanity check).
- Guardrail: refund rate within 1% of baseline (lower-friction click could mean lower-intent buyers).

Expected lift: 8-12%. Reasoning: similar label changes in published e-commerce A/B test corpora show 5-15% lifts; our audience is closer to the higher end because our 'Add to cart' copy is unusually formal.

What would falsify: cart-add rate change < 3% (no detectable effect), OR checkout-completion drops 5%+ (we shifted clicks but lost intent), OR refund rate climbs 2%+ (we're winning bad customers).

Common mistakes

Don't write a hypothesis that says 'we'll see if X has any effect.' That's not a hypothesis; it's a survey. Force a directional prediction with a magnitude. Also: do not skip the guardrail metrics. Many A/B tests 'win' on the primary metric while breaking something downstream that the team only notices three weeks later. The guardrail is the cheap insurance. Third mistake: 'expected lift' without a reasoning trail. If you can't say why you expect X%, you're guessing - and you'll have no way to update your priors when the test result comes back.

More from AI Prompts for Product Managers

Curated by Ivan Terechin

Copied!

Tired of copying & pasting prompts from websites?

Create your own AI prompt library with PromptPaste - this pack and every prompt you save, one shortcut away.