Introduction to A/B testing with python.

From the Data Science from Scratch book.

Libraries and helper functions

import math as m
from typing import Tuple
def normal_probability_below(x: float, mu: float = 0, sigma: float = 1) -> float:
    return (1 + m.erf((x - mu) / m.sqrt(2) / sigma)) / 2

def normal_probability_above(lo: float, mu: float = 0, sigma: float = 1) -> float:
    return 1 - normal_probability_below(lo, mu, sigma)
def two_sided_p_value(x: float, mu: float = 0, sigma: float = 1) -> float:
    """Return the probability of getting at least as extreme value as `x`, given
    that our values are from a normal distribution with `mu` mean and `sigma` std.
    """

    # If x is greater than the mean return everything above x
    if x >= mu:
        return 2 * normal_probability_above(x, mu, sigma)
    # If x is less than the mean than return everything below x
    else:
        return 2 * normal_probability_below(x, mu, sigma)

A/B test

def estimate_parameters(N: int, true: int) -> Tuple[float, float]:
    p = true / N
    sigma = m.sqrt(p * (1 - p) / N)
    return p, sigma

$H_0$: $p_a$ and $p_b$ are the same

With simplification this means that $p_a - p_b = 0$

def a_b_test_statistic(N_A: int, A: int, N_B: int, B:int) -> float:
    p_A, sigma_A = estimate_parameters(N_A, A)
    p_B, simga_B = estimate_parameters(N_B, B)

    return (p_B - p_A) / m.sqrt(sigma_A ** 2 + simga_B ** 2)

Example 1

z = a_b_test_statistic(1000, 200, 1000, 180)
z, two_sided_p_value(z)
(-1.1403464899034472, 0.2541419765422359)

That is, the probability of at least such a big difference occurring assuming that the two probabilities are the same is ~0.25.

Example 2

Let's decrease the occurance of $b$ even more.

z = a_b_test_statistic(1000, 200, 1000, 150)
z, two_sided_p_value(z)
(-2.948839123097944, 0.003189699706216853)

That is the probability of at least this large difference to occur if the probabilities are the same is 0.03.