Below is a short script to demonstrate the 'process of p-hacking'.

From the Data Science from Scratch book.

import random
from typing import List


First we define a usual experiment consisting of 1000 binomial trials with 0.5 probability.

def run_experiment(trials) -> List[bool]:
return [random.random() < 0.5 for _ in range(trials)]

experiment = run_experiment(1000)

print("Proportion of heads:", sum(experiment) / len(experiment))
print("First 10 elements:", experiment[:10])

Proportion of heads: 0.51
First 10 elements: [True, True, True, False, False, False, False, True, True, True]


Then we examine whether the outcome an experiment is beyond the 95% confidence levels around p = 0.5, that is, the hypothesis of having a fair coin.

def reject_fairness(experiment: List[bool]) -> bool:
return num_heads < 469 or num_heads > 531

reject_fairness(experiment)

False

We run 1000 independent experiments with the exact same parameters.

random.seed(42)
experiments = [run_experiment(1000) for _ in range(1000)]


Now we can simply pick those experiments which fall outside the confidence level.

number_of_unfair = sum([reject_fairness(experiment) for experiment in experiments])

print("Number of experiments 'showing' that the coin if unfair:", number_of_unfair)
print("\nProbabilities:")
print("\t".join([str(sum(experiment) / len(experiment)) for experiment in experiments if reject_fairness(experiment)]))

Number of experiments 'showing' that the coin if unfair: 42

Probabilities:
0.532	0.539	0.535	0.461	0.466	0.539	0.467	0.468	0.54	0.458	0.468	0.463	0.467	0.46	0.461	0.463	0.541	0.464	0.538	0.542	0.461	0.465	0.468	0.538	0.466	0.46	0.468	0.534	0.535	0.468	0.537	0.468	0.535	0.538	0.451	0.537	0.463	0.466	0.46	0.536	0.466	0.467