Sometimes we would like to find the $z$ values belonging to a particular probability. For this we use an inverse cumulative distribution function.

Because the cumulative distribution function is strictly increasing and continuous, we can find the $z$ by narrowing down with binary bisecting the distance between high and low $z$ values.

The code is based upon the respective example from Data Science from Scratch.

Libraries, helper functions

import math as m
import numpy as np
def calc_normal_cdf(x: float, mu: float = 0, sigma: float = 1) -> float:
    return (1 + m.erf((x - mu) / m.sqrt(2) / sigma)) / 2
def to_string(number: float, column_width: int = 20) -> str:
    # return str(number).ljust(column_width)
    return f"{str(number): <{column_width}}"

Inverse normal CDF

The inverse normal cumulative distribution function.

Takes a probability and optionaly a mean and standard deviation of the distribution, and a tolerance value.

  1. If the distribution is not standard normal, it calculates $z$ for the standard normal one and then rescales the result

  2. Take a very low and a very high $z$ value

  3. While the distance between the high and low $z$ values are greater than our tolerance

    a. Take their midpoint
    b. Calculate the probability belonging to that midpoint
    c. Based on the position of the probability belonging of the midpoint in respect to the probabilty for which we try to find out the $z$ value, assign the midpoint to the low or high $z$ value

def calc_inverse_normal_cdf(p: float, mu:float = 0, sigma: float = 1, tolerance: float = 1E-5, show_steps=False) -> float:

    # In case it is not a standard normal distribution, calculate the standard normal first and then rescale
    if mu != 0 or sigma != 1:
        return mu + sigma * calc_inverse_normal_cdf(p, tolerance=tolerance)

    low_z = -10
    hi_z = 10

    while hi_z - low_z > tolerance:
        mid_z = (low_z + hi_z) / 2
        mid_p = calc_normal_cdf(mid_z)

        if mid_p < p:
            low_z = mid_z
        else:
            hi_z = mid_z

        if show_steps: print("\t".join(map(to_string, [low_z, mid_z, hi_z])))

    return mid_z

We can examine the steps as the $z$ closes down to its final value.

calc_inverse_normal_cdf(p=.3, show_steps=True)
-10                 	0.0                 	0.0                 
-5.0                	-5.0                	0.0                 
-2.5                	-2.5                	0.0                 
-1.25               	-1.25               	0.0                 
-0.625              	-0.625              	0.0                 
-0.625              	-0.3125             	-0.3125             
-0.625              	-0.46875            	-0.46875            
-0.546875           	-0.546875           	-0.46875            
-0.546875           	-0.5078125          	-0.5078125          
-0.52734375         	-0.52734375         	-0.5078125          
-0.52734375         	-0.517578125        	-0.517578125        
-0.52734375         	-0.5224609375       	-0.5224609375       
-0.52490234375      	-0.52490234375      	-0.5224609375       
-0.52490234375      	-0.523681640625     	-0.523681640625     
-0.52490234375      	-0.5242919921875    	-0.5242919921875    
-0.52459716796875   	-0.52459716796875   	-0.5242919921875    
-0.524444580078125  	-0.524444580078125  	-0.5242919921875    
-0.524444580078125  	-0.5243682861328125 	-0.5243682861328125 
-0.5244064331054688 	-0.5244064331054688 	-0.5243682861328125 
-0.5244064331054688 	-0.5243873596191406 	-0.5243873596191406 
-0.5244064331054688 	-0.5243968963623047 	-0.5243968963623047 
-0.5243968963623047

Finally, we get a number of reference values for some different probabilities

for i in np.arange(0, 1.1, .1):
    print(f"{i:.1f}: {calc_inverse_normal_cdf(i) : >20}")
0.0:   -9.999990463256836
0.1:  -1.2815570831298828
0.2:  -0.8416271209716797
0.3:  -0.5243968963623047
0.4:  -0.2533435821533203
0.5: -9.5367431640625e-06
0.6:   0.2533435821533203
0.7:   0.5243968963623047
0.8:   0.8416271209716797
0.9:   1.2815570831298828
1.0:    8.244009017944336