Experiments

[1]:
from prayas import *

Experiments provide a simple way to continuously monitor the experiment performance and to decide when to stop based on the maximum potential loss of the variants. We first set up the experiment by defining the model, baseline, and measures of interest:

[2]:
e = Experiment("moonshot1")

e.setup = OneOptionModel(["apollo", "gemini"], baseline="apollo")
e.setup.add_measure("revenue", [90, 80])
e.setup.primary_measure = "revenue"

The specification of the experiment is:

[3]:
print(e)
Experiment with a One option model
Variants              : apollo, gemini
Baseline              : apollo
Measures              : conversion, revenue
Primary measure       : revenue
Maximum loss threshold: 5

Then we add the first daily data we collected:

[4]:
e.add_data("20190730", [ 0,  0], [ 186,  180])
e.add_data("20190731", [ 7,  1], [ 714,  652])

We can compute the daily scoring against the baseline:

[5]:
e.monitor_score_baseline()
[5]:
Date Variant Measure ProbabilityToBeBest ProbabilityToBeatBaseline UpliftFromBaseline PotentialLossFromBaseline MaxUplift MaxPotentialLoss
0 20190730 apollo revenue 0.52400 0.0000 0.000000 0.000000 9.784740 47.546597
1 20190730 gemini conversion 0.50270 0.5119 3.599060 49.631761 2.128436 48.418039
2 20190730 apollo conversion 0.49730 0.0000 0.000000 0.000000 -2.084078 50.932533
3 20190730 gemini revenue 0.47600 0.4783 -7.189932 52.351470 -8.912659 52.510509
4 20190731 apollo revenue 0.98215 0.0000 0.000000 0.000000 314.532699 0.888369
5 20190731 apollo conversion 0.97250 0.0000 0.000000 0.000000 265.239365 1.642050
6 20190731 gemini conversion 0.02750 0.0257 -72.822973 73.236028 -72.620695 72.950166
7 20190731 gemini revenue 0.01785 0.0183 -75.656182 75.981442 -75.876451 75.889115

To easily compare the performance of the variants over time, we suggest to look at the maximum potential loss per variant:

[6]:
e.monitor_plot();
../_images/notebooks_05-experiments_11_0.png

The plot shows the maximum potential loss over time; the dashed line is the maximum loss threshold set for the experiment (default: 0.05).

As the experiment continues, we add more daily data:

[7]:
e.add_data("20190730", [ 0,  0], [ 186,  180])
e.add_data("20190731", [ 7,  1], [ 714,  652])
e.add_data("20190801", [13,  5], [1233, 1141])
e.add_data("20190802", [15,  8], [1744, 1681])
e.add_data("20190803", [21, 13], [2304, 2146])
e.add_data("20190804", [26, 16], [2835, 2719])
e.add_data("20190805", [29, 20], [3275, 3260])
e.add_data("20190806", [36, 23], [3741, 3805])
e.add_data("20190807", [43, 26], [4343, 4354])
e.add_data("20190808", [51, 32], [4863, 4921])

And we continue to investigate the result:

[8]:
e.monitor_plot();
../_images/notebooks_05-experiments_15_0.png

In this example, the loss of the ‘apollo’ variant is below the threshold after a few days and continues to stay below. We decide to stop an experiment with a result if a variant is below the threshold for a consecutive number of days:

[9]:
e.monitor_decision(days=5)
[9]:
Timestamp Loss apollo Loss gemini Runs apollo Runs gemini Max loss apollo Max loss gemini Stop apollo Stop gemini
0 20190730 48.266853 51.491613 NaN NaN NaN NaN False False
1 20190731 1.069647 75.996100 NaN NaN NaN NaN False False
2 20190801 0.717000 59.209678 NaN NaN NaN NaN False False
3 20190802 1.208869 48.692807 NaN NaN NaN NaN False False
4 20190803 1.305198 40.086982 4.0 0.0 48.266853 75.996100 False False
5 20190804 0.598810 42.075643 5.0 0.0 1.305198 75.996100 True False
6 20190805 0.677079 37.922837 5.0 0.0 1.305198 59.209678 True False
7 20190806 0.154403 43.471300 5.0 0.0 1.305198 48.692807 True False
8 20190807 0.052977 45.669064 5.0 0.0 1.305198 45.669064 True False
9 20190808 0.037551 44.192609 5.0 0.0 0.677079 45.669064 True False

Following this rule, we could have stopped the experiment already on 2019-08-04. We can access the last model of the experiment and do the same analysis as already shown in the other example notebooks:

[10]:
e.result.score_baseline()
[10]:
Variant Measure ProbabilityToBeBest ProbabilityToBeatBaseline UpliftFromBaseline PotentialLossFromBaseline MaxUplift MaxPotentialLoss
0 apollo revenue 0.99670 0.00000 0.000000 0.000000 79.254505 0.037169
1 apollo conversion 0.98365 0.00000 0.000000 0.000000 59.268571 0.186129
2 gemini conversion 0.01635 0.01495 -37.287659 37.448505 -37.212973 37.387003
3 gemini revenue 0.00330 0.00380 -44.270473 44.282816 -44.213397 44.089087
[11]:
e.result.plot();
../_images/notebooks_05-experiments_20_0.png
../_images/notebooks_05-experiments_20_1.png
[12]:
e.result.decision()
[12]:
Variant Measure ProbabilityToBeBest ProbabilityToBeatBaseline UpliftFromBaseline PotentialLossFromBaseline MaxUplift MaxPotentialLoss
0 apollo revenue 0.99595 0.0 0.0 0.0 79.510021 0.03168

Based on the primary measure defined for this experiment, we would go with the variant ‘apollo’.