Experiments¶
[1]:
from prayas import *
Experiments provide a simple way to continuously monitor the experiment performance and to decide when to stop based on the maximum potential loss of the variants. We first set up the experiment by defining the model, baseline, and measures of interest:
[2]:
e = Experiment("moonshot1")
e.setup = OneOptionModel(["apollo", "gemini"], baseline="apollo")
e.setup.add_measure("revenue", [90, 80])
e.setup.primary_measure = "revenue"
The specification of the experiment is:
[3]:
print(e)
Experiment with a One option model
Variants : apollo, gemini
Baseline : apollo
Measures : conversion, revenue
Primary measure : revenue
Maximum loss threshold: 5
Then we add the first daily data we collected:
[4]:
e.add_data("20190730", [ 0, 0], [ 186, 180])
e.add_data("20190731", [ 7, 1], [ 714, 652])
We can compute the daily scoring against the baseline:
[5]:
e.monitor_score_baseline()
[5]:
Date | Variant | Measure | ProbabilityToBeBest | ProbabilityToBeatBaseline | UpliftFromBaseline | PotentialLossFromBaseline | MaxUplift | MaxPotentialLoss | |
---|---|---|---|---|---|---|---|---|---|
0 | 20190730 | apollo | revenue | 0.52400 | 0.0000 | 0.000000 | 0.000000 | 9.784740 | 47.546597 |
1 | 20190730 | gemini | conversion | 0.50270 | 0.5119 | 3.599060 | 49.631761 | 2.128436 | 48.418039 |
2 | 20190730 | apollo | conversion | 0.49730 | 0.0000 | 0.000000 | 0.000000 | -2.084078 | 50.932533 |
3 | 20190730 | gemini | revenue | 0.47600 | 0.4783 | -7.189932 | 52.351470 | -8.912659 | 52.510509 |
4 | 20190731 | apollo | revenue | 0.98215 | 0.0000 | 0.000000 | 0.000000 | 314.532699 | 0.888369 |
5 | 20190731 | apollo | conversion | 0.97250 | 0.0000 | 0.000000 | 0.000000 | 265.239365 | 1.642050 |
6 | 20190731 | gemini | conversion | 0.02750 | 0.0257 | -72.822973 | 73.236028 | -72.620695 | 72.950166 |
7 | 20190731 | gemini | revenue | 0.01785 | 0.0183 | -75.656182 | 75.981442 | -75.876451 | 75.889115 |
To easily compare the performance of the variants over time, we suggest to look at the maximum potential loss per variant:
[6]:
e.monitor_plot();
The plot shows the maximum potential loss over time; the dashed line is the maximum loss threshold set for the experiment (default: 0.05).
As the experiment continues, we add more daily data:
[7]:
e.add_data("20190730", [ 0, 0], [ 186, 180])
e.add_data("20190731", [ 7, 1], [ 714, 652])
e.add_data("20190801", [13, 5], [1233, 1141])
e.add_data("20190802", [15, 8], [1744, 1681])
e.add_data("20190803", [21, 13], [2304, 2146])
e.add_data("20190804", [26, 16], [2835, 2719])
e.add_data("20190805", [29, 20], [3275, 3260])
e.add_data("20190806", [36, 23], [3741, 3805])
e.add_data("20190807", [43, 26], [4343, 4354])
e.add_data("20190808", [51, 32], [4863, 4921])
And we continue to investigate the result:
[8]:
e.monitor_plot();
In this example, the loss of the ‘apollo’ variant is below the threshold after a few days and continues to stay below. We decide to stop an experiment with a result if a variant is below the threshold for a consecutive number of days:
[9]:
e.monitor_decision(days=5)
[9]:
Timestamp | Loss apollo | Loss gemini | Runs apollo | Runs gemini | Max loss apollo | Max loss gemini | Stop apollo | Stop gemini | |
---|---|---|---|---|---|---|---|---|---|
0 | 20190730 | 48.266853 | 51.491613 | NaN | NaN | NaN | NaN | False | False |
1 | 20190731 | 1.069647 | 75.996100 | NaN | NaN | NaN | NaN | False | False |
2 | 20190801 | 0.717000 | 59.209678 | NaN | NaN | NaN | NaN | False | False |
3 | 20190802 | 1.208869 | 48.692807 | NaN | NaN | NaN | NaN | False | False |
4 | 20190803 | 1.305198 | 40.086982 | 4.0 | 0.0 | 48.266853 | 75.996100 | False | False |
5 | 20190804 | 0.598810 | 42.075643 | 5.0 | 0.0 | 1.305198 | 75.996100 | True | False |
6 | 20190805 | 0.677079 | 37.922837 | 5.0 | 0.0 | 1.305198 | 59.209678 | True | False |
7 | 20190806 | 0.154403 | 43.471300 | 5.0 | 0.0 | 1.305198 | 48.692807 | True | False |
8 | 20190807 | 0.052977 | 45.669064 | 5.0 | 0.0 | 1.305198 | 45.669064 | True | False |
9 | 20190808 | 0.037551 | 44.192609 | 5.0 | 0.0 | 0.677079 | 45.669064 | True | False |
Following this rule, we could have stopped the experiment already on 2019-08-04
. We can access the last model of the experiment and do the same analysis as already shown in the other example notebooks:
[10]:
e.result.score_baseline()
[10]:
Variant | Measure | ProbabilityToBeBest | ProbabilityToBeatBaseline | UpliftFromBaseline | PotentialLossFromBaseline | MaxUplift | MaxPotentialLoss | |
---|---|---|---|---|---|---|---|---|
0 | apollo | revenue | 0.99670 | 0.00000 | 0.000000 | 0.000000 | 79.254505 | 0.037169 |
1 | apollo | conversion | 0.98365 | 0.00000 | 0.000000 | 0.000000 | 59.268571 | 0.186129 |
2 | gemini | conversion | 0.01635 | 0.01495 | -37.287659 | 37.448505 | -37.212973 | 37.387003 |
3 | gemini | revenue | 0.00330 | 0.00380 | -44.270473 | 44.282816 | -44.213397 | 44.089087 |
[11]:
e.result.plot();
[12]:
e.result.decision()
[12]:
Variant | Measure | ProbabilityToBeBest | ProbabilityToBeatBaseline | UpliftFromBaseline | PotentialLossFromBaseline | MaxUplift | MaxPotentialLoss | |
---|---|---|---|---|---|---|---|---|
0 | apollo | revenue | 0.99595 | 0.0 | 0.0 | 0.0 | 79.510021 | 0.03168 |
Based on the primary measure defined for this experiment, we would go with the variant ‘apollo’.