Bandit Algorithm for Testing

A/B testing arrangement
  1. To deal with this we need to handle the Exploit-Explore dilemma. Exploit means finding the best solution and explore means the extent to which we should use that option.
  2. A Bandit algorithm setup is shown below
  1. There are various variants of Bandit algorithms like Epsilon Greedy and Upper Confidence Bound.
    In Epsilon Greedy we consider one option to be better and divert more traffic towards that, but at the same time for both variant metrics are being captured. These are being fed to the system.
    Based on that epsilon value varies and thus the amount of traffic to different variants may change.
  2. Benefits of Bandit algorithm over A/B testing is
  3. Earn while you learn — Below is one quote about Bandit selection

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store