Bandit ProblemsΒΆ

Bandit problems are a simple RL scenario that is useful to think about before exploring more complicated scenarios. A multi armed bandit has the choice of N actions it choose at each time step. Each action may give a deterministic reward or one from an underlying distribution.

See the end of this notebookthis notebook where I show the average returns for a 10 armed bandit problem using the greedy vs epsilon greedy approach.