Randomized Algorithms, Carnegie Mellon: Spring 2011
Lecture #13: Learning Theory II
Today we talked about online learning. We discussed the Weighted Majority and Randomized Weighted Majority algorithms for the problem of “combining expert advice”, showing for instance that the RWM algorithm satisfies the bound , where is the number of “experts” and is the number of mistakes of the best expert in hindsight. Also, this can be used when the experts are not predictors but rather just different options (like whether to play Rock, Paper, or Scissors in the Rock-Paper-Scissors game). In this case, “# mistakes” becomes “total cost” and all costs are scaled to be in the range [0,1] each round.
We then discussed the “multiarmed bandit” problem, which is like the experts problem except you only find out the payoff for the expert you chose and not for those you didn’t choose. For motivation, we discussed this in the context of the problem of selling lemonade to an online series of buyers, where the “experts” correspond to different possible prices you might choose for selling your lemonade. We then went through an analysis of the EXP3 algorithm (though we did a simpler version of the analysis that gets a dependence on in the regret bound rather than the optimal ).
See the lecture notes (2nd half)