CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

Category Archives: Avrim

Lecture #18: Oblivious routing on a hypercube

In celebration of Les Valiant’s winning of the Turing award, today we discussed a classic result of his on the problem of oblivious routing on the hypercube. (Valiant, “A scheme for fast parallel communication”, SIAM J. Computing 1982, and Valiant and Brebner, “Universal schemes for parallel communication”, STOC 1981). The high-level idea is that rather than routing immediately to your final destination, go to a random intermediate point first. The analysis is really beautiful — you define the right quantities and it just magically works out nicely. See today’s class notes. We also discussed the Butterfly and Benes networks as well. (See the excellent notes of Satish Rao for more on them).

At the end, we also briefly discussed Martingales: their definition, Azuma’s inequality, and McDiarmid’s inequality (which doesn’t talk about Martingales directly but is very convenient and can be proven using Azuma). The discussion at the end was extremely rushed but the key point was: suppose you have a complicated random variable \phi you care about like “the running time of my randomized algorithm” where the random variable is a function of a series of random choices z_1, z_2, .... Then the sequence X_0, X_1, \ldots where X_i = E[\phi | z_1, \ldots, z_i] is a Martingale. E.g., the expected running time of quicksort given that the first i-1 pivots are z_1, \ldots z_{i-1} is the expected value, over the possible choices of z_i of the expected running time of quicksort given that the first i pivots are z_1, \ldots, z_i.

Lecture #13: Learning Theory II

Today we talked about online learning.  We discussed the Weighted Majority and Randomized Weighted Majority algorithms for the problem of “combining expert advice”, showing for instance that the RWM algorithm satisfies the bound E[\# mistakes] \leq (1+\epsilon)OPT + \frac{1}{\epsilon}\log n, where n is the number of “experts” and  OPT is the number of mistakes of the best expert in hindsight.  Also, this can be used when the experts are not predictors but rather just different options (like whether to play Rock, Paper, or Scissors in the Rock-Paper-Scissors game).  In this case, “# mistakes” becomes “total cost” and all costs are scaled to be in the range [0,1] each round.

We then discussed the “multiarmed bandit” problem, which is like the experts problem except you only find out the payoff for the expert you chose and not for those you didn’t choose.  For motivation, we discussed this in the context of the problem of selling lemonade to an online series of buyers, where the “experts” correspond to different possible prices you might choose for selling your lemonade.  We then went through an analysis of the EXP3 algorithm (though we did a simpler version of the analysis that gets a T^{2/3} dependence on T in the regret bound rather than the optimal T^{1/2}).

See the lecture notes (2nd half)



Lecture #12: Learning Theory 1

Today we talked about the problem of learning a target function from examples, where examples are drawn from some distribution D, and the goal is to use a labeled sample S (a set of examples drawn from D and labeled by the target f) to produce a function h such that Pr_{x \sim D}[h(x)\neq f(x)] is low. We gave a simple efficient algorithm for learning decision-lists in this setting, a basic “Occam’s razor” bound, and then a more interesting bound using the notion of shatter coefficients and a “ghost sample” argument. See 1st half of these lecture notes.

A few additional comments:

  • One way to interpret the basic Occam bound is that in principle, anything you can represent in a polynomial number of bits you can learn from a polynomial number of examples (if running time is not a concern). Also “data compression implies learning”: if you can take a set of m examples and find a prediction rule that is correct on the sample and requires < m/10 bits to write down, then you can be confident it will have low error on future points.
  • On the other hand, we would really like to learn from as few examples as possible, which is the reason for wanting bounds based on more powerful notions of the “underlying complexity” of the target function, such as shatter coefficients. Other very interesting bounds are based on a notion called “Rademacher complexity” which is even tighter.
  • For more info, see notes for 15-859(B) machine learning theory

Lecture #11: Online algorithms

Today we discussed randomized online algorithms, and in particular, algorithms for the ski-rental (elevator-or-stairs) and paging problems. See lecture notes as well as Chapter 13 of the MR book. Also, Claire Mathieu has a very nice set of notes on the randomized ski-rental problem.