CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

Lecture #8: Oh, and one more thing

I forgot to mention something about the two choices paradigm: recall from HW #2 that if you throw ${m}$ balls into ${n}$ bins randomly and ${m \gg n}$, the maximum load is about ${\frac{m}{n} + O(\sqrt{\frac{m}{n} \log n})}$. In fact, you can show that this variance term is about right—with high probability, the highest loaded bin will indeed be about ${O(\sqrt{\frac{m}{n} \log n})}$ above the average.

On the other hand, if you throw ${m}$ balls into ${n}$ bins using two-choices, then you can show that the highest load is about ${\frac{m}{n} + \log \log n + O(1)}$ with high probability. So not only do we gain in the low-load case (when ${m \approx n}$), we get more control over the variance in the high load case (when ${m \gg n}$): the additive gap between the average and the maximum loads is now independent of the number of balls! The proofs to show this require new ideas: check out the paper Balanced Allocations: the Heavily Loaded Case by Berenbrink, Czumaj, Steger and Vöcking for more details.

Here is a recent paper of Peres, Talwar and Weider that gives an analysis of the ${(1 + \epsilon)}$-choice process (where you invoke the two choices paradigm only on ${\epsilon}$ fraction of the balls). It also refers to more recent work in the area (weighted balls, weighted bins, etc), in case you’re interested.