CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

Lecture #8: Oh, and one more thing

I forgot to mention something about the two choices paradigm: recall from HW #2 that if you throw {m} balls into {n} bins randomly and {m \gg n}, the maximum load is about {\frac{m}{n} + O(\sqrt{\frac{m}{n} \log n})}. In fact, you can show that this variance term is about right—with high probability, the highest loaded bin will indeed be about {O(\sqrt{\frac{m}{n} \log n})} above the average.

On the other hand, if you throw {m} balls into {n} bins using two-choices, then you can show that the highest load is about {\frac{m}{n} + \log \log n + O(1)} with high probability. So not only do we gain in the low-load case (when {m \approx n}), we get more control over the variance in the high load case (when {m \gg n}): the additive gap between the average and the maximum loads is now independent of the number of balls! The proofs to show this require new ideas: check out the paper Balanced Allocations: the Heavily Loaded Case by Berenbrink, Czumaj, Steger and Vöcking for more details.

Here is a recent paper of Peres, Talwar and Weider that gives an analysis of the {(1 + \epsilon)}-choice process (where you invoke the two choices paradigm only on {\epsilon} fraction of the balls). It also refers to more recent work in the area (weighted balls, weighted bins, etc), in case you’re interested.

Comments are closed.

%d bloggers like this: