CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

Notes on Lecture #5

A couple of things I remembered about Chernoff bounds when sitting in Avrim’s lecture, which thought I’d post here.

The first thing is a (slightly different) bound that you may find easier to remember. To begin, let us state the multiplicative bounds mentioned in the handout, altered ever so slightly:

Theorem 1

Let {X_1, X_2, \ldots, X_n} be independent {\{0,1\}} variables, and let {X = \sum_i X_i}. Let {\mu = E[X] = \sum_i E[X_i]}. Then for any {\beta \in [0,1]},

\displaystyle  \Pr[ X \leq (1 - \beta) \mu ] \leq \exp( -\frac{\beta^2}{2} \mu)  \ \ \ \ \ (1)


\displaystyle  \Pr[ X \geq (1 + \beta) \mu ] \leq \exp( -\frac{\beta^2}{3} \mu).  \ \ \ \ \ (2)

Moreover, for all {\beta \geq 0},

\displaystyle  \Pr[ X \geq (1 + \beta) \mu ] \leq (e^{\beta}/(1+\beta)^{1 + \beta})^\mu.  \ \ \ \ \ (3)

Now, here’s the version of the general multiplicative bounds I find easiest to remember:

Theorem 2
Let {X_1, X_2, \ldots, X_n} be independent variables taking values in the range {[0,1]}, and let {X = \sum_i X_i}. Let {\mu = E[X] = \sum_i E[X_i]}. Then for any {\beta \in [0,1]},

\displaystyle  \Pr[ X \leq (1 - \beta) \mu ] \leq \exp( -\frac{\beta^2}{2} \mu). \ \ \ \ \ (4)

Moreover, for any {\beta \geq 0}, we have

\displaystyle  \Pr[ X \geq (1 + \beta) \mu ] \leq \exp( -\frac{\beta^2}{2+\beta} \mu). \ \ \ \ \ (5)

Note that we’ve relaxed the assumption that {X_i}‘s are {\{0,1\}}, and are allowing ourselves to lie in {[0,1]}. (This is often super-useful, you may want to go over the proof of Theorem 1 in the book or the handout and think about how you may prove it.)

Moreover, (4) is exactly the same as (1). And if we constrain {\beta \in [0,1]}, the exponent on the right of (5) is at most {-\beta^2 \mu /3}, which is the same as in (2), so (5) is a better bound than (2) for {\beta}‘s in this range. So we’re doing at least as well as the first two tail bounds in Theorem 1.

Cool! So does this subsume (3) as well? Sadly, no. We lose something in the translation. Here’s an example. Let the {X_i}‘s be i.i.d., taking on value {1} with probability {1/n} and {0} otherwise, so that {\mu = 1}. Now, what is the value of {\beta} which gives us a tail probability of {1/n^{10}}? If we set {\beta = c\log n/\log \log n} for suitably large {c} then (3) gives us {1/n^{10}}, but for (5) to fall below {1/n^{10}} we need {\beta = c \log n} for some constant {c}.

However, often the bound from (5) is in the “right ballpark”, and I find it easy to remember. Chances are, if I use a Chernoff bound in lecture, it’ll be the one in Theorem 2.

Secondly, some jargon: these bounds are often called concentration inequalities (since they show that “most of the probability mass is concentrated around the mean”) and tail bounds (they show that the “tails of the distribution are small”) or large deviation bounds (since they show that “large deviations from the mean are unlikely”). There are tons of other tail bounds, some relaxing the requirements of independence, others relaxing the boundedness requirements, others looking at functions other than the sum of the random variables—you’ll see more in the homeworks and future lectures.


Comments are closed.

%d bloggers like this: