CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

Belated Notes on Lecture #2

(Sorry for the delayed notes: it made sense to wait until we’d seen Chernoff bounds before I posted these. — AG)

We saw in Lecture #2 that the expected number of comparisons for the randomized version of quicksort was at most ${2nH_n}$; the proof was via defining just the right random variables ${X_{ij}}$ (“do we compare ${i}$ and ${j}$ at any step”), we could compute that ${E[X_{ij}] = 2/(j-i+1)}$, and hence the expected number of comparisons is ${E[\sum_{i < j} X_{ij}] \leq 2nH_n}$.

However, often it helps to get a quick ballpark estimate before going for the best analysis: here’s a quick-and-dirty way to see that the number of comparisons for Quicksort is ${c n \log n}$ with probability at least ${1 - 1/n^{c'}}$ for some other constant ${c'}$ that depends on ${c}$. Not only does this imply that the expected number of comparisons is ${O(n \log n)}$ (why?), but also that the number of comparisons is very unlikely to be much larger than the expectation.

Here’s the theorem for today:

Lemma 1 With probability at least ${1 - n^{-2}}$, the number of levels of recursion in Quicksort is at most ${28 \ln n}$.

Proof:
First, let level ${1}$ comparisons be those done when we partition the entire input into two parts, level ${2}$ comparisons be those when we partition these two parts into two parts each, and so on. Note that the number of comparisons made in level ${i}$ is at most ${n}$, so it suffices to show that the number of levels before all the parts have only a single element each is ${O(\log n)}$ with high probability.

Let’s prove this claim. Consider a single element ${e}$. Let ${S_i}$ be the set containing ${e}$ at level ${i}$. Note that ${S_1}$ is the entire input. We call level ${i}$ to be ${e}$-good if either ${|S_i| = 1}$, or the level-${i}$ pivot falls in the middle half of ${S_i}$ and hence causes ${|S_{i+1}|}$ to be at most ${3/4 |S_i|}$. Since we pick a pivot uniformly at random, the chance of any level ${i}$ being ${e}$-good is at least ${1/2}$, independent of all other levels. Hence, probability of seeing at least ${t}$ ${e}$-good levels within the top ${\tau}$ levels is at least the probability of seeing ${t}$ heads in ${\tau}$ flips of an unbiased coin. (I.e., the number of ${e}$-good levels stochastically dominates the Geom(${\tau, 1/2}$) random variable.)

Note that if there are ${3.5 \ln n \geq \log_{4/3} n}$ ${e}$-good levels, then since a good level reduces the size of the set by at least ${3/4}$, the element ${e}$ must lie in a set of unit size. Setting ${\tau = 28 \ln n}$, the chance of seeing fewer than ${3.5 \ln n}$ good levels is at most the chance of seeing fewer than ${3.5 \ln n}$ heads when we expect to see ${14 \ln n}$ heads. By a multiplicative Chernoff bound on the lower tail, the chance of this is at most

$\displaystyle \exp(-\frac{(3/4)^2 \times 14 \ln n}{2}) \leq n^{-3}$

Thus the branch of the recursion containing the element ${e}$ terminates before ${28 \ln n}$ levels except with probability ${1/n^3}$. There are ${n}$ possible elements, and we can take a trivial union bound over these ${n}$ elements: with probability at least ${1 - 1/n^2}$, the Quicksort recursion must have bottomed out before ${28 \ln n}$ levels.
$\Box$

Note that this shows the total number of comparisons is bounded by ${28 n \ln n}$ with probability ${1 - 1/n^2}$.

But what about the expected number of comparisons? One can bound this in several ways. Here’s one: with probability ${1- 1/n^2}$ the “good event” happens and the number of comparisons is at most ${28 n \ln n}$. If the good event does not happen (w.p. ${1/n^2}$), we know that Quicksort takes ${{n \choose 2}}$ comparisons in the worst case, and hence the expected number of comparisons is at most

$\displaystyle 28 n \ln n + \frac{1}{n^2} {n \choose 2} \leq 28 n \ln n + 1.$

This is much worse than the bound of ${\mu = 2nH_n}$ we showed in class, but it has two advantages: (a) it was quick and dirty, as promised, and (b) it also shows a concentration result—Quicksort is unlikely to do many more comparisons than you’d expect.

In fact, one can prove much better bounds: a result of McDiarmid and Hayward from SODA 1992 shows that if ${Q}$ is r.v. denoting the number of comparisons for Quicksort, then ${Q}$ falls outside the interval ${(1 \pm \epsilon) 2nH_n}$ with probability

$\displaystyle n^{-2\epsilon(\ln \ln n + O(\ln \ln \ln n))}.$

That’s pretty well-concentrated! (One class project could be to understand this paper, and the concentration results they use.)