CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

Belated Notes on Lecture #2

(Sorry for the delayed notes: it made sense to wait until we’d seen Chernoff bounds before I posted these. — AG)

We saw in Lecture #2 that the expected number of comparisons for the randomized version of quicksort was at most {2nH_n}; the proof was via defining just the right random variables {X_{ij}} (“do we compare {i} and {j} at any step”), we could compute that {E[X_{ij}] = 2/(j-i+1)}, and hence the expected number of comparisons is {E[\sum_{i < j} X_{ij}] \leq 2nH_n}.

However, often it helps to get a quick ballpark estimate before going for the best analysis: here’s a quick-and-dirty way to see that the number of comparisons for Quicksort is {c n \log n} with probability at least {1 - 1/n^{c'}} for some other constant {c'} that depends on {c}. Not only does this imply that the expected number of comparisons is {O(n \log n)} (why?), but also that the number of comparisons is very unlikely to be much larger than the expectation.

Here’s the theorem for today:

Lemma 1 With probability at least {1 - n^{-2}}, the number of levels of recursion in Quicksort is at most {28 \ln n}.

First, let level {1} comparisons be those done when we partition the entire input into two parts, level {2} comparisons be those when we partition these two parts into two parts each, and so on. Note that the number of comparisons made in level {i} is at most {n}, so it suffices to show that the number of levels before all the parts have only a single element each is {O(\log n)} with high probability.

Let’s prove this claim. Consider a single element {e}. Let {S_i} be the set containing {e} at level {i}. Note that {S_1} is the entire input. We call level {i} to be {e}-good if either {|S_i| = 1}, or the level-{i} pivot falls in the middle half of {S_i} and hence causes {|S_{i+1}|} to be at most {3/4 |S_i|}. Since we pick a pivot uniformly at random, the chance of any level {i} being {e}-good is at least {1/2}, independent of all other levels. Hence, probability of seeing at least {t} {e}-good levels within the top {\tau} levels is at least the probability of seeing {t} heads in {\tau} flips of an unbiased coin. (I.e., the number of {e}-good levels stochastically dominates the Geom({\tau, 1/2}) random variable.)

Note that if there are {3.5 \ln n \geq \log_{4/3} n} {e}-good levels, then since a good level reduces the size of the set by at least {3/4}, the element {e} must lie in a set of unit size. Setting {\tau = 28 \ln n}, the chance of seeing fewer than {3.5 \ln n} good levels is at most the chance of seeing fewer than {3.5 \ln n} heads when we expect to see {14 \ln n} heads. By a multiplicative Chernoff bound on the lower tail, the chance of this is at most

\displaystyle  \exp(-\frac{(3/4)^2 \times 14 \ln n}{2}) \leq n^{-3}

Thus the branch of the recursion containing the element {e} terminates before {28 \ln n} levels except with probability {1/n^3}. There are {n} possible elements, and we can take a trivial union bound over these {n} elements: with probability at least {1 - 1/n^2}, the Quicksort recursion must have bottomed out before {28 \ln n} levels.

Note that this shows the total number of comparisons is bounded by {28 n \ln n} with probability {1 - 1/n^2}.

But what about the expected number of comparisons? One can bound this in several ways. Here’s one: with probability {1- 1/n^2} the “good event” happens and the number of comparisons is at most {28 n \ln n}. If the good event does not happen (w.p. {1/n^2}), we know that Quicksort takes {{n \choose 2}} comparisons in the worst case, and hence the expected number of comparisons is at most

\displaystyle  28 n \ln n + \frac{1}{n^2} {n \choose 2} \leq 28 n \ln n + 1.

This is much worse than the bound of {\mu = 2nH_n} we showed in class, but it has two advantages: (a) it was quick and dirty, as promised, and (b) it also shows a concentration result—Quicksort is unlikely to do many more comparisons than you’d expect.

In fact, one can prove much better bounds: a result of McDiarmid and Hayward from SODA 1992 shows that if {Q} is r.v. denoting the number of comparisons for Quicksort, then {Q} falls outside the interval {(1 \pm \epsilon) 2nH_n} with probability

\displaystyle  n^{-2\epsilon(\ln \ln n + O(\ln \ln \ln n))}.

That’s pretty well-concentrated! (One class project could be to understand this paper, and the concentration results they use.)


Comments are closed.

%d bloggers like this: