Notes on Lecture 6
1. The Derandomization for Independent Sets
Here’s the derandomization of the probabilistic argument for Turan’s theorem. It’s just a little trickier than I’d figured, but it also leads to a different algorithm! (See Section 2.)
Let’s recall the randomized version: Given graph and permutation , call a vertex a winner if it appears before all its neighbors in the order . Let be the indicator of being a winner according to permutation , and hence . Now the number of winners , and hence . The greedy algorithm which considers the vertices in the order will at least pick all the winners. Hence,
The issue now is that we now don’t have an exact expression for the expected size of the independent set found by greedy that we can track, we only have a lower bound on it, namely the expected number of winners. But that’s OK: we’ll just track this lower bound instead, which we will call the “pessimistic estimator”. Let’s now pin this down.
First, given any set of vertices , if we run the random permutation algorithm on the induced graph , we can compute the expected number of winners exactly — it’s just , where is the degree of in graph .
So let’s do the following: having added some nodes to our independent set, let be the “rest” of the nodes which are not already chosen in nor already ruled out because they are adjacent to some node in . Now define the “pessimistic estimator” to be
Note that we start off with empty, then we get . And finally, , then this is just . So if we could ensure that at every step the value never decreases, we’d be done. One can do it in two ways (at least), here’s the approach some of you mentioned at the end of lecture.
1.1. Showing a Good Vertex to Add
Consider all the possible vertices in : we’ll show one of them can be added to to cause . For vertex , let be its neighbors (including itself) in . If we add in to the I.S., the new value would be
The part that originally threw me off in the calculations was that averaging over the ‘s could potentially lead to a greater potential than what we started off . But that is only better for us, so all’s well.
1.2. So To Summarize
The final idea is this: for the probabilistic analysis, we didn’t know how to compute the expectation we wanted (i.e., the expected number of vertices picked by random-permutation greedy). So we used a lower bound—the expected number of winners.
And then, that lower bound was useful for the derandomization. We show that we could add in one vertex at a time to the independent set while never decreasing plus the expected number of winners in the rest.
2. BTW, A Simpler Algorithm
If we go back and stare at inequality (1) for a few minutes, we realize that this analysis also gives an analysis for the following greedy algorithm:
While there is a vertex left in the graph, add the vertex with lowest remaining degree into , and remove it and all its neighbors.
(Do you see why?) We’ve shown that this will give an independent set of size at least .
3. Pessimistic Estimators for Low Congestion
The general method of pessimistic estimators is attributed to Prabhakar Raghavan, whose book is the text for the course, and who’s currently the head of Yahoo! Research. (He did this work while he was a graduate student in the mid-80s.)
His original paper used pessimistic estimators to derandomize, among other problems, the randomized-rounding algorithm for low-congestion routing example we saw in the previous lecture (due to Raghavan and Thompson). Aravind Srinivasan has some great notes on the derandomization; here’s a high-level summary.
Prabhakar’s use of these estimators is very neat, since looking at the proof it’s not obvious what to use as the pessimistic estimator, since there’s the Chernoff bound, which seems somewhat mysterious and awkward to handle. So he unrolls the first few steps of the Chernoff-bound calculation, which you recall goes as follows: if is the congestion on an edge , then
where the sum over edges represents the union bound, we use Markov’s inequality in the middle, all the m.g.f. calculations are omitted here, and is suitably chosen to ensure the bound of .
He then notes that the expression upper bounds the actual probability of error, and also can be tracked efficiently. In particular, fixing flow-paths for some of the source-sink pairs, we can calculate the expectation over random choices for the rest of the pairs, since these ‘s are sums of independent random variables. And we can make choices that causes this quantity to not increase. So if the estimator starts off below , we ensure it always stays below that quantity.
And since the final probabilities are integers or (since we’ve chose one flow-path for each source-sink pair, either an edge has more than congestion, or it does not), and they sum to less than , they must all be zero. This gives the deterministic algorithm with congestion at most .
Again, Aravind’s notes, or the original paper, give you more details, and a more general properties that suffice for pessimistic estimators.