# CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

## Category Archives: Anupam

## Lecture #8: Balls and Bins

**1. Balls and Bins **

The setting is simple: balls, bins. When you consider a ball, you pick a bin independently and uniformly at random, and add the ball to that bin. In HW #2 you proved:

Theorem 1The max-loaded bin has balls with probability at least .

One could use a Chernoff bound to prove this, but here is a more direct calculation of this theorem: the chance that bin has at least balls is at most

which is (say) for . To see this, note that

So union bounding over all the bins, the chance of some bin having more than balls is . (I’ve been sloppy with constants, you can do better constants by using Stirling’s approximation.)

Here is a semantically identical way of looking at this calculation: let be the indicator r.v. for bin having or more balls. Then . And hence if , then . So by Markov, . In other words, we again have

This idea of bounding the expectation of some variable , and using that to upper bound some quantity (in this case the max-load) is said to use the *first moment method*.

** 1.1. Tightness of the Bound **

In fact, is indeed the right answer for the max-load with balls and bins.

Theorem 2The max-loaded bin has balls with probability at least .

Here is one way to show this, via the *second moment method*. To begin, let us now lower bound the probability that bin has at least balls:

which for is at least , since . And so we expect bins to have at least balls.

Let us define some random variables: if is the indicator for bin having at least balls, and is the expected number of bins with at least balls, we get that

Alas, in general, just knowing that will not imply . Indeed, consider a random variable that is w.p. , and otherwise—while its expectation is , is more and more likely to be zero as increases. So we need some more information about to prove our claim. And that comes from the second moment.

Let’s appeal to Chebyshev’s inequality:

You have probably seen *covariance* before: . But since the bins are negatively correlated (some bin having more balls makes it less likely for another bin to do so), the covariance . Moreover, since , ; by the above calculations, . So summarizing, we get

In other words, there is a chance that some bin contains more than balls:

(Later, you will see how to use martingale arguments and Azuma-Hoeffding bounds to give guarantees on the max-load of bins. You can also use the “Poisson approximation” to show such a result, that’s yet another cool technique.)

** 1.2. So, in Summary **

If you want to show that some non-negative random variable is zero with high probability, show that it’s expectation is tends to zero, and use Markov—the *first moment method*. If you want to show that it is non-zero with high probability, show that the variance divided by the squared mean tends to zero, and use Chebyshev—the *second moment method*.

** 1.3. Taking it to the Threshold **

Such calculations often arise when you have a random process, and a random variable defined in terms of a parameter . Often you want to show that is zero whp when lies much below some “threshold” , and that is non-zero whp when is far above . The first things you should try are to see if the first and second moment methods give you rough answers. E.g., take vertices and add each of the edges independently with probability (also called the Erdös-Rényi graph ), and define to be the expected number of cliques on vertices. Show that is such a threshold for .

**2. The Power of Two Choices **

The setting now is: balls, bins. However, when you consider a ball, you pick *two* bins (or in general, bins) independently and uniformly at random, and put the ball in the less loaded of the two bins. The main theorem is:

Theorem 3The two-choices process gives a maximum load of with probability at least .

The intuition behind the proof is the following picture:

The actual proof is not far from this intuition. The following lemma says that if at most fraction of the bins have at least balls, then the fraction of bins having balls can indeed be upper bounded by , where is the Binomial random variable.

Lemma 4If is the number of bins with load at least , then .

*Proof:* For the proof, let us consider the “heights” of balls: this is the position of the ball when it comes in, if it is the first ball in its bin then its height is , etc. Observe that if there are bins with load , then there must be at least balls with height . I.e., if is the number of balls with height at least , then , and so we’ll now upper bound .

Consider the following experiment: just before a ball comes in, an adversary is allowed to “mark” at most bins. Call a ball marked if both its random bins are marked. Note that when we condition on , we know that the final number of bins with load at least is at most . In this case, we can imagine the adversary marking the bins with load at least (and maybe some more). Now the chance that a ball is marked is at least the chance that it has height and there are at most bins with height at least . Hence, if is the number of marked balls, we get

The second equality follows from the fact that .

*If you’d like to be more precise about proving (*) above, see the details in the notes from the Mitzenmacher-Upfal. (CMU/Pitt access only.)*

Now we can use Chernoff to prove tail bounds on the Binomial distribution.

Moreover, if , then

*Proof:* We’re interested in where each w.p. , and otherwise. The expectation . And the chance that this number exceeds is at most

which proves the first part. For the second part, , and the probability that exceeds is at most

as claimed.

So, now let us define to be the fraction of bins we’re aiming to show have load at least . Define , and . (The reason it is instead of , which is the expectation, is for some breathing room to apply Chernoff: in particular, the factor comes from the first part of Lemma 5.)

For each , let be the good event “”; recall that is the number of bins with load at least . We want to lower bound the probability that this good event does not happen.

*Proof:* We prove this by induction. The base case is when , when at most bins can have load at least (by Markov). So .

For the induction,

By Lemma 4 the former term is at most , which by Lemma 5 is at most . And by induction, , which means .

Consider . By the Lemma 6, . Hence, with probability , we have the number of bins with more than balls in them is at most .

We’re almost done, but there’s one more step to do. If this number were small, say , then we could have done a union bound, but this number may still be about . So apply Lemma 4 and the second part of Lemma 5 once more to get:

Finally, since is so small whp, use Lemma 4 and a union bound to say that

Finally, the calculations in Section 2 show that , which completes the proof.

** 2.1. A Calculation **

Since , and , we calculate

So, for , it suffices to set

**3. A Random Graphs Proof **

Another way to show that the maximum load is —note that the constant is worse—is to use an first-priciples analysis based on properties of random graphs. We build a random graph as follows: the vertices of correspond to the bins, and the edges correspond to balls—each time we probe two bins we connect them with an edge in . For technical reasons, we’ll just consider what happens if we throw fewer balls (only balls) into bins—also, let’s imagine that each ball chooses two distinct bins each time.

Theorem 7If we throw balls into bins using the best-of-two-bins method, the maximum load of any bin is whp.

Hence for balls and bins, the maximum load should be at most times as much, whp. (It’s as though after every balls, we forget about the current loads and zero out our counters—not zeroing out these counters can only give us a more evenly balanced allocation; I’ll try to put in a formal proof later.)

To prove the theorem, we need two results about the random graph obtained by throwing in random edges into vertices. Both the proofs are simple but surprisingly effective counting arguments, they appear at the end.

Lemma 9There exists a suitably large constant such that for all subsets of the vertex set with , the induced graph contains at most edges, and hence has average degree at most , whp.

Given the graph , suppose we repeatedly perform the following operation in rounds:

In each round, remove all vertices of degree in the current graph.

We stop when there are no more vertices of small degree.

Lemma 10This process ends after rounds whp, and the number of remaining vertices in each remaining component is at most .

*Proof:* Condition on the events in the two previous lemmas. Any component of size at least in the current graph has average degree at most ; by Markov at least half the vertices have degree at most and will be removed. So as long as we have at least nodes in a component, we halve its size. But the size of each component was to begin, so this takes rounds before each component has size at most .

Lemma 11If a node/bin survives rounds before it is deleted, its load due to edges that have already been deleted is at most . If a node/bin is never deleted, its load is at most , where is the total number of rounds.

*Proof:* Consider the nodes removed in round : their degree was at most , so even if all those balls went to such nodes, their final load would be at most . Now, consider any node that survived this round. While many edges incident to it might have been removed in this round, we claim that at most of those would have contributed to ‘s load. Indeed, the each of the other endpoints of those edges went to bins with final load at most . So at most of them would choose as their less loaded bin before it is better for them to go elsewhere.

Now, suppose is deleted in round : then again its load can be at most : ten because it survived the previous round, and 10 from its own degree in this round. OTOH, if survives, then consider all the edges incident to that were deleted in previous rounds. Each of them went to nodes that were deleted in rounds or , and hence had maximum load at most . Thus at most of these edges could contribute to ‘s load before it was better for them to go to the other endpoint. The same inductive argument holds for any round .

Finally, the process ends when each component has size at most , so the degree of any node is at most . Even if all these edges contribute to the load of a bin, it is only .

By Lemma 10, the number of rounds is whp, so by Lemma 11 the maximum load is also whp.

** 3.1. Missing Proofs of Lemmas **

*Proof:* We have a graph with vertices and edges where we connect vertices at random.

Since and . Now the probability that any such set exists can bounded above by the union bound

which proves the claim.

Lemma 13There exists a suitably large constant such that for all subsets of the vertex set with , the induced graph contains at most edges, and hence has average degree at most , whp.

*Proof:*

By a union bound over all sets, the probability that such a set exists is

where . Now, we can break this sum into two: for small values of , the term would be very small, else the term would be small. Indeed, for , we know that , so

Now for the rest:

for , say.

**Bibliographic Notes:** The layered induction appears in Balanced Allocations Azar, Broder, Karlin, and Upfal. The random graph analysis is in the paper Efficient PRAM Simulation on a Distributed Memory Machine by Karp, Luby, and Meyer auf der Heide; I learned it from Satish Rao. The *Always-go-left* algorithm and analysis is due to How Asymmetry Helps Load Balancing by Berthold Vöcking.

**Update:** Here’s a survey on the various proof techniques by Mitzenmacher, Sitaraman and Richa.

## Lecture #7: The Local Lemma

**1. The Local Lemma: Various Forms **

The symmetric version of the local lemma we saw in class:

Theorem 1 (Symmetric Form I)

Given a collection of events such that , and each is independent of all but of these events, if then .

Very often, the sample space is defined by some experiment which involves sampling from a bunch of *independent* random variables, and the bad events each depend on some subset of these random variables. In such cases we can state the local lemma thus:

Theorem 2 (Symmetric Form II)

Suppose are independent random variables, and events such that each that depends only on some subset of these variables. Moreover, suppose and each intersects at most of the ‘s. If then .

Note that both the examples from class (the -SAT and Leighton Maggs and Rao results) fall into this setting.

Finally, here’s the asymmetric form of the local lemma:

Theorem 3 (Asymmetric Form)

Given events with each independent of all but the set of these events, suppose there exist such that

Then .

Occasionally one needs to use the asymmetric form of the local lemma: one example is Uri Feige’s result showing a constant integrality gap for the Santa Claus problem, and the resulting approximation algorithm due to Heupler, Saha and Srinivasan.

** 1.1. Proofs of the Local Lemma **

The original proof of the local lemma was based on a inductive argument. This was a non-constructive proof, and the work of Beck gave the first techniques to make some of the existential proofs algorithmic.

In 2009, Moser, and then Moser and Tardos gave new, intuitive, and more algorithmic proofs of the lemma for the case where there is an underlying set of independent random variables, and the bad events are defined over subsets of these variables. (E.g., the version of the Local Lemma given in Theorem~2, and its asymmetric counterpart). Check out notes on the proofs of the Local Lemma by Joel Spencer and Uri Feige. The paper of Heupler, Saha and Srinivasan gives algorithmic versions for some cases where the number of events is exponentially large.

** 1.2. Lower Bounds **

The local lemma implies that if then the formula is satisfiable. This is complemented by the existence of unsatisfiable *E-SAT* formulas with degree : this is proved in a paper of Gebauer, Szabo and Tardos (SODA 2011). This shows that the factor of in the local lemma cannot be reduced, even for the special case of E-SAT.

The fact that the factor was tight for the symmetric form of the local lemma was known earlier, due to a result of Shearer (1985).

**2. Local Lemma: The E-SAT Version **

Let me be clearer, and tease apart the existence question from the algorithmic one. (I’ve just sketched the main ideas in the “proofs”, will try to fill in details later; let me know if you see any bugs.)

Theorem 4

If is a E-SAT formula with clauses, variables, and where the degree of each clause is at most , then is satisfiable.

*Proof:* Assume there is no satisfying assignment. Then the algorithm we saw in class will run forever, no matter what random bits it reads. Let us fix . So for every string of bits the algorithm reads from the random source, it will run for iterarations.

But now one can encode the string thus: use bits to encode the clauses at the roots of the recursion trees, to encode the clauses lying within these recursion trees, and bits for the final settings of the variables. As we argued, this is a lossless encoding, we can recover the bits from this encoding. How long is this encoding? It is , which is strictly less than for and .

So this would give us a way to encode *every* string of length into strings of shorter lengths. But since for every length , there are strings of length and strings of length strictly less than , this is impossible. So this contradicts our assumption that there is no satisfying assignment.

Now we can alter the proof to show that the expected running time of the algorithm is small:

Theorem 5

If is a E-SAT formula with clauses, variables, and where the degree of each clause is at most , then the algorithm FindSat finds a satisfying assignment in time.

*Proof:* Assume that we run for at least steps with probability at least . (Again, think of .) Then for at least of the strings, we compress each of these strings into strings of length .

But if we have any set of strings, we must use at least bits to represent at least one of them. So

If , we have , and

or

So we get that the probability of taking more than steps is at most , which implies an expected running time of .

## And Lecture #6 references

Some references from class yesterday: a short-and-sweet analysis for Johnson’s algorithm for MAX-SAT satisfying clauses is by Engebretsen; this simplifies the original proof by Chen, Friesen and Zhang. Recall that this analysis is tight if you take the clauses and consider the variables in the order : we could break ties badly and set and get whereas OPT is .

While it would seem that we’ve done all we can with this algorithm, the latest SODA conference has two interesting papers on the subject:

- The paper
*Randomized greedy: new variants of some classic approximation algorithms*by Costello, Shapira, Tetali, which shows that considering the variables in a random order in Johnson’s algorithm achieves a performance of for some constant . - Another paper
*Randomized Variants of Johnson’s Algorithm for MAX SAT*by Poloczek and Schnitger shows that this random order idea cannot achieve , sadly. However, they give a variant of Johnson’s algorithm that does get .

Several questions remain: can we pin down better the performance of Johnson’s algorithm using a random order? It would be interesting to understand these papers, and perhaps to simplify the algorithm or the analysis in the latter? Can it be viewed as implicitly solving a linear program?

**Update:** And here are some lecture notes for MAX-3-SAT from the previous time Avrim taught the course.

## Notes on Lecture 6

**1. The Derandomization for Independent Sets **

Here’s the derandomization of the probabilistic argument for Turan’s theorem. It’s just a little trickier than I’d figured, but it also leads to a different algorithm! (See Section 2.)

Let’s recall the randomized version: Given graph and permutation , call a vertex a *winner* if it appears before all its neighbors in the order . Let be the indicator of being a winner according to permutation , and hence . Now the number of winners , and hence . The greedy algorithm which considers the vertices in the order will at least pick all the winners. Hence,

The issue now is that we now don’t have an exact expression for the expected size of the independent set found by greedy that we can track, we only have a *lower bound* on it, namely the expected number of winners. But that’s OK: we’ll just track this lower bound instead, which we will call the “pessimistic estimator”. Let’s now pin this down.

First, given any set of vertices , if we run the random permutation algorithm on the induced graph , we can compute the expected number of winners exactly — it’s just , where is the degree of in graph .

So let’s do the following: having added some nodes to our independent set, let be the “rest” of the nodes which are not already chosen in nor already ruled out because they are adjacent to some node in . Now define the “*pessimistic estimator*” to be

Note that we start off with empty, then we get . And finally, , then this is just . So if we could ensure that at every step the value never decreases, we’d be done. One can do it in two ways (at least), here’s the approach some of you mentioned at the end of lecture.

** 1.1. Showing a Good Vertex to Add **

Consider all the possible vertices in : we’ll show one of them can be added to to cause . For vertex , let be its neighbors (including itself) in . If we add in to the I.S., the new value would be

The inequality is because the degrees in can never be larger than those in . Now rewriting,

If we average over all , and we get

which is non-negative since for each (of which there are at most , there are many ‘s. And hence, there is at least one to add so that .

The part that originally threw me off in the calculations was that averaging over the ‘s could potentially lead to a greater potential than what we started off . But that is only better for us, so all’s well.

** 1.2. So To Summarize **

The final idea is this: for the probabilistic analysis, we didn’t know how to compute the expectation we wanted (i.e., the expected number of vertices picked by random-permutation greedy). So we used a lower bound—the expected number of winners.

And then, that lower bound was useful for the derandomization. We show that we could add in one vertex at a time to the independent set while never decreasing plus the expected number of winners in the rest.

**2. BTW, A Simpler Algorithm **

If we go back and stare at inequality (1) for a few minutes, we realize that this analysis also gives an analysis for the following greedy algorithm:

While there is a vertex left in the graph, add the vertex with lowest remaining degree into , and remove it and all its neighbors.

(Do you see why?) We’ve shown that this will give an independent set of size at least .

**3. Pessimistic Estimators for Low Congestion **

The general method of pessimistic estimators is attributed to Prabhakar Raghavan, whose book is the text for the course, and who’s currently the head of Yahoo! Research. (He did this work while he was a graduate student in the mid-80s.)

His original paper used pessimistic estimators to derandomize, among other problems, the randomized-rounding algorithm for low-congestion routing example we saw in the previous lecture (due to Raghavan and Thompson). Aravind Srinivasan has some great notes on the derandomization; here’s a high-level summary.

Prabhakar’s use of these estimators is very neat, since looking at the proof it’s not obvious what to use as the pessimistic estimator, since there’s the Chernoff bound, which seems somewhat mysterious and awkward to handle. So he unrolls the first few steps of the Chernoff-bound calculation, which you recall goes as follows: if is the congestion on an edge , then

where the sum over edges represents the union bound, we use Markov’s inequality in the middle, all the m.g.f. calculations are omitted here, and is suitably chosen to ensure the bound of .

He then notes that the expression upper bounds the actual probability of error, and also can be tracked efficiently. In particular, fixing flow-paths for some of the source-sink pairs, we can calculate the expectation over random choices for the rest of the pairs, since these ‘s are sums of independent random variables. And we can make choices that causes this quantity to not increase. So if the estimator starts off below , we ensure it always stays below that quantity.

And since the final probabilities are integers or (since we’ve chose one flow-path for each source-sink pair, either an edge has more than congestion, or it does not), and they sum to less than , they must all be zero. This gives the deterministic algorithm with congestion at most .

Again, Aravind’s notes, or the original paper, give you more details, and a more general properties that suffice for pessimistic estimators.

## Belated Notes on Lecture #2

*(Sorry for the delayed notes: it made sense to wait until we’d seen Chernoff bounds before I posted these. — AG)*

We saw in Lecture #2 that the expected number of comparisons for the randomized version of quicksort was at most ; the proof was via defining just the right random variables (*“do we compare and at any step”*), we could compute that , and hence the expected number of comparisons is .

However, often it helps to get a quick ballpark estimate before going for the best analysis: here’s a quick-and-dirty way to see that the number of comparisons for Quicksort is with probability at least for some other constant that depends on . Not only does this imply that the expected number of comparisons is (why?), but also that the number of comparisons is very unlikely to be much larger than the expectation.

Here’s the theorem for today:

Lemma 1With probability at least , the number of levels of recursion in Quicksort is at most .

*Proof:*

First, let level comparisons be those done when we partition the entire input into two parts, level comparisons be those when we partition these two parts into two parts each, and so on. Note that the number of comparisons made in level is at most , so it suffices to show that the number of levels before all the parts have only a single element each is with high probability.

Let’s prove this claim. Consider a single element . Let be the set containing at level . Note that is the entire input. We call level to be -good if either , or the level- pivot falls in the middle half of and hence causes to be at most . Since we pick a pivot uniformly at random, the chance of any level being -good is at least , independent of all other levels. Hence, probability of seeing at least -good levels within the top levels is at least the probability of seeing heads in flips of an unbiased coin. (I.e., the number of -good levels stochastically dominates the Geom() random variable.)

Note that if there are -good levels, then since a good level reduces the size of the set by at least , the element must lie in a set of unit size. Setting , the chance of seeing fewer than good levels is at most the chance of seeing fewer than heads when we expect to see heads. By a multiplicative Chernoff bound on the lower tail, the chance of this is at most

Thus the branch of the recursion containing the element terminates before levels except with probability . There are possible elements, and we can take a trivial union bound over these elements: with probability at least , the Quicksort recursion must have bottomed out before levels.

Note that this shows the total number of comparisons is bounded by with probability .

But what about the **expected** number of comparisons? One can bound this in several ways. Here’s one: with probability the “good event” happens and the number of comparisons is at most . If the good event does not happen (w.p. ), we know that Quicksort takes comparisons in the worst case, and hence the expected number of comparisons is at most

This is much worse than the bound of we showed in class, but it has two advantages: (a) it was quick and dirty, as promised, and (b) it also shows a concentration result—Quicksort is unlikely to do many more comparisons than you’d expect.

In fact, one can prove much better bounds: a result of McDiarmid and Hayward from SODA 1992 shows that if is r.v. denoting the number of comparisons for Quicksort, then falls outside the interval with probability

That’s pretty well-concentrated! (One class project could be to understand this paper, and the concentration results they use.)

## Notes on Lecture #5

A couple of things I remembered about Chernoff bounds when sitting in Avrim’s lecture, which thought I’d post here.

The first thing is a (slightly different) bound that you may find easier to remember. To begin, let us state the multiplicative bounds mentioned in the handout, altered ever so slightly:

Theorem 1

Let be independent variables, and let . Let . Then for any ,

Now, here’s the version of the general multiplicative bounds I find easiest to remember:

Theorem 2

Let be independent variables taking values in the range , and let . Let . Then for any ,

Note that we’ve relaxed the assumption that ‘s are , and are allowing ourselves to lie in . (This is often super-useful, you may want to go over the proof of Theorem 1 in the book or the handout and think about how you may prove it.)

Moreover, (4) is exactly the same as (1). And if we constrain , the exponent on the right of (5) is at most , which is the same as in (2), so (5) is a better bound than (2) for ‘s in this range. So we’re doing at least as well as the first two tail bounds in Theorem 1.

Cool! So does this subsume (3) as well? Sadly, no. We lose something in the translation. Here’s an example. Let the ‘s be i.i.d., taking on value with probability and otherwise, so that . Now, what is the value of which gives us a tail probability of ? If we set for suitably large then (3) gives us , but for (5) to fall below we need for some constant .

However, often the bound from (5) is in the “right ballpark”, and I find it easy to remember. Chances are, if I use a Chernoff bound in lecture, it’ll be the one in Theorem 2.

Secondly, some jargon: these bounds are often called *concentration inequalities* (since they show that “most of the probability mass is concentrated around the mean”) and *tail bounds* (they show that the “tails of the distribution are small”) or *large deviation bounds* (since they show that “large deviations from the mean are unlikely”). There are tons of other tail bounds, some relaxing the requirements of independence, others relaxing the boundedness requirements, others looking at functions other than the sum of the random variables—you’ll see more in the homeworks and future lectures.

## Another Algorithm for SAT

BTW, when solving HW#1 Problem #2, it may be interesting to try solve the problem for -SAT. The algorithm remains the same, and you should be able to show a success probability of approximately . As a sanity check, for this is again .

So here’s a different randomized algorithm to solve -SAT in less than time. This one is from a paper titled Satisfiability Coding Lemma due to Ramamohan Paturi, Pavel Pudlak and Francis Zane. Hence, we’ll call this the PPZ algorithm.

Pick a random permutation on the variables. Pick a uniformly random “start” assignment for the variables. Now consider the variables in the order , and do the following:

If the current variable is the only remaining variable for some clause, set it to satisfy that clause. (If occurs positively in some clause and negatively in some other clause, you can abort the process at this point.) If the current variable only occurs in clauses with more than one variable, set it equal to .

Now, setting this variable to or will cause some clauses to be satisfied: remove these clauses. In other clauses, this variable will just disappear. E.g., if a clause was and we set to , then the new clause will just be . Do all these simplifications, and then go on to the next variable.

That’s it. To recap, we just pick a random variable, if it is forced (due to some clause of length ), set it to the forced value, else set it randomly. Continue until we have a complete assignment.

A theorem due to PPZ says:

Theorem 1

Given a satisfying -SAT formula, the probability that this algorithm finds a satisfying assignment is at least .

Hence for , this gives an algorithm with expected running time approximately . Not quite as good as Sch\”{o}ning’s algorithm, but not too bad! And the proof is also very clean. In this post we’ll just consider the case where there is a unique satisfying assignment . (The general case is not much harder, see Section~4.1 in the PPZ paper.)

So yeah, there’s a unique satisfying assignment . This means that for every variable , changing the value of to will give us an unsatisfying assignment, and some clause will complain because it is not satisfied any more; if there are several, pick any one. Call this a critical clause for variable , and name it . OK, variables, each with its very own critical clause.

Why these critical clauses? Suppose in the clause , we’ve set the other variables to be the same as in . Then the only way to satisfy will be to also set equal to . And, if it turns out that in the random order we’re considering the variables, if came after all the other variables in , this is precisely what the PPZ algorithm will do! Indeed, it will not have to make a random choice for : our hand will be forced to choose .

So suppose for a random permutation, call a variable *lucky* if among the variables in their critical clause, places last in the ordering . And be the set of lucky variables for permutation . Then, assuming that we lucked out and chose for all the variables , the algorithm will certainly also set to . So, for the permutation , the chance (over the choice of ) that we find the satisfying assignment is .

Finally, what is , the expectation taken over random permutations ? Rewriting, it is . By convexity, the numerator is at least . And what is the expectation? See, there are variables. Each one has a chance to be lucky and placed last in its critical clause, and so the expected size of is . So our success probability is at least , proving the theorem for the unique assignment case.

A couple of notes: Firstly, you can think of the PPZ algorithm as a randomized non-backtracking Davis-Putnam procedure. You’ll probably hear about DPLL algorithms (named after Martin Davis, Hilary Putnam, George Logemann, and Donald Loveland) sooner or later, if you haven’t already; so if you’re interested, check out these notes by Satish Rao.

Secondly, the PPZ algorithm can be improved. After their paper, Paturi, Pudlak and Zane teamed up with Mike Saks to give us the PPSZ paper (here, here). Their new algorithm first does some simple preprocessing (a few steps of resolution, since you ask), and then runs the PPZ algorithm. The new PPSZ analysis beats Sch\”{o}ning’s performance of for and higher.

Moreover, as we mentioned in this post, the two algorithms work better in somewhat orthogonal cases: Sch\”{o}ning’s algorithm works better when there are many “well-spread” satisfying assignments, and PPSZ works better with “few” solutions. And the papers of Iwama and Tamaki, and Hertli et al., combine these two algorithms carefully to get the current world record algorithms. Can you beat this world record?

## Lecture 1 notes and references

The reference to the best known algorithm for 3SAT we mentioned in class is this paper: Improving PPSZ for 3-SAT using Critical Variables, by Hertli, Moser, and Scheder.

It gives a algorithm with an expected runtime of 1.321^n, and the idea to cleverly combine Schoening’s algorithm (the one you saw today and will analyze in HW1) and a previous algorithm of Paturi, Pudlak, Saks and Zane (PPSZ).

At a super-high level, Schoening’s algorithm does better if there are lots of (well-separated) solutions that the randomized process can make a beeline for. The PPSZ algorithm, another very simple randomized algorithm that we may see in a later post, works better if there are few satisfying assignments. The idea in the paper of Hertli and others is how to trade off the two guarantees to do better than either one individually; this idea of combining Schoening and PPSZ goes back to a paper of Iwama and Tamaki (SODA ’04).