# CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

## Lecture #7: The Local Lemma

1. The Local Lemma: Various Forms

The symmetric version of the local lemma we saw in class:

Theorem 1 (Symmetric Form I)
Given a collection of events ${B_1, B_2, \ldots, B_m}$ such that ${\Pr[B_i] \leq p}$, and each ${B_i}$ is independent of all but ${d}$ of these events, if ${epd < 1}$ then ${\Pr[ \cap \overline{B_i} ] > 0}$.

Very often, the sample space is defined by some experiment which involves sampling from a bunch of independent random variables, and the bad events each depend on some subset of these random variables. In such cases we can state the local lemma thus:

Theorem 2 (Symmetric Form II)

Suppose ${X_1, X_2, \ldots, X_n}$ are independent random variables, and events ${B_1, B_2, \ldots, B_m}$ such that each ${B_i}$ that depends only on some subset ${\{X_j : j \in S_i\}}$ of these variables. Moreover, suppose ${\Pr[ B_i ] \leq p}$ and each ${S_i}$ intersects at most ${d}$ of the ${S_j}$‘s. If ${epd < 1}$ then ${\Pr[ \cap \overline{B_i} ] > 0}$.

Note that both the examples from class (the ${k}$-SAT and Leighton Maggs and Rao results) fall into this setting.

Finally, here’s the asymmetric form of the local lemma:

Theorem 3 (Asymmetric Form)

Given events ${B_1, B_2, \ldots, B_m}$ with each ${B_i}$ independent of all but the set ${\Gamma_i}$ of these events, suppose there exist ${x_i \in (0,1)}$ such that

$\displaystyle \Pr[ B_i ] \leq x_i \prod_{j \in \Gamma_i \setminus \{i\}} (1 - x_j).$

Then ${\Pr[ \cap \overline{B_i} ] \geq \prod_i (1 - x_i) > 0}$.

Occasionally one needs to use the asymmetric form of the local lemma: one example is Uri Feige’s result showing a constant integrality gap for the Santa Claus problem, and the resulting approximation algorithm due to Heupler, Saha and Srinivasan.

1.1. Proofs of the Local Lemma

The original proof of the local lemma was based on a inductive argument. This was a non-constructive proof, and the work of Beck gave the first techniques to make some of the existential proofs algorithmic.

In 2009, Moser, and then Moser and Tardos gave new, intuitive, and more algorithmic proofs of the lemma for the case where there is an underlying set of independent random variables, and the bad events are defined over subsets of these variables. (E.g., the version of the Local Lemma given in Theorem~2, and its asymmetric counterpart). Check out notes on the proofs of the Local Lemma by Joel Spencer and Uri Feige. The paper of Heupler, Saha and Srinivasan gives algorithmic versions for some cases where the number of events is exponentially large.

1.2. Lower Bounds

The local lemma implies that if ${d < 2^k/e}$ then the formula is satisfiable. This is complemented by the existence of unsatisfiable E${k}$-SAT formulas with degree ${d = 2^k(\frac1e + O(\frac{1}{\sqrt{k}}))}$: this is proved in a paper of Gebauer, Szabo and Tardos (SODA 2011). This shows that the factor of ${e}$ in the local lemma cannot be reduced, even for the special case of E${k}$-SAT.

The fact that the factor ${e}$ was tight for the symmetric form of the local lemma was known earlier, due to a result of Shearer (1985).

2. Local Lemma: The E${k}$-SAT Version

Let me be clearer, and tease apart the existence question from the algorithmic one. (I’ve just sketched the main ideas in the “proofs”, will try to fill in details later; let me know if you see any bugs.)

Theorem 4
If ${\varphi}$ is a E${k}$-SAT formula with ${m}$ clauses, ${n}$ variables, and where the degree of each clause is at most ${d \le 2^{k-3}}$, then ${\varphi}$ is satisfiable.

Proof: Assume there is no satisfying assignment. Then the algorithm we saw in class will run forever, no matter what random bits it reads. Let us fix ${M = m \log m + 1}$. So for every string ${R}$ of ${n+Mk}$ bits the algorithm reads from the random source, it will run for ${M}$ iterarations.

But now one can encode the string ${R}$ thus: use ${m \log m}$ bits to encode the clauses at the roots of the recursion trees, ${M(\log d + 2)}$ to encode the clauses lying within these recursion trees, and ${n}$ bits for the final settings of the variables. As we argued, this is a lossless encoding, we can recover the ${n+Mk}$ bits from this encoding. How long is this encoding? It is ${M(\log d + 2) + n + m \log m}$, which is strictly less than ${n+Mk}$ for ${M = m \log m + 1}$ and ${d \leq 2^{k-3}}$.

So this would give us a way to encode every string of length ${n+Mk}$ into strings of shorter lengths. But since for every length ${\ell}$, there are ${2^\ell}$ strings of length ${\ell}$ and ${1 + 2 + \ldots + 2^{\ell - 1} = 2^{\ell} - 1}$ strings of length strictly less than ${\ell}$, this is impossible. So this contradicts our assumption that there is no satisfying assignment.$\Box$

Now we can alter the proof to show that the expected running time of the algorithm is small:

Theorem 5
If ${\varphi}$ is a E${k}$-SAT formula with ${m}$ clauses, ${n}$ variables, and where the degree of each clause is at most ${d \le 2^{k-3}}$, then the algorithm FindSat finds a satisfying assignment in ${O(m \log m)}$ time.

Proof: Assume that we run for at least ${M + t}$ steps with probability at least ${1/2^s}$. (Again, think of ${M = m \log m}$.) Then for at least ${1/2^s}$ of the ${2^{n+(M+t)k}}$ strings, we compress each of these strings into strings of length ${(M+t)(\log d + 2) + n + m \log m}$.

But if we have any set of ${2^{n+(M+t)k} \cdot 2^{-s}}$ strings, we must use at least ${n + (M+t)k-s}$ bits to represent at least one of them. So

$\displaystyle n + (M+t)k - s \leq n + (M+t)(\log d + 2) + M.$

If ${d \leq 2^{k-3}}$, we have ${k - \log d - 2 \geq 1}$, and

$\displaystyle (M+t)(k - \log d - 2) - s \leq M$

or

$\displaystyle M+t-s \leq M \implies s \geq t.$

So we get that the probability of taking more than ${M+t}$ steps is at most ${1/2^t}$, which implies an expected running time of ${M + O(1)}$. $\Box$