# Lecture #19: Martingales

**1. Some definitions **

Recall that a *martingale* is a sequence of r.v.s (denoted by ) if each satisfies , and

Somewhat more generally, given a sequence of random variables, a *martingale with respect to * is another sequence of r.v.s (denoted by ) if each satisfies

- ,
- there exists functions such that , and
- .

One can define things even more generally, but for the purposes of this course, let’s just proceed with this. If you’d like more details, check out, say, books by Grimmett and Stirzaker, or Durett, or many others.)

** 1.1. The Azuma-Hoeffding Inequality **

Theorem 1 (Azuma-Hoeffding)If is a martingale such that for each , . Then

(Apparently Bernstein had essentially figured this one out as well, in addition to the Chernoff-Hoeffding bounds, back in 1937.) The proof of this bound can be found in most texts, we’ll skip it here. BTW, if you just want the upper or lower tail, replace by on the right hand side.

**2. The Doob Martingale **

Most often, the case we will be concerned with is where our entire space is defined by a sequence of random variables , where each takes values in the set . Moreover, we will be interested in some *bounded* function , and will want to understand how behaves, when is drawn from the underlying distribution. (Very often these ‘s will be drawn from a “product distribution”—i.e., they will be independent of each other, but they need not be.) Specifically, we ask:

How concentrated is around its mean ?

To this end, define for every , the random variable

(At this point, it is useful to remember the definition of a random variable as a function from the sample space to the reals: so this r.v. is also such a function, obtained by taking averages of over parts of the sample space.)

How does the random variable behave? It’s just the constant : the expected value of the function given random settings for through . What about ? It is a function that depends only on its first variable, namely —instead of averaging over the entire sample space, we partition according to value of the first variable, and average over each part in the partition. And is a function of , averages over the other variables. And so on to , which is the same as the function . So, as we go from to , the random variables go from the constant function to the function .

Of course, we’re defining this for a reason: is a martingale with respect to .

Lemma 2For a bounded function , the sequence is a martingale with respect to . (It’s called theDoob martingalefor .)

*Proof:* The first two properties of being a martingale with respect to follow from being bounded, and the definition of itself. For the last property,

The first equaility is the definition of , the second from the fact that for random variables , and the last from the definition of .

Assuming that was bounded was not necessary, one can work with weaker assumptions—see the texts for more details.

Before we continue on this thread, let us show some Doob martingales which arise in CS/Math-y applications.

- Throw balls into bins, and let be some function of the load: the number of empty bins, the max load, the second-highly loaded bin, or some similar function. Let , and be the index of the bin into which ball lands. For , is a martingale with respect to .
- Consider the random graph : vertices, each of the edges chosen independently with probability . Let be the chromatic number of the graph, the minimum number of colors to properly color the graph. There are two natural Doob martingales associated with this, depending on how we choose the variables .
In the first one, let be the edge, and which gives us a martingle sequence of length . This is called the

*edge-exposure martingale*. For the second one, let be the collection of edges going from the vertex to vertices : the new martingale has length and is called the*vertex exposure*martingale. - Consider a run of quicksort on a particular input: let be the number of comparisons. Let be the first pivot, the second, etc. Then is a Doob martingale with respect to .
BTW, are these ‘s independent of each other? Naively, they might depend on the size of the current set, which makes it dependent on the past. One way you can make these independent is by letting these ‘s be, say, random independent permutations on all elements, and when you want to choose the pivot, pick the first element from the current set according to the permutation . (Or, you could let be a random independent real in and use that to pick a random element from the current set, etc.)

- Suppose we have red and blue balls in a bin. We draw balls
*without replacement*from this bin: what is the number of red balls drawn? Let be the indicator for whether the ball is red, and let is the number of red balls. Then is a martingale with respect to .However, in this example, the ‘s are not independent. Nonetheless, the sequence is a Doob martingale. (As in the quicksort example, one can define it with respect to a different set of variables which are independent of each other.)

So yeah, if we want to study the concentration of around , we can now apply Azuma-Hoeffding to the Doob martingale, which gives us the concentration of (i.e., ) around (i.e., ). Good, good.

Next step: to apply Azuma-Hoeffding to the Doob martingale , we need to bound for all . Which just says that if we can go from to in a “small” number of steps (), and each time we’re not smoothing out “too agressively” (), then is concentrated about its mean.

** 2.1. Indepedence and Lipschitz-ness **

One case when it’s easy to bound the ‘s is when the ‘s are independent of each other, and also is not too sensitive in any coordinate—namely, changing any coordinate does not change the value of by much. Let’s see this in detail.

Definition 3Given values , the function is \underline{-Lipschitz} if for all and , for all and for all , it holds that

If for all , then we just say is -Lipschitz.

Lemma 4If is -Lipschitz and are independent, then the Doob martingale of with respect to satisfies

*Proof:* Let us use to denote the sequence , etc. Recall that

where the last equality is from independence. Similarly for . Hence

where the inequality is from the fact that changing the coordinate from to cannot change the function value by more than , and that .

Now applying Azuma-Hoeffding, we immediately get:

Corollary 5 (McDiarmid’s Inequality)If is -Lipschitz for each , and are independent, then

(Disclosure: I am cheating. McDiarmid’s inequality has better constants, the constant in the denominator moves to the numerator.) And armed with this inequality, we can give concentration results for some applications we mentioned above.

- For the balls and bins example, say is the number of empty bins: hence . Also, changing the location of the ball changes by at most . So is -Lipschitz, and hence
Hence, whp, .

- For the case where is the chromatic number of a random graph , and we define the edge-exposure martingale , clearly is -Lipschitz. Hence
This is not very interesting, since the right hand side is only when —but the chromatic number itself lies in , so we get almost no concentration at all.

Instead, we could use a vertex-exposure martingale, where at the step we expose the vertex and its edges going to vertices . Even with respect to these variables, the function is -Lipschitz, and hence

And hence the chromatic number of the random graph is concentrated to within around its mean.

**3. Concentration for Random Geometric TSP **

McDiarmid’s inequality is convenient to use, but Lipschitz-ness often does not get us as far as we’d like (even with independence). Sometimes you need to bound directly to get the full power of Azuma-Hoeffding. Here’s one example:

Let be points picked independently and uniformly at random from the unit square . Let be the length of the shortest traveling salesman tour on points. How closely is concentrated around its mean ?

In the HW, you will show that ; in fact, one can pin down up to the leading constant. (See the work of Rhee and others.)

** 3.1. Using McDiarmid: a weak first bound **

Note that is -Lipschitz. By Corollary 5 we get that

If we want the deviation probability to be , we would have to set . Not so great, since this is pretty large compared to the expectation itself—we’d like a tighter bound.

** 3.2. So let’s be more careful: an improved bound **

And in fact, we’ll get a better bound using the very same Doob martingale associated with :

But instead of just using the -Lipschitzness of , let us bound better.

Before we prove this lemma, let us complete the concentration bound for TSP using this. Setting gives us , and hence Azuma-Hoeffding gives:

So

Much better!

** 3.3. Some useful lemmas **

To prove Lemma 6, we’ll need a simple geometric lemma:

Lemma 7Let . Pick random points from , the expected distance of point to its closest point in is .

*Proof:* Define the random variable . Hence, exactly when . For , the area of is at least for some constant .

Define . For some , the chance that points all miss this ball, and hence is at most

Of course, for , . And hence

Secondly, here is another lemma about how the TSP behaves:

Lemma 8For any set of points, , we get

*Proof:* Follows from the fact that , for any .

** 3.4. Proving Lemma \ref **

}

OK, now to the proof of Lemma 6. Recall that we want to bound ; since is -Lipschitz, we get immediately. For the second bound of , note that

where is a independent copy of the random variable . Hence

Then, if we define the set and , then we get

where the first inequality uses Lemma 8 and the second uses the fact that the minimum distance to a set only increses when the set gets smaller. But now we can invoke Lemma 7 to bound each of the terms by . This completes the proof of Lemma 6.

** 3.5. Some more about Geometric TSP **

For constant dimension , one can consider the same problems in : the expected TSP length is now , and using similar arguments, you can show that devations of have probability .

The result we just proved was by Rhee and Talagrand, but it was not the last result about TSP concentration. Rhee and Talagrand subsequently improved this bound to the TSP has subgaussian tails!

We’ll show a proof of this using Talagrand’s inequality, in a later lecture.

If you’re interested in this line of research, here is a survey article by Michael Steele on concentration properties of optimization problems in Euclidean space, and another one by Alan Frieze and Joe Yukich on many aspects of probabilistic TSP.

**4. Citations **

As mentioned in a previous post, McDiarmid and Hayward use martingales to give extremely strong concentration results for QuickSort . The book by Dubhashi and Panconesi (preliminary version here) sketches this result, and also contains many other examples and extensions of the use of martingales.

Other resources for concentration using martingales: this survey by Colin McDiarmid, or this article by Fan Chung and Linyuan Lu.

Apart from giving us powerful concentration results, martingales and “stopping times” combine to give very surprising and powerful results: see this survey by Yuval Peres at SODA 2010, or these course notes by Yuval and Eyal Lubetzky.