CMU Randomized Algorithms
Randomized Algorithms, Carnegie Mellon: Spring 2011
Lecture #19: Martingales
1. Some definitions
Recall that a martingale is a sequence of r.v.s (denoted by
) if each
satisfies
, and
Somewhat more generally, given a sequence of random variables, a martingale with respect to
is another sequence of r.v.s
(denoted by
) if each
satisfies
-
,
- there exists functions
such that
, and
-
.
One can define things even more generally, but for the purposes of this course, let’s just proceed with this. If you’d like more details, check out, say, books by Grimmett and Stirzaker, or Durett, or many others.)
1.1. The Azuma-Hoeffding Inequality
Theorem 1 (Azuma-Hoeffding) If
is a martingale such that for each
,
. Then
(Apparently Bernstein had essentially figured this one out as well, in addition to the Chernoff-Hoeffding bounds, back in 1937.) The proof of this bound can be found in most texts, we’ll skip it here. BTW, if you just want the upper or lower tail, replace by
on the right hand side.
2. The Doob Martingale
Most often, the case we will be concerned with is where our entire space is defined by a sequence of random variables , where each
takes values in the set
. Moreover, we will be interested in some bounded function
, and will want to understand how
behaves, when
is drawn from the underlying distribution. (Very often these
‘s will be drawn from a “product distribution”—i.e., they will be independent of each other, but they need not be.) Specifically, we ask:
How concentrated is
around its mean
?
To this end, define for every , the random variable
(At this point, it is useful to remember the definition of a random variable as a function from the sample space to the reals: so this r.v. is also such a function, obtained by taking averages of
over parts of the sample space.)
How does the random variable behave? It’s just the constant
: the expected value of the function
given random settings for
through
. What about
? It is a function that depends only on its first variable, namely
—instead of averaging
over the entire sample space, we partition
according to value of the first variable, and average over each part in the partition. And
is a function of
, averages over the other variables. And so on to
, which is the same as the function
. So, as we go from
to
, the random variables
go from the constant function
to the function
.
Of course, we’re defining this for a reason: is a martingale with respect to
.
Lemma 2 For a bounded function
, the sequence
is a martingale with respect to
. (It’s called the Doob martingale for
.)
Proof: The first two properties of being a martingale with respect to
follow from
being bounded, and the definition of
itself. For the last property,
The first equaility is the definition of , the second from the fact that
for random variables
, and the last from the definition of
.
Assuming that was bounded was not necessary, one can work with weaker assumptions—see the texts for more details.
Before we continue on this thread, let us show some Doob martingales which arise in CS/Math-y applications.
- Throw
balls into
bins, and let
be some function of the load: the number of empty bins, the max load, the second-highly loaded bin, or some similar function. Let
, and
be the index of the bin into which ball
lands. For
,
is a martingale with respect to
.
- Consider the random graph
:
vertices, each of the
edges chosen independently with probability
. Let
be the chromatic number of the graph, the minimum number of colors to properly color the graph. There are two natural Doob martingales associated with this, depending on how we choose the variables
.
In the first one, let
be the
edge, and which gives us a martingle sequence of length
. This is called the edge-exposure martingale. For the second one, let
be the collection of edges going from the vertex
to vertices
: the new martingale has length
and is called the vertex exposure martingale.
- Consider a run of quicksort on a particular input: let
be the number of comparisons. Let
be the first pivot,
the second, etc. Then
is a Doob martingale with respect to
.
BTW, are these
‘s independent of each other? Naively, they might depend on the size of the current set, which makes it dependent on the past. One way you can make these independent is by letting these
‘s be, say, random independent permutations on all
elements, and when you want to choose the
pivot, pick the first element from the current set according to the permutation
. (Or, you could let
be a random independent real in
and use that to pick a random element from the current set, etc.)
- Suppose we have
red and
blue balls in a bin. We draw
balls without replacement from this bin: what is the number of red balls drawn? Let
be the indicator for whether the
ball is red, and let
is the number of red balls. Then
is a martingale with respect to
.
However, in this example, the
‘s are not independent. Nonetheless, the sequence is a Doob martingale. (As in the quicksort example, one can define it with respect to a different set of variables which are independent of each other.)
So yeah, if we want to study the concentration of around
, we can now apply Azuma-Hoeffding to the Doob martingale, which gives us the concentration of
(i.e.,
) around
(i.e.,
). Good, good.
Next step: to apply Azuma-Hoeffding to the Doob martingale , we need to bound
for all
. Which just says that if we can go from
to
in a “small” number of steps (
), and each time we’re not smoothing out “too agressively” (
), then
is concentrated about its mean.
2.1. Indepedence and Lipschitz-ness
One case when it’s easy to bound the ‘s is when the
‘s are independent of each other, and also
is not too sensitive in any coordinate—namely, changing any coordinate does not change the value of
by much. Let’s see this in detail.
Definition 3 Given values
, the function
is \underline{
-Lipschitz} if for all
and
, for all
and for all
, it holds that
If
for all
, then we just say
is
-Lipschitz.
Lemma 4 If
is
-Lipschitz and
are independent, then the Doob martingale of
with respect to
satisfies
Proof: Let us use to denote the sequence
, etc. Recall that
where the last equality is from independence. Similarly for . Hence
where the inequality is from the fact that changing the coordinate from
to
cannot change the function value by more than
, and that
.
Now applying Azuma-Hoeffding, we immediately get:
Corollary 5 (McDiarmid’s Inequality) If
is
-Lipschitz for each
, and
are independent, then
(Disclosure: I am cheating. McDiarmid’s inequality has better constants, the constant in the denominator moves to the numerator.) And armed with this inequality, we can give concentration results for some applications we mentioned above.
- For the
balls and
bins example, say
is the number of empty bins: hence
. Also, changing the location of the
ball changes
by at most
. So
is
-Lipschitz, and hence
Hence, whp,
.
- For the case where
is the chromatic number of a random graph
, and we define the edge-exposure martingale
, clearly
is
-Lipschitz. Hence
This is not very interesting, since the right hand side is
only when
—but the chromatic number itself lies in
, so we get almost no concentration at all.
Instead, we could use a vertex-exposure martingale, where at the
step we expose the vertex
and its edges going to vertices
. Even with respect to these variables, the function
is
-Lipschitz, and hence
And hence the chromatic number of the random graph
is concentrated to within
around its mean.
3. Concentration for Random Geometric TSP
McDiarmid’s inequality is convenient to use, but Lipschitz-ness often does not get us as far as we’d like (even with independence). Sometimes you need to bound directly to get the full power of Azuma-Hoeffding. Here’s one example:
Let
be
points picked independently and uniformly at random from the unit square
. Let
be the length of the shortest traveling salesman tour on
points. How closely is
concentrated around its mean
?
In the HW, you will show that ; in fact, one can pin down
up to the leading constant. (See the work of Rhee and others.)
3.1. Using McDiarmid: a weak first bound
Note that is
-Lipschitz. By Corollary 5 we get that
If we want the deviation probability to be , we would have to set
. Not so great, since this is pretty large compared to the expectation itself—we’d like a tighter bound.
3.2. So let’s be more careful: an improved bound
And in fact, we’ll get a better bound using the very same Doob martingale associated with
:
But instead of just using the -Lipschitzness of
, let us bound
better.
Before we prove this lemma, let us complete the concentration bound for TSP using this. Setting gives us
, and hence Azuma-Hoeffding gives:
So
Much better!
3.3. Some useful lemmas
To prove Lemma 6, we’ll need a simple geometric lemma:
Lemma 7 Let
. Pick
random points
from
, the expected distance of point
to its closest point in
is
.
Proof: Define the random variable . Hence,
exactly when
. For
, the area of
is at least
for some constant
.
Define . For some
, the chance that
points all miss this ball, and hence
is at most
Of course, for ,
. And hence
Secondly, here is another lemma about how the TSP behaves:
Lemma 8 For any set of
points,
, we get
Proof: Follows from the fact that , for any
.
3.4. Proving Lemma \ref
}
OK, now to the proof of Lemma 6. Recall that we want to bound ; since
is
-Lipschitz, we get
immediately. For the second bound of
, note that
where is a independent copy of the random variable
. Hence
Then, if we define the set and
, then we get
where the first inequality uses Lemma 8 and the second uses the fact that the minimum distance to a set only increses when the set gets smaller. But now we can invoke Lemma 7 to bound each of the terms by . This completes the proof of Lemma 6.
3.5. Some more about Geometric TSP
For constant dimension , one can consider the same problems in
: the expected TSP length is now
, and using similar arguments, you can show that devations of
have probability
.
The result we just proved was by Rhee and Talagrand, but it was not the last result about TSP concentration. Rhee and Talagrand subsequently improved this bound to the TSP has subgaussian tails!
We’ll show a proof of this using Talagrand’s inequality, in a later lecture.
If you’re interested in this line of research, here is a survey article by Michael Steele on concentration properties of optimization problems in Euclidean space, and another one by Alan Frieze and Joe Yukich on many aspects of probabilistic TSP.
4. Citations
As mentioned in a previous post, McDiarmid and Hayward use martingales to give extremely strong concentration results for QuickSort . The book by Dubhashi and Panconesi (preliminary version here) sketches this result, and also contains many other examples and extensions of the use of martingales.
Other resources for concentration using martingales: this survey by Colin McDiarmid, or this article by Fan Chung and Linyuan Lu.
Apart from giving us powerful concentration results, martingales and “stopping times” combine to give very surprising and powerful results: see this survey by Yuval Peres at SODA 2010, or these course notes by Yuval and Eyal Lubetzky.
Comments are closed.