# CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

## Lecture #15: Distance-preserving trees (part I)

1. Metric Spaces

A metric space is a set ${V}$ of points, with a distance function ${d: V \times V \rightarrow {\mathbb R}_{\geq 0}}$ that satisfies ${d(x,x) = 0}$ for all ${x \in V}$, symmetry (i.e., ${d(x,y) = d(y,x)}$), and the triangle inequality (i.e., ${d(x,y) + d(y,z) \geq d(x,z)}$ for all ${x,y,z \in V}$). Most of the computer science applications deal with finite metrics, and then ${n}$ denotes the number of points ${|V|}$.

There are many popular problems which are defined on metric spaces:

• The Traveling Salesman Problem (TSP): the input is a metric space, and the goal is to find a tour ${v_1, v_2, \ldots, v_n, v_{n+1} = v_0}$ on all the ${n}$ nodes whose total length ${\sum_{i = 1}^n d(v_i, v_{i+1})}$ is as small as possible. This problem is sometimes defined on non-metrics as well, but most of the time we consider the metric version.

The best approximation algorithm for the problem is a ${(3/2 - \epsilon)}$-approximation due to Oveis-Gharan, Saberi and Singh (2010). Their paper uses randomization to beat the ${3/2}$-approximation of Cristofides (1976), and make progress on this long-standing open problem. The best hardness result for this problem is something like ${1.005}$ due to Papadimitriou and Vempala.

• The ${k}$-Center/${k}$-Means/${k}$-median problems: the input is a metric space ${(V,d)}$, and the goal is to choose some ${k}$ positions ${F}$ from ${V}$ as “facilities”, to minimize some objective function. In ${k}$-center, we minimize ${\max_{v \in V} d(v, F)}$, the largest distance from any client to its closest facility; here, we define the distance from a point ${v}$ to a set ${S}$ as ${d(v,S) := \min_{s \in S} d(v,s)}$. In ${k}$-median, we minimize ${\sum_{v \in V} d(v, F)}$, the total (or equivalently, the average) distance from any client to its closest facility. In ${k}$-means, we minimize ${\sum_{v \in V} d(v, F)^2}$, the average squared distance from any client to its closest facility. (Note: to see why these problems are called what they are, consider what happens for the ${1}$-means/medians problem on the line.)The best algorithms for ${k}$-center give us a ${2}$-approximation, and this is the best possible unless P=NP. The best ${k}$-median algorithm gives an ${(3+\epsilon)}$-approximation, whereas the best hardness known for the version of the problem stated above is ${(1 + 1/e)}$ unless P=NP. For ${k}$-means, gap between the best algorithm and hardness results is worse for general metric spaces. For geometric spaces, better algorithms are known for ${k}$-means/medians.
• The ${k}$-server problem: this is a classic online problem, where the input is a metric space (given up-front); a sequence of requests ${\sigma_1, \sigma_2, \cdots}$ arrives online, each request being some point in the metric space. The algorithm maintains ${k}$ servers, one each at some ${k}$ positions in the metric space. When the request ${\sigma_t}$ arrives, one of the servers must be moved to ${\sigma_t}$ to serve the request. The cost incurred by the algorithm in this step is the distance moved by the server, and the total cost is the sum of these per-step costs. The goal is to give a strategy that minimizes the total cost of the algorithm.The best algorithm for ${k}$-server is a ${2k-1}$-competitive deterministic algorithm due to Koutsoupias and Papadimitriou. Since ${k}$-server contains paging as a special case (why?), no deterministic algorithm can do better than ${k}$-competitive. It is a long-standing open problem whether we can do better than ${2k-1}$CC deterministically—but far more interesting is the question of whether randomization can help beat ${2k-1}$; the best lower bound against oblivious adversaries is ${\Omega(\log k)}$, again from the paging problem.

1.1. Approximating Metrics by Trees: Attempt I

A special kind of metric space is a tree metric: here we are given a tree ${T = (V,E)}$ where each edge ${e \in E}$ has a length ${\ell_e}$. This defines a metric ${(V, d_T)}$, where the distance ${d_T(x,y)}$ is the length of the (unique) shortest path between ${x}$ and ${y}$, according to the edge lengths ${\ell_e}$. In general, given any graph ${G = (V,E)}$ with edge lengths, we get a metric ${(V, d_G)}$.

Tree metrics are especially nice because we can use the graph theoretic idea that it is “generated” by a tree to understand the structure of the metric better, and hence give better algorithms for problems on tree metrics. For instance:

• TSP on tree metrics can be solved exactly: just take an Euler tour of the points in the tree.
• ${k}$-median can be solved exactly on tree metrics using dynamic programming.
• ${k}$-server on trees admits a simple ${k}$-competitive deterministic algorithm.

So if all metrics spaces were well-approximable by trees (e.g., if there were some small factor ${\alpha}$ such that for every metric ${M = (V,d)}$ we could find a tree ${T}$ such that

$\displaystyle d(x,y) \leq d_T(x,y) \leq \alpha d(x,y) \ \ \ \ \ (1)$

for every ${x,y \in V}$, then we would have an ${\alpha}$-approximation for TSP and ${k}$-median, and an ${\alpha k}$-competitive algorithm for ${k}$-server on all metrics. Sadly, this is not the case: for the metric generated by the cycle graph ${C_n}$, the best factor we can get in~(1) is ${\alpha = n-1}$. This is what we would get if we just approximated the tree by a line.

So even for simple metrics like that generated by the cycle (on which we can solve these problems really easily), this approach hits a dead-end really fast. Pity.

1.2. Approximating Metrics by Trees: Attempt II

Here’s where randomization will come to our help: let’s illustrate the idea on a cycle. Suppose we delete a uniformly random edge of the cycle, we get a tree ${T}$ (in fact, a line). Note that the distances in the line are at least those in the cycle.

How much more? For two vertices ${x, y}$ adjacent in the cycle, the edge ${(x,y)}$ still exists in the tree with probability ${1 - 1/n}$, in which case ${d_T(x,y) = d(x,y)}$; else, with probability ${1/n}$, ${x}$ and ${y}$ lie at distance ${n-1}$ from each other. So the expected distance between the endpoints of an edge ${(x,y)}$ of the cycle is

$\displaystyle (1 - 1/n) \cdot 1 + 1/n \cdot n-1 = 2(1-1/n) \cdot d(x,y)$

And indeed, this also holds for any pair ${x, y\in V}$ (check!),

$\displaystyle E_T[ d_T(x,y) ] \leq 2(1-1/n) \cdot d(x,y)$

But is this any good for us?

Suppose we wanted to ${k}$-median on the cycle, and let ${F^*}$ be the optimal solution. For each ${x}$, let ${f_x^*}$ be the closest facility in ${F^*}$ to ${x}$; hence the cost of the solution is:

$\displaystyle OPT = \sum_{x \in V} d(x, f_x).$

By the expected stretch guarantee, we get that

$\displaystyle \sum_{x \in V} E_T[ d_T(x, f_x) ] < 2\, OPT.$

I.e., the expected cost of this solution ${F^*}$ on the random tree is at most ${2\, OPT}$. And hence, if ${OPT_T}$ is the cost of the optimal solution on ${T}$, we get

$\displaystyle E_T[ OPT_T ] \leq 2\, OPT$

Great—we know that the optimal solution on the random tree does not cost too much. And we know we can find the optimal solution on trees in poly-time.

Let’s say ${F_T \subseteq V}$ is the optimal solution for the tree ${T}$, where the closest facility in ${F_T}$ to ${x}$ is ${f_x^T}$, giving ${OPT_T = \sum_x d_T(x, f_x^T)}$. How does this solution ${F_T}$ perform back on the cycle? Well, each distance in the cycle is less than that in the tree ${T}$, so the expected cost of solution ${F_T}$ on the cycle will be

$\displaystyle E \left[ \sum_x d(x,f_x^T) \right] = \sum_x E[ d(x,f_x^T) ] \leq \sum_x E[ d_T(x, f_x^T) ] = E_T [ OPT_T ] \leq 2\, OPT .$

And we have a randomized ${2}$-approximation for ${k}$-median on the cycle!

1.3. Popping the Stack

To recap, here’s the algorithm: pick a random tree ${T}$ from some nice distribution. Find an optimal solution ${F_T}$ for the problem, using distances according to the tree ${T}$, and output this set as the solution for the original metric.

And what did we use to show this was a good solution? That we had a distribution over trees such that

• every tree in the distribution had distances no less than that in the original metric, and
• the expected tree distance between any pair ${x, y \in V}$ satisfies ${E_T[ d_T(x,y) ] \leq \alpha \cdot d(x,y)}$ for some small ${\alpha}$; here ${\alpha = 2}$.

And last but not least

• that the objective function was linear in the distances, and so we could use linearity of expectations.

Note that TSP, ${k}$-median, ${k}$-server, and many other metric problems have cost functions that are linear in the distances, so as long as the metrics we care about can be “embedded into random trees” with small ${\alpha}$, we can translate algorithms on trees for these problems into (randomized) algorithms for general metrics! This approach gets used all the time, and is worth remembering. (BTW, note that this general approach does not work for non-linear objective functions, like ${k}$-center, or ${k}$-means.)

But can we get a small ${\alpha}$ in general? In the next section, we show that for any ${n}$-point metric with aspect ratio ${\frac{\max d(x,y)}{\min d(x,y)} = \Delta}$, we can get ${\alpha = O(\log n \log \Delta)}$; and we indicate how to improve this to ${O(\log n)}$, which is the best possible!

2. Embeddings into Trees

In this section, we prove the following theorem using tree embeddings (and then, in the following section, we improve it further to ${O(\log n)}$).

Theorem 1 Given any metric ${(V,d)}$ with ${|V| = n}$ and aspect ratio ${\Delta}$, there exists a efficiently sampleable distribution ${\mathcal{D}}$ over spanning trees of ${V}$ such that for all ${u,v\in V}$:

1. For all ${T \in \textrm{Support}(\mathcal{D})}$, ${d_T(u,v) \geq d(u,v)}$, and
2. ${\mathop{\mathbb E}_{T\sim \mathcal{D}}[d_T(u,v)] \leq O(\log n \log \Delta) \; d(u,v)}$.

To prove this theorem, we will use the idea of a low diameter decomposition. Given a metric space ${(V, d)}$ on ${|V| = n}$ points and a parameter ${r \in {\mathbb R}_+}$, a (randomized) low-diameter decomposition is an efficiently sampleable probability distribution over partitions of ${V}$ into ${S_1 \uplus S_2 \uplus S_3 \uplus \dots \uplus S_t}$ such that

1. (Low Radius/Diameter) For all ${S_i}$, there exists ${c_i \in V}$ such that for all ${u \in S_i}$, ${d(c_i, u) \leq r/2}$. Hence, for any ${u, v \in S_i}$, ${d(u,v) \leq r}$.
2. (Low Cutting Probability) For each pair ${u,v}$, ${\Pr[u,v \text{ lie in different } S_i's ] \leq \beta \; \frac{d(u,v)}{r}}$ with ${\beta = O(\log n)}$.

We’ll show how to construct such a decomposition in the next section (next lecture), and use such a decomposition to prove Theorem 1.