Back from Boston! This is the final installment in my series about expanders. I’ll post a pdf containing the whole text in the next day or two. Thanks to everyone who’s contributed in the comments!
Today’s post explains how expander graphs can be used to reduce the number of random bits needed by a randomized algorithm in order to achieve a desired success probability. This post is the culmination of the series: we make use of the fact, proved in the last post, that random walks on an expander are exponentially unlikely to remain localized in any sufficiently large subset of vertices, a fact that relies in turn on the connection, developed in earlier posts, between the eigenavlue gap and the expansion parameter.
Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.
Reducing the number of random bits required by an algorithm
One surprising application of expanders is that they can be used to reduce the number of random bits needed by a randomized algorithm in order to achieve a desired success probability.
Suppose, for example, that we are trying to compute a function [tex]f(x)[/tex] that can take the values [tex]f(x) = 0[/tex] or [tex]f(x) = 1[/tex]. Suppose we have a randomized algorithm [tex]A(x,Y)[/tex] which takes as input [tex]x[/tex] and an [tex]m[/tex]-bit uniformly distributed random variable [tex]Y[/tex], and outputs either [tex]0[/tex] or [tex]1[/tex]. We assume that:
- [tex]f(x) = 0[/tex] implies [tex]A(x,Y) = 0[/tex] with certainty.
- [tex]f(x) = 1[/tex] implies [tex]A(x,Y) = 1[/tex] with probability at least [tex]1-p_f[/tex].
That is, [tex]p_f[/tex] is the maximal probability that the algorithm fails, in the case when [tex]f(x) = 1[/tex], but [tex]A(x,Y) = 0[/tex] is output by the algorithm.
An algorithm of this type is called a one-sided randomized algorithm, since it can only fail when [tex]f(x) = 1[/tex], not when [tex]f(x) = 0[/tex]. I won’t give any concrete examples of one-sided randomized algorithms here, but the reader unfamiliar with them should rest assured that they are useful and important – see, e.g., the book of Motwani and Raghavan (Cambridge University Press, 1995) for examples.
As an aside, the discussion of one-sided algorithms in this post can be extended to the case of randomized algorithms which can fail when either [tex]f(x) = 0[/tex] or [tex]f(x) = 1[/tex]. The details are a little more complicated, but the basic ideas are the same. This is described in Linial and Wigderson’s lecture notes. Alternately, extending the discussion to this case is a good problem.
How can we descrease the probability of failure for a one-sided randomized algoerithm? One obvious way of decreasing the failure probability is to run the algorithm [tex]k[/tex] times, computing [tex]A(x,Y_0),A(x,Y_1),\ldots,A(x,Y_{k-1})[/tex]. If we get [tex]A(x,Y_j) = 0[/tex] for all [tex]j[/tex] then we output [tex]0[/tex], while if [tex]A(x,Y_j) = 1[/tex] for at least one value of [tex]J[/tex], then we output [tex]f(x) = 1[/tex]. This algorithm makes use of [tex]km[/tex] bits, and reduces the failure probability to at most [tex]p_f^k[/tex].
Expanders can be used to substantially decrease the number of random bits required to achieve such a reduction in the failure probability. We define a new algorithm [tex]\tilde A[/tex] as follows. It requires a [tex]d[/tex]-regular expander graph [tex]G[/tex] whose vertex set [tex]V[/tex] contains [tex]2^m[/tex] vertices, each of which can represent a possible [tex]m[/tex]-bit input [tex]y[/tex] to [tex]A(x,y)[/tex]. The modified algorithm [tex]\tilde A[/tex] works as follows:
- Input [tex]x[/tex].
- Sample uniformly at random from [tex]V[/tex] to generate [tex]Y_0[/tex].
- Now do a [tex]k-1[/tex] step random walk on the expander, generating random variables [tex]Y_1,\ldots, Y_{k-1}[/tex].
- Compute [tex]A(x,Y_0),\ldots,A(x,Y_{k-1})[/tex]. If any of these are [tex]1[/tex], output [tex]1[/tex], otherwise output [tex]0[/tex].
We see that the basic idea of the algorithm is similar to the earlier proposal for running [tex]A(x,Y)[/tex] repeatedly, but the sequence of independent and uniformly distributed samples [tex]Y_0,\ldots,Y_{k-1}[/tex] is replaced by a random walk on the expander. The advantage of doing this is that only [tex]m+k \log(d)[/tex] random bits are required – [tex]m[/tex] to sample from the initial uniform distribution, and then [tex]\log(d)[/tex] for each step in the random walk. When [tex]d[/tex] is a small constant this is far fewer than the [tex]km[/tex] bits used when we simply repeatedly run the algorithm [tex]A(x,Y_j)[/tex] with uniform and independently generated random bits [tex]Y_j[/tex].
With what probability does this algorithm fail? Define [tex]B_x[/tex] to be the set of values of [tex]y[/tex] such that [tex]A(x,y) = 0[/tex], yet [tex]f(x) = 1[/tex]. This is the “bad” set, which we hope our algorithm will avoid. The algorithm will fail only if the steps in the random walk [tex]Y_0,Y_1,\ldots,Y_{k-1}[/tex] all fall within [tex]B_x[/tex]. From our earlier theorem we see that this occurs with probability at most:
[tex] \left( \frac{|B_x|}{2^m} + \frac{\lambda_2(G)}{d} \right)^{k-1}. [/tex]
But we know that [tex]|B_x|/2^m \leq p_f[/tex], and so the failure probability is at most
[tex] \left( p_f + \frac{\lambda_2(G)}{d} \right)^{k-1}. [/tex]
Thus, provided [tex]p_f+\lambda_2(G)/d < 1[/tex], we again get an exponential decrease in the failure probability as the number of repetitions [tex]k[/tex] is increased. Conclusion These notes have given a pretty basic introduction to expanders, and there's much we haven't covered. More detail and more applications can be found in the online notes of Linial and Wigderson, or in the research literature. Still, I hope that these notes have given some idea of why these families of graphs are useful, and of some of the powerful connections between graph theory, linear algebra, and random walks.
It’d be helpful to have the Mathcad analytics and sub-routines in addition to the numeric proofs for a full comparison. To be honest I lose tract of bounded or defined variables if they’re left unstated… my problem, not yours.