Expander graphs V: random walks on expanders
Back to expanders again. Today’s post looks at a simple application of expanders, showing that a random walk on an expander graph is likely to quickly escape from any sufficiently small subset of vertices. Intuitively, of course, this result is not surprising, but the exact quantitative form of the result turns out to be extremely useful in the next post, which is about decreasing the number of random bits used by a randomized algorithm.
Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.
Random walks on expanders
Many applications of expanders involve doing a random walk on the expander. We start at some chosen vertex, and then repeatedly move to any one of the neighbours, each time choosing a neighbour uniformly at random, and independently of prior choices.
To describe this random walk, suppose at some given time we have a probability distribution describing the probability of being at any given vertex in the graph . We then apply one step of the random walk procedure described above, i.e., selecting a neighbour of the current vertex uniformly at random. The updated probability distribution is easily verified to be:
That is, the Markov transition matrix describing this random walk is just , i.e., up to a constant of proportionality the transition matrix is just the adjacency matrix. This relationship between the adjacency matrix and random walks opens up a whole new world of connections between graphs and Markov chains.
One of the most important connections is between the eigenvalues of Markov transition matrices and the rate at which the Markov chain converges to its stationary distribution. In particular, the following beautiful theorem tells us that when the uniform distribution is a stationary distribution for the chain, then the Markov chain converges to the uniform distribution exponentially quickly, at a rate determined by the second largest eigenvalue of .
Exercise: Show that if is a normal transition matrix for a Markov chain then .
Theorem: Suppose is a normal transition matrix for a Markov chain on states, with the uniform distribution as a stationary point, . Then for any starting distribution ,
where denotes the norm.
The normality condition in this theorem may appear a little surprising. The reason it’s there is to ensure that can be diagonalized. The theorem can be made to work for general , with the second largest eigenvalue replaced by the second largest singular value. However, in our situation is symmetric, and thus automatically normal, and we prefer the statement in terms of eigenvalues, since it allows us to make a connection to the expansion parameter of a graph. In particular, when we obtain:
Combining this with our earlier results connecting the gap to the expansion parameter, we deduce that
Thus, for a family of expander graphs, the rate of convergence of the Markov chain is exponentially fast in the number of time steps .
Exercise: Suppose is a transition matrix for a Markov chain. Show that the uniform distribution is a stationary point point for the chain, i.e., , if and only if is doubly stochastic, i.e., has non-zero entries, and all rows and columns of the matrix sum to .
Proof: We start by working with the norm . Since we have , and so:
A computation shows that is orthogonal to . But is an eigenvector of with the maximum eigenvalue, , and thus must lie in the span of the eigenspaces with eigenvalues . It follows that
where we used the fact that , easily established by observing that is convex in , and thus must be maximized at an extreme point in the space of probability distributions; the symmetry of ensures that without loss of generality we may take . To convert this into a result about the norm, we use the fact that in dimensions , and thus we obtain
which was the desired result. QED
What other properties do random walks on expanders have? We now prove another beautiful theorem which tells us that they “move around quickly”, in the sense that they are exponentially unlikely to stay for long within a given subset of vertices, , unless is very large.
More precisely, suppose is a subset of vertices, and we choose some vertex uniformly at random from the graph. Suppose we use as the starting point for a random walk, , where is the vertex after the th step. Let be the event that for all in the range . Then we will prove that:
Provided , we get the desired exponential decrease in probability. For a family of expander graphs it follows that there is some constant such that we get an exponential decrease for any such that . These results are special cases of the following more general theorem about Markov chains. Theorem: Let be uniformly distributed on states, and let be a time-homogeneous Markov chain with transition matrix . Suppose the uniform distribution is a stationary point of , i.e., . Let be a subset of the states, and let be the event that for all . Then Proof: The first step in the proof is to observe that: where the operator projects onto the vector space spanned by those basis vectors corresponding to elements of . This equation is not entirely obvious, and proving it is a good exercise for the reader. The next step is to prove that , where the norm here is the operator norm. We will do this below, but note first that once this is done, the result follows, for we have by the standard inequality relating and norms, and thus by definition of the operator norm, and finally where we used the assumed inequality for the operator norm, and the observation that . To prove the desired operator norm inequality, , suppose is a normalized state such that . Decompose , where is a normalized state orthogonal to . Since we must have . Furthermore, multiplying on the left by shows that . It follows that is maximized by choosing to be uniformly distributed over , from which it follows that . A little algebra shows that Applying , , , and gives which completes the proof. QED The results in today's post are elegant, but qualitatively unsurprising. (Of course, having elegant quantitative statements of results that are qualitatively clear is worthwhile in its own right!) In the next post we'll use these ideas to develop a genuinely surprising application of expanders, to reducing the number of random bits required by a probabilistic algorithm in order to achieve a desired success probability.