Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.
Today’s post digresses to explain one of the most interesting ideas in graph theory, namely, to describe a graph through its adjacency matrix. By describing a graph in terms of matrices we enable all the powerful tools of linear algebra to be brought to bear on graph theory, and we’ll see that there are all sorts of interesting connections between graphs and linear algebraic concepts such as eigenvalues. The connection to expanders will be explained in later posts, where we’ll see that the expansion parameter is connected to the second largest eigenvalue of the adjacency matrix. I believe (but have not checked) that it is this result which is used to establish that the graphs described in the previous post are actually expanders.
Graphs and their adjacency matrices
How can we prove that a family of graphs is an expander? Stated another way, how does the expansion parameter [tex]h(G)[/tex] vary as the graph [tex]G[/tex] is varied over all graphs in the family?
One way of tackling the problem of computing [tex]h(G)[/tex] is to do a brute force calculation of the ratio [tex]|\partial S|/|S|[/tex] for every subset [tex]S[/tex] of vertices containing no more than half the vertices in the graph. Doing this is a time-consuming task, since if there are [tex]n[/tex] vertices in the graph, then there are exponentially many such subsets [tex]S[/tex].
Problem: In general, how hard is it to find the subset [tex]S[/tex] minimizing [tex]|\partial S|/|S|[/tex]? Can we construct an NP-Complete variant of this problem? I don’t know the answer to this question, and I don’t know if anyone else does, either.
Fortunately, there is an extraordinarily beautiful approach to the problem of determining [tex]h(G)[/tex] which is far less computationally intensive. It involves the adjacency matrix [tex]A(G)[/tex] of the graph [tex]G[/tex]. By definition, the rows and columns of the adjacency matrix are labelled by the vertices of [tex]V[/tex]. For vertices [tex]v[/tex] and [tex]w[/tex] the entry [tex]A(G)_{vw}[/tex] is defined to be [tex]1[/tex] if [tex](v,w)[/tex] is an edge, and [tex]0[/tex] if it is not an edge.
It is a marvellous fact that properties of the eigenvalue spectrum of the adjacency matrix [tex]A(G)[/tex] can be used to understand properties of the graph [tex]G[/tex]. This occurs so frequently that we refer to the spectrum of [tex]A(G)[/tex] as the spectrum of the graph [tex]G[/tex]. It is useful because the eigenvalue spectrum can be computed quickly, and certain properties, such as the largest and smallest eigenvalue, the determinant and trace, can be computed extremely quickly.
More generally, by recasting graphs in terms of adjacency matrices, we open up the possibility of using tools from linear algebra to study the properties of graphs. Although we’re most interested in studying expanders, for the rest of this post I’m going to digress from the study of expanders, studying how the linear algebraic point of view can help us understand graphs, without worrying about how this connects to expanders. This digression is partially motivated by the fact that this is beautiful stuff (at least in my opinion), and is partially because our later discussion of expanders will be based on this linear algebraic point of view, and so it’s good to get comfortable with this point of view.
The following exercise provides a good example of how graph properties can be related to the eigenvalues of the graph.
Exercise: Prove that if two graphs are isomorphic, then they have the same spectrum.
This result is often useful in proving that two graphs are not isomorphic: simply compute their eigenvalues, and show that they are different. A useful extension of the exercise is to find an example of two graphs which have the same spectra, but which are not isomorphic.
Note that the adjacency matrix may be considered as a matrix over any field, and the result of the exercise is true over any field. (I’ve often wondered if the converse is true, but don’t know the answer.) Nonetheless, by and large, we’ll consider the adjacency matrix as a matrix over the field [tex]R[/tex] of real numbers. Assuming that [tex]G[/tex] is an undirected graph, we see that [tex]A(G)[/tex] is a real symmetric matrix, and thus can be diagonalized. We will find it convenient to write the eigenvalues of a graph [tex]G[/tex] in non-increasing order, as [tex]\lambda_1(G) \geq \lambda_2(G) \geq \ldots \geq \lambda_n(G)[/tex].
A fact we’ll make a lot of use of is that when [tex]G[/tex] is [tex]d[/tex]-regular the largest eigenvalue of [tex]G[/tex] is just [tex]d[/tex]. To see this, note that the vector [tex]\vec 1 \equiv (1,1,\ldots,1)[/tex] is an eigenvector of [tex]G[/tex] with eigenvalue [tex]d[/tex]. To prove that [tex]d[/tex] is the largest eigenvalue seems to be a little bit harder. We’ll just sketch a proof. To prove this it is sufficient to show that [tex]v^T A(G) v \leq d[/tex] for all normalized vectors [tex]v[/tex]. From the [tex]d[/tex]-regularity of [tex]G[/tex] it follows that [tex]A(G)/d[/tex] is a doubly stochastic matrix, i.e., has non-negative entries, and all rows and columns sum to one. A theorem of Birkhoff ensures that [tex]A(G)/d[/tex] can be written as a convex combination of permutation matrices, so [tex]A(G) = d \sum_j p_j P_j[/tex], where [tex]p_j[/tex] are probabilities, and the [tex]P_j[/tex] are permutation matrices. This gives [tex]v^T A(G) v = d \sum_j p_j v^T P_j v[/tex]. But [tex]v^T P_j v \leq 1[/tex] for any permutation [tex]P_j[/tex], which gives the desired result.
The following proposition gives another example of the relationships one can find between a graph and its spectrum.
Proposition: A [tex]d[/tex]-regular graph [tex]G[/tex] is connected if and only if [tex]\lambda_1(G) > \lambda_2(G)[/tex].
Proof: The easy direction is the reverse implication, for which we prove the contrapositive, namely, that a [tex]d[/tex]-regular disconnected graph has [tex]\lambda_1(G) = \lambda_2(G)[/tex]. This follows by breaking [tex]G[/tex] up into disconnected components [tex]G_1[/tex] and [tex]G_2[/tex], and observing that [tex]A(G) = A(G_1) \oplus A(G_2)[/tex], where [tex]\oplus[/tex] is the matrix direct sum. Since both [tex]G_1[/tex] and [tex]G_2[/tex] are [tex]d[/tex]-regular it follows that they both have maximal eigenvalue [tex]d[/tex], and so [tex]d[/tex] appears at least twice in the spectrum of [tex]A(G)[/tex].
At the moment, I don’t see an easy way of proving the forward implication. One not very satisfying proof is to observe that [tex]A(G)/d[/tex] is the Markov transition matrix for a random walk on the graph, and that since the graph is connected, the random walk must converge to a unique distribution, which implies that in the limit of large [tex]n[/tex] there can only be one vector [tex]v[/tex] such that [tex](G^n/d^n) v = v[/tex]. This means that [tex]G^n[/tex]’s largest eigenvalue is non-degenerate, from which it follows that [tex]G[/tex]’s largest eigenvalue is non-degenerate. This is a sketch, but it can all be established rigorously with a little work and the aid of well-known theorems on Markov chains.
The proof sketched in the previous paragraph is not really satisfactory, since it involves an appeal to theorems which are in some sense less elementary than the result under discussion. Another possibility which I’ve explored but haven’t made work with complete rigour is to investigate [tex]G^n/d^n[/tex] more explicitly. With a little thought one can prove that the entry [tex]G^n_{vw}[/tex] is just the number of paths between [tex]v[/tex] and [tex]w[/tex] of length [tex]n[/tex]. Since [tex]G[/tex] is connected, we’d expect in the limit of large [tex]n[/tex] this number would be dominated by a term which does not depend on [tex]w[/tex], and would just scale like the total number of paths of length [tex]n[/tex] starting at [tex]v[/tex] (which is [tex]d^n[/tex]), divided by the total number of possible destinations [tex]w[/tex], which is [tex]n[/tex], giving [tex]G^n_{vw}/d^n \rightarrow 1/n[/tex]. (This would only be true if [tex]G[/tex] has self-loops [tex](v,v)[/tex].) Of course, the matrix whose entries are all [tex]1/n[/tex] has a single eigenvalue [tex]1[/tex], with all the rest [tex]0[/tex], which would suffice to establish the theorem.
QED
Problem: How should we interpret the determinant of a graph? What about the trace?
Problem: If we consider [tex]A(G)[/tex] as a matrix over the field [tex]Z_2 = \{0,1\}[/tex], then it is possible to define a matrix sum [tex]G_1+G_2[/tex], whose adjacency matrix is just [tex]A(G_1)+A(G_2)[/tex], and a matrix product [tex]G_1 \times G_2[/tex] whose adjacency matrix is just [tex]A(G_1) A(G_2)[/tex]. Many questions naturally suggest themseves: (1) when is there an edge between [tex]v[/tex] and [tex]w[/tex] in [tex]G_1+G_2[/tex]; (2) when is there an edge between [tex]v[/tex] and [tex]w[/tex] in [tex]G_1 \times G_2[/tex] (these first two questions are easy to answer); (3) for which graphs is [tex]A(G)[/tex] invertible, and thus a natural inverse graph [tex]G^{-1}[/tex] exists; (4) how can we interpret the inverse graph; (5) when do two graphs commute?
Problem: Along similar lines to the previous problem, it’s possible to define a tensor product of graphs. What are the properties of the graph tensor product?
The ideas I’ve described in this post are examples of the important general principle that once you’ve defined a mathematical object, you should seek out alternate representations (or even just partial representations) of that object in terms of mathematical objects that you already understand. By recasting graphs as matrices, we open up the possibility of using all the tools of linear algebra to answer questions about graphs. This can work in one of two ways: we can ask a question about graphs, and try to see if it’s possible to give a linear algebraic answer, or we can ask what implication known results of linear algebra have for graphs – what does the Gaussian elimination procedure correspond to, or the spectral decomposition, or two matrices commuting, or the wedge product, or whatever. Exploring such connections has the potential to greatly enrich both subjects.
In this post we’ve started to get a feel for how the properties of graphs can be studied using linear algebra. In the next post we’ll turn our attention back to expanders, and understand how the expansion coefficient can be related to the gap [tex]\lambda_1(G)-\lambda_2(G)[/tex] between the largest and second largest eigenvalues of [tex]G[/tex].