Fermions and the Jordan-Wigner transform III: Consequences of the Canonical Commutation Relations

Back to Fermions again! In today’s post we’ll show that the CCRs uniquely determine the action of the fermionic operators [tex]a_j[/tex], up to a choice of basis. Mathematically, the argument is somewhat detailed, but it’s also the kind of argument that rewards detailed study, especially if studied in conjunction with related topics, such as the representation theory of [tex]su(2)[/tex]. You’ll need to look elsewhere for that, however!

Note: This post is one in a series describing fermi algebras, and a powerful tool known as the Jordan-Wigner transform, which allows one to move back and forth between describing a system as a collection of qubits, and as a collection of fermions. The posts assume familiarity with elementary quantum mechanics, comfort with elementary linear algebra (but not advanced techniques), and a little familiarity with the basic nomenclature of quantum information science (qubits, the Pauli matrices).

Consequences of the fermionic CCRs

We will assume that the vector space [tex]V[/tex] is finite dimensional, and that there are [tex]n[/tex] operators [tex]a_1,\ldots,a_n[/tex] acting on [tex]V[/tex] and satisfying the Fermionic CCRs. At the end of this paragraph we’re going to give a broad outline of the steps we go through. Upon a first read, some of these steps may appear a little mysterious to the reader not familiar with representation theory. In particular, please don’t worry if you get a little stuck in your understanding of the outline at some points, as the exposition is very much at the bird’s-eye level, and not all detail is visible at that level. Nonetheless, the reason for including this broad outline is the belief that repeated study will pay substantial dividends, if it is read in conjunction with the more detailed exposition to follow, or similar material on, e.g., representations of the Lie algebra [tex]su(2)[/tex]. Indeed, the advantage of operating at the bird’s-eye level is that it makes it easier to see the connections between these ideas, and the use of similar ideas in other branches of representation theory.

  • We’ll start by showing that the operators [tex]a_j^\dagger a_j[/tex] are positive Hermitian operators with eigenvalues [tex]0[/tex] and [tex]1[/tex].
  • We’ll show that [tex]a_j[/tex] acts as a lowering operator for [tex]a_j^\dagger a_j[/tex], in the sense that if [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex], then [tex]a_j |\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. If [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex], then [tex]a_j |\psi\rangle[/tex] vanishes.
  • Similarly, [tex]a_j^\dagger[/tex] acts as a raising operator for [tex]a_j^\dagger a_j[/tex], in the sense that if [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex], then [tex]a_j^\dagger |\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex]. If [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex], then [tex]a_j^\dagger |\psi\rangle[/tex] vanishes.
  • We prove that the operators [tex]a_j^\dagger a_j[/tex] form a mutually commuting set of Hermitian matrices, and thus there exists a state [tex]|\psi\rangle[/tex] which is a simultaneous eigenstate of [tex]a_j^\dagger a_j[/tex] for all values [tex]j=1,\ldots,n[/tex].
  • By raising and lowering the state [tex]|\psi\rangle[/tex] in all possible combinations, we’ll construct a set of [tex]2^n[/tex] orthonormal states which are simultaneous eigenstates of the [tex]a_j^\dagger a_j[/tex]. The corresponding vector of eigenvalues uniquely labels each state in this orthonormal basis.
  • Suppose the vector space spanned by these [tex]2^n[/tex] simultaneous eigenstates is [tex]W[/tex]. At this point, we know that [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] map [tex]W[/tex] into [tex]W[/tex], and, indeed, we know everything about the action [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] have on [tex]W[/tex].
  • Suppose we define [tex]W_\perp[/tex] to be the orthocomplement of [tex]W[/tex] in [tex]V[/tex]. Then we’ll show that the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] map [tex]W_\perp[/tex] into itself, and their restrictions to [tex]W_\perp[/tex] satisfy the Fermionic CCRs. We can then repeat the above procedure, and identify a [tex]2^n[/tex]-dimensional subspace of [tex]W_\perp[/tex] on which we know the action of the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] exactly.
  • We iterate this procedure until [tex]W_\perp[/tex] is the trivial vector space, at which point it is no longer possible to continue. At this point we have established an orthonormal basis for the whole of [tex]V[/tex] with respect to which we can explicilty write down the action of both [tex]a_j[/tex] and [tex]a_j^\dagger[/tex].

Let’s go through each of these steps in more detail.

The [tex]a_j^\dagger a_j[/tex] are positive Hermitian with eigenvalues [tex]0[/tex] and [tex]1[/tex]: Observe that the [tex]a_j^\dagger a_j[/tex] are positive (and thus Hermitian) matrices. We will show that [tex](a_j^\dagger a_j)^2 = a_j^\dagger a_j[/tex], and thus the eigenvalues of [tex]a_j^\dagger a_j[/tex] are all [tex]0[/tex] or [tex]1[/tex].

To see this, observe that [tex](a_j^\dagger a_j)^2 = a_j^\dagger a_j a_j^\dagger a_j = -(a_j^\dagger)^2 a_j^2 + a_j^\dagger a_j[/tex], where we used the CCR [tex]\{a_j,a_j^\dagger \} = I[/tex]. Note also that [tex]a_j^2 = 0[/tex] by the CCR [tex]\{a_j,a_j\} = 0[/tex]. It follows that [tex](a_j^\dagger a_j)^2 = a_j^\dagger a_j[/tex], as claimed.

The [tex]a_j[/tex] are lowering operators: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex]. Then we claim that [tex]a_j|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. To see that [tex]a_j|\psi\rangle[/tex] is normalized, note that [tex]\langle \psi|a_j^\dagger a_j |\psi\rangle = \langle \psi|\psi\rangle = 1[/tex], where we used the fact that [tex]|\psi\rangle[/tex] is an eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex] to establish the first equality. To see that it has eigenvalue [tex]0[/tex], note that [tex]a_j^\dagger a_j a_j|\psi\rangle = 0[/tex], since [tex]\{ a_j,a_j \} = 0[/tex].

Exercise: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. Show that [tex]a_j |\psi\rangle = 0[/tex].

The [tex]a_j[/tex] are raising operators: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. Then we claimed that [tex]a_j^\dagger|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex].

To see the normalization, we use the CCR [tex]\{ a_j,a_j^\dagger \} = I[/tex] to deduce [tex]\langle \psi|a_j a_j^\dagger |\psi\rangle = -\langle \psi|a_j^\dagger a_j|\psi\rangle + \langle \psi|\psi\rangle[/tex]. But [tex]a_j^\dagger a_j|\psi\rangle = 0[/tex], by the eigenvalue assumption, and [tex]\langle \psi|\psi\rangle = 1[/tex], whence [tex]\langle \psi|a_j a_j^\dagger |\psi\rangle = 1[/tex], which is the desired normalization condition.

To see that [tex]a_j^\dagger |\psi\rangle[/tex] is an eigenstate with eigenvalue [tex]1[/tex], use the CCR [tex]\{a_j,a_j^\dagger \} = I[/tex] to deduce that [tex]a_j^\dagger a_j a_j^\dagger |\psi\rangle = – a_j^\dagger a_j^\dagger a_j|\psi\rangle + a_j^\dagger|\psi\rangle = a_j^\dagger |\psi\rangle[/tex], where the final equality can be deduced either from the assumption that [tex]a_j^\dagger a_j |\psi\rangle = 0[/tex], or from the CCR [tex]\{ a_j^\dagger,a_j^\dagger \} = 0[/tex]. This is the desired eigenvalue equation for [tex]a_j^\dagger|\psi\rangle[/tex].

Exercise: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex]. Show that [tex]a_j^\dagger |\psi\rangle = 0[/tex].

The [tex]a_j^\dagger a_j[/tex] form a mutually commuting set of observables: To see this, let [tex]j \neq k[/tex], and apply the CCRs repeatedly to obtain [tex]a_j^\dagger a_j a_k^\dagger a_k = a_k^\dagger a_k a_j^\dagger a_j[/tex], which is the desired commutativity.

Existence of a common eigenstate: It is well known that a mutually commuting set of Hermitian operators possesses a common eigenbasis. This fact is usually taught in undergraduate quantum mechanics courses; for completeness, I’ve included a simple proof in an appendix to these notes. We won’t make use of the full power of this result here, but instead simply use the fact that there exists a normalized state [tex]|\psi\rangle[/tex] which is a simultaneous eigenstate of all the [tex]a_j^\dagger a_j[/tex] operators. In particular, for all [tex]j[/tex] we have:

[tex] a_j^\dagger a_j |\psi\rangle = \alpha_j |\psi\rangle, [/tex]

where for each [tex]j[/tex] either [tex]\alpha_j = 0[/tex] or [tex]\alpha_j = 1[/tex]. It will be convenient to assume that [tex]\alpha_j = 0[/tex] for all [tex]j[/tex]. This assumption can be made without loss of generality, by applying lowering operators to the [tex]|\psi\rangle[/tex] for each [tex]j[/tex] such that [tex]\alpha_j = 1[/tex], resulting in a normalized state [tex]|\mbox{vac}\rangle[/tex] such that [tex]a_j^\dagger a_j |\mbox{vac}\rangle = 0[/tex] for all [tex]j[/tex].

Defining an orthonormal basis: For any vector [tex]\alpha = (\alpha_1,\ldots,\alpha_n)[/tex], where each [tex]\alpha_j = 0[/tex] or [tex]1[/tex], define a corresponding state:

[tex] |\alpha \rangle \equiv (a_1^\dagger)^{\alpha_1} \ldots (a_n^\dagger)^{\alpha_n}|\mbox{vac}\rangle. [/tex]

It is clear that there are [tex]2^n[/tex] such states [tex]|\alpha\rangle[/tex], and that they form an orthonormal set spanning a subspace of [tex]V[/tex] that we shall call [tex]W[/tex].

The action of the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] on [tex]W[/tex]: How do [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] act on [tex]W[/tex]? Stated another way, how do they act on the orthonormal basis we have constructed for [tex]W[/tex], the states [tex]|\alpha\rangle[/tex]? Applying the CCRs and the definition of the states [tex]|\alpha\rangle[/tex] it is easy to verify that the action of [tex]a_j[/tex] is as follows:

  • Suppose [tex]\alpha_j = 0[/tex]. Then [tex]a_j|\alpha\rangle = 0[/tex].
  • Suppose [tex]\alpha_j = 1[/tex]. Let [tex]\tilde \alpha[/tex] be that vector which results when the [tex]j[/tex]th entry of [tex]\alpha[/tex] is changed to [tex]0[/tex]. Then [tex]a_j|\alpha\rangle = -(-1)^{s_\alpha^j} |\tilde \alpha\rangle[/tex], where [tex]s_\alpha^j \equiv \sum_{k=1}^{j-1} \alpha_k[/tex].

The action of [tex]a_j^\dagger[/tex] on [tex]W[/tex] is similar:

  • Suppose [tex]\alpha_j = 0[/tex]. Let [tex]\tilde \alpha[/tex] be that vector which results when the [tex]j[/tex]th entry of [tex]\alpha[/tex] is changed to [tex]1[/tex]. Then [tex]a_j^\dagger|\alpha\rangle = -(-1)^{s_\alpha^j}|\tilde \alpha\rangle[/tex], where [tex]s_\alpha^j \equiv \sum_{k=1}^{j-1} \alpha_k[/tex].
  • Suppose [tex]\alpha_j = 1[/tex]. Then [tex]a_j^\dagger |\alpha\rangle = 0[/tex].

Action of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] on [tex]W_\perp[/tex]: We have described the action of the [tex]a_j[/tex] and the [tex]a_j^\dagger[/tex] on the subspace [tex]W[/tex]. What of the action of these operators on the remainder of [tex]V[/tex]? To answer that question, we first show that [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] map the orthocomplement [tex]W_\perp[/tex] into itself.

To see this, let [tex]|\psi\rangle \in W_\perp[/tex], and consider [tex]a_j|\psi\rangle[/tex]. We wish to show that [tex]a_j|\psi\rangle \in W_\perp[/tex] also, i.e., that for any [tex]|\phi\rangle \in W[/tex] we have [tex]\langle \phi|a_j|\psi\rangle = 0[/tex]. This follows easily by considering the complex conjugate quantity [tex]\langle \psi|a_j^\dagger |\phi\rangle[/tex], and observing that [tex]a_j^\dagger |\phi\rangle \in W[/tex], since [tex]|\phi\rangle \in W[/tex], and thus [tex]\langle \psi|a_j^\dagger |\phi\rangle = 0[/tex]. A similar argument shows that [tex]a_j^\dagger[/tex] maps [tex]W_\perp[/tex] into itself.

Consider now the operators [tex]\tilde a_j[/tex] obtained by restricting [tex]a_j[/tex] to [tex]W_\perp[/tex]. Provided [tex]W_\perp[/tex] is nontrivial it is clear that these operators satisfy the CCRs on [tex]W_\perp[/tex]. Repeating the above argument, we can therefore identify a [tex]2^n[/tex]-dimensional subspace of [tex]W_\perp[/tex] on which we can compute the action of [tex]\tilde a_j[/tex] and [tex]\tilde a_j^\dagger[/tex], and thus of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex].

We may iterate this procedure many times, but the fact that [tex]V[/tex] is finite dimensional means that the process must eventually terminate. At the point of termination we will have broken up [tex]V[/tex] as a direct sum of some finite number [tex]d[/tex] of orthonormal [tex]2^n[/tex]-dimensional vector spaces, [tex]W_1,W_2,\ldots,W_d[/tex], and on each vector space we will have an orthonormal basis with respect to which the action of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] is known precisely.

Stated another way, we can introduce an orthonormal basis [tex]|\alpha,k\rangle[/tex] for [tex]V[/tex], where [tex]\alpha[/tex] runs over all [tex]n[/tex]-bit vectors, and [tex]k = 1,\ldots,d[/tex], and such that the action of the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] is to leave [tex]k[/tex] invariant, and to act on [tex]|\alpha\rangle[/tex] as described above. In this representation it is clear that [tex]V[/tex] can be regarded as a tensor product [tex]C^{2^n} \otimes C^d[/tex], with the action of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] trivial on the [tex]C^d[/tex] component. We will call this the occupation number representation for the Fermi algebra [tex]a_j[/tex].

It’s worth pausing to appreciate what has been achieved here: starting only from the CCRs for [tex]a_1,\ldots,a_n[/tex] we have proved that [tex]V[/tex] can be broken down into a tensor product of a [tex]2^n[/tex]-dimensional vector space and a [tex]d[/tex]-dimensional vector space, with the [tex]a_j[/tex]s acting nontrivially only on the [tex]2^n[/tex]-dimensional component. Furthermore, the action of the [tex]a_j[/tex]s is completely known. I think it’s quite remarkable that we can say so much: at the outset it wasn’t even obvious that the dimension of [tex]V[/tex] should be a multiple of [tex]2^n[/tex]!

When [tex]d=1[/tex] we will call this the fundamental representation for the Fermionic CCRs. (This is the terminology I use, but I don’t know if it is standard or not.) Up to a change of basis it is clear that all other representations can be obtained by taking a tensor product of the fundamental representation with the trivial action on a [tex]d[/tex]-dimensional vector space.

We’ve now understood the fundamental mathematical fact about fermions: the mere existence of operators satisfying Fermionic canonical commutation relations completely determines the action of those operators with respect to some suitable orthonormal occuptation number basis. That’s a very strong statement! In the next post we’ll use this technology to study a problem of direct physical interest: finding the energy spectrum and eigenstates of a Hamiltonian which is quadratic in terms of Fermi operators.

Published
Categorized as General

Expander graphs V: random walks on expanders

Back to expanders again. Today’s post looks at a simple application of expanders, showing that a random walk on an expander graph is likely to quickly escape from any sufficiently small subset of vertices. Intuitively, of course, this result is not surprising, but the exact quantitative form of the result turns out to be extremely useful in the next post, which is about decreasing the number of random bits used by a randomized algorithm.

Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.

Random walks on expanders

Many applications of expanders involve doing a random walk on the expander. We start at some chosen vertex, and then repeatedly move to any one of the [tex]d[/tex] neighbours, each time choosing a neighbour uniformly at random, and independently of prior choices.

To describe this random walk, suppose at some given time we have a probability distribution [tex]p[/tex] describing the probability of being at any given vertex in the graph [tex]G[/tex]. We then apply one step of the random walk procedure described above, i.e., selecting a neighbour of the current vertex uniformly at random. The updated probability distribution is easily verified to be:

[tex] \tilde p = \frac{A(G)}{d} p. [/tex]

That is, the Markov transition matrix describing this random walk is just [tex]\hat A(G) \equiv A(G)/d[/tex], i.e., up to a constant of proportionality the transition matrix is just the adjacency matrix. This relationship between the adjacency matrix and random walks opens up a whole new world of connections between graphs and Markov chains.

One of the most important connections is between the eigenvalues of Markov transition matrices and the rate at which the Markov chain converges to its stationary distribution. In particular, the following beautiful theorem tells us that when the uniform distribution is a stationary distribution for the chain, then the Markov chain converges to the uniform distribution exponentially quickly, at a rate determined by the second largest eigenvalue of [tex]M[/tex].

Exercise: Show that if [tex]M[/tex] is a normal transition matrix for a Markov chain then [tex]1 = \lambda_1(M) \geq \lambda_2(M) \geq \ldots[/tex].

Theorem: Suppose [tex]M[/tex] is a normal transition matrix for a Markov chain on [tex]n[/tex] states, with the uniform distribution [tex]u = \vec 1/n[/tex] as a stationary point, [tex]M u = u[/tex]. Then for any starting distribution [tex]p[/tex],

[tex] \| M^t p – u \|_1 \leq \sqrt{n} \lambda_2(M)^t, [/tex]

where [tex]\| \cdot \|_1[/tex] denotes the [tex]l_1[/tex] norm.

The normality condition in this theorem may appear a little surprising. The reason it’s there is to ensure that [tex]M[/tex] can be diagonalized. The theorem can be made to work for general [tex]M[/tex], with the second largest eigenvalue replaced by the second largest singular value. However, in our situation [tex]M[/tex] is symmetric, and thus automatically normal, and we prefer the statement in terms of eigenvalues, since it allows us to make a connection to the expansion parameter of a graph. In particular, when [tex]M = \hat A(G)[/tex] we obtain:

[tex] \| \hat A(G)^t p-u\|_1 \leq \sqrt{n} \left(\frac{\lambda_2(G)}{d}\right)^t. [/tex]

Combining this with our earlier results connecting the gap to the expansion parameter, we deduce that

[tex] \| \hat A(G)^t p-u\|_1 \leq \sqrt{n} \left(1-\frac{h(G)^2}{2d^2}\right)^t. [/tex]

Thus, for a family of expander graphs, the rate of convergence of the Markov chain is exponentially fast in the number of time steps [tex]t[/tex].

Exercise: Suppose [tex]M[/tex] is a transition matrix for a Markov chain. Show that the uniform distribution [tex]u[/tex] is a stationary point point for the chain, i.e., [tex]Mu = u[/tex], if and only if [tex]M[/tex] is doubly stochastic, i.e., has non-zero entries, and all rows and columns of the matrix sum to [tex]1[/tex].

Proof: We start by working with the [tex]l_2[/tex] norm [tex]\| \cdot \|_2[/tex]. Since [tex]Mu = u[/tex] we have [tex]M^t u = u[/tex], and so:

[tex] \|M^t p – u \|_2 = \|M^t(p-u) \|_2. [/tex]

A computation shows that [tex]p-u[/tex] is orthogonal to [tex]u[/tex]. But [tex]u[/tex] is an eigenvector of [tex]M[/tex] with the maximum eigenvalue, [tex]1[/tex], and thus [tex]p-u[/tex] must lie in the span of the eigenspaces with eigenvalues [tex]\lambda_2(M),\lambda_3(M),\ldots[/tex]. It follows that

[tex] \|M^t(p-u)\|_2 \leq \lambda_2(M)^t \|p-u\|_2 \leq \lambda_2(M)^t, [/tex]

where we used the fact that [tex]\| p-u\|_2 \leq 1[/tex], easily established by observing that [tex]\|p-u\|_2[/tex] is convex in [tex]p[/tex], and thus must be maximized at an extreme point in the space of probability distributions; the symmetry of [tex]u[/tex] ensures that without loss of generality we may take [tex]p = (1,0,\ldots,0)[/tex]. To convert this into a result about the [tex]l_1[/tex] norm, we use the fact that in [tex]n[/tex] dimensions [tex]\|v\|_1 \leq \sqrt{n} \|v\|_2[/tex], and thus we obtain

[tex] \|M^t(p-u)\|_1 \leq \sqrt{n} \lambda_2(M)^t, [/tex]

which was the desired result. QED

What other properties do random walks on expanders have? We now prove another beautiful theorem which tells us that they “move around quickly”, in the sense that they are exponentially unlikely to stay for long within a given subset of vertices, [tex]B[/tex], unless [tex]B[/tex] is very large.

More precisely, suppose [tex]B[/tex] is a subset of vertices, and we choose some vertex [tex]X_0[/tex] uniformly at random from the graph. Suppose we use [tex]X_0[/tex] as the starting point for a random walk, [tex]X_0,\ldots,X_t[/tex], where [tex]X_t[/tex] is the vertex after the [tex]t[/tex]th step. Let [tex]B(t)[/tex] be the event that [tex]X_j \in B[/tex] for all [tex]j[/tex] in the range [tex]0,\ldots,t[/tex]. Then we will prove that:

[tex] \mbox{Pr}(B(t)) \leq \left( \frac{\lambda_2(G)}{d} + \frac{|B|}{n} \right)^t [/tex]

Provided [tex]\lambda_2(G)/d + |B|/n < 1[/tex], we get the desired exponential decrease in probability. For a family of expander graphs it follows that there is some constant [tex]\epsilon > 0[/tex] such that we get an exponential decrease for any [tex]B[/tex] such that [tex]|B|/n < \epsilon[/tex]. These results are special cases of the following more general theorem about Markov chains. Theorem: Let [tex]X_0[/tex] be uniformly distributed on [tex]n[/tex] states, and let [tex]X_0,\ldots,X_t[/tex] be a time-homogeneous Markov chain with transition matrix [tex]M[/tex]. Suppose the uniform distribution [tex]u[/tex] is a stationary point of [tex]M[/tex], i.e., [tex]Mu = u[/tex]. Let [tex]B[/tex] be a subset of the states, and let [tex]B(t)[/tex] be the event that [tex]X_j \in B[/tex] for all [tex]j \in 0,\ldots,t[/tex]. Then [tex] \mbox{Pr}(B(t)) \leq \left( \lambda_2(M) + \frac{|B|}{n} \right)^t. [/tex] Proof: The first step in the proof is to observe that: [tex] \mbox{Pr}(B(t)) = \|(PMP)^t P u \|_1, [/tex] where the operator [tex]P[/tex] projects onto the vector space spanned by those basis vectors corresponding to elements of [tex]B[/tex]. This equation is not entirely obvious, and proving it is a good exercise for the reader. The next step is to prove that [tex]\| PMP \| \leq \lambda_2(M)+|B|/n[/tex], where the norm here is the operator norm. We will do this below, but note first that once this is done, the result follows, for we have [tex] \mbox{Pr}(B(t)) = \| (PMP)^t P u \|_1 \leq \sqrt{n} \| (PMP)^t P u \|_2 [/tex] by the standard inequality relating [tex]l_1[/tex] and [tex]l_2[/tex] norms, and thus [tex] \mbox{Pr}(B(t)) \leq \sqrt{n} \| PMP \|^t \| P u \|_2, [/tex] by definition of the operator norm, and finally [tex] \mbox{Pr}(B(t)) \leq \left( \lambda_2(M)+\frac{|B|}{n} \right)^t, [/tex] where we used the assumed inequality for the operator norm, and the observation that [tex]\| P u \|_2 = \sqrt{|B|}/n \leq 1/\sqrt{n}[/tex]. To prove the desired operator norm inequality, [tex]\| PMP \| \leq \lambda_2(M)+|B|/n[/tex], suppose [tex]v[/tex] is a normalized state such that [tex]\| PMP \| = |v^T PMP v|[/tex]. Decompose [tex]Pv = \alpha u + \beta u_\perp[/tex], where [tex]u_\perp[/tex] is a normalized state orthogonal to [tex]u[/tex]. Since [tex]\|P v \|_2 \leq \|v \|_2 = 1[/tex] we must have [tex]|\beta| \leq 1[/tex]. Furthermore, multiplying [tex]Pv = \alpha u + \beta u_\perp[/tex] on the left by [tex]nu^T[/tex] shows that [tex]\alpha = n u^T P v[/tex]. It follows that [tex]|\alpha|[/tex] is maximized by choosing [tex]v[/tex] to be uniformly distributed over [tex]B[/tex], from which it follows that [tex]|\alpha| \leq \sqrt{|B|}[/tex]. A little algebra shows that [tex] v^T PMP v = \alpha^2 u^T M u + \beta^2 u_\perp^T M u_\perp. [/tex] Applying [tex]|\alpha| \leq \sqrt{|B|}[/tex], [tex]u^T M u = u^Tu = 1/n[/tex], [tex]|\beta| \leq 1[/tex], and [tex]u_\perp^T M u_\perp \leq \lambda_2(M)[/tex] gives [tex] v^T P M P v \leq \frac{|B|}{n} + \lambda_2(M), [/tex] which completes the proof. QED The results in today's post are elegant, but qualitatively unsurprising. (Of course, having elegant quantitative statements of results that are qualitatively clear is worthwhile in its own right!) In the next post we'll use these ideas to develop a genuinely surprising application of expanders, to reducing the number of random bits required by a probabilistic algorithm in order to achieve a desired success probability.

Published
Categorized as General

Quote

“People don’t always want what they need.” – Alan Kay

Published
Categorized as General

Fermions and the Jordan-Wigner transform II: the Canonical Commutation Relations

Back to fermions today, with a post introducing the canonical commutation relations fermions, and discusses something of their mathematical and physical significance. The next post will get more into the meat, describing the consequences these commutation relations have.

Note: This post is one in a series describing fermi algebras, and a powerful tool known as the Jordan-Wigner transform, which allows one to move back and forth between describing a system as a collection of qubits, and as a collection of fermions. The posts assume familiarity with elementary quantum mechanics, comfort with elementary linear algebra (but not advanced techniques), and a little familiarity with the basic nomenclature of quantum information science (qubits, the Pauli matrices).

Fermions

The canonical commutation relations for Fermions

Suppose we have a set of operators [tex]a_1,\ldots,a_n[/tex] acting on some Hilbert space [tex]V[/tex]. Then we say that these operations satisfy the canonical commutation relations (CCRs) for Fermions if they satisfy the equations

[tex] \{ a_j, a_k^\dagger \} = \delta_{jk} I; \,\,\,\, \{ a_j,a_k \} = 0, [/tex]

where [tex]\{ A, B \} \equiv AB+BA[/tex] is the anticommutator. Note that when we take the conjugate of the second of these relations we obtain [tex]\{ a_j^\dagger, a_k^\dagger \} = 0[/tex], which is sometimes also referred to as one of the CCRs. It is also frequently useful to set [tex]j = k[/tex], giving [tex]a_j^2 = (a_j^\dagger)^2 = 0[/tex].

How should one understand the CCRs? One way of thinking about the CCRS is in an axiomatic mathematical sense. In this way of thinking they are purely mathematical conditions that can be imposed on a set of matrices: for a given set of matrices, we can simply check and verify whether those matrices satisfy or do not satisfy the CCRs. For example, when the state space [tex]V[/tex] is that of a single qubit, we can easily verify that the operator [tex]a = |0\rangle \langle 1|[/tex] satisfies the Fermionic CCRs. From this axiomatic point of view the question to ask is what consequences about the structure of [tex]V[/tex] and the operators [tex]a_j[/tex] can be deduced from the fact that the CCRs hold.

There’s also a more sophisticated (but still entirely mathematical) way of understanding the CCRs, as an instance of the relationship between abstract algebraic objects (such as groups, Lie algebras, or Hopf algebras), and their representations as linear maps on vector spaces. My own knowledge of representation theory is limited to a little representation theory of finite groups and of Lie algberas, and I certainly do not see the full context in the way an expert on representation theory would. However, even with that limited background, one can see that there are common themes and techniques: what may appear to be an isolated technique or trick is often really an instance of a much deeper idea or set of ideas that become obvious once once one has enough broad familiarity with representation theory. I’m not going to pursue this point of view in these notes, but thought it worth mentioning for the sake of giving context and motivation to the study of other topics.

Finally, there’s a physical way in which we can understand the CCRs. When we want to describe a system containing Fermions, one way to begin is to start by writing down a set of operators satisfying the CCRs, and then to try to guess what sort of Hamiltonian involving those operators could describe the interactions observed in the system, often motivated by classical considerations, or other rules of thumb. This is, for example, the sort of point of view pursued in the BCS theory of superconductivity, and which people are trying to pursue in understanding high temperature superconductors.

Of course, one can ask why physicists want to use operators satisfying the Fermionic CCRs to describe a system of Fermions, or why anyone, mathematician or physicist, would ever write down the CCRs in the first place. These are good questions, which I’m not going to try to answer here, although one or both questions might make a good subject for some future notes. (It is, of course, a lot easier to answer these questions once you understand the material I present here.)

Instead, I’m going to approach the Fermionic CCRs from a purely mathematical point of view, asking the question “What can we deduce from the fact that a set of operators satisfying the CCRs exists?” The surprising answer is that we can deduce quite a lot about the structure of [tex]V[/tex] and the operators [tex]a_j[/tex] simply from the fact that the [tex]a_j[/tex] satisfy the canonical commutation relations!

We’ll take this subject up in the next post, where we’ll show that the CCRs essentially uniquely determine the action of the [tex]a_j[/tex]s, up to a choice of basis.

Published
Categorized as General

Expander graphs IV: Expansion and the eigenvalue gap

I return to expanders today, with a discussion of expansion and eigenvalue gaps. Thanks to all the commenters on earlier posts for all the proof ideas and simplifications!

Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.

In today’s post we really reap the benefits of recasting graph theory in linear algebraic terms, showing that the the expansion parameter can be related to the gap between the largest and second largest eigenvalue of the graph. This ought to surprise you, perhaps even shock you. It’s a highly non-trivial result, with the big payoff of relating something we didn’t really understand all that well (the expansion parameter) to eigenvalue gaps, about which an enormous amount it known.

Expansion and the eigenvalue gap

Let’s return our attention to expander graphs, and see what the eigenvalues of a graph have to do with its expansion parameter. We define the gap for the graph [tex]G[/tex] to be the difference [tex]\Delta(G) \equiv \lambda_1(G)-\lambda_2(G)[/tex] between the largest and second-largest eigenvalues. The expansion parameter and the gap are connected by the following theorem:

Theorem: The expansion parameter [tex]h(G)[/tex] for a [tex]d[/tex]-regular graph [tex]G[/tex] is related to the gap [tex]\Delta(G)[/tex] by:

[tex] \frac{\Delta(G)}{2} \leq h(G) \leq \sqrt{2d \Delta(G)}. [/tex]

Thus, properties of the eigenvalue gap can be used to deduce properties of the expansion parameter. For example, if the eigenvalue gap for a family of [tex]d[/tex]-regular graphs is bounded below by a positive constant, then the expansion parameter must also be bounded below by a positive constant, and so the family is an expander.

One reason for finding the connection between the gap and the expansion parameter interesting is that it is far easier to estimate the gap of an [tex]n[/tex] by [tex]n[/tex] matrix than it is to enumerate the exponentially many subsets [tex]S[/tex] of the vertex set [tex]V[/tex], and compute [tex]|\partial S|/|S|[/tex] for each one.

Proof discussion: We already understand that [tex]\lambda_1(G) = d[/tex] for this graph, with eigenvector [tex]\vec 1 = (1,1,\ldots,1)[/tex]. So we’ll concentrate on trying to understand the behaviour of the second largest eigenvalue, [tex]\lambda_2(G)[/tex]. The theorem tells us that the difference between [tex]d[/tex] and [tex]\lambda_2(G)[/tex] is controlled both above and below by the expansion parameter [tex]h(G)[/tex].

How can we get control over the second largest eigenvalue of [tex]G[/tex]? One way is to observe that [tex]\lambda_2(G)[/tex] is just the maximum of the expression [tex]v^T A v / v^T v[/tex], where [tex]A[/tex] is the adjacency matrix of [tex]G[/tex], and we maximize over all vectors [tex]v[/tex] orthogonal to the eigenvector [tex]\vec 1[/tex]. An encouraging fact is that this expression is quite easy to deal with, because the condition that [tex]v[/tex] be orthogonal to [tex]\vec 1[/tex] is actually equivalent to the sum of [tex]v[/tex]’s entries being equal to [tex]0[/tex], so we have

[tex] \lambda_2(G) = \max_{v: {\rm tr}(v) = 0} \frac{v^T A v}{v^T v}, [/tex]

where [tex]{\rm tr}(v)[/tex] is just the sum of the entries of the vector [tex]v[/tex].

We’re going to provide a lower bound on [tex]\lambda_2(G)[/tex] by simply guessing a good choice of [tex]v[/tex] satisfying [tex]\mbox{tr}(v) = 0[/tex], and using the fact that

[tex] \lambda_2(G) \geq \frac{v^T A v}{v^T v}. [/tex]

To make a good guess, it helps to have a way of thinking about expressions like [tex]v^T A v[/tex], where [tex]\mbox{tr}(v) = 0[/tex]. A convenient way of thinking is to rewrite [tex]v[/tex] as the difference of two disjoint probability distributions, [tex]p[/tex] and [tex]q[/tex], i.e., [tex]v = p-q[/tex], where [tex]p[/tex] and [tex]q[/tex] are non-negative vectors each summing to [tex]1[/tex], and with disjoint support. This results in terms like [tex]p^T A q[/tex], which we can think of in terms of transition probabilities between [tex]q[/tex] and [tex]p[/tex]. This will allow us to apply the expansion properties of the graph.

Let’s make these ideas a little more concrete. The key is to define [tex]\vec 1_S[/tex] to be the vector whose entries are [tex]1[/tex] on [tex]S[/tex], and [tex]0[/tex] elsewhere, and to observe that

[tex] \vec 1_S^T A \vec 1_T = |E(S,T)|, [/tex]

where [tex]|E(S,T)|[/tex] is the number of edges between the vertex sets [tex]S[/tex] and [tex]T[/tex]. This suggests that we should choose [tex]p[/tex] and [tex]q[/tex] in terms of vectors like [tex]\vec 1_S[/tex], since it will enable us to relate expressions like [tex]v^T A v[/tex] to the sizes of various edge sets, which, in turn, can be related to the expansion parameter.

Suppose in particular that we choose

[tex] v = \frac{\vec 1_S}{|S|}-\frac{\vec 1_{\overline S}}{|\overline S|}. [/tex]

This satisfies the condition [tex]\mbox{tr}(v) = 0[/tex], and gives

[tex] v^T v = \frac{1}{|S|}+\frac{1}{|\overline S|} [/tex]

and

[tex] v^T A v = \frac{1}{|S|^2} E(S,S) + \frac{1}{|\overline S|^2} E(\overline S,\overline S) – \frac{2}{|S||\overline S|} E(S,\overline S). [/tex]

The definition of an expander graph gives us control over [tex]E(S,\overline S)[/tex], so it is convenient to rewrite [tex]E(S,S)[/tex] and [tex]E(\overline S,\overline S)[/tex] in terms of [tex]E(S,\overline S)[/tex], using the [tex]d[/tex]-regularity of the graph:

[tex] E(S,S)+E(S,\overline S) = d |S|; \,\,\, E(\overline S,\overline S)+ E(S,\overline S) = d |\overline S|. [/tex]

A straightforward substitution and a little algebra gives:

[tex] v^T A v = d \left( \frac{1}{|S|}+\frac{1}{|\overline S|} \right) – \left(\frac{1}{|S|}+\frac{1}{|\overline S|}\right)^2 E(S,\overline S). [/tex]

Comparing with the earlier expression for the denominator [tex]v^T v[/tex], we obtain

[tex] \lambda_2(G) \geq d-\left(\frac{1}{|S|}+\frac{1}{|\overline S|}\right) E(S,\overline S). [/tex]

Now choose [tex]S[/tex] so that [tex]E(S,\overline S) = h(G) |S|[/tex], and [tex]|S| \leq n/2[/tex], giving after a little algebra

[tex] \lambda_2(G) \geq d-2 h(G), [/tex]

and thus

[tex] \frac{\Delta(G)}{2} \leq h(G), [/tex]

which was the first of the two desired inequalities in the theorem.

The proof of the second inequality is a little more complicated. Unfortunately, I haven’t managed to boil the proof down to a form that I’m really happy with, and for this reason I won’t describe the details. If you’re interested, you should try to prove it yourself, or refer to the notes of Linial and Wigderson.

QED

Problem: Can we generalize this result so that it applies to a general undirected graph [tex]G[/tex], not just to [tex]d[/tex]-regular graphs? Can we prove an analogous statement for directed graphs, perhaps in terms of singular values? Can we define a a generalized notion of “expansion” which can be applied to any symmetric matrix [tex]A[/tex] with non-negative entries, and connect that notion of expansion to the eigenvalue gap? Can we generalize even further? What happens if we change the field over which the matrix is considered?

Published
Categorized as General

Interdisciplinary research

Interdisciplinary research: Building off a provocative post by Eszter Hargittai at Crooked Timber, Cosma Shalizi posts some interesting thoughts on how to do (and how not to do) interdisciplinary scientific research. Look for the followups too, including the link to Aaron Clauset’s post.

How does quantum information and computation stack up in this vein? One healthy thing is that quite a few computer scientists attend conferences that are predominantly physics-oriented. Conversely, while there’s only one regular conference in the field that’s predominantly computer science oriented (QIP), so far as I know, a reasonable number of physicists attend that.

Published
Categorized as General

Religious wars

I like Macs. I always have, ever since the day I first saw MacPaint running on one of the original Macs (I believe it 128k!)

But I do have a question for all the Mac zealots out there. If System 10 is so terrific, then why did it take the brilliant engineers at Apple two decades to figure out that running Unix was a good idea? Unix hasn’t changed so much in that time.

Published
Categorized as General

Fermions and the Jordan-Wigner transform I: Introduction

I’m going to switch topics now, moving to the more physical topic of Fermions and the Jordan-Wigner transform. I’ve actually written papers using this stuff, but it’s only in writing these notes that I really feel like I’ve understood it well.

The series on expanders isn’t finished – I’ll be back with another installment in a day or two, and there will be several more installments after that.

General introductory note and prerequisites: This post is one in a series describing fermi algebras, and a powerful tool known as the Jordan-Wigner transform, which allows one to move back and forth between describing a system as a collection of qubits, and as a collection of fermions. The posts assume familiarity with elementary quantum mechanics, comfort with elementary linear algebra (but not advanced techniques), and a little familiarity with the basic nomenclature of quantum information science (qubits, the Pauli matrices).

Introduction

When you learn undergraduate quantum mechanics, it starts out being all about wavefunctions and Hamiltonians, finding energy eigenvalues and eigenstates, calculating measurement probabilities, and so on.

If your physics education was anything like mine, at some point a mysterious jump occurs. People teaching more advanced subjects, like quantum field theory, condensed matter physics, or quantum optics, start “imposing canonical commutation relations” on various field operators.

Any student quickly realizes that “imposing canonical commutation relations” is extremely important, but, speaking personally, at the time I found it quite mysterious exactly what people meant by this phrase. It’s only in the past few years that I’ve obtained a satisfactory understanding of how this works, and understood why I had such trouble in the first place.

These notes contain two parts. The first part is a short tutorial explaining the Fermionic canonical commutation relations (CCRs) from an elementary point of view: the different meanings they can have, both mathematical and physical, and what mathematical consequences they have. I concentrate more on the mathematical consequences than the physical in these notes, since having a good grasp of the former seems to make it relatively easy to appreciate the latter, but not so much vice versa. I may come back to the physical aspect in some later notes.

The second part of the notes describes a beautiful application of the Fermionic CCRs known as the Jordan-Wigner transform. This powerful tool allows us to map a system of interacting qubits onto an equivalent system of interacting Fermions, or, vice versa, to map a system of Fermions onto a system of qubits.

Why is this kind of mapping interesting? It’s interesting because it means that anything we understand about one type of system (e.g., Fermions) can be immediately applied to learn something about the other type of system (e.g., qubits).

I’ll describe an application of this idea, taking what appears to be a very complicated one-dimensional model of interacting spin-[tex]\frac 12[/tex] particles, and showing that it is equivalent to a simple model of non-interacting Fermions. This enables us to solve for the energy spectrum and eigenstates of the original Hamiltonian. This has, of course, intrinsic importance, since we’d like to understand such spin models – they’re important for a whole bundle of reasons, not the least of which is that they’re perhaps the simplest systems in which quantum phase transitions occur. But this example is only the tip of a much larger iceberg: the idea that the best way of understanding some physical systems may be to map those systems onto mathematically equivalent but physically quite different systems, whose properties we already understand. Physically, we say that we introduce a quasiparticle description of the original system, in order to simplify its understanding. This idea has been of critical importance in much of modern physics, including the understanding of superconductivity and the quantum Hall effect.

Another application of the Jordan-Wigner transform, which I won’t describe in detail here, but which might be of interest to quantum computing people, is to the quantum simulation of a system of Fermions. In particular, the Jordan-Wigner transform allows us to take a system of interacting Fermions, and map it onto an equivalent model of interacting spins, which can then, in principle, be simulated using standard techniques on a quantum computer. This enables us to use quantum computers to efficiently simulate systems of interacting Fermions. This is not a trivial problem, as can be seen from the following quote from Feynman, in his famous 1982 paper on quantum computing:

“[with Feynman’s proposed quantum computing device] could we imitate every quantum mechanical system which is discrete and has a finite number of degrees of freedom? I know, almost certainly, that we could do that for any quantum mechanical system which involves Bose particles. I’m not sure whether Fermi particles could be described by such a system. So I leave that open.”

It wasn’t until 20 years later, and the work by Somma, Ortiz, Gubernatis, Knill and Laflamme (Physical Review A, 2002) that this problem was resolved, by making use of the Jordan-Wigner transform.

The next post will introduce the canonical commutation relations for fermions, and discuss something of their mathematical and physical significance.

Published
Categorized as General

Expander graphs III: graphs and their adjacency matrices

Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.

Today’s post digresses to explain one of the most interesting ideas in graph theory, namely, to describe a graph through its adjacency matrix. By describing a graph in terms of matrices we enable all the powerful tools of linear algebra to be brought to bear on graph theory, and we’ll see that there are all sorts of interesting connections between graphs and linear algebraic concepts such as eigenvalues. The connection to expanders will be explained in later posts, where we’ll see that the expansion parameter is connected to the second largest eigenvalue of the adjacency matrix. I believe (but have not checked) that it is this result which is used to establish that the graphs described in the previous post are actually expanders.

Graphs and their adjacency matrices

How can we prove that a family of graphs is an expander? Stated another way, how does the expansion parameter [tex]h(G)[/tex] vary as the graph [tex]G[/tex] is varied over all graphs in the family?

One way of tackling the problem of computing [tex]h(G)[/tex] is to do a brute force calculation of the ratio [tex]|\partial S|/|S|[/tex] for every subset [tex]S[/tex] of vertices containing no more than half the vertices in the graph. Doing this is a time-consuming task, since if there are [tex]n[/tex] vertices in the graph, then there are exponentially many such subsets [tex]S[/tex].

Problem: In general, how hard is it to find the subset [tex]S[/tex] minimizing [tex]|\partial S|/|S|[/tex]? Can we construct an NP-Complete variant of this problem? I don’t know the answer to this question, and I don’t know if anyone else does, either.

Fortunately, there is an extraordinarily beautiful approach to the problem of determining [tex]h(G)[/tex] which is far less computationally intensive. It involves the adjacency matrix [tex]A(G)[/tex] of the graph [tex]G[/tex]. By definition, the rows and columns of the adjacency matrix are labelled by the vertices of [tex]V[/tex]. For vertices [tex]v[/tex] and [tex]w[/tex] the entry [tex]A(G)_{vw}[/tex] is defined to be [tex]1[/tex] if [tex](v,w)[/tex] is an edge, and [tex]0[/tex] if it is not an edge.

It is a marvellous fact that properties of the eigenvalue spectrum of the adjacency matrix [tex]A(G)[/tex] can be used to understand properties of the graph [tex]G[/tex]. This occurs so frequently that we refer to the spectrum of [tex]A(G)[/tex] as the spectrum of the graph [tex]G[/tex]. It is useful because the eigenvalue spectrum can be computed quickly, and certain properties, such as the largest and smallest eigenvalue, the determinant and trace, can be computed extremely quickly.

More generally, by recasting graphs in terms of adjacency matrices, we open up the possibility of using tools from linear algebra to study the properties of graphs. Although we’re most interested in studying expanders, for the rest of this post I’m going to digress from the study of expanders, studying how the linear algebraic point of view can help us understand graphs, without worrying about how this connects to expanders. This digression is partially motivated by the fact that this is beautiful stuff (at least in my opinion), and is partially because our later discussion of expanders will be based on this linear algebraic point of view, and so it’s good to get comfortable with this point of view.

The following exercise provides a good example of how graph properties can be related to the eigenvalues of the graph.

Exercise: Prove that if two graphs are isomorphic, then they have the same spectrum.

This result is often useful in proving that two graphs are not isomorphic: simply compute their eigenvalues, and show that they are different. A useful extension of the exercise is to find an example of two graphs which have the same spectra, but which are not isomorphic.

Note that the adjacency matrix may be considered as a matrix over any field, and the result of the exercise is true over any field. (I’ve often wondered if the converse is true, but don’t know the answer.) Nonetheless, by and large, we’ll consider the adjacency matrix as a matrix over the field [tex]R[/tex] of real numbers. Assuming that [tex]G[/tex] is an undirected graph, we see that [tex]A(G)[/tex] is a real symmetric matrix, and thus can be diagonalized. We will find it convenient to write the eigenvalues of a graph [tex]G[/tex] in non-increasing order, as [tex]\lambda_1(G) \geq \lambda_2(G) \geq \ldots \geq \lambda_n(G)[/tex].

A fact we’ll make a lot of use of is that when [tex]G[/tex] is [tex]d[/tex]-regular the largest eigenvalue of [tex]G[/tex] is just [tex]d[/tex]. To see this, note that the vector [tex]\vec 1 \equiv (1,1,\ldots,1)[/tex] is an eigenvector of [tex]G[/tex] with eigenvalue [tex]d[/tex]. To prove that [tex]d[/tex] is the largest eigenvalue seems to be a little bit harder. We’ll just sketch a proof. To prove this it is sufficient to show that [tex]v^T A(G) v \leq d[/tex] for all normalized vectors [tex]v[/tex]. From the [tex]d[/tex]-regularity of [tex]G[/tex] it follows that [tex]A(G)/d[/tex] is a doubly stochastic matrix, i.e., has non-negative entries, and all rows and columns sum to one. A theorem of Birkhoff ensures that [tex]A(G)/d[/tex] can be written as a convex combination of permutation matrices, so [tex]A(G) = d \sum_j p_j P_j[/tex], where [tex]p_j[/tex] are probabilities, and the [tex]P_j[/tex] are permutation matrices. This gives [tex]v^T A(G) v = d \sum_j p_j v^T P_j v[/tex]. But [tex]v^T P_j v \leq 1[/tex] for any permutation [tex]P_j[/tex], which gives the desired result.

The following proposition gives another example of the relationships one can find between a graph and its spectrum.

Proposition: A [tex]d[/tex]-regular graph [tex]G[/tex] is connected if and only if [tex]\lambda_1(G) > \lambda_2(G)[/tex].

Proof: The easy direction is the reverse implication, for which we prove the contrapositive, namely, that a [tex]d[/tex]-regular disconnected graph has [tex]\lambda_1(G) = \lambda_2(G)[/tex]. This follows by breaking [tex]G[/tex] up into disconnected components [tex]G_1[/tex] and [tex]G_2[/tex], and observing that [tex]A(G) = A(G_1) \oplus A(G_2)[/tex], where [tex]\oplus[/tex] is the matrix direct sum. Since both [tex]G_1[/tex] and [tex]G_2[/tex] are [tex]d[/tex]-regular it follows that they both have maximal eigenvalue [tex]d[/tex], and so [tex]d[/tex] appears at least twice in the spectrum of [tex]A(G)[/tex].

At the moment, I don’t see an easy way of proving the forward implication. One not very satisfying proof is to observe that [tex]A(G)/d[/tex] is the Markov transition matrix for a random walk on the graph, and that since the graph is connected, the random walk must converge to a unique distribution, which implies that in the limit of large [tex]n[/tex] there can only be one vector [tex]v[/tex] such that [tex](G^n/d^n) v = v[/tex]. This means that [tex]G^n[/tex]’s largest eigenvalue is non-degenerate, from which it follows that [tex]G[/tex]’s largest eigenvalue is non-degenerate. This is a sketch, but it can all be established rigorously with a little work and the aid of well-known theorems on Markov chains.

The proof sketched in the previous paragraph is not really satisfactory, since it involves an appeal to theorems which are in some sense less elementary than the result under discussion. Another possibility which I’ve explored but haven’t made work with complete rigour is to investigate [tex]G^n/d^n[/tex] more explicitly. With a little thought one can prove that the entry [tex]G^n_{vw}[/tex] is just the number of paths between [tex]v[/tex] and [tex]w[/tex] of length [tex]n[/tex]. Since [tex]G[/tex] is connected, we’d expect in the limit of large [tex]n[/tex] this number would be dominated by a term which does not depend on [tex]w[/tex], and would just scale like the total number of paths of length [tex]n[/tex] starting at [tex]v[/tex] (which is [tex]d^n[/tex]), divided by the total number of possible destinations [tex]w[/tex], which is [tex]n[/tex], giving [tex]G^n_{vw}/d^n \rightarrow 1/n[/tex]. (This would only be true if [tex]G[/tex] has self-loops [tex](v,v)[/tex].) Of course, the matrix whose entries are all [tex]1/n[/tex] has a single eigenvalue [tex]1[/tex], with all the rest [tex]0[/tex], which would suffice to establish the theorem.

QED

Problem: How should we interpret the determinant of a graph? What about the trace?

Problem: If we consider [tex]A(G)[/tex] as a matrix over the field [tex]Z_2 = \{0,1\}[/tex], then it is possible to define a matrix sum [tex]G_1+G_2[/tex], whose adjacency matrix is just [tex]A(G_1)+A(G_2)[/tex], and a matrix product [tex]G_1 \times G_2[/tex] whose adjacency matrix is just [tex]A(G_1) A(G_2)[/tex]. Many questions naturally suggest themseves: (1) when is there an edge between [tex]v[/tex] and [tex]w[/tex] in [tex]G_1+G_2[/tex]; (2) when is there an edge between [tex]v[/tex] and [tex]w[/tex] in [tex]G_1 \times G_2[/tex] (these first two questions are easy to answer); (3) for which graphs is [tex]A(G)[/tex] invertible, and thus a natural inverse graph [tex]G^{-1}[/tex] exists; (4) how can we interpret the inverse graph; (5) when do two graphs commute?

Problem: Along similar lines to the previous problem, it’s possible to define a tensor product of graphs. What are the properties of the graph tensor product?

The ideas I’ve described in this post are examples of the important general principle that once you’ve defined a mathematical object, you should seek out alternate representations (or even just partial representations) of that object in terms of mathematical objects that you already understand. By recasting graphs as matrices, we open up the possibility of using all the tools of linear algebra to answer questions about graphs. This can work in one of two ways: we can ask a question about graphs, and try to see if it’s possible to give a linear algebraic answer, or we can ask what implication known results of linear algebra have for graphs – what does the Gaussian elimination procedure correspond to, or the spectral decomposition, or two matrices commuting, or the wedge product, or whatever. Exploring such connections has the potential to greatly enrich both subjects.

In this post we’ve started to get a feel for how the properties of graphs can be studied using linear algebra. In the next post we’ll turn our attention back to expanders, and understand how the expansion coefficient can be related to the gap [tex]\lambda_1(G)-\lambda_2(G)[/tex] between the largest and second largest eigenvalues of [tex]G[/tex].

Published
Categorized as General

Expander graphs II: Examples

Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.

Today’s post is a short post, giving some extremely useful explicit examples of expander graphs.

Explicit examples of expanders

We’ve seen previously that a family of [tex]d[/tex]-regular random graphs on [tex]n[/tex] vertices defines an expander. For applications it is often more useful to have more explicit constructions for expanders. In particular, for applications to algorithms it is often useful to construct expanders on [tex]O(2^n)[/tex] vertices, where [tex]n[/tex] is some parameter describing problem size. Just to store a description of a random graph on so many vertices requires exponentially much time and space, and so is not feasible. Fortunately, more parsimonious constructions are possible, which we now describe.

Example: In this example the family of graphs is indexed by a prime number, [tex]p[/tex]. The set of vertices for the graph [tex]G_p[/tex] is just the set of points in [tex]Z_p[/tex], the field of integers modulo [tex]p[/tex]. We construct a [tex]3[/tex]-regular graph by connecting each vertex [tex]x \neq 0[/tex] to [tex]x-1,x+1[/tex] and [tex]x^{-1}[/tex]. The vertex [tex]x=0[/tex] is connected to [tex]p-1,0[/tex] and [tex]1[/tex]. According to the lecture notes by Linial and Wigderson, this was proved to be a family of expanders by Lubotsky, Phillips and Sarnak in 1988, but I don’t know a lower bound on the expansion parameter. Note that even for [tex]p = O(2^n)[/tex] we can do basic operations with this graph (e.g., random walking along its vertices), using computational resources that are only polynomial in time and space. This makes this graph potentially far more useful in applications than the random graphs considered earlier.

Example: A similar but slightly more complex example is as follows. The vertex set is [tex]Z_m \times Z_m[/tex], where [tex]m[/tex] is some positive integer, and [tex]Z_m[/tex] is the additive group of integers modulo [tex]m[/tex]. The degree is [tex]4[/tex], and the vertex [tex](x,y)[/tex] has edges to [tex](x\pm y,y)[/tex], and [tex](x,x \pm y)[/tex], where all addition is done modulo [tex]m[/tex]. Something which concerns me a little about this definition, but which I haven’t resolved, is what happens when [tex]m[/tex] is even and we choose [tex]y = m/2[/tex] so that, e.g., the vertices [tex](x+y,y)[/tex] and [tex](x-y,y)[/tex] coincide with one another. We would expect this duplication to have some effect on the expansion parameter, but I haven’t thought through exactly what.

Unfortunately, I don’t know any more examples of expanders, although I’m sure there are some! Still, hopefully you will agree that these examples are pretty nice: they are easy to describe, and allow us to work with expander graphs on exponentially large vertex sets, without expending too much effort. I haven’t proved that these families are expanders: doing that requires some extra technical tools based on the adjacency matrix of a graph, which is the subject of the next post.

Published
Categorized as General