June 2005 – Michael Nielsen

Fermions and the Jordan-Wigner transform V: Diagonalization Continued

In today’s post we continue with the train of thought begun in the last post, learning how to find the energy spectrum of any Hamiltonian quadratic in Fermi operators. Although we don’t dwell in this post on the connection to specific physical models, this class of Hamiltonians covers an enormous number of models of relevance in condensed matter physics. In later posts we’ll apply these results (together with the Jordan-Wigner transform) to understand the energy spectrum of some models of interacting spins in one dimension, which are prototypes for an understanding of quantum magnets and many, many other phenomena, including many systems which undergo quantum phase transitions.

Update: Thanks to Steve Mayer for fixing a bug in the way matrices are displayed.

Note: This post is one in a series describing fermi algebras, and a powerful tool known as the Jordan-Wigner transform, which allows one to move back and forth between describing a system as a collection of qubits, and as a collection of fermions. The posts assume familiarity with elementary quantum mechanics, comfort with elementary linear algebra (but not advanced techniques), and a little familiarity with the basic nomenclature of quantum information science (qubits, the Pauli matrices).

The Hamiltonian [tex]H = \sum_{jk} \alpha_{jk} a_j^\dagger a_k[/tex] we diagonalized in the last post can be generalized to any Hamiltonian which is quadratic in Fermi operators, by which we mean it may contain terms of the form [tex]a^\dagger_j a_k, a_j a_k^\dagger, a_j a_k[/tex] and [tex]a_j^\dagger a_k[/tex]. We will not allow linear terms like [tex]a_j+a_j^\dagger[/tex]. Additive constant terms [tex]\gamma I[/tex] are easily incorporated, since they simply displace all elements of the spectrum by an amount [tex]\gamma[/tex]. There are several ways one can write such a Hamiltonian, but the following form turns out to be especially convenient for our purposes:

[tex] H = \sum_{jk} \left( \alpha_{jk} a_j^\dagger a_k -\alpha^*_{jk} a_j a_k^\dagger + \beta_{jk} a_j a_k – \beta^*_{jk} a_j^\dagger a_k^\dagger \right). [/tex]

The reader should spend a little time convincing themselves that for the class of Hamiltonians we have described, it is always possible to write the Hamiltonian in this form, up to an additive constant [tex]\gamma I[/tex], and with [tex]\alpha[/tex] hermitian and [tex]\beta[/tex] antisymmetric.

This class of Hamiltonian appears to have first been diagonalized in an appendix to a famous Annals of Physics paper by Lieb, Schultz and Mattis, dating to 1961 (volume 16, pages 407-466), and the procedure we follow is inspired by theirs. We begin by defining operators [tex]b_1,\ldots,b_n[/tex]:

[tex] b_j \equiv \sum_k \left( \gamma_{jk} a_k + \mu_{jk} a_k^\dagger \right). [/tex]

We will try to choose the complex numbers [tex]\gamma_{jk}[/tex] and [tex]\mu_{jk}[/tex] to ensure that: (1) the operators [tex]b_j[/tex] satisfy Fermionic CCRs; and (2) when expressed in terms of the [tex]b_j[/tex], [tex]H[/tex] has the same form as [tex]H_{\rm free}[/tex], and so can be diagonalized.

A calculation shows that the condition [tex]\{ b_j, b_k^\dagger \} = \delta_{jk} I[/tex] is equivalent to the condition

[tex] \gamma \gamma^\dagger + \mu \mu^\dagger = I, [/tex]

while the condition [tex]\{ b_j, b_k \} = 0[/tex] is equivalent to the condition

[tex] \gamma \mu^T+\mu \gamma^T = 0. [/tex]

These are straightforward enough equations, but their meaning is perhaps a little mysterious. More insight into their structure is obtained by rewriting the connection between the [tex]a_j[/tex]s and the [tex]b_j[/tex]s in an equivalent form using vectors whose individual entries are not numbers, but rather are operators such as [tex]a_j[/tex] and [tex]b_j[/tex], and using a block matrix with blocks [tex]\gamma, \mu, \mu^*[/tex] and [tex]\gamma^*[/tex]:

[tex] \left[ \begin{array}{c} b_1 \\ \vdots \\ b_n \\ b_1^\dagger \\ \vdots \\ b_n^\dagger \end{array} \right] = \left[ \begin{array}{cc} \gamma & \mu \\ \mu^* & \gamma^* \end{array} \right] \left[ \begin{array}{c} a_1 \\ \vdots \\ a_n \\ a_1^\dagger \\ \vdots \\ a_n^\dagger \end{array} \right]. [/tex]

The conditions derived above for the [tex]b_j[/tex]s to satisfy the CCRs are equivalent to the condition that the transformation matrix

[tex] T \equiv \left[ \begin{array}{cc} \gamma & \mu \\ \mu^* & \gamma^* \end{array} \right] [/tex]

is unitary, which is perhaps a somewhat less mysterious condition than the earlier equations involving [tex]\gamma[/tex] and [tex]\mu[/tex]. One advantage of this representation is that it makes it easy to find an expression for the [tex]a_j[/tex] in terms of the [tex]b_j[/tex], simply by inverting this unitary transformation, to obtain:

[tex] \left[ \begin{array}{c} a_1 \\ \vdots \\ a_n \\ a_1^\dagger \\ \vdots \\ a_n^\dagger \end{array} \right] = T^\dagger \left[ \begin{array}{c} b_1 \\ \vdots \\ b_n \\ b_1^\dagger \\ \vdots \\ b_n^\dagger \end{array} \right]. [/tex]

The next step is to rewrite the Hamiltonian in terms of the [tex]b_j[/tex] operators. To do this, observe that:

[tex] H = [ a_1^\dagger \ldots a_n^\dagger a_1 \ldots a_n ] \left[ \begin{array}{cc} \alpha & -\beta^* \ \beta & -\alpha^* \end{array} \right] \left[ \begin{array}{c} a_1 \\ \vdots \\ a_n \\ a_1^\dagger \\ \vdots \\ a_n^\dagger \end{array} \right]. [/tex]

It is actually this expression for [tex]H[/tex] which motivated the original special form which we chose for [tex]H[/tex]. The expression is convenient, for it allows us to easily transform back and forth between [tex]H[/tex] expressed in terms of the [tex]a_j[/tex] and [tex]H[/tex] in terms of the [tex]b_j[/tex]. We already have an expression in terms of the [tex]b_j[/tex] operators for the column vector containing the [tex]a[/tex] and [tex]a^\dagger[/tex] terms. With a little algebra this gives rise to a corresponding expression for the row vector containing the [tex]a^\dagger[/tex] and [tex]a[/tex] terms:

[tex] [a_1^\dagger \ldots a_n^\dagger a_1 \ldots a_n] = [b_1^\dagger \ldots b_n^\dagger b_1 \ldots b_n] T. [/tex]

This allows us to rewrite the Hamiltonian as

[tex] H = [b^\dagger b] T M T^\dagger \left[ \begin{array}{c} b \\ b^\dagger \end{array} \right], [/tex]

where we have used the shorthand [tex][b^\dagger b][/tex] to denote the vector with entries [tex]b_1^\dagger, \ldots, b_n^\dagger,b_1,\ldots,b_n[/tex], and

[tex] M = \left[ \begin{array}{cc} \alpha & -\beta^* \\ \beta & -\alpha^* \end{array} \right]. [/tex]

Supposing we can choose [tex]T[/tex] such that [tex]TMT^\dagger[/tex] is diagonal, we see that the Hamiltonian can be expressed in the form of [tex]H_{\rm free}[/tex], and the energy spectrum found, following our earlier methods.

Since [tex]\alpha[/tex] is hermitian and [tex]\beta[/tex] antisymmetric it follows that [tex]M[/tex] also is hermitian, and so can be diagonalized for some choice of unitary [tex]T[/tex]. However, the fact that the [tex]b_j[/tex]s must satisfy the CCRs constrains the class of [tex]T[/tex]s available to us. We need to show that such a [tex]T[/tex] can be used to do the diagonalization.

We will give a heuristic and somewhat incomplete proof that this is possible, before making some brief remarks about what is required for a rigorous proof. I’ve omitted the rigorous proof, since the way I understand it is uses a result from linear algebra that, while beautiful, I don’t want to explain in full detail here.

Suppose [tex]T[/tex] is any unitary such that

[tex] T M T^\dagger = \left[ \begin{array}{cc} d & 0 \\ 0 & -d \end{array} \right], [/tex]

where [tex]d[/tex] is diagonal, and we used the special form of [tex]M[/tex] to deduce that the eigenvalues are real and appear in matched pairs [tex]\pm \lambda[/tex]. We’d like to show that [tex]T[/tex] can be chosen to be of the desired special form. To see that this is plausible, consider the map [tex]X \rightarrow S X^* S^\dagger[/tex], where [tex]S[/tex] is a block matrix:

[tex] S = \left[ \begin{array}{cc} 0 & I \\ I & 0 \end{array} \right]. [/tex]

Applying this map to both sides of the earlier equation we obtain

[tex] ST^* M^* T^T S^\dagger = \left[ \begin{array}{cc} -d & 0 \\ 0 & d \end{array} \right] = -TMT^\dagger. [/tex]

But [tex]M^* = -S^\dagger M S[/tex], and so we obtain:

[tex] -ST^* S^\dagger M S T^T S^\dagger = -TMT^\dagger. [/tex]

It is at least plausible that we can choose [tex]T[/tex] such that [tex]ST^*S^\dagger = T[/tex], which would imply that [tex]T[/tex] has the required form. What this actually shows is, of course, somewhat weaker, namely that [tex]T^\dagger ST^* S^\dagger[/tex] commutes with [tex]M[/tex].

One way of obtaining a rigorous proof is to find a [tex]T[/tex] satisfying

[tex] T M T^\dagger = \left[ \begin{array}{cc} d & 0 \\ 0 & -d \end{array} \right], [/tex]

and then to apply the cosine-sine (or CS) decomposition from linear algebra, which provides a beautiful way of representing block unitary matrices, and which, in this instance, allows us to obtain a [tex]T[/tex] of the desired form with just a little more work. The CS decomposition may be found, for example, as Theorem VII.1.6 on page 196 of Bhatia’s book “Matrix Analysis” (Springer-Verlag, New York, 1997).

Problem: Can we extend these results to allow terms in the Hamiltonian which are linear in the Fermi operators?

In this post we’ve seen how to diagonalize a general Fermi quadratic Hamiltonian. We’ve treated this as a purely mathematical problem, although most physicists will probably have little trouble believing that these techniques are useful in a wide range of physical problems. In the next post we’ll explain a surprising connection between these ideas and one-dimensional spin systems: a tool known as the Jordan-Wigner transform can be used to establish an equivalence between a large class of one-dimensional spin systems and the type of Fermi systems we have been considering. This is interesting because included in this class of one-dimensional spin systems are important models such as the transverse Ising model, which serve as a general prototype for quantum magnetism, are a good basis for understanding some naturally occurring physical systems, and which also serve as prototypes for the understanding of quantum phase transitions.

Fermions and the Jordan-Wigner transform IV: Diagonalizing Fermi Quadratic Hamiltonians

I’ve finally had a chance to get back to Fermions. Today’s post explains how to diagonalize a Hamiltonian which is quadratic in operators satisfying the Fermionic CCRs. Remarkably, we’ll do this using only the CCRs: the operators could arise in many different ways physically, but, as we shall see, it is only the CCRs that matter for determining the spectrum! This class of Hamiltonians arises in a lot of realistic physical systems, and we’ll see an explicit example later on, when we show that a particular spin model (the X-Y model) is equivalent to a Fermi quadratic Hamiltonian.

(Unfortunately, there seems to be a bug in WordPress that required me to strip most of the tags denoting emphasis (e.g. bold or italics) out of this post. Weird.)

Diagonalizing a Fermi quadratic Hamiltonian

Suppose [tex]a_1,\ldots,a_n[/tex] satisfy the Fermionic CCRs, and we have a system with Hamiltonian

[tex] H_{\rm free} = \sum_j \alpha_j a_j^\dagger a_j, [/tex]

where [tex]\alpha_j \geq 0[/tex] for each value of [tex]j[/tex]. In physical terms, this is the Hamiltonian used to describe a system of free, i.e., non-interacting, Fermions.

Such Hamiltonians are used, for example, in the simplest possible quantum mechanical model of a metal, the Drude-Sommerfeld model, which treats the conduction electrons as free Fermions. Such a model may appear pretty simplistic (especially after we solve it, below), but actually there’s an amazing amount of physics one can get out of such simple models. I won’t dwell on these physical consequences here, but if you’re unfamiliar with the Drude-Sommerfeld theory, you could profitably spend a couple of hours looking at the first couple of chapters in a good book on condensed matter physics, like Ashcroft and Mermin’s “Solid State Physics”, which explains the Drude-Sommerfeld model and its consequences in detail. (Why such a simplistic model does such a great job of describing metals is another long story, which I may come back to in a future post.)

Returning to the abstract Hamiltonian [tex]H_{\rm free}[/tex], the positivity of the operators [tex]a_j^\dagger a_j[/tex] implies that [tex]\langle \psi |H_{\rm free} |\psi\rangle \geq 0[/tex] for any state [tex]|\psi\rangle[/tex], and thus the ground state energy of [tex]H_{\rm free}[/tex] is non-negative. However, our earlier construction also shows that we can find at least one state [tex]|\mbox{vac}\rangle[/tex] such that [tex]a_j^\dagger a_j|\mbox{vac}\rangle = 0[/tex] for all [tex]j[/tex], and thus [tex]H_{\rm free}|\mbox{vac}\rangle = 0[/tex]. It follows that the ground state energy of [tex]H_{\rm free}[/tex] is exactly [tex]0[/tex].

This result is easily generalized to the case where the [tex]\alpha_j[/tex] have any sign, with the result that the ground state energy is [tex]\sum_j \min(0,\alpha_j)[/tex], and the ground state [tex]|\psi\rangle[/tex] is obtained from [tex]|\mbox{vac}\rangle[/tex] by applying the raising operator [tex]a_j[/tex] for all [tex]j[/tex] with [tex]\alpha_j < 0[/tex]. More generally, the allowed energies of the excited states of this system correspond to sums over subsets of the [tex]\alpha_j[/tex]. Exercise: Express the excited states of the system in terms of [tex]|\mbox{vac}\rangle[/tex]. Just by the way, readers with an interest in computational complexity theory may find it interesting to note a connection between the spectrum of [tex]H_{\rm free}[/tex] and the Subset-Sum problem from computer science. The Subset-Sum problem is this: given a set of integers [tex]x_1,\ldots,x_n[/tex], with repetition allowed, is there a subset of those integers which adds up to a desired target, [tex]t[/tex]? Obviously, the problem of determining whether [tex]H_{\rm free}[/tex] has a particular energy is equivalent to the Subset-Sum problem, at least in the case where the [tex]\alpha_j[/tex] are integers. What is interesting is that the Subset-Sum problem is known to be NP-Complete, in the language of computational complexity theory, and thus is regarded as computationally intractable. As a consequence, we deduce that the problem of determining whether a particular value for energy is in the spectrum of [tex]H_{\rm free}[/tex] is in general NP-Hard, i.e., at least as difficult as the NP-Complete problems. Similar results hold for the more general Fermi Hamiltonians considered below. Furthermore, this observation suggests the possibility of an interesting link between the physical problem of estimating the density of states, and classes of problems in computational complexity theory, such as the counting classes (e.g., #P), and also to approximation problems. Let's generalize our results about the spectrum of [tex]H_{\rm free}[/tex]. Suppose now that we have the Hamiltonian [tex] H = \sum_{jk} \alpha_{jk} a_j^\dagger a_k. [/tex] Taking the adjoint of this equation we see that in order for [tex]H[/tex] to be hermitian, we must have [tex]\alpha_{jk}^* = \alpha_{kj}[/tex], i.e., the matrix [tex]\alpha[/tex] whose entries are the [tex]\alpha_{jk}[/tex] is itself hermitian. Suppose we introduce new operators [tex]b_1,\ldots,b_n[/tex] defined by [tex] b_j \equiv \sum_{k=1}^n \beta_{jk} a_k, [/tex] where [tex]\beta_{jk}[/tex] are complex numbers. We are going to try to choose the [tex]\beta_{jk}[/tex] so that (1) the operators [tex]b_j[/tex] satisfy the Fermionic CCRs, and (2) when expressed in terms of the [tex]b_j[/tex], the Hamiltonian [tex]H[/tex] takes on the same form as [tex]H_{\rm free}[/tex], and thus can be diagonalized. We begin by looking for conditions on the complex numbers [tex]\beta_{jk}[/tex] such that the [tex]b_j[/tex] operators satisfy Fermionic CCRs. Computing anticommutators we find [tex] \{ b_j, b_k^\dagger \} = \sum_{lm} \beta_{jl} \beta_{km}^* \{ a_l,a_m^\dagger \}. [/tex] Substituting the CCR [tex]\{ a_l,a_m^\dagger \} = \delta_{lm} I[/tex] and writing [tex]\beta_{km}^* = \beta_{mk}^\dagger[/tex] gives [tex] \{ b_j, b_k^\dagger \} = \sum_{lm} \beta_{jl} \delta_{lm} \beta_{mk}^\dagger I= (\beta \beta^\dagger)_{jk} I [/tex] where [tex]\beta \beta^\dagger[/tex] denotes the matrix product of the matrix [tex]\beta[/tex] with entries [tex]\beta_{jl}[/tex] and its adjoint [tex]\beta^\dagger[/tex]. To compute [tex]\{b_j,b_k\}[/tex] we use the linearity of the anticommutator bracket in each term to express [tex]\{b_j,b_k\}[/tex] as a sum over terms of the form [tex]\{ a_l,a_m \}[/tex], each of which is [tex]0[/tex], by the CCRs. As a result, we have: [tex] \{ b_j,b_k \} = 0. [/tex] It follows that provided [tex]\beta \beta^\dagger = I[/tex], i.e., provided [tex]\beta[/tex] is unitary, the operators [tex]b_j[/tex] satisfy the Fermionic CCRs. Let's assume that [tex]\beta[/tex] is unitary, and change our notation, writing [tex]u_{jk} \equiv \beta_{jk}[/tex] in order to emphasize the unitarity of this matrix. We now have [tex] b_j = \sum_k u_{jk} a_k. [/tex] Using the unitarity of [tex]u[/tex] we can invert this equation to obtain [tex] a_j = \sum_k u^\dagger_{jk} b_k. [/tex] Substituting this expression and its adjoint into [tex]H[/tex] and doing some simplification gives us [tex] H = \sum_{lm} (u \alpha u^\dagger)_{lm} b_l^\dagger b_m. [/tex] Since [tex]\alpha[/tex] is hermitian, we can choose [tex]u[/tex] so that [tex]u \alpha u^\dagger[/tex] is diagonal, with entries [tex]\lambda_j[/tex], the eigenvalues of [tex]\alpha[/tex], giving us [tex] H = \sum_j \lambda_j b_j^\dagger b_j. [/tex] This is of the same form as [tex]H_{\rm free}[/tex], and thus the ground state energy and excitation energies may be computed in the same way as we described earlier. What about the ground state of [tex]H[/tex]? Assuming that all the [tex]\lambda_j[/tex] are non-negative, it turns out that a state [tex]|\psi\rangle[/tex] satisfies [tex]a_j^\dagger a_j |\psi\rangle = 0[/tex] for all [tex]j[/tex] if and only if [tex]b_j^\dagger b_j|\psi\rangle = 0[/tex] for all [tex]j[/tex], and so the ground state for the two sets of Fermi operators is the same. This follows from a more general observation, namely, that [tex]a_j^\dagger a_j |\psi\rangle = 0[/tex] if and only if [tex]a_j|\psi\rangle = 0[/tex]. In one direction, this is trivial: just multiply [tex]a_j|\psi\rangle = 0[/tex] on the left by [tex]a_j^\dagger[/tex]. In the other direction, we multiply [tex]a_j^\dagger a_j |\psi\rangle = 0[/tex] on the left by [tex]a_j[/tex] to obtain [tex]a_j a_j^\dagger a_j |\psi\rangle = 0[/tex]. Substituting the CCR [tex]a_j a_j^\dagger = -a_j^\dagger a_j + I[/tex], we obtain [tex] (-a_j^\dagger a_j^2+a_j)|\psi\rangle = 0. [/tex] But [tex]a_j^2 = 0[/tex], so this simplifies to [tex]a_j|\psi\rangle = 0[/tex], as desired. Returning to the question of determining the ground state, supposing [tex]a_j^\dagger a_j|\psi\rangle = 0[/tex] for all [tex]j[/tex], we immediately have [tex]a_j|\psi\rangle = 0[/tex] for all [tex]j[/tex], and thus [tex]b_j|\psi\rangle = 0[/tex] for all [tex]j[/tex], since the [tex]b_j[/tex] are linear functions of the [tex]a_j[/tex], and thus [tex]b_j^\dagger b_j|\psi\rangle = 0[/tex] for all [tex]j[/tex]. This shows that the ground state for the two sets of Fermi operators, [tex]a_j[/tex] and [tex]b_j[/tex], is in fact the same. The excitations for [tex]H[/tex] may be obtained by applying raising operators [tex]b_j^\dagger[/tex] to the ground state. Exercise: Suppose some of the [tex]\lambda_j[/tex] are negative. Express the ground state of [tex]H[/tex] in terms of the simultaneous eigenstates of the [tex]a_j^\dagger a_j[/tex]. Okay, that's enough for one day! We've learnt how to diagonalize a fairly general class of Hamiltonians quadratic in Fermi operators. In the next post we'll go further, learning how to cope with additional terms like [tex]a_j a_j[/tex] and [tex]a_j^\dagger a_k^\dagger[/tex].

Expander graphs: the complete notes

The full pdf text of my series of posts about expander graphs. Thankyou very much to all the people who commented on the posts; if you’re reading this text, and haven’t seen the comments on earlier posts, I recommend you look through them to see all the alternate proofs, generalizations and so on that people have offered.

Expander graphs VI: reducing randomness

Back from Boston! This is the final installment in my series about expanders. I’ll post a pdf containing the whole text in the next day or two. Thanks to everyone who’s contributed in the comments!

Today’s post explains how expander graphs can be used to reduce the number of random bits needed by a randomized algorithm in order to achieve a desired success probability. This post is the culmination of the series: we make use of the fact, proved in the last post, that random walks on an expander are exponentially unlikely to remain localized in any sufficiently large subset of vertices, a fact that relies in turn on the connection, developed in earlier posts, between the eigenavlue gap and the expansion parameter.

Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.

Reducing the number of random bits required by an algorithm

One surprising application of expanders is that they can be used to reduce the number of random bits needed by a randomized algorithm in order to achieve a desired success probability.

Suppose, for example, that we are trying to compute a function [tex]f(x)[/tex] that can take the values [tex]f(x) = 0[/tex] or [tex]f(x) = 1[/tex]. Suppose we have a randomized algorithm [tex]A(x,Y)[/tex] which takes as input [tex]x[/tex] and an [tex]m[/tex]-bit uniformly distributed random variable [tex]Y[/tex], and outputs either [tex]0[/tex] or [tex]1[/tex]. We assume that:

[tex]f(x) = 0[/tex] implies [tex]A(x,Y) = 0[/tex] with certainty.
[tex]f(x) = 1[/tex] implies [tex]A(x,Y) = 1[/tex] with probability at least [tex]1-p_f[/tex].

That is, [tex]p_f[/tex] is the maximal probability that the algorithm fails, in the case when [tex]f(x) = 1[/tex], but [tex]A(x,Y) = 0[/tex] is output by the algorithm.

An algorithm of this type is called a one-sided randomized algorithm, since it can only fail when [tex]f(x) = 1[/tex], not when [tex]f(x) = 0[/tex]. I won’t give any concrete examples of one-sided randomized algorithms here, but the reader unfamiliar with them should rest assured that they are useful and important – see, e.g., the book of Motwani and Raghavan (Cambridge University Press, 1995) for examples.

As an aside, the discussion of one-sided algorithms in this post can be extended to the case of randomized algorithms which can fail when either [tex]f(x) = 0[/tex] or [tex]f(x) = 1[/tex]. The details are a little more complicated, but the basic ideas are the same. This is described in Linial and Wigderson’s lecture notes. Alternately, extending the discussion to this case is a good problem.

How can we descrease the probability of failure for a one-sided randomized algoerithm? One obvious way of decreasing the failure probability is to run the algorithm [tex]k[/tex] times, computing [tex]A(x,Y_0),A(x,Y_1),\ldots,A(x,Y_{k-1})[/tex]. If we get [tex]A(x,Y_j) = 0[/tex] for all [tex]j[/tex] then we output [tex]0[/tex], while if [tex]A(x,Y_j) = 1[/tex] for at least one value of [tex]J[/tex], then we output [tex]f(x) = 1[/tex]. This algorithm makes use of [tex]km[/tex] bits, and reduces the failure probability to at most [tex]p_f^k[/tex].

Expanders can be used to substantially decrease the number of random bits required to achieve such a reduction in the failure probability. We define a new algorithm [tex]\tilde A[/tex] as follows. It requires a [tex]d[/tex]-regular expander graph [tex]G[/tex] whose vertex set [tex]V[/tex] contains [tex]2^m[/tex] vertices, each of which can represent a possible [tex]m[/tex]-bit input [tex]y[/tex] to [tex]A(x,y)[/tex]. The modified algorithm [tex]\tilde A[/tex] works as follows:

Input [tex]x[/tex].
Sample uniformly at random from [tex]V[/tex] to generate [tex]Y_0[/tex].
Now do a [tex]k-1[/tex] step random walk on the expander, generating random variables [tex]Y_1,\ldots, Y_{k-1}[/tex].
Compute [tex]A(x,Y_0),\ldots,A(x,Y_{k-1})[/tex]. If any of these are [tex]1[/tex], output [tex]1[/tex], otherwise output [tex]0[/tex].

We see that the basic idea of the algorithm is similar to the earlier proposal for running [tex]A(x,Y)[/tex] repeatedly, but the sequence of independent and uniformly distributed samples [tex]Y_0,\ldots,Y_{k-1}[/tex] is replaced by a random walk on the expander. The advantage of doing this is that only [tex]m+k \log(d)[/tex] random bits are required – [tex]m[/tex] to sample from the initial uniform distribution, and then [tex]\log(d)[/tex] for each step in the random walk. When [tex]d[/tex] is a small constant this is far fewer than the [tex]km[/tex] bits used when we simply repeatedly run the algorithm [tex]A(x,Y_j)[/tex] with uniform and independently generated random bits [tex]Y_j[/tex].

With what probability does this algorithm fail? Define [tex]B_x[/tex] to be the set of values of [tex]y[/tex] such that [tex]A(x,y) = 0[/tex], yet [tex]f(x) = 1[/tex]. This is the “bad” set, which we hope our algorithm will avoid. The algorithm will fail only if the steps in the random walk [tex]Y_0,Y_1,\ldots,Y_{k-1}[/tex] all fall within [tex]B_x[/tex]. From our earlier theorem we see that this occurs with probability at most:

[tex] \left( \frac{|B_x|}{2^m} + \frac{\lambda_2(G)}{d} \right)^{k-1}. [/tex]

But we know that [tex]|B_x|/2^m \leq p_f[/tex], and so the failure probability is at most

[tex] \left( p_f + \frac{\lambda_2(G)}{d} \right)^{k-1}. [/tex]

Thus, provided [tex]p_f+\lambda_2(G)/d < 1[/tex], we again get an exponential decrease in the failure probability as the number of repetitions [tex]k[/tex] is increased. Conclusion These notes have given a pretty basic introduction to expanders, and there's much we haven't covered. More detail and more applications can be found in the online notes of Linial and Wigderson, or in the research literature. Still, I hope that these notes have given some idea of why these families of graphs are useful, and of some of the powerful connections between graph theory, linear algebra, and random walks.

Adios

I’m off to Boston for a few days, so blogging will be on hold until the middle of next week.

New paper: Operator quantum error correction

A new paper (postscript), joint with David Poulin, all about operator quantum error correction, a clever way (alas, not my idea!) of protecting quantum states against the effects of noise. The paper provides several sets of easily checkable conditions (both algebraic and information-theoretic) characterizing when operator error-correction is feasible.

Postscript: Dave Bacon has a just put out a nice paper related to operator quantum error-correction, constructing some physically motivated examples of self-correcting quantum memories. This idea – that a quantum system can correct itself – is absolutely fascinating, and runs completely counter to the standard folk “wisdom” about quantum mechanics, namely that quantum states are delicate objects that are easily (and irreversibly) destroyed.

Appendix to my posts on Fermions and the Jordan-Wigner transform

This is a little appendix to my post about the consequences of the fermionic CCRs. The results described in the appendix are very well-known – they are taught in any undergrad quantum course – but I’m rather fond of the little proof given, and so am indulging myself by including it here. The results are used in the previous post.

Appendix on mutually commuting observables

Any undergraduate quantum mechanics course covers the fact that a mutually commuting set of Hermitian operators possesses a common eigenbasis. Unfortunately, in my experience this fact is usually proved rather early on, and suffers from being presented in a slightly too elementary fashion, with inductive constructions of explicit basis sets and so on. The following proof is still elementary, but from a slightly more sophisticated perspective. It is, I like to imagine, rather more like what would be given in an advanced course in linear algebra, were linear algebraists to actually cover this kind of material. (They don’t, so far as I know, having other fish to fry.)

Suppose [tex]H_1,\ldots,H_n[/tex] are commuting Hermitian (indeed, normal suffices) operators with spectral decompositions:

[tex] H_j = \sum_{jk} E_{jk} P_{jk}, [/tex]

where [tex]E_{jk}[/tex] are the eigenvalues of [tex]H_j[/tex], and [tex]P_{jk}[/tex] are the corresponding projectors. Since the [tex]H_j[/tex] commute, it is not difficult to verify that for any quadruple [tex]j,k,l,m[/tex] the operators [tex]P_{jk}[/tex] and [tex]P_{lm}[/tex] also commute. For a vector [tex]\vec k = (k_1,\ldots,k_n)[/tex] define the operator

[tex] P_{\vec k} \equiv P_{1 k_1} P_{2 k_2} \ldots P_{n k_n}. [/tex]

Note that the order of the operators on the right-hand side does not matter, since they all commute with one another. The following equations all follow easily by direct computation, the mutual commutativity of the [tex]P_{jk}[/tex] operators, and standard properties of the spectral decomposition:

[tex] P_{\vec k}^\dagger = P_{\vec k}; \,\,\,\, \sum_{\vec k} P_{\vec k} = I; \,\,\,\, P_{\vec k} P_{\vec l} = \delta_{\vec k \vec l} P_{\vec k}. [/tex]

Thus, the operators [tex]P_{\vec k}[/tex] form a complete set of orthonormal projectors. Furthermore, suppose we have [tex]P_{\vec k} |\psi\rangle = |\psi\rangle[/tex]. Then we will show that for any [tex]j[/tex] we have [tex]P_{jk_j} |\psi\rangle = |\psi\rangle[/tex], so [tex]|\psi\rangle[/tex] is an eigenstate of [tex]H_j[/tex] with eigenvalue [tex]k_j[/tex]. This shows that the operators [tex]P_{\vec k}[/tex] project onto a complete orthonormal set of simultaneous eigenspaces for the [tex]H_j[/tex], and will complete the proof.

Our goal is to show that if [tex]P_{\vec k} |\psi\rangle = |\psi\rangle[/tex] then for any [tex]j[/tex] we have [tex]P_{jk_j} |\psi\rangle = |\psi\rangle[/tex]. To see this, simply multiply both sides of [tex]P_{\vec k} |\psi\rangle = |\psi\rangle[/tex] by [tex]P_{jk_j}[/tex], and observe that [tex]P_{jk_j} P_{\vec k} = P_{\vec k}[/tex]. This gives [tex]P_{\vec k}|\psi\rangle = P_{jk_j}|\psi\rangle[/tex]. But [tex]P_{\vec k}|\psi\rangle = |\psi\rangle[/tex], so we obtain [tex]|\psi\rangle = P_{j k_j}|\psi\rangle[/tex], which completes the proof.

Fermions and the Jordan-Wigner transform III: Consequences of the Canonical Commutation Relations

Back to Fermions again! In today’s post we’ll show that the CCRs uniquely determine the action of the fermionic operators [tex]a_j[/tex], up to a choice of basis. Mathematically, the argument is somewhat detailed, but it’s also the kind of argument that rewards detailed study, especially if studied in conjunction with related topics, such as the representation theory of [tex]su(2)[/tex]. You’ll need to look elsewhere for that, however!

Consequences of the fermionic CCRs

We will assume that the vector space [tex]V[/tex] is finite dimensional, and that there are [tex]n[/tex] operators [tex]a_1,\ldots,a_n[/tex] acting on [tex]V[/tex] and satisfying the Fermionic CCRs. At the end of this paragraph we’re going to give a broad outline of the steps we go through. Upon a first read, some of these steps may appear a little mysterious to the reader not familiar with representation theory. In particular, please don’t worry if you get a little stuck in your understanding of the outline at some points, as the exposition is very much at the bird’s-eye level, and not all detail is visible at that level. Nonetheless, the reason for including this broad outline is the belief that repeated study will pay substantial dividends, if it is read in conjunction with the more detailed exposition to follow, or similar material on, e.g., representations of the Lie algebra [tex]su(2)[/tex]. Indeed, the advantage of operating at the bird’s-eye level is that it makes it easier to see the connections between these ideas, and the use of similar ideas in other branches of representation theory.

We’ll start by showing that the operators [tex]a_j^\dagger a_j[/tex] are positive Hermitian operators with eigenvalues [tex]0[/tex] and [tex]1[/tex].
We’ll show that [tex]a_j[/tex] acts as a lowering operator for [tex]a_j^\dagger a_j[/tex], in the sense that if [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex], then [tex]a_j |\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. If [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex], then [tex]a_j |\psi\rangle[/tex] vanishes.
Similarly, [tex]a_j^\dagger[/tex] acts as a raising operator for [tex]a_j^\dagger a_j[/tex], in the sense that if [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex], then [tex]a_j^\dagger |\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex]. If [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex], then [tex]a_j^\dagger |\psi\rangle[/tex] vanishes.
We prove that the operators [tex]a_j^\dagger a_j[/tex] form a mutually commuting set of Hermitian matrices, and thus there exists a state [tex]|\psi\rangle[/tex] which is a simultaneous eigenstate of [tex]a_j^\dagger a_j[/tex] for all values [tex]j=1,\ldots,n[/tex].
By raising and lowering the state [tex]|\psi\rangle[/tex] in all possible combinations, we’ll construct a set of [tex]2^n[/tex] orthonormal states which are simultaneous eigenstates of the [tex]a_j^\dagger a_j[/tex]. The corresponding vector of eigenvalues uniquely labels each state in this orthonormal basis.
Suppose the vector space spanned by these [tex]2^n[/tex] simultaneous eigenstates is [tex]W[/tex]. At this point, we know that [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] map [tex]W[/tex] into [tex]W[/tex], and, indeed, we know everything about the action [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] have on [tex]W[/tex].
Suppose we define [tex]W_\perp[/tex] to be the orthocomplement of [tex]W[/tex] in [tex]V[/tex]. Then we’ll show that the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] map [tex]W_\perp[/tex] into itself, and their restrictions to [tex]W_\perp[/tex] satisfy the Fermionic CCRs. We can then repeat the above procedure, and identify a [tex]2^n[/tex]-dimensional subspace of [tex]W_\perp[/tex] on which we know the action of the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] exactly.
We iterate this procedure until [tex]W_\perp[/tex] is the trivial vector space, at which point it is no longer possible to continue. At this point we have established an orthonormal basis for the whole of [tex]V[/tex] with respect to which we can explicilty write down the action of both [tex]a_j[/tex] and [tex]a_j^\dagger[/tex].

Let’s go through each of these steps in more detail.

The [tex]a_j^\dagger a_j[/tex] are positive Hermitian with eigenvalues [tex]0[/tex] and [tex]1[/tex]: Observe that the [tex]a_j^\dagger a_j[/tex] are positive (and thus Hermitian) matrices. We will show that [tex](a_j^\dagger a_j)^2 = a_j^\dagger a_j[/tex], and thus the eigenvalues of [tex]a_j^\dagger a_j[/tex] are all [tex]0[/tex] or [tex]1[/tex].

To see this, observe that [tex](a_j^\dagger a_j)^2 = a_j^\dagger a_j a_j^\dagger a_j = -(a_j^\dagger)^2 a_j^2 + a_j^\dagger a_j[/tex], where we used the CCR [tex]\{a_j,a_j^\dagger \} = I[/tex]. Note also that [tex]a_j^2 = 0[/tex] by the CCR [tex]\{a_j,a_j\} = 0[/tex]. It follows that [tex](a_j^\dagger a_j)^2 = a_j^\dagger a_j[/tex], as claimed.

The [tex]a_j[/tex] are lowering operators: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex]. Then we claim that [tex]a_j|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. To see that [tex]a_j|\psi\rangle[/tex] is normalized, note that [tex]\langle \psi|a_j^\dagger a_j |\psi\rangle = \langle \psi|\psi\rangle = 1[/tex], where we used the fact that [tex]|\psi\rangle[/tex] is an eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex] to establish the first equality. To see that it has eigenvalue [tex]0[/tex], note that [tex]a_j^\dagger a_j a_j|\psi\rangle = 0[/tex], since [tex]\{ a_j,a_j \} = 0[/tex].

Exercise: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. Show that [tex]a_j |\psi\rangle = 0[/tex].

The [tex]a_j[/tex] are raising operators: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. Then we claimed that [tex]a_j^\dagger|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex].

To see the normalization, we use the CCR [tex]\{ a_j,a_j^\dagger \} = I[/tex] to deduce [tex]\langle \psi|a_j a_j^\dagger |\psi\rangle = -\langle \psi|a_j^\dagger a_j|\psi\rangle + \langle \psi|\psi\rangle[/tex]. But [tex]a_j^\dagger a_j|\psi\rangle = 0[/tex], by the eigenvalue assumption, and [tex]\langle \psi|\psi\rangle = 1[/tex], whence [tex]\langle \psi|a_j a_j^\dagger |\psi\rangle = 1[/tex], which is the desired normalization condition.

To see that [tex]a_j^\dagger |\psi\rangle[/tex] is an eigenstate with eigenvalue [tex]1[/tex], use the CCR [tex]\{a_j,a_j^\dagger \} = I[/tex] to deduce that [tex]a_j^\dagger a_j a_j^\dagger |\psi\rangle = – a_j^\dagger a_j^\dagger a_j|\psi\rangle + a_j^\dagger|\psi\rangle = a_j^\dagger |\psi\rangle[/tex], where the final equality can be deduced either from the assumption that [tex]a_j^\dagger a_j |\psi\rangle = 0[/tex], or from the CCR [tex]\{ a_j^\dagger,a_j^\dagger \} = 0[/tex]. This is the desired eigenvalue equation for [tex]a_j^\dagger|\psi\rangle[/tex].

Exercise: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex]. Show that [tex]a_j^\dagger |\psi\rangle = 0[/tex].

The [tex]a_j^\dagger a_j[/tex] form a mutually commuting set of observables: To see this, let [tex]j \neq k[/tex], and apply the CCRs repeatedly to obtain [tex]a_j^\dagger a_j a_k^\dagger a_k = a_k^\dagger a_k a_j^\dagger a_j[/tex], which is the desired commutativity.

Existence of a common eigenstate: It is well known that a mutually commuting set of Hermitian operators possesses a common eigenbasis. This fact is usually taught in undergraduate quantum mechanics courses; for completeness, I’ve included a simple proof in an appendix to these notes. We won’t make use of the full power of this result here, but instead simply use the fact that there exists a normalized state [tex]|\psi\rangle[/tex] which is a simultaneous eigenstate of all the [tex]a_j^\dagger a_j[/tex] operators. In particular, for all [tex]j[/tex] we have:

[tex] a_j^\dagger a_j |\psi\rangle = \alpha_j |\psi\rangle, [/tex]

where for each [tex]j[/tex] either [tex]\alpha_j = 0[/tex] or [tex]\alpha_j = 1[/tex]. It will be convenient to assume that [tex]\alpha_j = 0[/tex] for all [tex]j[/tex]. This assumption can be made without loss of generality, by applying lowering operators to the [tex]|\psi\rangle[/tex] for each [tex]j[/tex] such that [tex]\alpha_j = 1[/tex], resulting in a normalized state [tex]|\mbox{vac}\rangle[/tex] such that [tex]a_j^\dagger a_j |\mbox{vac}\rangle = 0[/tex] for all [tex]j[/tex].

Defining an orthonormal basis: For any vector [tex]\alpha = (\alpha_1,\ldots,\alpha_n)[/tex], where each [tex]\alpha_j = 0[/tex] or [tex]1[/tex], define a corresponding state:

[tex] |\alpha \rangle \equiv (a_1^\dagger)^{\alpha_1} \ldots (a_n^\dagger)^{\alpha_n}|\mbox{vac}\rangle. [/tex]

It is clear that there are [tex]2^n[/tex] such states [tex]|\alpha\rangle[/tex], and that they form an orthonormal set spanning a subspace of [tex]V[/tex] that we shall call [tex]W[/tex].

The action of the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] on [tex]W[/tex]: How do [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] act on [tex]W[/tex]? Stated another way, how do they act on the orthonormal basis we have constructed for [tex]W[/tex], the states [tex]|\alpha\rangle[/tex]? Applying the CCRs and the definition of the states [tex]|\alpha\rangle[/tex] it is easy to verify that the action of [tex]a_j[/tex] is as follows:

Suppose [tex]\alpha_j = 0[/tex]. Then [tex]a_j|\alpha\rangle = 0[/tex].
Suppose [tex]\alpha_j = 1[/tex]. Let [tex]\tilde \alpha[/tex] be that vector which results when the [tex]j[/tex]th entry of [tex]\alpha[/tex] is changed to [tex]0[/tex]. Then [tex]a_j|\alpha\rangle = -(-1)^{s_\alpha^j} |\tilde \alpha\rangle[/tex], where [tex]s_\alpha^j \equiv \sum_{k=1}^{j-1} \alpha_k[/tex].

The action of [tex]a_j^\dagger[/tex] on [tex]W[/tex] is similar:

Suppose [tex]\alpha_j = 0[/tex]. Let [tex]\tilde \alpha[/tex] be that vector which results when the [tex]j[/tex]th entry of [tex]\alpha[/tex] is changed to [tex]1[/tex]. Then [tex]a_j^\dagger|\alpha\rangle = -(-1)^{s_\alpha^j}|\tilde \alpha\rangle[/tex], where [tex]s_\alpha^j \equiv \sum_{k=1}^{j-1} \alpha_k[/tex].
Suppose [tex]\alpha_j = 1[/tex]. Then [tex]a_j^\dagger |\alpha\rangle = 0[/tex].

Action of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] on [tex]W_\perp[/tex]: We have described the action of the [tex]a_j[/tex] and the [tex]a_j^\dagger[/tex] on the subspace [tex]W[/tex]. What of the action of these operators on the remainder of [tex]V[/tex]? To answer that question, we first show that [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] map the orthocomplement [tex]W_\perp[/tex] into itself.

To see this, let [tex]|\psi\rangle \in W_\perp[/tex], and consider [tex]a_j|\psi\rangle[/tex]. We wish to show that [tex]a_j|\psi\rangle \in W_\perp[/tex] also, i.e., that for any [tex]|\phi\rangle \in W[/tex] we have [tex]\langle \phi|a_j|\psi\rangle = 0[/tex]. This follows easily by considering the complex conjugate quantity [tex]\langle \psi|a_j^\dagger |\phi\rangle[/tex], and observing that [tex]a_j^\dagger |\phi\rangle \in W[/tex], since [tex]|\phi\rangle \in W[/tex], and thus [tex]\langle \psi|a_j^\dagger |\phi\rangle = 0[/tex]. A similar argument shows that [tex]a_j^\dagger[/tex] maps [tex]W_\perp[/tex] into itself.

Consider now the operators [tex]\tilde a_j[/tex] obtained by restricting [tex]a_j[/tex] to [tex]W_\perp[/tex]. Provided [tex]W_\perp[/tex] is nontrivial it is clear that these operators satisfy the CCRs on [tex]W_\perp[/tex]. Repeating the above argument, we can therefore identify a [tex]2^n[/tex]-dimensional subspace of [tex]W_\perp[/tex] on which we can compute the action of [tex]\tilde a_j[/tex] and [tex]\tilde a_j^\dagger[/tex], and thus of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex].

We may iterate this procedure many times, but the fact that [tex]V[/tex] is finite dimensional means that the process must eventually terminate. At the point of termination we will have broken up [tex]V[/tex] as a direct sum of some finite number [tex]d[/tex] of orthonormal [tex]2^n[/tex]-dimensional vector spaces, [tex]W_1,W_2,\ldots,W_d[/tex], and on each vector space we will have an orthonormal basis with respect to which the action of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] is known precisely.

Stated another way, we can introduce an orthonormal basis [tex]|\alpha,k\rangle[/tex] for [tex]V[/tex], where [tex]\alpha[/tex] runs over all [tex]n[/tex]-bit vectors, and [tex]k = 1,\ldots,d[/tex], and such that the action of the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] is to leave [tex]k[/tex] invariant, and to act on [tex]|\alpha\rangle[/tex] as described above. In this representation it is clear that [tex]V[/tex] can be regarded as a tensor product [tex]C^{2^n} \otimes C^d[/tex], with the action of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] trivial on the [tex]C^d[/tex] component. We will call this the occupation number representation for the Fermi algebra [tex]a_j[/tex].

It’s worth pausing to appreciate what has been achieved here: starting only from the CCRs for [tex]a_1,\ldots,a_n[/tex] we have proved that [tex]V[/tex] can be broken down into a tensor product of a [tex]2^n[/tex]-dimensional vector space and a [tex]d[/tex]-dimensional vector space, with the [tex]a_j[/tex]s acting nontrivially only on the [tex]2^n[/tex]-dimensional component. Furthermore, the action of the [tex]a_j[/tex]s is completely known. I think it’s quite remarkable that we can say so much: at the outset it wasn’t even obvious that the dimension of [tex]V[/tex] should be a multiple of [tex]2^n[/tex]!

When [tex]d=1[/tex] we will call this the fundamental representation for the Fermionic CCRs. (This is the terminology I use, but I don’t know if it is standard or not.) Up to a change of basis it is clear that all other representations can be obtained by taking a tensor product of the fundamental representation with the trivial action on a [tex]d[/tex]-dimensional vector space.

We’ve now understood the fundamental mathematical fact about fermions: the mere existence of operators satisfying Fermionic canonical commutation relations completely determines the action of those operators with respect to some suitable orthonormal occuptation number basis. That’s a very strong statement! In the next post we’ll use this technology to study a problem of direct physical interest: finding the energy spectrum and eigenstates of a Hamiltonian which is quadratic in terms of Fermi operators.

Expander graphs V: random walks on expanders

Back to expanders again. Today’s post looks at a simple application of expanders, showing that a random walk on an expander graph is likely to quickly escape from any sufficiently small subset of vertices. Intuitively, of course, this result is not surprising, but the exact quantitative form of the result turns out to be extremely useful in the next post, which is about decreasing the number of random bits used by a randomized algorithm.

Random walks on expanders

Many applications of expanders involve doing a random walk on the expander. We start at some chosen vertex, and then repeatedly move to any one of the [tex]d[/tex] neighbours, each time choosing a neighbour uniformly at random, and independently of prior choices.

To describe this random walk, suppose at some given time we have a probability distribution [tex]p[/tex] describing the probability of being at any given vertex in the graph [tex]G[/tex]. We then apply one step of the random walk procedure described above, i.e., selecting a neighbour of the current vertex uniformly at random. The updated probability distribution is easily verified to be:

[tex] \tilde p = \frac{A(G)}{d} p. [/tex]

That is, the Markov transition matrix describing this random walk is just [tex]\hat A(G) \equiv A(G)/d[/tex], i.e., up to a constant of proportionality the transition matrix is just the adjacency matrix. This relationship between the adjacency matrix and random walks opens up a whole new world of connections between graphs and Markov chains.

One of the most important connections is between the eigenvalues of Markov transition matrices and the rate at which the Markov chain converges to its stationary distribution. In particular, the following beautiful theorem tells us that when the uniform distribution is a stationary distribution for the chain, then the Markov chain converges to the uniform distribution exponentially quickly, at a rate determined by the second largest eigenvalue of [tex]M[/tex].

Exercise: Show that if [tex]M[/tex] is a normal transition matrix for a Markov chain then [tex]1 = \lambda_1(M) \geq \lambda_2(M) \geq \ldots[/tex].

Theorem: Suppose [tex]M[/tex] is a normal transition matrix for a Markov chain on [tex]n[/tex] states, with the uniform distribution [tex]u = \vec 1/n[/tex] as a stationary point, [tex]M u = u[/tex]. Then for any starting distribution [tex]p[/tex],

[tex] \| M^t p – u \|_1 \leq \sqrt{n} \lambda_2(M)^t, [/tex]

where [tex]\| \cdot \|_1[/tex] denotes the [tex]l_1[/tex] norm.

The normality condition in this theorem may appear a little surprising. The reason it’s there is to ensure that [tex]M[/tex] can be diagonalized. The theorem can be made to work for general [tex]M[/tex], with the second largest eigenvalue replaced by the second largest singular value. However, in our situation [tex]M[/tex] is symmetric, and thus automatically normal, and we prefer the statement in terms of eigenvalues, since it allows us to make a connection to the expansion parameter of a graph. In particular, when [tex]M = \hat A(G)[/tex] we obtain:

[tex] \| \hat A(G)^t p-u\|_1 \leq \sqrt{n} \left(\frac{\lambda_2(G)}{d}\right)^t. [/tex]

Combining this with our earlier results connecting the gap to the expansion parameter, we deduce that

[tex] \| \hat A(G)^t p-u\|_1 \leq \sqrt{n} \left(1-\frac{h(G)^2}{2d^2}\right)^t. [/tex]

Thus, for a family of expander graphs, the rate of convergence of the Markov chain is exponentially fast in the number of time steps [tex]t[/tex].

Exercise: Suppose [tex]M[/tex] is a transition matrix for a Markov chain. Show that the uniform distribution [tex]u[/tex] is a stationary point point for the chain, i.e., [tex]Mu = u[/tex], if and only if [tex]M[/tex] is doubly stochastic, i.e., has non-zero entries, and all rows and columns of the matrix sum to [tex]1[/tex].

Proof: We start by working with the [tex]l_2[/tex] norm [tex]\| \cdot \|_2[/tex]. Since [tex]Mu = u[/tex] we have [tex]M^t u = u[/tex], and so:

[tex] \|M^t p – u \|_2 = \|M^t(p-u) \|_2. [/tex]

A computation shows that [tex]p-u[/tex] is orthogonal to [tex]u[/tex]. But [tex]u[/tex] is an eigenvector of [tex]M[/tex] with the maximum eigenvalue, [tex]1[/tex], and thus [tex]p-u[/tex] must lie in the span of the eigenspaces with eigenvalues [tex]\lambda_2(M),\lambda_3(M),\ldots[/tex]. It follows that

[tex] \|M^t(p-u)\|_2 \leq \lambda_2(M)^t \|p-u\|_2 \leq \lambda_2(M)^t, [/tex]

where we used the fact that [tex]\| p-u\|_2 \leq 1[/tex], easily established by observing that [tex]\|p-u\|_2[/tex] is convex in [tex]p[/tex], and thus must be maximized at an extreme point in the space of probability distributions; the symmetry of [tex]u[/tex] ensures that without loss of generality we may take [tex]p = (1,0,\ldots,0)[/tex]. To convert this into a result about the [tex]l_1[/tex] norm, we use the fact that in [tex]n[/tex] dimensions [tex]\|v\|_1 \leq \sqrt{n} \|v\|_2[/tex], and thus we obtain

[tex] \|M^t(p-u)\|_1 \leq \sqrt{n} \lambda_2(M)^t, [/tex]

which was the desired result. QED

What other properties do random walks on expanders have? We now prove another beautiful theorem which tells us that they “move around quickly”, in the sense that they are exponentially unlikely to stay for long within a given subset of vertices, [tex]B[/tex], unless [tex]B[/tex] is very large.

More precisely, suppose [tex]B[/tex] is a subset of vertices, and we choose some vertex [tex]X_0[/tex] uniformly at random from the graph. Suppose we use [tex]X_0[/tex] as the starting point for a random walk, [tex]X_0,\ldots,X_t[/tex], where [tex]X_t[/tex] is the vertex after the [tex]t[/tex]th step. Let [tex]B(t)[/tex] be the event that [tex]X_j \in B[/tex] for all [tex]j[/tex] in the range [tex]0,\ldots,t[/tex]. Then we will prove that:

[tex] \mbox{Pr}(B(t)) \leq \left( \frac{\lambda_2(G)}{d} + \frac{|B|}{n} \right)^t [/tex]

Provided [tex]\lambda_2(G)/d + |B|/n < 1[/tex], we get the desired exponential decrease in probability. For a family of expander graphs it follows that there is some constant [tex]\epsilon > 0[/tex] such that we get an exponential decrease for any [tex]B[/tex] such that [tex]|B|/n < \epsilon[/tex]. These results are special cases of the following more general theorem about Markov chains. Theorem: Let [tex]X_0[/tex] be uniformly distributed on [tex]n[/tex] states, and let [tex]X_0,\ldots,X_t[/tex] be a time-homogeneous Markov chain with transition matrix [tex]M[/tex]. Suppose the uniform distribution [tex]u[/tex] is a stationary point of [tex]M[/tex], i.e., [tex]Mu = u[/tex]. Let [tex]B[/tex] be a subset of the states, and let [tex]B(t)[/tex] be the event that [tex]X_j \in B[/tex] for all [tex]j \in 0,\ldots,t[/tex]. Then [tex] \mbox{Pr}(B(t)) \leq \left( \lambda_2(M) + \frac{|B|}{n} \right)^t. [/tex] Proof: The first step in the proof is to observe that: [tex] \mbox{Pr}(B(t)) = \|(PMP)^t P u \|_1, [/tex] where the operator [tex]P[/tex] projects onto the vector space spanned by those basis vectors corresponding to elements of [tex]B[/tex]. This equation is not entirely obvious, and proving it is a good exercise for the reader. The next step is to prove that [tex]\| PMP \| \leq \lambda_2(M)+|B|/n[/tex], where the norm here is the operator norm. We will do this below, but note first that once this is done, the result follows, for we have [tex] \mbox{Pr}(B(t)) = \| (PMP)^t P u \|_1 \leq \sqrt{n} \| (PMP)^t P u \|_2 [/tex] by the standard inequality relating [tex]l_1[/tex] and [tex]l_2[/tex] norms, and thus [tex] \mbox{Pr}(B(t)) \leq \sqrt{n} \| PMP \|^t \| P u \|_2, [/tex] by definition of the operator norm, and finally [tex] \mbox{Pr}(B(t)) \leq \left( \lambda_2(M)+\frac{|B|}{n} \right)^t, [/tex] where we used the assumed inequality for the operator norm, and the observation that [tex]\| P u \|_2 = \sqrt{|B|}/n \leq 1/\sqrt{n}[/tex]. To prove the desired operator norm inequality, [tex]\| PMP \| \leq \lambda_2(M)+|B|/n[/tex], suppose [tex]v[/tex] is a normalized state such that [tex]\| PMP \| = |v^T PMP v|[/tex]. Decompose [tex]Pv = \alpha u + \beta u_\perp[/tex], where [tex]u_\perp[/tex] is a normalized state orthogonal to [tex]u[/tex]. Since [tex]\|P v \|_2 \leq \|v \|_2 = 1[/tex] we must have [tex]|\beta| \leq 1[/tex]. Furthermore, multiplying [tex]Pv = \alpha u + \beta u_\perp[/tex] on the left by [tex]nu^T[/tex] shows that [tex]\alpha = n u^T P v[/tex]. It follows that [tex]|\alpha|[/tex] is maximized by choosing [tex]v[/tex] to be uniformly distributed over [tex]B[/tex], from which it follows that [tex]|\alpha| \leq \sqrt{|B|}[/tex]. A little algebra shows that [tex] v^T PMP v = \alpha^2 u^T M u + \beta^2 u_\perp^T M u_\perp. [/tex] Applying [tex]|\alpha| \leq \sqrt{|B|}[/tex], [tex]u^T M u = u^Tu = 1/n[/tex], [tex]|\beta| \leq 1[/tex], and [tex]u_\perp^T M u_\perp \leq \lambda_2(M)[/tex] gives [tex] v^T P M P v \leq \frac{|B|}{n} + \lambda_2(M), [/tex] which completes the proof. QED The results in today's post are elegant, but qualitatively unsurprising. (Of course, having elegant quantitative statements of results that are qualitatively clear is worthwhile in its own right!) In the next post we'll use these ideas to develop a genuinely surprising application of expanders, to reducing the number of random bits required by a probabilistic algorithm in order to achieve a desired success probability.

Quote

“People don’t always want what they need.” – Alan Kay