Adios

I’m off to Boston for a few days, so blogging will be on hold until the middle of next week.

Published
Categorized as General

New paper: Operator quantum error correction

A new paper (postscript), joint with David Poulin, all about operator quantum error correction, a clever way (alas, not my idea!) of protecting quantum states against the effects of noise. The paper provides several sets of easily checkable conditions (both algebraic and information-theoretic) characterizing when operator error-correction is feasible.

Postscript: Dave Bacon has a just put out a nice paper related to operator quantum error-correction, constructing some physically motivated examples of self-correcting quantum memories. This idea – that a quantum system can correct itself – is absolutely fascinating, and runs completely counter to the standard folk “wisdom” about quantum mechanics, namely that quantum states are delicate objects that are easily (and irreversibly) destroyed.

Published
Categorized as General

Appendix to my posts on Fermions and the Jordan-Wigner transform

This is a little appendix to my post about the consequences of the fermionic CCRs. The results described in the appendix are very well-known – they are taught in any undergrad quantum course – but I’m rather fond of the little proof given, and so am indulging myself by including it here. The results are used in the previous post.

Note: This post is one in a series describing fermi algebras, and a powerful tool known as the Jordan-Wigner transform, which allows one to move back and forth between describing a system as a collection of qubits, and as a collection of fermions. The posts assume familiarity with elementary quantum mechanics, comfort with elementary linear algebra (but not advanced techniques), and a little familiarity with the basic nomenclature of quantum information science (qubits, the Pauli matrices).

Appendix on mutually commuting observables

Any undergraduate quantum mechanics course covers the fact that a mutually commuting set of Hermitian operators possesses a common eigenbasis. Unfortunately, in my experience this fact is usually proved rather early on, and suffers from being presented in a slightly too elementary fashion, with inductive constructions of explicit basis sets and so on. The following proof is still elementary, but from a slightly more sophisticated perspective. It is, I like to imagine, rather more like what would be given in an advanced course in linear algebra, were linear algebraists to actually cover this kind of material. (They don’t, so far as I know, having other fish to fry.)

Suppose [tex]H_1,\ldots,H_n[/tex] are commuting Hermitian (indeed, normal suffices) operators with spectral decompositions:

[tex] H_j = \sum_{jk} E_{jk} P_{jk}, [/tex]

where [tex]E_{jk}[/tex] are the eigenvalues of [tex]H_j[/tex], and [tex]P_{jk}[/tex] are the corresponding projectors. Since the [tex]H_j[/tex] commute, it is not difficult to verify that for any quadruple [tex]j,k,l,m[/tex] the operators [tex]P_{jk}[/tex] and [tex]P_{lm}[/tex] also commute. For a vector [tex]\vec k = (k_1,\ldots,k_n)[/tex] define the operator

[tex] P_{\vec k} \equiv P_{1 k_1} P_{2 k_2} \ldots P_{n k_n}. [/tex]

Note that the order of the operators on the right-hand side does not matter, since they all commute with one another. The following equations all follow easily by direct computation, the mutual commutativity of the [tex]P_{jk}[/tex] operators, and standard properties of the spectral decomposition:

[tex] P_{\vec k}^\dagger = P_{\vec k}; \,\,\,\, \sum_{\vec k} P_{\vec k} = I; \,\,\,\, P_{\vec k} P_{\vec l} = \delta_{\vec k \vec l} P_{\vec k}. [/tex]

Thus, the operators [tex]P_{\vec k}[/tex] form a complete set of orthonormal projectors. Furthermore, suppose we have [tex]P_{\vec k} |\psi\rangle = |\psi\rangle[/tex]. Then we will show that for any [tex]j[/tex] we have [tex]P_{jk_j} |\psi\rangle = |\psi\rangle[/tex], so [tex]|\psi\rangle[/tex] is an eigenstate of [tex]H_j[/tex] with eigenvalue [tex]k_j[/tex]. This shows that the operators [tex]P_{\vec k}[/tex] project onto a complete orthonormal set of simultaneous eigenspaces for the [tex]H_j[/tex], and will complete the proof.

Our goal is to show that if [tex]P_{\vec k} |\psi\rangle = |\psi\rangle[/tex] then for any [tex]j[/tex] we have [tex]P_{jk_j} |\psi\rangle = |\psi\rangle[/tex]. To see this, simply multiply both sides of [tex]P_{\vec k} |\psi\rangle = |\psi\rangle[/tex] by [tex]P_{jk_j}[/tex], and observe that [tex]P_{jk_j} P_{\vec k} = P_{\vec k}[/tex]. This gives [tex]P_{\vec k}|\psi\rangle = P_{jk_j}|\psi\rangle[/tex]. But [tex]P_{\vec k}|\psi\rangle = |\psi\rangle[/tex], so we obtain [tex]|\psi\rangle = P_{j k_j}|\psi\rangle[/tex], which completes the proof.

Published
Categorized as General

Fermions and the Jordan-Wigner transform III: Consequences of the Canonical Commutation Relations

Back to Fermions again! In today’s post we’ll show that the CCRs uniquely determine the action of the fermionic operators [tex]a_j[/tex], up to a choice of basis. Mathematically, the argument is somewhat detailed, but it’s also the kind of argument that rewards detailed study, especially if studied in conjunction with related topics, such as the representation theory of [tex]su(2)[/tex]. You’ll need to look elsewhere for that, however!

Note: This post is one in a series describing fermi algebras, and a powerful tool known as the Jordan-Wigner transform, which allows one to move back and forth between describing a system as a collection of qubits, and as a collection of fermions. The posts assume familiarity with elementary quantum mechanics, comfort with elementary linear algebra (but not advanced techniques), and a little familiarity with the basic nomenclature of quantum information science (qubits, the Pauli matrices).

Consequences of the fermionic CCRs

We will assume that the vector space [tex]V[/tex] is finite dimensional, and that there are [tex]n[/tex] operators [tex]a_1,\ldots,a_n[/tex] acting on [tex]V[/tex] and satisfying the Fermionic CCRs. At the end of this paragraph we’re going to give a broad outline of the steps we go through. Upon a first read, some of these steps may appear a little mysterious to the reader not familiar with representation theory. In particular, please don’t worry if you get a little stuck in your understanding of the outline at some points, as the exposition is very much at the bird’s-eye level, and not all detail is visible at that level. Nonetheless, the reason for including this broad outline is the belief that repeated study will pay substantial dividends, if it is read in conjunction with the more detailed exposition to follow, or similar material on, e.g., representations of the Lie algebra [tex]su(2)[/tex]. Indeed, the advantage of operating at the bird’s-eye level is that it makes it easier to see the connections between these ideas, and the use of similar ideas in other branches of representation theory.

  • We’ll start by showing that the operators [tex]a_j^\dagger a_j[/tex] are positive Hermitian operators with eigenvalues [tex]0[/tex] and [tex]1[/tex].
  • We’ll show that [tex]a_j[/tex] acts as a lowering operator for [tex]a_j^\dagger a_j[/tex], in the sense that if [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex], then [tex]a_j |\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. If [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex], then [tex]a_j |\psi\rangle[/tex] vanishes.
  • Similarly, [tex]a_j^\dagger[/tex] acts as a raising operator for [tex]a_j^\dagger a_j[/tex], in the sense that if [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex], then [tex]a_j^\dagger |\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex]. If [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex], then [tex]a_j^\dagger |\psi\rangle[/tex] vanishes.
  • We prove that the operators [tex]a_j^\dagger a_j[/tex] form a mutually commuting set of Hermitian matrices, and thus there exists a state [tex]|\psi\rangle[/tex] which is a simultaneous eigenstate of [tex]a_j^\dagger a_j[/tex] for all values [tex]j=1,\ldots,n[/tex].
  • By raising and lowering the state [tex]|\psi\rangle[/tex] in all possible combinations, we’ll construct a set of [tex]2^n[/tex] orthonormal states which are simultaneous eigenstates of the [tex]a_j^\dagger a_j[/tex]. The corresponding vector of eigenvalues uniquely labels each state in this orthonormal basis.
  • Suppose the vector space spanned by these [tex]2^n[/tex] simultaneous eigenstates is [tex]W[/tex]. At this point, we know that [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] map [tex]W[/tex] into [tex]W[/tex], and, indeed, we know everything about the action [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] have on [tex]W[/tex].
  • Suppose we define [tex]W_\perp[/tex] to be the orthocomplement of [tex]W[/tex] in [tex]V[/tex]. Then we’ll show that the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] map [tex]W_\perp[/tex] into itself, and their restrictions to [tex]W_\perp[/tex] satisfy the Fermionic CCRs. We can then repeat the above procedure, and identify a [tex]2^n[/tex]-dimensional subspace of [tex]W_\perp[/tex] on which we know the action of the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] exactly.
  • We iterate this procedure until [tex]W_\perp[/tex] is the trivial vector space, at which point it is no longer possible to continue. At this point we have established an orthonormal basis for the whole of [tex]V[/tex] with respect to which we can explicilty write down the action of both [tex]a_j[/tex] and [tex]a_j^\dagger[/tex].

Let’s go through each of these steps in more detail.

The [tex]a_j^\dagger a_j[/tex] are positive Hermitian with eigenvalues [tex]0[/tex] and [tex]1[/tex]: Observe that the [tex]a_j^\dagger a_j[/tex] are positive (and thus Hermitian) matrices. We will show that [tex](a_j^\dagger a_j)^2 = a_j^\dagger a_j[/tex], and thus the eigenvalues of [tex]a_j^\dagger a_j[/tex] are all [tex]0[/tex] or [tex]1[/tex].

To see this, observe that [tex](a_j^\dagger a_j)^2 = a_j^\dagger a_j a_j^\dagger a_j = -(a_j^\dagger)^2 a_j^2 + a_j^\dagger a_j[/tex], where we used the CCR [tex]\{a_j,a_j^\dagger \} = I[/tex]. Note also that [tex]a_j^2 = 0[/tex] by the CCR [tex]\{a_j,a_j\} = 0[/tex]. It follows that [tex](a_j^\dagger a_j)^2 = a_j^\dagger a_j[/tex], as claimed.

The [tex]a_j[/tex] are lowering operators: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex]. Then we claim that [tex]a_j|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. To see that [tex]a_j|\psi\rangle[/tex] is normalized, note that [tex]\langle \psi|a_j^\dagger a_j |\psi\rangle = \langle \psi|\psi\rangle = 1[/tex], where we used the fact that [tex]|\psi\rangle[/tex] is an eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex] to establish the first equality. To see that it has eigenvalue [tex]0[/tex], note that [tex]a_j^\dagger a_j a_j|\psi\rangle = 0[/tex], since [tex]\{ a_j,a_j \} = 0[/tex].

Exercise: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. Show that [tex]a_j |\psi\rangle = 0[/tex].

The [tex]a_j[/tex] are raising operators: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]0[/tex]. Then we claimed that [tex]a_j^\dagger|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex].

To see the normalization, we use the CCR [tex]\{ a_j,a_j^\dagger \} = I[/tex] to deduce [tex]\langle \psi|a_j a_j^\dagger |\psi\rangle = -\langle \psi|a_j^\dagger a_j|\psi\rangle + \langle \psi|\psi\rangle[/tex]. But [tex]a_j^\dagger a_j|\psi\rangle = 0[/tex], by the eigenvalue assumption, and [tex]\langle \psi|\psi\rangle = 1[/tex], whence [tex]\langle \psi|a_j a_j^\dagger |\psi\rangle = 1[/tex], which is the desired normalization condition.

To see that [tex]a_j^\dagger |\psi\rangle[/tex] is an eigenstate with eigenvalue [tex]1[/tex], use the CCR [tex]\{a_j,a_j^\dagger \} = I[/tex] to deduce that [tex]a_j^\dagger a_j a_j^\dagger |\psi\rangle = – a_j^\dagger a_j^\dagger a_j|\psi\rangle + a_j^\dagger|\psi\rangle = a_j^\dagger |\psi\rangle[/tex], where the final equality can be deduced either from the assumption that [tex]a_j^\dagger a_j |\psi\rangle = 0[/tex], or from the CCR [tex]\{ a_j^\dagger,a_j^\dagger \} = 0[/tex]. This is the desired eigenvalue equation for [tex]a_j^\dagger|\psi\rangle[/tex].

Exercise: Suppose [tex]|\psi\rangle[/tex] is a normalized eigenstate of [tex]a_j^\dagger a_j[/tex] with eigenvalue [tex]1[/tex]. Show that [tex]a_j^\dagger |\psi\rangle = 0[/tex].

The [tex]a_j^\dagger a_j[/tex] form a mutually commuting set of observables: To see this, let [tex]j \neq k[/tex], and apply the CCRs repeatedly to obtain [tex]a_j^\dagger a_j a_k^\dagger a_k = a_k^\dagger a_k a_j^\dagger a_j[/tex], which is the desired commutativity.

Existence of a common eigenstate: It is well known that a mutually commuting set of Hermitian operators possesses a common eigenbasis. This fact is usually taught in undergraduate quantum mechanics courses; for completeness, I’ve included a simple proof in an appendix to these notes. We won’t make use of the full power of this result here, but instead simply use the fact that there exists a normalized state [tex]|\psi\rangle[/tex] which is a simultaneous eigenstate of all the [tex]a_j^\dagger a_j[/tex] operators. In particular, for all [tex]j[/tex] we have:

[tex] a_j^\dagger a_j |\psi\rangle = \alpha_j |\psi\rangle, [/tex]

where for each [tex]j[/tex] either [tex]\alpha_j = 0[/tex] or [tex]\alpha_j = 1[/tex]. It will be convenient to assume that [tex]\alpha_j = 0[/tex] for all [tex]j[/tex]. This assumption can be made without loss of generality, by applying lowering operators to the [tex]|\psi\rangle[/tex] for each [tex]j[/tex] such that [tex]\alpha_j = 1[/tex], resulting in a normalized state [tex]|\mbox{vac}\rangle[/tex] such that [tex]a_j^\dagger a_j |\mbox{vac}\rangle = 0[/tex] for all [tex]j[/tex].

Defining an orthonormal basis: For any vector [tex]\alpha = (\alpha_1,\ldots,\alpha_n)[/tex], where each [tex]\alpha_j = 0[/tex] or [tex]1[/tex], define a corresponding state:

[tex] |\alpha \rangle \equiv (a_1^\dagger)^{\alpha_1} \ldots (a_n^\dagger)^{\alpha_n}|\mbox{vac}\rangle. [/tex]

It is clear that there are [tex]2^n[/tex] such states [tex]|\alpha\rangle[/tex], and that they form an orthonormal set spanning a subspace of [tex]V[/tex] that we shall call [tex]W[/tex].

The action of the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] on [tex]W[/tex]: How do [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] act on [tex]W[/tex]? Stated another way, how do they act on the orthonormal basis we have constructed for [tex]W[/tex], the states [tex]|\alpha\rangle[/tex]? Applying the CCRs and the definition of the states [tex]|\alpha\rangle[/tex] it is easy to verify that the action of [tex]a_j[/tex] is as follows:

  • Suppose [tex]\alpha_j = 0[/tex]. Then [tex]a_j|\alpha\rangle = 0[/tex].
  • Suppose [tex]\alpha_j = 1[/tex]. Let [tex]\tilde \alpha[/tex] be that vector which results when the [tex]j[/tex]th entry of [tex]\alpha[/tex] is changed to [tex]0[/tex]. Then [tex]a_j|\alpha\rangle = -(-1)^{s_\alpha^j} |\tilde \alpha\rangle[/tex], where [tex]s_\alpha^j \equiv \sum_{k=1}^{j-1} \alpha_k[/tex].

The action of [tex]a_j^\dagger[/tex] on [tex]W[/tex] is similar:

  • Suppose [tex]\alpha_j = 0[/tex]. Let [tex]\tilde \alpha[/tex] be that vector which results when the [tex]j[/tex]th entry of [tex]\alpha[/tex] is changed to [tex]1[/tex]. Then [tex]a_j^\dagger|\alpha\rangle = -(-1)^{s_\alpha^j}|\tilde \alpha\rangle[/tex], where [tex]s_\alpha^j \equiv \sum_{k=1}^{j-1} \alpha_k[/tex].
  • Suppose [tex]\alpha_j = 1[/tex]. Then [tex]a_j^\dagger |\alpha\rangle = 0[/tex].

Action of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] on [tex]W_\perp[/tex]: We have described the action of the [tex]a_j[/tex] and the [tex]a_j^\dagger[/tex] on the subspace [tex]W[/tex]. What of the action of these operators on the remainder of [tex]V[/tex]? To answer that question, we first show that [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] map the orthocomplement [tex]W_\perp[/tex] into itself.

To see this, let [tex]|\psi\rangle \in W_\perp[/tex], and consider [tex]a_j|\psi\rangle[/tex]. We wish to show that [tex]a_j|\psi\rangle \in W_\perp[/tex] also, i.e., that for any [tex]|\phi\rangle \in W[/tex] we have [tex]\langle \phi|a_j|\psi\rangle = 0[/tex]. This follows easily by considering the complex conjugate quantity [tex]\langle \psi|a_j^\dagger |\phi\rangle[/tex], and observing that [tex]a_j^\dagger |\phi\rangle \in W[/tex], since [tex]|\phi\rangle \in W[/tex], and thus [tex]\langle \psi|a_j^\dagger |\phi\rangle = 0[/tex]. A similar argument shows that [tex]a_j^\dagger[/tex] maps [tex]W_\perp[/tex] into itself.

Consider now the operators [tex]\tilde a_j[/tex] obtained by restricting [tex]a_j[/tex] to [tex]W_\perp[/tex]. Provided [tex]W_\perp[/tex] is nontrivial it is clear that these operators satisfy the CCRs on [tex]W_\perp[/tex]. Repeating the above argument, we can therefore identify a [tex]2^n[/tex]-dimensional subspace of [tex]W_\perp[/tex] on which we can compute the action of [tex]\tilde a_j[/tex] and [tex]\tilde a_j^\dagger[/tex], and thus of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex].

We may iterate this procedure many times, but the fact that [tex]V[/tex] is finite dimensional means that the process must eventually terminate. At the point of termination we will have broken up [tex]V[/tex] as a direct sum of some finite number [tex]d[/tex] of orthonormal [tex]2^n[/tex]-dimensional vector spaces, [tex]W_1,W_2,\ldots,W_d[/tex], and on each vector space we will have an orthonormal basis with respect to which the action of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] is known precisely.

Stated another way, we can introduce an orthonormal basis [tex]|\alpha,k\rangle[/tex] for [tex]V[/tex], where [tex]\alpha[/tex] runs over all [tex]n[/tex]-bit vectors, and [tex]k = 1,\ldots,d[/tex], and such that the action of the [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] is to leave [tex]k[/tex] invariant, and to act on [tex]|\alpha\rangle[/tex] as described above. In this representation it is clear that [tex]V[/tex] can be regarded as a tensor product [tex]C^{2^n} \otimes C^d[/tex], with the action of [tex]a_j[/tex] and [tex]a_j^\dagger[/tex] trivial on the [tex]C^d[/tex] component. We will call this the occupation number representation for the Fermi algebra [tex]a_j[/tex].

It’s worth pausing to appreciate what has been achieved here: starting only from the CCRs for [tex]a_1,\ldots,a_n[/tex] we have proved that [tex]V[/tex] can be broken down into a tensor product of a [tex]2^n[/tex]-dimensional vector space and a [tex]d[/tex]-dimensional vector space, with the [tex]a_j[/tex]s acting nontrivially only on the [tex]2^n[/tex]-dimensional component. Furthermore, the action of the [tex]a_j[/tex]s is completely known. I think it’s quite remarkable that we can say so much: at the outset it wasn’t even obvious that the dimension of [tex]V[/tex] should be a multiple of [tex]2^n[/tex]!

When [tex]d=1[/tex] we will call this the fundamental representation for the Fermionic CCRs. (This is the terminology I use, but I don’t know if it is standard or not.) Up to a change of basis it is clear that all other representations can be obtained by taking a tensor product of the fundamental representation with the trivial action on a [tex]d[/tex]-dimensional vector space.

We’ve now understood the fundamental mathematical fact about fermions: the mere existence of operators satisfying Fermionic canonical commutation relations completely determines the action of those operators with respect to some suitable orthonormal occuptation number basis. That’s a very strong statement! In the next post we’ll use this technology to study a problem of direct physical interest: finding the energy spectrum and eigenstates of a Hamiltonian which is quadratic in terms of Fermi operators.

Published
Categorized as General

Expander graphs V: random walks on expanders

Back to expanders again. Today’s post looks at a simple application of expanders, showing that a random walk on an expander graph is likely to quickly escape from any sufficiently small subset of vertices. Intuitively, of course, this result is not surprising, but the exact quantitative form of the result turns out to be extremely useful in the next post, which is about decreasing the number of random bits used by a randomized algorithm.

Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.

Random walks on expanders

Many applications of expanders involve doing a random walk on the expander. We start at some chosen vertex, and then repeatedly move to any one of the [tex]d[/tex] neighbours, each time choosing a neighbour uniformly at random, and independently of prior choices.

To describe this random walk, suppose at some given time we have a probability distribution [tex]p[/tex] describing the probability of being at any given vertex in the graph [tex]G[/tex]. We then apply one step of the random walk procedure described above, i.e., selecting a neighbour of the current vertex uniformly at random. The updated probability distribution is easily verified to be:

[tex] \tilde p = \frac{A(G)}{d} p. [/tex]

That is, the Markov transition matrix describing this random walk is just [tex]\hat A(G) \equiv A(G)/d[/tex], i.e., up to a constant of proportionality the transition matrix is just the adjacency matrix. This relationship between the adjacency matrix and random walks opens up a whole new world of connections between graphs and Markov chains.

One of the most important connections is between the eigenvalues of Markov transition matrices and the rate at which the Markov chain converges to its stationary distribution. In particular, the following beautiful theorem tells us that when the uniform distribution is a stationary distribution for the chain, then the Markov chain converges to the uniform distribution exponentially quickly, at a rate determined by the second largest eigenvalue of [tex]M[/tex].

Exercise: Show that if [tex]M[/tex] is a normal transition matrix for a Markov chain then [tex]1 = \lambda_1(M) \geq \lambda_2(M) \geq \ldots[/tex].

Theorem: Suppose [tex]M[/tex] is a normal transition matrix for a Markov chain on [tex]n[/tex] states, with the uniform distribution [tex]u = \vec 1/n[/tex] as a stationary point, [tex]M u = u[/tex]. Then for any starting distribution [tex]p[/tex],

[tex] \| M^t p – u \|_1 \leq \sqrt{n} \lambda_2(M)^t, [/tex]

where [tex]\| \cdot \|_1[/tex] denotes the [tex]l_1[/tex] norm.

The normality condition in this theorem may appear a little surprising. The reason it’s there is to ensure that [tex]M[/tex] can be diagonalized. The theorem can be made to work for general [tex]M[/tex], with the second largest eigenvalue replaced by the second largest singular value. However, in our situation [tex]M[/tex] is symmetric, and thus automatically normal, and we prefer the statement in terms of eigenvalues, since it allows us to make a connection to the expansion parameter of a graph. In particular, when [tex]M = \hat A(G)[/tex] we obtain:

[tex] \| \hat A(G)^t p-u\|_1 \leq \sqrt{n} \left(\frac{\lambda_2(G)}{d}\right)^t. [/tex]

Combining this with our earlier results connecting the gap to the expansion parameter, we deduce that

[tex] \| \hat A(G)^t p-u\|_1 \leq \sqrt{n} \left(1-\frac{h(G)^2}{2d^2}\right)^t. [/tex]

Thus, for a family of expander graphs, the rate of convergence of the Markov chain is exponentially fast in the number of time steps [tex]t[/tex].

Exercise: Suppose [tex]M[/tex] is a transition matrix for a Markov chain. Show that the uniform distribution [tex]u[/tex] is a stationary point point for the chain, i.e., [tex]Mu = u[/tex], if and only if [tex]M[/tex] is doubly stochastic, i.e., has non-zero entries, and all rows and columns of the matrix sum to [tex]1[/tex].

Proof: We start by working with the [tex]l_2[/tex] norm [tex]\| \cdot \|_2[/tex]. Since [tex]Mu = u[/tex] we have [tex]M^t u = u[/tex], and so:

[tex] \|M^t p – u \|_2 = \|M^t(p-u) \|_2. [/tex]

A computation shows that [tex]p-u[/tex] is orthogonal to [tex]u[/tex]. But [tex]u[/tex] is an eigenvector of [tex]M[/tex] with the maximum eigenvalue, [tex]1[/tex], and thus [tex]p-u[/tex] must lie in the span of the eigenspaces with eigenvalues [tex]\lambda_2(M),\lambda_3(M),\ldots[/tex]. It follows that

[tex] \|M^t(p-u)\|_2 \leq \lambda_2(M)^t \|p-u\|_2 \leq \lambda_2(M)^t, [/tex]

where we used the fact that [tex]\| p-u\|_2 \leq 1[/tex], easily established by observing that [tex]\|p-u\|_2[/tex] is convex in [tex]p[/tex], and thus must be maximized at an extreme point in the space of probability distributions; the symmetry of [tex]u[/tex] ensures that without loss of generality we may take [tex]p = (1,0,\ldots,0)[/tex]. To convert this into a result about the [tex]l_1[/tex] norm, we use the fact that in [tex]n[/tex] dimensions [tex]\|v\|_1 \leq \sqrt{n} \|v\|_2[/tex], and thus we obtain

[tex] \|M^t(p-u)\|_1 \leq \sqrt{n} \lambda_2(M)^t, [/tex]

which was the desired result. QED

What other properties do random walks on expanders have? We now prove another beautiful theorem which tells us that they “move around quickly”, in the sense that they are exponentially unlikely to stay for long within a given subset of vertices, [tex]B[/tex], unless [tex]B[/tex] is very large.

More precisely, suppose [tex]B[/tex] is a subset of vertices, and we choose some vertex [tex]X_0[/tex] uniformly at random from the graph. Suppose we use [tex]X_0[/tex] as the starting point for a random walk, [tex]X_0,\ldots,X_t[/tex], where [tex]X_t[/tex] is the vertex after the [tex]t[/tex]th step. Let [tex]B(t)[/tex] be the event that [tex]X_j \in B[/tex] for all [tex]j[/tex] in the range [tex]0,\ldots,t[/tex]. Then we will prove that:

[tex] \mbox{Pr}(B(t)) \leq \left( \frac{\lambda_2(G)}{d} + \frac{|B|}{n} \right)^t [/tex]

Provided [tex]\lambda_2(G)/d + |B|/n < 1[/tex], we get the desired exponential decrease in probability. For a family of expander graphs it follows that there is some constant [tex]\epsilon > 0[/tex] such that we get an exponential decrease for any [tex]B[/tex] such that [tex]|B|/n < \epsilon[/tex]. These results are special cases of the following more general theorem about Markov chains. Theorem: Let [tex]X_0[/tex] be uniformly distributed on [tex]n[/tex] states, and let [tex]X_0,\ldots,X_t[/tex] be a time-homogeneous Markov chain with transition matrix [tex]M[/tex]. Suppose the uniform distribution [tex]u[/tex] is a stationary point of [tex]M[/tex], i.e., [tex]Mu = u[/tex]. Let [tex]B[/tex] be a subset of the states, and let [tex]B(t)[/tex] be the event that [tex]X_j \in B[/tex] for all [tex]j \in 0,\ldots,t[/tex]. Then [tex] \mbox{Pr}(B(t)) \leq \left( \lambda_2(M) + \frac{|B|}{n} \right)^t. [/tex] Proof: The first step in the proof is to observe that: [tex] \mbox{Pr}(B(t)) = \|(PMP)^t P u \|_1, [/tex] where the operator [tex]P[/tex] projects onto the vector space spanned by those basis vectors corresponding to elements of [tex]B[/tex]. This equation is not entirely obvious, and proving it is a good exercise for the reader. The next step is to prove that [tex]\| PMP \| \leq \lambda_2(M)+|B|/n[/tex], where the norm here is the operator norm. We will do this below, but note first that once this is done, the result follows, for we have [tex] \mbox{Pr}(B(t)) = \| (PMP)^t P u \|_1 \leq \sqrt{n} \| (PMP)^t P u \|_2 [/tex] by the standard inequality relating [tex]l_1[/tex] and [tex]l_2[/tex] norms, and thus [tex] \mbox{Pr}(B(t)) \leq \sqrt{n} \| PMP \|^t \| P u \|_2, [/tex] by definition of the operator norm, and finally [tex] \mbox{Pr}(B(t)) \leq \left( \lambda_2(M)+\frac{|B|}{n} \right)^t, [/tex] where we used the assumed inequality for the operator norm, and the observation that [tex]\| P u \|_2 = \sqrt{|B|}/n \leq 1/\sqrt{n}[/tex]. To prove the desired operator norm inequality, [tex]\| PMP \| \leq \lambda_2(M)+|B|/n[/tex], suppose [tex]v[/tex] is a normalized state such that [tex]\| PMP \| = |v^T PMP v|[/tex]. Decompose [tex]Pv = \alpha u + \beta u_\perp[/tex], where [tex]u_\perp[/tex] is a normalized state orthogonal to [tex]u[/tex]. Since [tex]\|P v \|_2 \leq \|v \|_2 = 1[/tex] we must have [tex]|\beta| \leq 1[/tex]. Furthermore, multiplying [tex]Pv = \alpha u + \beta u_\perp[/tex] on the left by [tex]nu^T[/tex] shows that [tex]\alpha = n u^T P v[/tex]. It follows that [tex]|\alpha|[/tex] is maximized by choosing [tex]v[/tex] to be uniformly distributed over [tex]B[/tex], from which it follows that [tex]|\alpha| \leq \sqrt{|B|}[/tex]. A little algebra shows that [tex] v^T PMP v = \alpha^2 u^T M u + \beta^2 u_\perp^T M u_\perp. [/tex] Applying [tex]|\alpha| \leq \sqrt{|B|}[/tex], [tex]u^T M u = u^Tu = 1/n[/tex], [tex]|\beta| \leq 1[/tex], and [tex]u_\perp^T M u_\perp \leq \lambda_2(M)[/tex] gives [tex] v^T P M P v \leq \frac{|B|}{n} + \lambda_2(M), [/tex] which completes the proof. QED The results in today's post are elegant, but qualitatively unsurprising. (Of course, having elegant quantitative statements of results that are qualitatively clear is worthwhile in its own right!) In the next post we'll use these ideas to develop a genuinely surprising application of expanders, to reducing the number of random bits required by a probabilistic algorithm in order to achieve a desired success probability.

Published
Categorized as General

Quote

“People don’t always want what they need.” – Alan Kay

Published
Categorized as General

Fermions and the Jordan-Wigner transform II: the Canonical Commutation Relations

Back to fermions today, with a post introducing the canonical commutation relations fermions, and discusses something of their mathematical and physical significance. The next post will get more into the meat, describing the consequences these commutation relations have.

Note: This post is one in a series describing fermi algebras, and a powerful tool known as the Jordan-Wigner transform, which allows one to move back and forth between describing a system as a collection of qubits, and as a collection of fermions. The posts assume familiarity with elementary quantum mechanics, comfort with elementary linear algebra (but not advanced techniques), and a little familiarity with the basic nomenclature of quantum information science (qubits, the Pauli matrices).

Fermions

The canonical commutation relations for Fermions

Suppose we have a set of operators [tex]a_1,\ldots,a_n[/tex] acting on some Hilbert space [tex]V[/tex]. Then we say that these operations satisfy the canonical commutation relations (CCRs) for Fermions if they satisfy the equations

[tex] \{ a_j, a_k^\dagger \} = \delta_{jk} I; \,\,\,\, \{ a_j,a_k \} = 0, [/tex]

where [tex]\{ A, B \} \equiv AB+BA[/tex] is the anticommutator. Note that when we take the conjugate of the second of these relations we obtain [tex]\{ a_j^\dagger, a_k^\dagger \} = 0[/tex], which is sometimes also referred to as one of the CCRs. It is also frequently useful to set [tex]j = k[/tex], giving [tex]a_j^2 = (a_j^\dagger)^2 = 0[/tex].

How should one understand the CCRs? One way of thinking about the CCRS is in an axiomatic mathematical sense. In this way of thinking they are purely mathematical conditions that can be imposed on a set of matrices: for a given set of matrices, we can simply check and verify whether those matrices satisfy or do not satisfy the CCRs. For example, when the state space [tex]V[/tex] is that of a single qubit, we can easily verify that the operator [tex]a = |0\rangle \langle 1|[/tex] satisfies the Fermionic CCRs. From this axiomatic point of view the question to ask is what consequences about the structure of [tex]V[/tex] and the operators [tex]a_j[/tex] can be deduced from the fact that the CCRs hold.

There’s also a more sophisticated (but still entirely mathematical) way of understanding the CCRs, as an instance of the relationship between abstract algebraic objects (such as groups, Lie algebras, or Hopf algebras), and their representations as linear maps on vector spaces. My own knowledge of representation theory is limited to a little representation theory of finite groups and of Lie algberas, and I certainly do not see the full context in the way an expert on representation theory would. However, even with that limited background, one can see that there are common themes and techniques: what may appear to be an isolated technique or trick is often really an instance of a much deeper idea or set of ideas that become obvious once once one has enough broad familiarity with representation theory. I’m not going to pursue this point of view in these notes, but thought it worth mentioning for the sake of giving context and motivation to the study of other topics.

Finally, there’s a physical way in which we can understand the CCRs. When we want to describe a system containing Fermions, one way to begin is to start by writing down a set of operators satisfying the CCRs, and then to try to guess what sort of Hamiltonian involving those operators could describe the interactions observed in the system, often motivated by classical considerations, or other rules of thumb. This is, for example, the sort of point of view pursued in the BCS theory of superconductivity, and which people are trying to pursue in understanding high temperature superconductors.

Of course, one can ask why physicists want to use operators satisfying the Fermionic CCRs to describe a system of Fermions, or why anyone, mathematician or physicist, would ever write down the CCRs in the first place. These are good questions, which I’m not going to try to answer here, although one or both questions might make a good subject for some future notes. (It is, of course, a lot easier to answer these questions once you understand the material I present here.)

Instead, I’m going to approach the Fermionic CCRs from a purely mathematical point of view, asking the question “What can we deduce from the fact that a set of operators satisfying the CCRs exists?” The surprising answer is that we can deduce quite a lot about the structure of [tex]V[/tex] and the operators [tex]a_j[/tex] simply from the fact that the [tex]a_j[/tex] satisfy the canonical commutation relations!

We’ll take this subject up in the next post, where we’ll show that the CCRs essentially uniquely determine the action of the [tex]a_j[/tex]s, up to a choice of basis.

Published
Categorized as General

Expander graphs IV: Expansion and the eigenvalue gap

I return to expanders today, with a discussion of expansion and eigenvalue gaps. Thanks to all the commenters on earlier posts for all the proof ideas and simplifications!

Note: This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses. The posts require a little background in graph theory, computer science, linear algebra and Markov chains (all at about the level of a first course) to be comprehensible. I am not an expert on expanders, and the posts are just an introduction. They are are mostly based on some very nice 2003 lecture notes by Nati Linial and Avi Wigderson, available on the web at http://www.math.ias.edu/~boaz/ExpanderCourse/.

In today’s post we really reap the benefits of recasting graph theory in linear algebraic terms, showing that the the expansion parameter can be related to the gap between the largest and second largest eigenvalue of the graph. This ought to surprise you, perhaps even shock you. It’s a highly non-trivial result, with the big payoff of relating something we didn’t really understand all that well (the expansion parameter) to eigenvalue gaps, about which an enormous amount it known.

Expansion and the eigenvalue gap

Let’s return our attention to expander graphs, and see what the eigenvalues of a graph have to do with its expansion parameter. We define the gap for the graph [tex]G[/tex] to be the difference [tex]\Delta(G) \equiv \lambda_1(G)-\lambda_2(G)[/tex] between the largest and second-largest eigenvalues. The expansion parameter and the gap are connected by the following theorem:

Theorem: The expansion parameter [tex]h(G)[/tex] for a [tex]d[/tex]-regular graph [tex]G[/tex] is related to the gap [tex]\Delta(G)[/tex] by:

[tex] \frac{\Delta(G)}{2} \leq h(G) \leq \sqrt{2d \Delta(G)}. [/tex]

Thus, properties of the eigenvalue gap can be used to deduce properties of the expansion parameter. For example, if the eigenvalue gap for a family of [tex]d[/tex]-regular graphs is bounded below by a positive constant, then the expansion parameter must also be bounded below by a positive constant, and so the family is an expander.

One reason for finding the connection between the gap and the expansion parameter interesting is that it is far easier to estimate the gap of an [tex]n[/tex] by [tex]n[/tex] matrix than it is to enumerate the exponentially many subsets [tex]S[/tex] of the vertex set [tex]V[/tex], and compute [tex]|\partial S|/|S|[/tex] for each one.

Proof discussion: We already understand that [tex]\lambda_1(G) = d[/tex] for this graph, with eigenvector [tex]\vec 1 = (1,1,\ldots,1)[/tex]. So we’ll concentrate on trying to understand the behaviour of the second largest eigenvalue, [tex]\lambda_2(G)[/tex]. The theorem tells us that the difference between [tex]d[/tex] and [tex]\lambda_2(G)[/tex] is controlled both above and below by the expansion parameter [tex]h(G)[/tex].

How can we get control over the second largest eigenvalue of [tex]G[/tex]? One way is to observe that [tex]\lambda_2(G)[/tex] is just the maximum of the expression [tex]v^T A v / v^T v[/tex], where [tex]A[/tex] is the adjacency matrix of [tex]G[/tex], and we maximize over all vectors [tex]v[/tex] orthogonal to the eigenvector [tex]\vec 1[/tex]. An encouraging fact is that this expression is quite easy to deal with, because the condition that [tex]v[/tex] be orthogonal to [tex]\vec 1[/tex] is actually equivalent to the sum of [tex]v[/tex]’s entries being equal to [tex]0[/tex], so we have

[tex] \lambda_2(G) = \max_{v: {\rm tr}(v) = 0} \frac{v^T A v}{v^T v}, [/tex]

where [tex]{\rm tr}(v)[/tex] is just the sum of the entries of the vector [tex]v[/tex].

We’re going to provide a lower bound on [tex]\lambda_2(G)[/tex] by simply guessing a good choice of [tex]v[/tex] satisfying [tex]\mbox{tr}(v) = 0[/tex], and using the fact that

[tex] \lambda_2(G) \geq \frac{v^T A v}{v^T v}. [/tex]

To make a good guess, it helps to have a way of thinking about expressions like [tex]v^T A v[/tex], where [tex]\mbox{tr}(v) = 0[/tex]. A convenient way of thinking is to rewrite [tex]v[/tex] as the difference of two disjoint probability distributions, [tex]p[/tex] and [tex]q[/tex], i.e., [tex]v = p-q[/tex], where [tex]p[/tex] and [tex]q[/tex] are non-negative vectors each summing to [tex]1[/tex], and with disjoint support. This results in terms like [tex]p^T A q[/tex], which we can think of in terms of transition probabilities between [tex]q[/tex] and [tex]p[/tex]. This will allow us to apply the expansion properties of the graph.

Let’s make these ideas a little more concrete. The key is to define [tex]\vec 1_S[/tex] to be the vector whose entries are [tex]1[/tex] on [tex]S[/tex], and [tex]0[/tex] elsewhere, and to observe that

[tex] \vec 1_S^T A \vec 1_T = |E(S,T)|, [/tex]

where [tex]|E(S,T)|[/tex] is the number of edges between the vertex sets [tex]S[/tex] and [tex]T[/tex]. This suggests that we should choose [tex]p[/tex] and [tex]q[/tex] in terms of vectors like [tex]\vec 1_S[/tex], since it will enable us to relate expressions like [tex]v^T A v[/tex] to the sizes of various edge sets, which, in turn, can be related to the expansion parameter.

Suppose in particular that we choose

[tex] v = \frac{\vec 1_S}{|S|}-\frac{\vec 1_{\overline S}}{|\overline S|}. [/tex]

This satisfies the condition [tex]\mbox{tr}(v) = 0[/tex], and gives

[tex] v^T v = \frac{1}{|S|}+\frac{1}{|\overline S|} [/tex]

and

[tex] v^T A v = \frac{1}{|S|^2} E(S,S) + \frac{1}{|\overline S|^2} E(\overline S,\overline S) – \frac{2}{|S||\overline S|} E(S,\overline S). [/tex]

The definition of an expander graph gives us control over [tex]E(S,\overline S)[/tex], so it is convenient to rewrite [tex]E(S,S)[/tex] and [tex]E(\overline S,\overline S)[/tex] in terms of [tex]E(S,\overline S)[/tex], using the [tex]d[/tex]-regularity of the graph:

[tex] E(S,S)+E(S,\overline S) = d |S|; \,\,\, E(\overline S,\overline S)+ E(S,\overline S) = d |\overline S|. [/tex]

A straightforward substitution and a little algebra gives:

[tex] v^T A v = d \left( \frac{1}{|S|}+\frac{1}{|\overline S|} \right) – \left(\frac{1}{|S|}+\frac{1}{|\overline S|}\right)^2 E(S,\overline S). [/tex]

Comparing with the earlier expression for the denominator [tex]v^T v[/tex], we obtain

[tex] \lambda_2(G) \geq d-\left(\frac{1}{|S|}+\frac{1}{|\overline S|}\right) E(S,\overline S). [/tex]

Now choose [tex]S[/tex] so that [tex]E(S,\overline S) = h(G) |S|[/tex], and [tex]|S| \leq n/2[/tex], giving after a little algebra

[tex] \lambda_2(G) \geq d-2 h(G), [/tex]

and thus

[tex] \frac{\Delta(G)}{2} \leq h(G), [/tex]

which was the first of the two desired inequalities in the theorem.

The proof of the second inequality is a little more complicated. Unfortunately, I haven’t managed to boil the proof down to a form that I’m really happy with, and for this reason I won’t describe the details. If you’re interested, you should try to prove it yourself, or refer to the notes of Linial and Wigderson.

QED

Problem: Can we generalize this result so that it applies to a general undirected graph [tex]G[/tex], not just to [tex]d[/tex]-regular graphs? Can we prove an analogous statement for directed graphs, perhaps in terms of singular values? Can we define a a generalized notion of “expansion” which can be applied to any symmetric matrix [tex]A[/tex] with non-negative entries, and connect that notion of expansion to the eigenvalue gap? Can we generalize even further? What happens if we change the field over which the matrix is considered?

Published
Categorized as General

Interdisciplinary research

Interdisciplinary research: Building off a provocative post by Eszter Hargittai at Crooked Timber, Cosma Shalizi posts some interesting thoughts on how to do (and how not to do) interdisciplinary scientific research. Look for the followups too, including the link to Aaron Clauset’s post.

How does quantum information and computation stack up in this vein? One healthy thing is that quite a few computer scientists attend conferences that are predominantly physics-oriented. Conversely, while there’s only one regular conference in the field that’s predominantly computer science oriented (QIP), so far as I know, a reasonable number of physicists attend that.

Published
Categorized as General

Religious wars

I like Macs. I always have, ever since the day I first saw MacPaint running on one of the original Macs (I believe it 128k!)

But I do have a question for all the Mac zealots out there. If System 10 is so terrific, then why did it take the brilliant engineers at Apple two decades to figure out that running Unix was a good idea? Unix hasn’t changed so much in that time.

Published
Categorized as General