Michael Nielsen – Page 47

The Solovay-Kitaev algorithm

New paper: â€œThe Solovay-Kitaev algorithmâ€, jointly with Chris Dawson, submitted as a tutorial / review article to the journal â€œQuantum Information and Computationâ€. I expect the paper to appear at the preprint arxiv this coming Monday, as I write.

The Solovay-Kitaev theorem is an important result in quantum computing. In a classical computer, itâ€™s well-known that the same computation can be built up out of many different basic gate sets. For example, you can build up a circuit to add two numbers either using OR and NOT gates, or out of NAND gates alone. Furthermore, you can use OR and NOT gates to simulate NAND gates (or vice versa), so, roughly speaking, these two different circuits can have the same size, up to some constant of proportionality.

In the case of quantum computers, things are more complicated. The set of possible quantum gates forms a continuum, and itâ€™s not necessarily possible to use one gate set to simulate another exactly. Instead, some approximation may be necessary.

The Solovay-Kitaev theorem guarantees that different universal gate sets can simulate one another exceedingly quickly, to a very good approximation. From a practical point of view, this is important in compiling quantum algorithms (like Shorâ€™s) into a form that can be implemented fault-tolerantly. From a more mathematical point of view, the Solovay-Kitaev theorem is a remarkable general statement about how quickly the group SU(d) is â€œfilled inâ€ by a universal set of gates.

This paper aims to provide a simple exposition of the Solovay-Kitaev theorem, which we hope is accessible to people whoâ€™ve just started working on quantum computing. We explain the proof in the form of an algorithm (just nine simple lines of pseudocode!) which, given a goal unitary operation U, builds up a good approximation to U using some prespecified set of universal gates. Chris has also prepared an online implementation of the algorithm.

Small world networks

Attention conservation notice: Just over 2000 words, requires a little graph theory.

Just for fun, Iâ€™ve been doing a little reading on small world networks, a topic that started to attract a lot of interest after a 1998 paper in Nature, by Duncan Watts and Stephen Strogatz.

The idea of small world networks goes back to (among other places) the well-known experiments demonstrating â€œsix degrees of separationâ€, conducted in the 1960s by the astonishing psychologist Stanley Milgram. This is the idea that while you personally only know a relatively small sample of the total population, your extended friendship network â€“ the people who are â€œfriends of a friend of aâ€¦â€ – grows extremely rapidly as the number of links in the chain increases. Milgramâ€™s results suggested that most of the US population is within six degrees of separation from one another, i.e., that itâ€™s truly a small world.

(It is worth noting that doubts have been cast recently on Milgramâ€™s results, notably by Judith Kleinfield. See the account in Duncan Wattsâ€™ recent book â€œSix Degreesâ€.)

The use of â€œnetworksâ€ in the term â€œsmall world networksâ€ seems to me a bit peculiar, since, at least among the people I know, the mathematical object being referred to as a network is what weâ€™d call a graph, consisting of a set of vertices, V, and a set of edges, E, which are just pairs of vertices. Iâ€™ll use the terms â€œsmall world networkâ€ and â€œsmall world graphâ€ interchangeably, usually preferring the former in applications, and the latter in describing mathematical objects.

What is a small world network? Stated another way, what does it mean to say that a graph has the small world property?

Itâ€™s a graph G = (V,E) with two properties that a priori might appear to be contradictory: (1) typical pairs of vertices in the graph have short paths between them, and (2) the graph is highly clustered.

To make these ideas precise, Watts and Strogatz introduce mathematical definitions for the average path length and clustering in a graph.

The average path length is defined as you would expect: define the length [tex]l(v,w)[/tex] between two vertices, v and w, as the minimum number of edges that need to be traversed to pass from v to w (or vice versa). Then the average path length L is simply the average value of [tex]l(v,w)[/tex], with v and w chosen uniformly at random,

[tex]L \equiv \frac{\sum_{vw} l(v,w)}{{n \choose 2}}, [/tex]

where the sum is over all distinct pairs of vertices, v and w, and n is the number of vertices in G. Note that for these definitions to make sense we require that the graph be connected.

Watts and Strogatzâ€™ clustering coefficient is defined in a less intuitive fashion. The goal is to define a quantity C which varies between 0 (unclustered) and 1 (highly clustered).

To do this, first fix a vertex, v. Suppose we consider the set of vertices S which are neighbours of v. Suppose S has k elements. How connected are the vertices in S to one another? If every vertex in S is connected to every other vertex in S, then there are [tex]{k \choose 2}[/tex] edges between the elements of S. In such a case we would say that the graph is highly clustered about the vertex v. We capture this idea by defining the clustering coefficient near v, [tex]C_v[/tex], to be the number of edges between vertices in S, divided by the maximum possible number, [tex]{k \choose 2}[/tex]. The clustering coefficient for the entire graph is then defined to be the average of [tex]C_v[/tex] over all vertices in the graph.

Obviously, the clustering coefficient varies between 0 and 1, with a value near 0 meaning that most of the vertices connected to any given vertex v are not connected to each other; conversely, a value near 1 mean that most of the neighbours of any given vertex v will tend to be connected to one another.

I must admit, I harbour doubts both about how mathematically natural the clustering coefficient is, andabout how well it captures the concept of the graph being highly clustered. In particular, I would not be surprised to discover that there are other quantities which capture the same intuition, but which are far more powerful mathematically as tools for investigating small world graphs. Nonetheless, this was the definition that Watts and Strogatz used, and I will follow their lead here.

Several questions about small world graphs naturally suggest themselves:

Do graphs with the small world property exist? The answer is yes, or I wouldnâ€™t be writing these notes, but itâ€™s not a priori obvious that this ought to be the case. How can we construct examples of graphs with the small world property?
How commonly do small networks arise in nature? Might they arise in collaboration networks for movie actors, musicians, or scientists, in friendship networks, or link patterns in the internet? Under what general circumstances can small world networks arise?
What, if any, consequences follow from a network being a small world network? E.g., if peopleâ€™s network of acquaintances forms a small world network, does this have any implications for the spread of disease, or of memes?
Can we formulate a mathematically interesting theory of small world graphs?

Constructing small world graphs

Watts and Strogatz suggest a simple method for constructing graphs with the small world property. The method is as follows.

We start by imagining a set of n vertices arranged in a ring in the plane.

Now connect each vertex to its k nearest-neighbours, in both the clockwise and anti-clockwise directions. Here, k is just some small constant, e.g., k = 2 or k = 3, whose exact value doesnâ€™t much matter.

We complete the construction of the graph by adding a small amount of disorder. More precisely, let p > 0 be a small parameter. For every single edge in the lattice, with probability p we leave it alone, and with probability 1-p we â€œrewireâ€ the edge, changing one of the end vertices to a randomly chosen vertex.

What happens to the average length and clustering coefficient in this graph as p is increased away from zero?

At p = 0 we see that the average path length between two vertices is roughly n / 4k, where n is the number of vertices in the graph. The reason is that if we choose two vertices at random on the ring, then typically theyâ€™ll be about a quarter of the way around the ring from one another, and we can get from one to the other in roughly n / 4k hops, hopping k vertices at a time.

As p increases away from zero, however, the average path length drops extremely quickly.

One way of seeing this is to do a computer simulation. Watts and Strogatz do this, finding that the average path length plummets, quickly becoming logarithmic in n, rather than linear in n.

Another way of seeing that path lengths should drop very quickly when disorder is introduced is the following heuristic argument, inspired by the renormalization group technique from statistical physics.

Suppose we subdivide the vertices in the graph up into groups of [tex]m \approx 2 / p,[/tex] vertices. We will treat those groups as single vertices in a new graph, Gâ€™, and define two vertices in Gâ€™ to be connected if the corresponding groups of vertices in G are connected through at least one edge.

A little thought about the â€œrewiringâ€ prescription above shows that vertices in Gâ€™ are typically connected to approximately 2 other randomly chosen vertices in Gâ€™. There will also be some additional edges between neighbouring elements of Gâ€™, but these wonâ€™t make any difference in the subsequent analysis, and we’ll simply ignore them.

It follows from the presence of these random edges in G’ that the average path length in Gâ€™ scales roughly as [tex]\log_2(q)[/tex], where [tex]q = n / m[/tex] is the number of vertices in Gâ€™. It follows that average path lengths in G must scale roughly as [tex]m \log_2 (n/m) \approx 2 \log_2(np/2) / p[/tex].

Define [tex]N \equiv np[/tex], so, up to a small constant of proportionality, N scales as the average number of rewirings in the graph. When [tex]N > 2[/tex] we see that the average path length scales as:

[tex]\frac{2 \log N }{N} n. [/tex]

Thus, as N rises, we see that the linear term in the average path length is very rapidly suppressed. This occurs for any value of p much larger than 1/n.

It ought to be noted that this analysis is quite general: we didnâ€™t make use of any special properties of the initial graph. In general, then, it follows that adding a small amount of disorder to any graph will strongly suppress any linear component in the average path length. In the event it hasnâ€™t already been done, itâ€™d be fun to extend this analysis using a more detailed renormalization group style analysis.

The behaviour of the clustering coefficient is quite different. At p = 0 the clustering coefficient is obviously quite close to one. But as disorder is added, and p is increased, the clustering coefficient remains close to one, changing only very slowly.

Watts and Strogatz show this using computer simulations, but the fundamental reason isnâ€™t difficult to understand: even when a little bit of disorder is added, most edges in the graph are not changed, and so most clusters in the graph are not affected.

In other words, when p is small but non-zero, the resulting graph is highly clustered, but also has short average path lengths. In short, it has the small world property.

Why is this interesting?

Why is this type of graph interesting? Sure, the average path length drops rapidly while the clustering coefficient remains high. But so what? Itâ€™s surely not that difficult to come up with other graph properties, some of which change precipitously when disorder is added to a regular graph, and others of which change only very slightly.

Thereâ€™s a couple of reasons to care.

First, Watts and Strogatz (and many others) have shown that there are a lot of small world networks which arise in nature. Many different types of human (and non-human) relationship network have the small world property. So too do many artificial networks, like the power grid, or the pattern of links on the internet.

This is not surprising: based on the argument above weâ€™d expect that any network which starts out highly clustered, and which has a little disorder added, will have the small world property. Most natural systems are noisy, so weâ€™d expect that pretty much any highly clustered network arising in nature would possess the small world property.

Of course, the fact that small world networks are common doesnâ€™t necessarily make them interesting. Are there any interesting consequences which follow from a network having the small world property?

As an example where this is the case, Watts and Strogatz present a simple model of the spread of infectious diseases through a network, and argue that diseases ought to spread much more rapidly in a small world network than they would through a network that does not have the small world property. Itâ€™s not difficult to think up other potential consequences of a network having the small world property.

More generally, thereâ€™s an extremely interesting idea here: the idea that we can understand things about the function of a network (i.e., its dynamics), simply by looking at its static structure. This is, I gather, quite an old idea, in certain specific contexts â€“ epidemiology, social networks â€“ but the Watts and Strogatz paper emphasizes that it may be possible to study this connection from a more abstract point of view, and reach more general conclusions, while at the same time making the connection to concrete examples.

Analogous ideas are, of course, used in physics, where simple static properties of a system (e.g., its dimensionality) can be used to make predictions about its dynamical properties (e.g., whether it can undergo a phase transition at nonzero temperature). But itâ€™s by no means so clear a priori that the same idea can be successfully applied to more complex systems, and I think that one nice thing about the Watts and Strogatz paper is that it suggests a wide class of examples where this can potentially be done.

Indeed, these ideas suggest an entire broad research program, with several interlocking tasks:

Building a taxonomy of static and dynamic properties whose presence or absence in a network can be easily checked. Some of these properties might be rather specialized, while others might be quite generally applicable in all networks.
Collecting data about real networks.
Guessing at possible connections between static and dynamic properties.
Checking whether these connections actually hold, both through empirical work, and through theoretical argument.
Ideally, this will then allow us to make predictions about the properties of other networks, and perhaps exert some kind of control over those properties.

For me personally, I can see a lot of interesting avenues over on the theoretical side, developing the appropriate mathematical tools to illuminate these kinds of questions. Of course, Iâ€™m very ignorant of what has already been done in this vein, and it would be useful to spend some time (a) figuring out what the important problems to address are, and (b) learning a bit about what has already been done.

RSS feed

One unanticipated effect of my move from Movable Type to WordPress was that the old RSS feed stopped working. There is a new feed at:

http://www.qinfo.org/people/nielsen/blog/wp-rss2.php

that seems to function correctly, at least so far as bloglines is concerned.

Comments or technical assistance from anyone who uses RSS would be very greatly appreciated! It’d be good to know if other people see the feed working properly, or not.

The Planck length

Thanks to all those people who commented on the significance of the Planck length. Putting it all together, one plausible argument for the significance of the Planck length seems to be something like this.

First, suppose we have a particle of mass [tex]m[/tex] whose wavefunction is localized within a length [tex]L[/tex], in each co-ordinate. For each co-ordinate the standard deviation in position satisfies:

[tex]\Delta x^2 \leq L^2/4.[/tex]

The uncertainty principle tells us that the uncertainty in each momentum co-ordinate satisfies:

[tex]\Delta p^2 \geq \hbar^2/ L^2.[/tex]

It follows that:

[tex]\langle P^2 \rangle \geq 3 \hbar^2 / L^2,[/tex]

where now [tex]\langle P^2 \rangle[/tex] is the average of the square of the total momentum, i.e., not a single co-ordinate. Using the usual formula connecting energy and momentum for a free particle in special relativity, we obtain [tex]E^2 = P^2 c^2 + m^2 c^4[/tex]. If we assume that [tex]P^2[/tex] and [tex]\langle P^2 \rangle[/tex] can be identified then we obtain:

[tex]E^2 \geq 3 \hbar^2 c^2 / L^2.[/tex]

In short, strong localization in position forces a large momentum, which creates a large energy density.

It’s plausible that if we make the energy density large enough, we’ll create a black hole. Equating [tex]E[/tex] to [tex]M c^2[/tex] for some notional black hole mass M and seeing at what radius that is, we get the Planck length, up to a small constant. I haven’t put the details in because this stage of this argument is even more hokey than the rest: energy density depends on what frame you’re in. However, it is at least plausible.

If you take the “no hair” theorem of general relativity seriously, then you’d believe that such a black hole should have no internal structure. But the detailed wavefunction would seem to be such an internal structure, and so we have a problem, which seems likely to signify a breakdown in one or both of general relativity or quantum mechanics. How that breakdown would manifest itself, I don’t know.

There’s lots wrong with this argument: the sign ambiguity for E, the use of the free particle formula for energy, and so on. No doubt many of these difficulties disappear in some more sophisticated approaches to quantum gravity; I’ve just been using undergrad quantum and GR.

Nonetheless, this argument does seem suggestive that particle having wavefunctions with structure on the Planck scale would, indeed, be very interesting objects, and that it is at least somewhat likely that general relativity, quantum mechanics, or both, would break down at that level.

Journal club on quantum gravity

The following post is based on some notes I prepared for a journal club talk I’m going to give on quantum gravity in a few hours. A postscript equivalent is here, with a few modifications.

Disclaimer: The whole point of our journal club talks is to give talks on interesting topics about which we are not experts! For me, quantum gravity fits this description in spades. Caveat emptor.

Introduction

Every physicist learns as an undergraduate (if not before) that we don’t yet have a single theory unifying quantum mechanics and general relativity, i.e., a theory of quantum gravity. What is often not explained is why it is difficult to come up with such a theory. In this journal club I want to ask and partially answer two questions: (1) what makes it so difficult to put quantum mechanics and general relativity together; and (2) what approaches might one take to developing a theory of quantum gravity?

You might wonder if this is an appropriate topic for a forum such as this. After all, none of us here, including myself, are experts on string theory, loop quantum gravity, twistors, or any of the other approaches to quantum gravity that have been proposed and are currently being pursued.

However, we don’t yet know that any of these approaches is correct, and so there’s no harm in going back and thinking through some of the basic aspects of the problem, from asn elementary point of view. This can be done by anyone who knows the rudiments of quantum mechanics and of the general theory of relativity.

If you like, you can view it as trying to solve the problem of quantum gravity without first “looking in the back of the book” to see the best attempted answers that other people have come up with. This procedure of first thinking things through for yourself has the advantage that it is likely to greatly increase the depth of your understanding of other people’s work if you later do investigate topics such as string theory, etc.

A related disclaimer is that I personally know only a miniscule fraction of all the modern thinking on quantum gravity. I prepared this lecture to force myself to think through in a naive way some of the problems involved in constructing a quantum theory of gravity, only pausing occasionally to peek in the back of the book. I won’t try to acknowledge my sources, which were many, but suffice to say that I doubt there’s anything here that hasn’t been thought before. Furthermore, people who’ve thought hard about quantum gravity over and extended period are likely to find much of what I say obvious, naive, absurd, or some combination thereof. Frankly, I don’t recommend that such people look through these notes — they’ll likely find it rather frustrating! For those less expert even than myself, perhaps you’ll find these notes a useful entertainment, and maybe they’ll stimulate you to think further on the subject.

Standard formulations of quantum mechanics and general relativity

Let’s start off by reminding ourselves of the standard formulations used for quantum mechanics and general relativity. I expect that most attendees at this journal club are extremely familiar with the basic principles of quantum mechanics, and, indeed, use them every day of their working lives. You may be rather less familiar with general relativity. I’ve tried to construct the lecture so you can follow the overall gist, even so.

Recall that the standard formulation of quantum mechanics contains the following elements:

The postulate that for every physical system there is a state vector in a Hilbert space which provides the most complete possible description of that system.
The postulate that the dynamics of a closed quantum system are described by a Hamiltonian and Schroedinger’s equation.
The postulate that a measurement on a system is described using an observable, a Hermitian operator acting on state space, which is used to describe measurement according to some rule for: (1) calculating measurement probabilities; and (2) describing the relationship between prior and posterior states.
The postulate that the state space for a composite quantum system is built up by taking the tensor product of individual state spaces. In the special case when those systems are indistinguishable, the postulate is modified so that the state space is either the symmetric or antisymmetric subspace of the total tensor product, depending on whether the systems are bosons or fermions.

It’s worth pointing out that this is merely the most common formulation of quantum mechanics. Other formulations are possible, and may be extremely valuable. It’s certainly possible that the right way of constructing a quantum theory of gravity is to start from some different formulation of quantum mechanics. My reason for describing this formulation of quantum mechanics — probably the most commonly used formulation — is so that we’re all working off the same page.

Let’s switch now to discuss general relativity. Recall that the standard formulation of general relativity contains the following elements:

The postulate that spacetime is a four-dimensional pseudo-Riemannian manifold, with metric signature (+1,-1,-1,-1).
The postulate that material in spacetime is described by a two-index tensor T known as the stress-energy tensor. The stress-energy tensor describes not only thinks like mass and energy, but also describes the transport of mass and energy, so it has aspects that are both static and dynamic.
The postulate known as the Einstein field equations: [tex]G = 8\pi T[/tex]. This postulate connects the stress-energy tensor T to the Einstein tensor, G. In its mathematical definition G is fundamentally a geometric object, i.e., it is determined by the “shape” of spacetime. The physical content of the Einstein field equations is therefore that the shape of spacetime is determined by the matter distribution, and vice versa.An interesting point is that because the stress-energy tensor contains components describing the transport of matter, the transport properties of matter are actually determined by the geometry. For example, it can easily be shown that, as a consequence of the Einstein field equations, test particles follow geodesics of spacetime.
Since 1998 it has been thought that the Einstein equations need to be modifed, becoming [tex]G+\Lambda g = 8 \pi T[/tex], where g is the metric tensor, and [tex]\Lambda[/tex] is a non-zero constant known as the cosmological constant. Rather remarkably, it turns out that, once again, test particles follow geodesics of spacetime. However, for a given stress-energy tensor, the shape of spacetime will itself be different, and so the geodesics will be different.

In an ideal world, of course, we wouldn’t just unify quantum mechanics and general relativity. We’d actually construct a single theory which incorporates both general relativity and the entire standard model of particle physics. So it’s arguable that we shouldn’t just be thinking about the standard formulation of quantum mechanics, but rather about the entire edifice of the standard model. I’m not going to do that here, because: (1) talking about vanilla quantum mechanics is plenty enough for one lecture; (2) it illustrates many of the problems that arise in the standard model, anyway; and (3) I’m a lot more comfortable with elementary quantum mechanics than I am with the standard model, and I expect much of my audience is, too.

Comparing the elements of general relativity and quantum mechanics

Let’s go through and look at each element in the standard formulations of general relativity and quantum mechanics, attempting as we do to understand some of the problems which arise when we try to unify the two theories.

Before getting started with the comparisons, let me make an aside on my presentation style. Conventionally, a good lecture is much like a good movie or a good book, in that a problem or situation is set up, preferably one involving high drama, the tension mounts, and then the problem is partially or fully resolved. Unfortunately, today is going to be a litany of problems, with markedly little success in resolution, and so the lecture may feel a little unsatisfying for those hoping, consciously or unconsciously, for a resolution.

Spacetime: In standard quantum mechanics, we usually work with respect to a fixed background spacetime of allowed configurations. By contrast, in general relativity, the metric tensor specifying the structure of spacetime is one of the physical variables of the theory. If we follow the usual prescriptions of quantum mechanics, we conclude that the metric tensor itself ought to be replaced by some suitable quantum mechanical observable, or set of observables. If one does this, it is no longer so clear that space and time can be treated as background parameters in quantum mechanics. How, for example, are we supposed to treat Schroedinger’s equation, when the physical structure of time itself is variable? Perhaps we ought to aim for an effective equation of the form

[tex]i \frac{d|\psi\rangle}{d\langle t \rangle} = H |\psi\rangle [/tex]

derived from some deeper underlying theory?

Stress-energy tensor: In general relativity T is used to describe the configuration of material bodies. Standard quantum mechanics tells us that T needs to be replaced by a suitable set of observables. In and of itself this is not obviously a major problem. However, a problem arises (again) in connection with the possible quantization of space and time. As usually understood in general relativity, T is a function of location p on the underlying four-dimensional manifold. The natural analogue in a quantized version is an observable [tex]\hat T(p)[/tex] which is again a function of position on the manifold. However, as described above, it seems likely that p itself should be replaced by some quantum equivalent, and it is not so clear how [tex]\hat T[/tex] ought to be constructed then. One possibility is that [tex]\hat T[/tex] becomes a function of some suitable [tex]\hat p[/tex]. A related problem is that the standard definition of the components of T often involve tangent vectors (essentially, velocity 4-vectors) to the underlying manifold. As for the position, p, perhaps such tangent vectors should be replaced by quantized equivalents.

Einstein field equations (with and without the cosmological constant): Consider the usual general relativistic formulation of the field equations: [tex]G+\Lambda g = 8\pi T[/tex]. The problem with constructing a quantum version ought by now to be obvious: quantum mechanics tells us that the quantities on the left — geometric quantities, to do with the shape of spacetime — are all associated with some notion of a background configuration, ordinarily left unquantized, while the quantities on the right are physical variables that ought to be quantized.

One natural speculation in this vein is that in any quantum theory of gravity we ought to have

[tex] G+\Lambda g = 8 \pi \langle T \rangle[/tex]

as an effective equation of the theory.

Hilbert space and quantum states: There is no obvious incompatability with general relativity, perhaps because it is so unclear which Hilbert space or quantum state one might use in a description of gravitation.

The Hamiltonian and Schroedinger’s equation: As already mentioned, this presents a challenge because it is not so clear how to describe time in quantum gravity. Something else which is of concern is that for many standard physical forms Schroedinger’s equation often gives rise to faster than light effects. In order to alleviate this problem we must move to a relativistic wave equation, or to a quantum field theory.

In this vein, let me mention one natural candidate description for the dynamics of a free (quantum) test particle moving in the background of a fixed (classical) spacetime. First, start with a relativistically invariant wave equation such as the Klein-Gordon equation, which can be used to describe a free spin zero particle,

[tex] -\hbar^2 \frac{\partial^2 \psi}{\partial^2 t} = -\hbar^2 c^2 \nabla^2 \psi + m^2 c^4 \psi,[/tex]

or the Dirac wave equation, which can be used to describe a free spin 1/2 particle,

[tex] i \hbar \frac{\partial \psi}{\partial t} = \left(i \hbar c \alpha \cdot \nabla – \beta mc^2 \right) \psi,[/tex]

where [tex]\alpha_x,\alpha_y,\alpha_z[/tex] and [tex]\beta[/tex] are the four Dirac matrices. In the case of the Klein-Gordon equation there is a clear prescription for how to take this over to a curved spacetime: simply replace derivatives by appropriate covariant derivatives, giving:

[tex] -\hbar^2 \nabla^2_; \psi = m^2 c^2 \psi.[/tex]

In flat spacetime this will have the same behaviour as the Klein-Gordon equation. In a fixed background curved spacetime we would expect this equation to describe a free spin zero test particle.

The same basic procedure can be followed in the case of the Dirac equation, replacing derivatives wherever necessary by covariant derivatives. I have not explicitly checked that the resulting equation is invariantly defined, but expect that it is (exercise!), and can be used to describe a free spin 1/2 test particle in a fixed background curved spacetime. It would be interesting to study the solutions of such equations for some simple nontrivial geometries, such as the Schwarzschild geometry. For metrics with sufficient symmetry, it may be possible to obtain analytic (or at least perturbative) solutions; in any case, it should be possible to investigate these problems numerically.

Of course, although it would be interesting to study this prescription, we should expect it to be inadequate in various ways. We have described a means of studying a quantum test particle moving against a fixed classical background spacetime. In reality: (1) the background may not be classical; (2) the particle itself modifies the background; and (3) because of quantum indeterminancy, the particle may modify the background in different ways. In the language of the many-worlds interpretation, it seems reasonable to expect that the which branch of the wavefunction we are in (representing different particle positions) may have some bearing on the structure of spacetime itself: in particular, different branches will correspond to different spacetimes.

This discussion highlights another significant incompatibility between general relativity and quantum mechanics. In general relativity, we know that test particles follow well-defined trajectories — geodesics of spacetime. This is a simple consequence of the field equations themselves. In quantum mechanics, no particle can follow a well-defined trajectory: the only way this could happen is if the Hamiltonian commuted with the position variables, in which case the particle would be stationary. In any case, this commutation condition can not occur when the momentum contributes to the Hamiltonian, as is typically the case.

Observables: One striking difference between quantum mechanics and general relativity is that the description of measurement is much more complex in the former. Several questions that might arise include:

Should wave function collapse occur instantaneously? This depends on how one interprest the wave function.
Should measurements be a purely local phenomena, or can we make a measurement across an entire slice of spacetime? Across all of spacetime?
Should we worry that in the usual description of measurement, time and space are treated in a manifestly unsymmetric manner?
What observables would one expect to have in a quantum theory of gravity?

The tensor product structure and indistinguishable particles:One cause for concern here is that the notion of distinguishability itself is often framed in terms of the spatial separation of particles. If the structure of space itself really ought to be thought of in quantum terms, it is perhaps not so clear that the concepts of distinguishable, indistinguishable, and spatially separated particles even make sense. This may be a hint that in a quantum theory of gravity such concepts may be absent at the foundation, though they would need to emerge as consequences of the theory.

Quantum field theory: So far, we’ve concentrated on identifying incompatabilities between general relativity and quantum mechanics. Of course, fundamental modern physics is cast in terms of an extension of quantum mechanics known as quantum field theory, and it is worth investigating what problems arise when one attempts to unify general relativity with the entire edifice of quantum field theory. We won’t do this in any kind of fullness here, but will make one comment in relation to the canonical quantization procedure usually used to construct quantum field theories. The standard procedure is to start from some classical field equation, such as the wave equation, [tex](\nabla^2 – 1/c^2 \partial^2 / \partial t^2 ) \phi = 0[/tex], to expand the solution as a linear combination of solutions for individual field modes, to regard the different mode coefficients as dynamical variables, and to then quantize by imposing canonical commutation relationships on those variables. This procedure can be carried out for many of the standard field equations, such as the wave equation, the Dirac equation, and the Klein-Gordon equation, because in each case the equation is a linear equation, and thus the solution space has a linear structure. In the case of general relativity, the field equations are nonlinear in what seems like the natural field variables — the metric tensor — and it is not possible to even get started with this procedure. One could, of course, try linearizing the field equations, and starting from there. My understanding is that when this is done the resulting quantum field theory is nonrenormalizable (?), and thus unsatisfactory.

Discussion

Perhaps the most striking feature of the above discussion is an asymmetry between general relativity and quantum mechanics. Quantum mechanics, like Newton’s laws of motion, is not so much a physical theory as a framework for constructing physical theories, with many important quantities (the state, the state space, the Hamiltonian, the relevant observables) left unspecified. General relativity is much more prescriptive, specifying as it does an equation relating the distribution of material entities to the shape of spacetime, and, as a consequence, controlling the matter-energy dynamics. Once we’ve set up the initial matter-energy distribution and structure of spacetime, general relativity gives us no further control. In the analogous quantum mechanical situation we still have to specify the dynamics, and the measurements to be performed.

There is therefore a sense in which quantum mechanics is a more wideranging and flexible framework than general relativity. This is arguably a bug, not a feature, since one of general relativity’s most appealing points is its prescriptiveness; once we have the Einstein equations, we get everything else for free, in some sense. However, it also suggests that while the right approach may be to extend the quantum mechanical framework to incorporate general relativity, it is exceedingly unlikely that the right approach is to extend general relativity to incorporate quantum mechanics. On the other hand, it may also be that some extension or reformulation of quantum mechanics is necessary to incorporate gravity. Such an extension would have to be carried out rather carefully: results such as Gleason’s theorem show that quantum mechanics is surprisingly sensitive to small changes.

As an aside, let me also take this opportunity to point out something which often bugs me: the widely-made assertion that quantum gravity effects will become important at the Planck length — about [tex]10^{-35}[/tex] meters — and the notion of spacetime will break down at that length. Anyone claiming this, in my opinion, ought to be asked why the notion of mass doesn’t break down at the Planck mass, which has the rather hefty value of about [tex]10^{-8}[/tex] kilograms.

A toy model

Just for fun, let me propose a simple toy model for quantum gravity, inspired by the Klein-Gordon equation. I’m sure this is wrong or inadequate somehow, but after an hour or so’s thought, I can’t yet see why. I include it here primarily as a stimulant to further thought.

The idea is to look for a four-dimensional pseudo-Riemannian manifold M, with metric signature (-,+,+,+), and a function [tex]\psi : M \rightarrow C[/tex], such that the following equations have a solution:

[tex]G + \Lambda g = 8 \pi T [/tex]

[tex] T^{\mu \nu} = v^\mu v^\nu [/tex]

[tex] v^0 = \frac{i\hbar}{2mc^2}( \psi^* \psi^{;0}- \psi \psi^{;0 *})[/tex]

[tex] v^j = \frac{-i\hbar}{2m}( \psi^* \psi^{;j}- \psi \psi^{;j *}),[/tex]

where m, c, [tex]\Lambda[/tex] are all constants with their usual meanings, j = 1,2,3, and the expression for [tex]T^{\mu \nu}[/tex] may need a proportionality constant, probably related to m, out the front. The expressions for [tex]v^0[/tex] and [tex]v^j[/tex] are covariant versions of the corresponding expressions for the charge and current densities associated to the Klein-Gordon equation — see Chapter~13 of Schiff’s well-known text on quantum mechanics (3rd ed., Mc-Graw Hill, 1968); note that Schiff calls this equation the “relativistic Schroedinger equation”. A subtlety is that the covariant derivative itself depends on the metric g, and so these equations are potentially extremely restrictive; it is by no means obvious that a solution ever exists. However, if we take seriously the idea that [tex]T^{\mu \nu}[/tex] needs a proportionality constant related to m, then we can see that in the test particle limit, [tex]m \rightarrow 0[/tex], these equations have as a solution any [tex]\psi[/tex], and flat spacetime, which is not unreasonable.

Conclusion

The picture I have painted is somewhat bleak, which is perhaps not surprising: finding a quantum theory of gravity is not a trivial problem! However, the good news is that many further steps naturally suggest themselves:

At many points, my analysis has been incomplete, in that I haven’t thoroughly mapped out a catalogue of all the possible alternatives. A more thorough analysis of the possibilities should be done.
The analysis needs to be extended to incorporate modern relativistic quantum field theory.
Computer pioneer Alan Kay has said “A change of perspective is worth 80 IQ points”. It would be fruitful to repeat this exercise from the point of view of some of the other formulations people have of general relativity and quantum mechanics. I’d particularly like to do this for the initial value and action formulations of general relativity, and for the quasidistribution and nonlocal hidden variable formulations of quantum mechanics. It may also be useful to attempt to construct modifications of either or both theories in order to solve some of the problems that we’ve described here.
Read up on some of the work that other people have done on quantum gravity, from a variety of points of view. Things to learn might include: supersymmetry, string theory, loop quantum gravity, twistors, Euclidean quantum gravity, Hawking radiation, the Unruh effect, the Wheeler-de Witt equation, Penrose’s gravitational collapse, 1- and 2-dimensional quantum gravity, gravitational wave astronomy, work on the cosmological constant, …

Acknolwedgments

Thanks to David Poulin for comments and encouragement.

Cluster-state quantum computation

One of my main areas of research interest for the past few years has been measurement-based models of quantum computation. In the standard accounts of quantum computing, a quantum computer is presented as a device that gets its power by performing coherent manipulations of superpositions of computational states, before a final measurement step destroys the superposition, singling out a single computational state to be read out.

Measurement-based quantum computing turns this picture on its head. In such a model, there are no coherent manipulations of superpositions. Instead, it’s just all measurements, all the time. In Debbie Leung’s memorable phrase, we compute by “pinging Nature”.

I’ve just written a short pedagogical review of one of these models, the so-called “one-way quantum computer”, or “cluster-state model” of quantum computation. A simple version of this model was recently implemented in the lab, as reported in Nature by the Zeilinger group.

The review is written to be accessible to anyone with a thorough grounding in basic quantum mechanics. The main part of the review is spent explaining what a quantum computer is, what the cluster state model is, and how clusters can be used to simulate an ordinary quantum computer. At the end, I also explain two new results: (1) a no-go theorem which, subject to some caveats (see paper), forbids us from obtaining the cluster experimentally by cooling a physical system to the ground state; and (2) a proof that clusters must be prepared in two or more dimensions if they are to be useful for quantum computing.

The review was written for a festschrift in honour of Tony Bracken and Angas Hurst, two well-known Australian mathematical physicists. Tony was my fourth-year honours thesis advisor, for which he suggested a wonderful topic: whether there is any connection between sometimes-negative “probability” distribution functions, like the Wigner function, and the Bell inequalities. It was a great topic for a student, combining a fundamental physical aspect with beautiful mathematics, and requiring only a little background to get started.

The so-called breakdown of spacetime

An assertion which is often made is that quantum gravity effects will become important at the Planck length — about [tex]10^{-35}[/tex] meters — and the notion of spacetime will break down at that length. People like to wax lyrical about spacetime turning into some kind of “quantum foam” at that level.

This bugs me. If it really is the case, then why doesn’t the notion of mass doesn’t break down at the Planck mass, which has the rather hefty value of about [tex]10^{-8}[/tex] kilograms? What’s the critical difference between mass and length?

Lectures on quantum information and computation

Here’s viewgraphs and a rough text for eight introductory lectures I gave at a Summer School at UQ in 2002. I point this out here as the links on that page were dead until this afternoon, when I noticed, and fixed them up. Should be fine now.

Finite quantum de Finetti theorems

Attention conservation notice: This post needs knowledge of some elementary quantum mechanics to make sense.

David Poulin gave a seminar today, in which he described some beautiful results in quant-ph/0410229, by Koenig and Renner.

The main result is, I think, quite incredible.

Suppose we have N quantum systems, all with identical state spaces. Suppose rho is some fixed but completely arbitrary quantum state of those N systems.

Now suppose we pick out M of those systems. M is fixed, but we choose the subset of systems completely at random. Letâ€™s call the resulting quantum state when we disregard the remaining systems rhoâ€™.

What can we say about rhoâ€™? Konig and Renner prove that, to a very good approximation, the state can be written in the form:

(*) int P(tau) tau^{\otimes M} d tau,

where the integral is over single-copy density matrices tau, and P(tau) is a normalized probability distribution.

How good an approximation? They prove that the trace distance between rhoâ€™ and the form (*) is at most c M / sqrt{N-M}, where c is some constant. As N goes to infinity, this goes to zero. (This gives rise to the result known as the quantum de Finetti theorem.)

Why is this representation theorem incredible?

From a mathematical point of view, the reason is that states of the form (*) are very special. The way we constructed rhoâ€™, itâ€™s not difficult to see that the resulting state ought to be symmetric, but that is only enough to ensure that rhoâ€™ has a representation like (*), but possibly with P(tau) taking negative values, or with the tau operators not being density matrices. States of the form (*) are far more special.

There is also a reason that is more physical, almost philosophical, in nature. Imagine we have some very large number of systems, all with identical state spaces, but otherwise arbitrary. For example, consider the set of all electrons in the Universe.

Now pick out a small number of those systems, at random. The representation (*) tells us that to a very good approximation we can imagine that the state of the system was prepared by first sampling from the distribution P(tau), and then prepared M identical copies of tau. Furthermore, this is true, no matter what the original state was! The only way the original state enters the picture is that it controls what the distribution P(.) is.

In the N -> infinity limit all of this has been known for years; in recent years it’s received a lot of attention, due to the work of Caves, Fuchs, Schack and collaborators. I must admit, though, that until David’s talk I hadn’t appreciated how remarkable the results were, perhaps because I am always suspicious of any result involving infinite tensor products.

Geodesics and the cosmological constant

Attention conservation notice: This post requires a little general relativity to understand the early bits, and quite a bit of general relativity later on. Later in the post I’ve also used some of the munged LaTeX beloved of theorists collaborating by email; hopefully this won’t obscure the main point.

One of the most exciting developments in physics over the past twenty years was the discovery that the cosmological constant (â€œEinsteinâ€™s greatest mistakeâ€) was not, in fact, a mistake, but appears to be very real.

In standard general relativity â€“ the type where the cosmological constant is zero â€“ it can be shown that test particles follow geodesics of spacetime, i.e., locally, they simply fall freely, with relative acceleration between different particles due to spacetime curvature.

(Note that this geodesic behaviour is sometimes presented as though itâ€™s a fundamental assumption of Einsteinâ€™s theory. Actually, itâ€™s not â€“ it follows as a consequence of the Einstein field equations, as weâ€™ll see below.)

I got to wondering recently what paths test particles take when the cosmological constant is non-zero. Do they still follow geodesics, or is their behaviour modified?

The answer, which Iâ€™m sure is well-known to relativists, is that they still follow geodesics. So even when the cosmological constant is non-zero, particles still fall freely. What does change is the geometry, since the modification of the field equations means that the same distribution of matter will give rise to a different geometry, and thus to different geodesics.

Hereâ€™s an outline of the argument deriving geodesic paths, which is a simple variation on the argument given in Diracâ€™s text on general relativity for the case where the cosmological constant \Lambda = 0.

Start with the field equations, G+ \Lambda g = 8 \pi T. Suppose we evaluate the divergence of both sides. In index notation we obtain:

G^{\mu \nu}_{; \nu} + \Lambda g^{\mu \nu}_{;\nu} = 8 \pi T^{\mu \nu}_{;\nu}

The first term on the left vanishes because of the Bianchi identies, while the second term vanishes because of the metric compatability condition imposed on the connection. This tells us that the divergence of the stress-energy tensor vanishes:

T^{\mu \nu}_{;\nu} = 0

Now, suppose the stress-energy tensor has the form T^{\mu \nu} = \rho v^{\mu} v^{\nu}, where \rho is matter density, and v^\mu is the four-velocity. Using the product rule to evaluate the divergence, and conservation of mass-energy (\rho v^{\mu})_{;\mu} = 0) we end up with the equation v^{\nu} v ^{\mu}_{;\,nu} = 0, which is the geodesic equation.