Introduction to Yang-Mills theories

Yang-Mills theories are a class of classical field theory generalizing Maxwell’s equations. When quantized, Yang-Mills theories form the basis for all successful modern quantum field theories, including the standard model of particle physics, and grand unified theories (GUTs) that attempt to go beyond the standard model.

This post contains working notes (32 pages, pdf only) that I wrote in an attempt to come to a satisfactory personal understanding of Yang-Mills theory. They are part of a larger project of understanding the standard models of particle physics and of cosmology – some related earlier notes are here.

Caveat: The current notes take a geometric approach to Yang-Mills theory, and include quite a bit of background on differential geometry. After completing a first draft, I realized that if I was to write either a pedagogical introduction or a review of Yang-Mills theory, this geometric approach is not the approach I’d prefer to take. Rather, I’d start with a bare statement of the Yang-Mills equations, considered as a generalization of Maxwell’s equations, and then work through a series of examples, only gradually mixing in the geometric approach. This would have the advantage of bringing readers up to speed much more quickly, without needing to absorb reams of differential geometry upfront.

Because of this, I haven’t polished these notes – they remain primarily my personal working notes, and there are various inaccuracies and shortcomings in the notes. I’m content to ignore these – why spend time polishing when you know a better approach is possible – but would appreciate advisement if you spot any serious misconceptions.

Despite these caveats, I believe the notes may be useful to some readers. In particular, if you’d like to understand the approach to Yang-Mills theory from differential geometry, these notes may serve as a useful first step, to be supplemented by additional reading such as the book by Baez and Muniain (“Knots, Gauge Fields and Gravity”, World Scientific 1994) on which the notes are primarily based.


Update: If you’re reading the notes in detail, then you might want to take a look at the comments, esepcially those by David Speyer and Aaron Bergman, who provide some important corrections and extensions.

What is the Universe made of? Part I

This post is, more than usual, a work in progress. It is the first draft of the first installment of a longer article on the subject “What is the Universe made of”. I intend to revise this draft and finish the longer article over the next few weeks, posting it to my blog as I make progress.

This first installment gives a bird’s-eye view of the subject, describing in very broad terms how ideas from particle physics, from cosmology, and from quantum gravity have contributed to our current understanding of what the Universe is made of. It’s really just a warm-up – subsequent installments will be meatier, describing in more detail each of these ideas, how they fit together, and some of the big questions that remain. The next installment will describe the standard model of particle physics in some detail.

The article is intended for a general audience, albeit one with a good grounding in basic science. Physicists hoping for a technical treatment will be disappointed. I certainly can’t claim any great expertise in the subject; while I’m a theoretical physicist, my work has been mostly on quantum information, not particle physics, cosmology, or quantum gravity, and I’m far from being expert on the topics discussed here. If you are an expert, and spot any errors, I’d appreciate hearing about them.

Here’s a link to the article. (PDF only, I’m afraid).

Limits to collective decision making: Arrow’s theorem

What’s the best way for a group of people to make collective decisions? Democracy may be, as Churchill said, the worst system that’s better than anything else, but in practice there’s a lot of variation possible in democractic voting systems. How should we set up voting systems that result in effective collective decisions?

Designing a good voting system seems like a simple task, but it’s surprisingly complicated. A famous result known as Arrow’s theorem, proved by the economist Kenneth Arrow in 1950, vividly demonstrates just how difficult it is. Arrow picked out three properties that (arguably) you would like any good voting system to have – and then mathematically proved that no voting system with all three properties can possibly exist! What is especially remarkable about Arrow’s achievement is that there is no obvious a priori reason to suppose that these three requirements are incompatible. Arrow was a joint recipient of the 1972 Nobel Memorial Prize for economics, in part because of this work.

These notes explain what Arrow’s theorem says, why it’s true (i.e., I give a proof), and discuss briefly what it means for collective cognition. I’ve tried to keep the notes easy to read, favouring a relaxed and discursive discussion over the terse presentation favoured in most mathematical works. I have also tried to assume minimal mathematical background, no more than some facility with basic mathematical notation and mathematical argument. The main drawback of this approach is that the notes have become rather lengthy, clocking in at nearly [tex]3000[/tex] words. Hopefully, however, they’ll be a rewarding read.

My notes are based on a very nice paper by John Geanakoplos (Cowles Foundation discussion paper No.1123RRRR, available at, and reprinted in Economic Theory (2005), 26: 211-215), which gives three proofs of Arrow’s theorem. The proof I give is Geanakoplos’ first (and simplest) proof. Note that there are other (related) results in the literature which go by the name Arrow’s theorem. Understanding the version presented here should enable you to understand the variants with a minimum of fuss. Arrow’s original paper appeared as “A Difficulty in the Concept of Social Welfare” in The Journal of Political Economy, Volume 58, page 328 (1950).

Voting systems

To explain what Arrow’s theorem says, we need to define more precisely what we mean by a voting system. We imagine a population of [tex]n[/tex] “voters”, to whom we assign convenient labels , like [tex]1, \ldots, n[/tex]. This population might be, for example, all the mentally competent people over the age of [tex]18[/tex] in some country (e.g., Australia).

These voters are going to vote amongst [tex]m[/tex] alternative options, which we’ll label with capital letters, [tex]A, B, C, \ldots, Z[/tex], to keep distinct from the voters. These voting options might be the political parties running for control of the Federal Government, for example.

What each voter does is produce an ordered ranking of the options, with ties allowed. For example, voter number [tex]3[/tex] might rank [tex]B[/tex] as number [tex]1[/tex], [tex]A[/tex] and [tex]C[/tex] as a tie for number [tex]2[/tex], [tex]D[/tex] as number [tex]3[/tex], and so on.

(You might wonder whether [tex]D[/tex] shouldn’t really be ranked number [tex]4[/tex], in view of the tie between [tex]B[/tex] and [tex]C[/tex] for the ranking of [tex]2[/tex]. Below we’ll make an assumption about the voting system that means that only the relative ranking matters, not the exact value of the number assigned, and so we’re justified in ignoring this issue.)

A helpful shorthand is to write [tex]S >_v T[/tex] to indicate that voter [tex]v[/tex] ranked [tex]S[/tex] strictly ahead of [tex]T[/tex]. For example, maybe voter [tex]v[/tex] ranked [tex]S[/tex] as [tex]2[/tex] and [tex]T[/tex] as [tex]4[/tex]. It’s worth keeping in mind that this notation for ordering is the reverse of the conventional numerical order – politicians usually campaign to be “number [tex]1[/tex]” rather than “number [tex]10[/tex]”! Similarly, we’ll write [tex]S \geq_v T[/tex] to indicate that voter [tex]v[/tex] ranked [tex]S[/tex] at least as highly as [tex]T[/tex]; this allows that maybe [tex]S[/tex] and [tex]T[/tex] had the same rank. By the way, we’ve been using [tex]S[/tex] and [tex]T[/tex] here simply to indicate generic voting options; we’ll do this throughout, occasionally using [tex]U[/tex] as well.

As an example, imagine there are [tex]5[/tex] voters choosing among [tex]3[/tex] alternatives, [tex]A, B, C[/tex]. The overall voting profile might look something like this:

  • Voter [tex]1[/tex]: [tex]A \rightarrow 1[/tex]; [tex]B \rightarrow 1[/tex]; [tex]C \rightarrow 2[/tex].
  • Voter [tex]2[/tex]: [tex]A \rightarrow 3[/tex]; [tex]B \rightarrow 1[/tex]; [tex]C \rightarrow 2[/tex].
  • Voter [tex]3[/tex]: [tex]A \rightarrow 2[/tex]; [tex]B \rightarrow 1[/tex]; [tex]C \rightarrow 1[/tex].
  • Voter [tex]4[/tex]: [tex]A \rightarrow 1[/tex]; [tex]B \rightarrow 1[/tex]; [tex]C \rightarrow 1[/tex].
  • Voter [tex]5[/tex]: [tex]A \rightarrow 3[/tex]; [tex]B \rightarrow 2[/tex]; [tex]C \rightarrow 1[/tex].

A voting system is a function which takes the profile as input, and produces a single social ranking of the options as its output. For example, we might add up the numerical value of the votes associated to each voting option, and then order the options accordingly. So, for example, option [tex]A[/tex] above has a total vote [tex]1+3+2+1+3=10[/tex], option [tex]B[/tex] a total [tex]1+1+1+1+2=6[/tex] and option [tex]C[/tex] a total [tex]2+2+1+1+1=7[/tex]. As a result, in this voting system the social ranking has [tex]A[/tex] as [tex]3[/tex], [tex]B[/tex] as [tex]1[/tex], and [tex]C[/tex] as [tex]2[/tex].

Once a voting system has been fixed, we write [tex]S > T[/tex] to indicate that [tex]S[/tex] has a higher social rank than [tex]T[/tex]. Similarly, we write [tex]S \geq T[/tex] to indicate that [tex]S[/tex]’s social ranking is at least as high as [tex]T[/tex]’s.

What makes a good voting system?

What properties should a good voting system have?

Obviously, this is a potentially controversial question! Arrow’s theorem is based on several reasonable properties that, arguably, we should expect to be satisfied in any good voting system.

The first property is that the voting system should respect unaninmity, in the sense that if [tex]S >_v T[/tex] for all voters, then we should have [tex]S > T[/tex]. Restating this verbally instead of symbolically, if every voter prefers [tex]S[/tex] to [tex]T[/tex], then the social ranking should place [tex]S[/tex] ahead of [tex]T[/tex]. This is pretty obviously a highly desirable property for any voting system to have – imagine if the electoral commission announced that candidate [tex]A[/tex] had won, despite the fact that every single voter ranked [tex]B[/tex] ahead of [tex]A[/tex]!

The second property is that the voting system should respect the independence of irrelevant alternatives, meaning that the relative social ranking of [tex]S[/tex] and [tex]T[/tex] (higher, lower, or the same) depends only on the relative ranking of [tex]S[/tex] and [tex]T[/tex] by individual voters, and isn’t influenced by the position of other options (e.g., [tex]U[/tex]).

It’s rather less obvious that any good voting system will respect the independence of irrelevant alternatives. For example, the simple voting system I described above (where we summed the votes) turns out not to have this property, yet it’s not so obviously a bad voting system.

However, for the purposes of this discussion we’re going to assume that this property of respecting the independece of irrelevant alternatives is regarded as highly desirable. Can we design a voting system which respects both unanimity and the independence of irrelevant alternatives?

It turns out that if there are just two voting options, [tex]A[/tex] and [tex]B[/tex], then it’s possible to design a voting system with this property. We could, for example, just sum up the the respective votes of the two voting options, and declare the winner to be whichever option has the lower total.

What happens if there are three voting options, [tex]A, B[/tex] and [tex]C[/tex]? What Arrow’s theorem shows in this case is that in any voting system which respects unanimity and the independence of irrelevant alternatives there must automatically be a voter, [tex]d[/tex], who can act as a dictator for the voting system, in the sense that if [tex]S >_d T[/tex], then [tex]S > T[/tex], no matter how the other voters rank their options!

Stated another way, what Arrow’s theorem shows is that the requirements of respecting unanimity and the independence of irrelevant alternatives are incompatible with a third desirable requirement, namely that the voting system not be a dictatorship.

In full generality, what Arrow’s theorem shows is as follows:

Arrow’s theorem: Suppose we have a voting system to rank [tex]3[/tex] or more voting options. Suppose that system respects unanimity and the independence of independent alternatives. Then there must be a dictator for the voting system.

This theorem ought to shock you. How can the assumptions of unanimity and independence of irrelevant alternatives possibly imply the existence of a dictator?

I’ll now give a short proof of Arrow’s theorem, to give you some feeling for why the theorem is true. However, much of interest can be said about Arrow’s theorem even if you don’t understand the proof, so if you’re so inclined you should feel free to skim or skip the following proof. Of course, those who want to understand Arrow’s theorem deeply should spend some time trying to prove it themselves, before reading the proof in detail.

Proof of Arrow’s theorem: The first step of the proof is to argue that if every voter ranks [tex]S[/tex] either strictly first or strictly last, then [tex]S[/tex]’s social ranking must be either strictly first or strictly last. To see this, we use a proof by contradiction, supposing that in fact [tex]S[/tex]’s social ranking is neither strictly first nor strictly last. That is, we suppose that there exist [tex]T[/tex] and [tex]U[/tex] such that [tex]T \geq S \geq U[/tex]. Suppose we rearrange each voter’s rankings so that [tex]U >_v T[/tex], but without changing the relative ranking of [tex]S[/tex] and [tex]T[/tex], or of [tex]S[/tex] and [tex]U[/tex], so the rearrangement doesn’t affect the fact that [tex]T \geq S[/tex] and [tex]S \geq U[/tex]. (Understanding why such a rearrangement can always be done requires a little thought, and maybe some working out on a separate sheet of paper, which you should do.) Unanimity then requires that [tex]U > T[/tex], which is inconsistent with the fact that [tex]T \geq S \geq U[/tex]. This is the desired contradiction.

Suppose now that we start out with a profile in which [tex]S[/tex] is ranked strictly first by every voter, and thus by unanimity must be ranked strictly first in the social ranking. Suppose we move through the voting population, and for each voter in turn change their ranking for [tex]S[/tex] from strictly first to strictly last. When this has been done for all the voters, unanimity implies that [tex]S[/tex] must be ranked strictly last, and so at some point we see that there must be a voter (who will turn out to be the dictator, [tex]d[/tex]), whose vote change causes [tex]S[/tex] to move from first to last in the social ranking.

Consider the voting profile immediately before [tex]d[/tex] changes their vote. We will say that any profile which has the same rankings for [tex]S[/tex] as this profile is an [tex]S[/tex]-profile. Similarly, an [tex]S'[/tex]-profile is one which has the same rankings for [tex]S[/tex] as the profile just after [tex]d[/tex] changes their vote. It follows from the independence of irrelevant alternatives that the overall ranking of [tex]S[/tex] must be first in any [tex]S[/tex]-profile, and last in any [tex]S'[/tex]-profile.

Suppose now that [tex]T[/tex] and [tex]U[/tex] are voting options that are not equal to [tex]S[/tex]. Suppose [tex]d[/tex] ranks [tex]T[/tex] higher than [tex]U[/tex], i.e., [tex]T >_d U[/tex]. We will show that we must have [tex]T > U[/tex], no matter how the other voters rank [tex]T[/tex] and [tex]U[/tex]. That is, [tex]d[/tex]’s vote dictates that [tex]T[/tex] be ranked above [tex]U[/tex] in the social ranking. To see this, first we rearrange the profile so that it becomes an [tex]S[/tex]-profile, without changing the relative ordering of [tex]T[/tex] and [tex]U[/tex] anywhere, and so not affecting whether [tex]T > U[/tex] or not. We can do this simply by changing each voter’s ranking for [tex]S[/tex] in an appropriate way, placing it either at the top or the bottom of their ranking. Second, we change [tex]d[/tex]’s rankings so that [tex]T >_d S >_d U[/tex]. This can be done without changing the relative ranking of [tex]T[/tex] and [tex]U[/tex], and so does not affect whether [tex]T > U[/tex]. We call the resulting voting profile after these two rearrangements the final profile. Observe that in the final profile we have [tex]S > U[/tex], since if [tex]d[/tex] changes [tex]S[/tex] to be ranked strictly first then we have an [tex]S[/tex]-profile. Similarly, in the final profile we have [tex]T > S[/tex], since if [tex]d[/tex] changes [tex]S[/tex] to be strictly last, then we have an [tex]S'[/tex]-profile. As a result, we have [tex]T > S > U[/tex] in the final profile, and thus [tex]T > U[/tex]. But we constructed the final profile so that [tex]T > U[/tex] holds only if [tex]T > U[/tex] also in the actual voting profile, and so we must have had [tex]T > U[/tex] in the actual voting profile, as we desired to show.

The final step of the proof is to argue that [tex]d[/tex] can also dictate the order of [tex]S[/tex] and [tex]T[/tex], for an arbitrary [tex]T \neq S[/tex]. To see this, pick an element [tex]U[/tex] which is neither [tex]S[/tex] nor [tex]T[/tex], and consider an [tex]S[/tex]-profile in which [tex]U[/tex] is placed strictly last everywhere that [tex]S[/tex] is first, and vice versa. The results of the first and second paragraph of this proof imply that [tex]d[/tex] can change the rank of [tex]U[/tex] from last to first simply by changing their vote. We now apply the same argument as in the paragraph before this, but with [tex]U[/tex] taking the place of [tex]S[/tex], to argue that [tex]d[/tex] can dictate the relative ordering of [tex]S[/tex] and [tex]T[/tex]. QED

Arrow’s theorem is a striking result in the theory of collective decision making. It shows the great advantages that can come by formalizing ideas in a simple mathematical model – the fact that the model can lead to striking unforseen conclusions that entirely change our perception of the phenomenon. More prosaically, by thinking closely about our values and our desired social outcomes, we can try to formalize models of those properties, and then study which models of collective decision making best respect those properties (if, indeed, any such models exists).

What of the implications of Arrow’s theorem for voting? It is, of course, true that the restrictions in Arrow’s theorem can be relaxed, and there is a large literature studying other models of voting and the extent to which they can be made fair. I’m not an expert on this literature, and so won’t comment other than to point out that economists have, of course, not been silent on the matter in the 50 plus years since Arrow’s paper!

My own interest in the subject is due to a more general hobby interest in the problem of collective cognition. In particular, I’m interested in the question of how we can design institutions which result in good collective decision making. It seems that the subject of institutional design is still in its infancy, and I find it remarkable how small a fraction of “institution space” humans have explored. One of the most interesting things about the web, in my opinion, is that it has greatly cut the cost of developing new institutions, and as a result we’re seeing new institutional models, and new types of collective cognition, develop at an incredible rate.

Expander graphs: the complete notes

The full pdf text of my series of posts about expander graphs. Thankyou very much to all the people who commented on the posts; if you’re reading this text, and haven’t seen the comments on earlier posts, I recommend you look through them to see all the alternate proofs, generalizations and so on that people have offered.

Optimal photons

Optimal photons for quantum information processing, joint with Peter Rohde and Tim Ralph.

Producing single photons is hard work, and there’s no really good single photon sources available, but a plethora of theoretical proposals for sources.

Perhaps somewhat surprisingly, not all photons are created equal. In particular, getting the kind of interference effects necessary for quantum information processing depends a lot on the shape of the wavepacket produced by the source: if you have the wrong shape wavepacket, even a tiny amount of noise may destroy the interference effects. For this reason, it’s important to understand which sources produce photons for which interferenceis stable against the noise.

That’s the subject of this paper. In particular, the main result is to show that for a wide variety of possible applications, Gaussian wavepackets produce the most stable interference effects, with the implication that people designing sources should look for sources which produce Gaussian or near Gaussian wavepackets.

Journal club on quantum gravity

The following post is based on some notes I prepared for a journal club talk I’m going to give on quantum gravity in a few hours. A postscript equivalent is here, with a few modifications.

Disclaimer: The whole point of our journal club talks is to give talks on interesting topics about which we are not experts! For me, quantum gravity fits this description in spades. Caveat emptor.


Every physicist learns as an undergraduate (if not before) that we don’t yet have a single theory unifying quantum mechanics and general relativity, i.e., a theory of quantum gravity. What is often not explained is why it is difficult to come up with such a theory. In this journal club I want to ask and partially answer two questions: (1) what makes it so difficult to put quantum mechanics and general relativity together; and (2) what approaches might one take to developing a theory of quantum gravity?

You might wonder if this is an appropriate topic for a forum such as this. After all, none of us here, including myself, are experts on string theory, loop quantum gravity, twistors, or any of the other approaches to quantum gravity that have been proposed and are currently being pursued.

However, we don’t yet know that any of these approaches is correct, and so there’s no harm in going back and thinking through some of the basic aspects of the problem, from asn elementary point of view. This can be done by anyone who knows the rudiments of quantum mechanics and of the general theory of relativity.

If you like, you can view it as trying to solve the problem of quantum gravity without first “looking in the back of the book” to see the best attempted answers that other people have come up with. This procedure of first thinking things through for yourself has the advantage that it is likely to greatly increase the depth of your understanding of other people’s work if you later do investigate topics such as string theory, etc.

A related disclaimer is that I personally know only a miniscule fraction of all the modern thinking on quantum gravity. I prepared this lecture to force myself to think through in a naive way some of the problems involved in constructing a quantum theory of gravity, only pausing occasionally to peek in the back of the book. I won’t try to acknowledge my sources, which were many, but suffice to say that I doubt there’s anything here that hasn’t been thought before. Furthermore, people who’ve thought hard about quantum gravity over and extended period are likely to find much of what I say obvious, naive, absurd, or some combination thereof. Frankly, I don’t recommend that such people look through these notes — they’ll likely find it rather frustrating! For those less expert even than myself, perhaps you’ll find these notes a useful entertainment, and maybe they’ll stimulate you to think further on the subject.

Standard formulations of quantum mechanics and general relativity

Let’s start off by reminding ourselves of the standard formulations used for quantum mechanics and general relativity. I expect that most attendees at this journal club are extremely familiar with the basic principles of quantum mechanics, and, indeed, use them every day of their working lives. You may be rather less familiar with general relativity. I’ve tried to construct the lecture so you can follow the overall gist, even so.

Recall that the standard formulation of quantum mechanics contains the following elements:

  • The postulate that for every physical system there is a state vector in a Hilbert space which provides the most complete possible description of that system.
  • The postulate that the dynamics of a closed quantum system are described by a Hamiltonian and Schroedinger’s equation.
  • The postulate that a measurement on a system is described using an observable, a Hermitian operator acting on state space, which is used to describe measurement according to some rule for: (1) calculating measurement probabilities; and (2) describing the relationship between prior and posterior states.
  • The postulate that the state space for a composite quantum system is built up by taking the tensor product of individual state spaces. In the special case when those systems are indistinguishable, the postulate is modified so that the state space is either the symmetric or antisymmetric subspace of the total tensor product, depending on whether the systems are bosons or fermions.

It’s worth pointing out that this is merely the most common formulation of quantum mechanics. Other formulations are possible, and may be extremely valuable. It’s certainly possible that the right way of constructing a quantum theory of gravity is to start from some different formulation of quantum mechanics. My reason for describing this formulation of quantum mechanics — probably the most commonly used formulation — is so that we’re all working off the same page.

Let’s switch now to discuss general relativity. Recall that the standard formulation of general relativity contains the following elements:

  • The postulate that spacetime is a four-dimensional pseudo-Riemannian manifold, with metric signature (+1,-1,-1,-1).
  • The postulate that material in spacetime is described by a two-index tensor T known as the stress-energy tensor. The stress-energy tensor describes not only thinks like mass and energy, but also describes the transport of mass and energy, so it has aspects that are both static and dynamic.
  • The postulate known as the Einstein field equations: [tex]G = 8\pi T[/tex]. This postulate connects the stress-energy tensor T to the Einstein tensor, G. In its mathematical definition G is fundamentally a geometric object, i.e., it is determined by the “shape” of spacetime. The physical content of the Einstein field equations is therefore that the shape of spacetime is determined by the matter distribution, and vice versa.An interesting point is that because the stress-energy tensor contains components describing the transport of matter, the transport properties of matter are actually determined by the geometry. For example, it can easily be shown that, as a consequence of the Einstein field equations, test particles follow geodesics of spacetime.
  • Since 1998 it has been thought that the Einstein equations need to be modifed, becoming [tex]G+\Lambda g = 8 \pi T[/tex], where g is the metric tensor, and [tex]\Lambda[/tex] is a non-zero constant known as the cosmological constant. Rather remarkably, it turns out that, once again, test particles follow geodesics of spacetime. However, for a given stress-energy tensor, the shape of spacetime will itself be different, and so the geodesics will be different.

In an ideal world, of course, we wouldn’t just unify quantum mechanics and general relativity. We’d actually construct a single theory which incorporates both general relativity and the entire standard model of particle physics. So it’s arguable that we shouldn’t just be thinking about the standard formulation of quantum mechanics, but rather about the entire edifice of the standard model. I’m not going to do that here, because: (1) talking about vanilla quantum mechanics is plenty enough for one lecture; (2) it illustrates many of the problems that arise in the standard model, anyway; and (3) I’m a lot more comfortable with elementary quantum mechanics than I am with the standard model, and I expect much of my audience is, too.

Comparing the elements of general relativity and quantum mechanics

Let’s go through and look at each element in the standard formulations of general relativity and quantum mechanics, attempting as we do to understand some of the problems which arise when we try to unify the two theories.

Before getting started with the comparisons, let me make an aside on my presentation style. Conventionally, a good lecture is much like a good movie or a good book, in that a problem or situation is set up, preferably one involving high drama, the tension mounts, and then the problem is partially or fully resolved. Unfortunately, today is going to be a litany of problems, with markedly little success in resolution, and so the lecture may feel a little unsatisfying for those hoping, consciously or unconsciously, for a resolution.

Spacetime: In standard quantum mechanics, we usually work with respect to a fixed background spacetime of allowed configurations. By contrast, in general relativity, the metric tensor specifying the structure of spacetime is one of the physical variables of the theory. If we follow the usual prescriptions of quantum mechanics, we conclude that the metric tensor itself ought to be replaced by some suitable quantum mechanical observable, or set of observables. If one does this, it is no longer so clear that space and time can be treated as background parameters in quantum mechanics. How, for example, are we supposed to treat Schroedinger’s equation, when the physical structure of time itself is variable? Perhaps we ought to aim for an effective equation of the form

[tex]i \frac{d|\psi\rangle}{d\langle t \rangle} = H |\psi\rangle [/tex]

derived from some deeper underlying theory?

Stress-energy tensor: In general relativity T is used to describe the configuration of material bodies. Standard quantum mechanics tells us that T needs to be replaced by a suitable set of observables. In and of itself this is not obviously a major problem. However, a problem arises (again) in connection with the possible quantization of space and time. As usually understood in general relativity, T is a function of location p on the underlying four-dimensional manifold. The natural analogue in a quantized version is an observable [tex]\hat T(p)[/tex] which is again a function of position on the manifold. However, as described above, it seems likely that p itself should be replaced by some quantum equivalent, and it is not so clear how [tex]\hat T[/tex] ought to be constructed then. One possibility is that [tex]\hat T[/tex] becomes a function of some suitable [tex]\hat p[/tex]. A related problem is that the standard definition of the components of T often involve tangent vectors (essentially, velocity 4-vectors) to the underlying manifold. As for the position, p, perhaps such tangent vectors should be replaced by quantized equivalents.

Einstein field equations (with and without the cosmological constant): Consider the usual general relativistic formulation of the field equations: [tex]G+\Lambda g = 8\pi T[/tex]. The problem with constructing a quantum version ought by now to be obvious: quantum mechanics tells us that the quantities on the left — geometric quantities, to do with the shape of spacetime — are all associated with some notion of a background configuration, ordinarily left unquantized, while the quantities on the right are physical variables that ought to be quantized.

One natural speculation in this vein is that in any quantum theory of gravity we ought to have

[tex] G+\Lambda g = 8 \pi \langle T \rangle[/tex]

as an effective equation of the theory.

Hilbert space and quantum states: There is no obvious incompatability with general relativity, perhaps because it is so unclear which Hilbert space or quantum state one might use in a description of gravitation.

The Hamiltonian and Schroedinger’s equation: As already mentioned, this presents a challenge because it is not so clear how to describe time in quantum gravity. Something else which is of concern is that for many standard physical forms Schroedinger’s equation often gives rise to faster than light effects. In order to alleviate this problem we must move to a relativistic wave equation, or to a quantum field theory.

In this vein, let me mention one natural candidate description for the dynamics of a free (quantum) test particle moving in the background of a fixed (classical) spacetime. First, start with a relativistically invariant wave equation such as the Klein-Gordon equation, which can be used to describe a free spin zero particle,

[tex] -\hbar^2 \frac{\partial^2 \psi}{\partial^2 t} = -\hbar^2 c^2 \nabla^2 \psi + m^2 c^4 \psi,[/tex]

or the Dirac wave equation, which can be used to describe a free spin 1/2 particle,

[tex] i \hbar \frac{\partial \psi}{\partial t} = \left(i \hbar c \alpha \cdot \nabla – \beta mc^2 \right) \psi,[/tex]

where [tex]\alpha_x,\alpha_y,\alpha_z[/tex] and [tex]\beta[/tex] are the four Dirac matrices. In the case of the Klein-Gordon equation there is a clear prescription for how to take this over to a curved spacetime: simply replace derivatives by appropriate covariant derivatives, giving:

[tex] -\hbar^2 \nabla^2_; \psi = m^2 c^2 \psi.[/tex]

In flat spacetime this will have the same behaviour as the Klein-Gordon equation. In a fixed background curved spacetime we would expect this equation to describe a free spin zero test particle.

The same basic procedure can be followed in the case of the Dirac equation, replacing derivatives wherever necessary by covariant derivatives. I have not explicitly checked that the resulting equation is invariantly defined, but expect that it is (exercise!), and can be used to describe a free spin 1/2 test particle in a fixed background curved spacetime. It would be interesting to study the solutions of such equations for some simple nontrivial geometries, such as the Schwarzschild geometry. For metrics with sufficient symmetry, it may be possible to obtain analytic (or at least perturbative) solutions; in any case, it should be possible to investigate these problems numerically.

Of course, although it would be interesting to study this prescription, we should expect it to be inadequate in various ways. We have described a means of studying a quantum test particle moving against a fixed classical background spacetime. In reality: (1) the background may not be classical; (2) the particle itself modifies the background; and (3) because of quantum indeterminancy, the particle may modify the background in different ways. In the language of the many-worlds interpretation, it seems reasonable to expect that the which branch of the wavefunction we are in (representing different particle positions) may have some bearing on the structure of spacetime itself: in particular, different branches will correspond to different spacetimes.

This discussion highlights another significant incompatibility between general relativity and quantum mechanics. In general relativity, we know that test particles follow well-defined trajectories — geodesics of spacetime. This is a simple consequence of the field equations themselves. In quantum mechanics, no particle can follow a well-defined trajectory: the only way this could happen is if the Hamiltonian commuted with the position variables, in which case the particle would be stationary. In any case, this commutation condition can not occur when the momentum contributes to the Hamiltonian, as is typically the case.

Observables: One striking difference between quantum mechanics and general relativity is that the description of measurement is much more complex in the former. Several questions that might arise include:

  • Should wave function collapse occur instantaneously? This depends on how one interprest the wave function.
  • Should measurements be a purely local phenomena, or can we make a measurement across an entire slice of spacetime? Across all of spacetime?
  • Should we worry that in the usual description of measurement, time and space are treated in a manifestly unsymmetric manner?
  • What observables would one expect to have in a quantum theory of gravity?

The tensor product structure and indistinguishable particles:One cause for concern here is that the notion of distinguishability itself is often framed in terms of the spatial separation of particles. If the structure of space itself really ought to be thought of in quantum terms, it is perhaps not so clear that the concepts of distinguishable, indistinguishable, and spatially separated particles even make sense. This may be a hint that in a quantum theory of gravity such concepts may be absent at the foundation, though they would need to emerge as consequences of the theory.

Quantum field theory: So far, we’ve concentrated on identifying incompatabilities between general relativity and quantum mechanics. Of course, fundamental modern physics is cast in terms of an extension of quantum mechanics known as quantum field theory, and it is worth investigating what problems arise when one attempts to unify general relativity with the entire edifice of quantum field theory. We won’t do this in any kind of fullness here, but will make one comment in relation to the canonical quantization procedure usually used to construct quantum field theories. The standard procedure is to start from some classical field equation, such as the wave equation, [tex](\nabla^2 – 1/c^2 \partial^2 / \partial t^2 ) \phi = 0[/tex], to expand the solution as a linear combination of solutions for individual field modes, to regard the different mode coefficients as dynamical variables, and to then quantize by imposing canonical commutation relationships on those variables. This procedure can be carried out for many of the standard field equations, such as the wave equation, the Dirac equation, and the Klein-Gordon equation, because in each case the equation is a linear equation, and thus the solution space has a linear structure. In the case of general relativity, the field equations are nonlinear in what seems like the natural field variables — the metric tensor — and it is not possible to even get started with this procedure. One could, of course, try linearizing the field equations, and starting from there. My understanding is that when this is done the resulting quantum field theory is nonrenormalizable (?), and thus unsatisfactory.


Perhaps the most striking feature of the above discussion is an asymmetry between general relativity and quantum mechanics. Quantum mechanics, like Newton’s laws of motion, is not so much a physical theory as a framework for constructing physical theories, with many important quantities (the state, the state space, the Hamiltonian, the relevant observables) left unspecified. General relativity is much more prescriptive, specifying as it does an equation relating the distribution of material entities to the shape of spacetime, and, as a consequence, controlling the matter-energy dynamics. Once we’ve set up the initial matter-energy distribution and structure of spacetime, general relativity gives us no further control. In the analogous quantum mechanical situation we still have to specify the dynamics, and the measurements to be performed.

There is therefore a sense in which quantum mechanics is a more wideranging and flexible framework than general relativity. This is arguably a bug, not a feature, since one of general relativity’s most appealing points is its prescriptiveness; once we have the Einstein equations, we get everything else for free, in some sense. However, it also suggests that while the right approach may be to extend the quantum mechanical framework to incorporate general relativity, it is exceedingly unlikely that the right approach is to extend general relativity to incorporate quantum mechanics. On the other hand, it may also be that some extension or reformulation of quantum mechanics is necessary to incorporate gravity. Such an extension would have to be carried out rather carefully: results such as Gleason’s theorem show that quantum mechanics is surprisingly sensitive to small changes.

As an aside, let me also take this opportunity to point out something which often bugs me: the widely-made assertion that quantum gravity effects will become important at the Planck length — about [tex]10^{-35}[/tex] meters — and the notion of spacetime will break down at that length. Anyone claiming this, in my opinion, ought to be asked why the notion of mass doesn’t break down at the Planck mass, which has the rather hefty value of about [tex]10^{-8}[/tex] kilograms.

A toy model

Just for fun, let me propose a simple toy model for quantum gravity, inspired by the Klein-Gordon equation. I’m sure this is wrong or inadequate somehow, but after an hour or so’s thought, I can’t yet see why. I include it here primarily as a stimulant to further thought.

The idea is to look for a four-dimensional pseudo-Riemannian manifold M, with metric signature (-,+,+,+), and a function [tex]\psi : M \rightarrow C[/tex], such that the following equations have a solution:

[tex]G + \Lambda g = 8 \pi T [/tex]

[tex] T^{\mu \nu} = v^\mu v^\nu [/tex]

[tex] v^0 = \frac{i\hbar}{2mc^2}( \psi^* \psi^{;0}- \psi \psi^{;0 *})[/tex]

[tex] v^j = \frac{-i\hbar}{2m}( \psi^* \psi^{;j}- \psi \psi^{;j *}),[/tex]

where m, c, [tex]\Lambda[/tex] are all constants with their usual meanings, j = 1,2,3, and the expression for [tex]T^{\mu \nu}[/tex] may need a proportionality constant, probably related to m, out the front. The expressions for [tex]v^0[/tex] and [tex]v^j[/tex] are covariant versions of the corresponding expressions for the charge and current densities associated to the Klein-Gordon equation — see Chapter~13 of Schiff’s well-known text on quantum mechanics (3rd ed., Mc-Graw Hill, 1968); note that Schiff calls this equation the “relativistic Schroedinger equation”. A subtlety is that the covariant derivative itself depends on the metric g, and so these equations are potentially extremely restrictive; it is by no means obvious that a solution ever exists. However, if we take seriously the idea that [tex]T^{\mu \nu}[/tex] needs a proportionality constant related to m, then we can see that in the test particle limit, [tex]m \rightarrow 0[/tex], these equations have as a solution any [tex]\psi[/tex], and flat spacetime, which is not unreasonable.


The picture I have painted is somewhat bleak, which is perhaps not surprising: finding a quantum theory of gravity is not a trivial problem! However, the good news is that many further steps naturally suggest themselves:

  • At many points, my analysis has been incomplete, in that I haven’t thoroughly mapped out a catalogue of all the possible alternatives. A more thorough analysis of the possibilities should be done.
  • The analysis needs to be extended to incorporate modern relativistic quantum field theory.
  • Computer pioneer Alan Kay has said “A change of perspective is worth 80 IQ points”. It would be fruitful to repeat this exercise from the point of view of some of the other formulations people have of general relativity and quantum mechanics. I’d particularly like to do this for the initial value and action formulations of general relativity, and for the quasidistribution and nonlocal hidden variable formulations of quantum mechanics. It may also be useful to attempt to construct modifications of either or both theories in order to solve some of the problems that we’ve described here.
  • Read up on some of the work that other people have done on quantum gravity, from a variety of points of view. Things to learn might include: supersymmetry, string theory, loop quantum gravity, twistors, Euclidean quantum gravity, Hawking radiation, the Unruh effect, the Wheeler-de Witt equation, Penrose’s gravitational collapse, 1- and 2-dimensional quantum gravity, gravitational wave astronomy, work on the cosmological constant, …


Thanks to David Poulin for comments and encouragement.