Introduction to Yang-Mills theories

Yang-Mills theories are a class of classical field theory generalizing Maxwell’s equations. When quantized, Yang-Mills theories form the basis for all successful modern quantum field theories, including the standard model of particle physics, and grand unified theories (GUTs) that attempt to go beyond the standard model.

This post contains working notes (32 pages, pdf only) that I wrote in an attempt to come to a satisfactory personal understanding of Yang-Mills theory. They are part of a larger project of understanding the standard models of particle physics and of cosmology – some related earlier notes are here.

Caveat: The current notes take a geometric approach to Yang-Mills theory, and include quite a bit of background on differential geometry. After completing a first draft, I realized that if I was to write either a pedagogical introduction or a review of Yang-Mills theory, this geometric approach is not the approach I’d prefer to take. Rather, I’d start with a bare statement of the Yang-Mills equations, considered as a generalization of Maxwell’s equations, and then work through a series of examples, only gradually mixing in the geometric approach. This would have the advantage of bringing readers up to speed much more quickly, without needing to absorb reams of differential geometry upfront.

Because of this, I haven’t polished these notes – they remain primarily my personal working notes, and there are various inaccuracies and shortcomings in the notes. I’m content to ignore these – why spend time polishing when you know a better approach is possible – but would appreciate advisement if you spot any serious misconceptions.

Despite these caveats, I believe the notes may be useful to some readers. In particular, if you’d like to understand the approach to Yang-Mills theory from differential geometry, these notes may serve as a useful first step, to be supplemented by additional reading such as the book by Baez and Muniain (“Knots, Gauge Fields and Gravity”, World Scientific 1994) on which the notes are primarily based.


Update: If you’re reading the notes in detail, then you might want to take a look at the comments, esepcially those by David Speyer and Aaron Bergman, who provide some important corrections and extensions.

What is the Universe made of? Part I

This post is, more than usual, a work in progress. It is the first draft of the first installment of a longer article on the subject “What is the Universe made of”. I intend to revise this draft and finish the longer article over the next few weeks, posting it to my blog as I make progress.

This first installment gives a bird’s-eye view of the subject, describing in very broad terms how ideas from particle physics, from cosmology, and from quantum gravity have contributed to our current understanding of what the Universe is made of. It’s really just a warm-up – subsequent installments will be meatier, describing in more detail each of these ideas, how they fit together, and some of the big questions that remain. The next installment will describe the standard model of particle physics in some detail.

The article is intended for a general audience, albeit one with a good grounding in basic science. Physicists hoping for a technical treatment will be disappointed. I certainly can’t claim any great expertise in the subject; while I’m a theoretical physicist, my work has been mostly on quantum information, not particle physics, cosmology, or quantum gravity, and I’m far from being expert on the topics discussed here. If you are an expert, and spot any errors, I’d appreciate hearing about them.

Here’s a link to the article. (PDF only, I’m afraid).

Limits to collective decision making: Arrow’s theorem

What’s the best way for a group of people to make collective decisions? Democracy may be, as Churchill said, the worst system that’s better than anything else, but in practice there’s a lot of variation possible in democractic voting systems. How should we set up voting systems that result in effective collective decisions?

Designing a good voting system seems like a simple task, but it’s surprisingly complicated. A famous result known as Arrow’s theorem, proved by the economist Kenneth Arrow in 1950, vividly demonstrates just how difficult it is. Arrow picked out three properties that (arguably) you would like any good voting system to have – and then mathematically proved that no voting system with all three properties can possibly exist! What is especially remarkable about Arrow’s achievement is that there is no obvious a priori reason to suppose that these three requirements are incompatible. Arrow was a joint recipient of the 1972 Nobel Memorial Prize for economics, in part because of this work.

These notes explain what Arrow’s theorem says, why it’s true (i.e., I give a proof), and discuss briefly what it means for collective cognition. I’ve tried to keep the notes easy to read, favouring a relaxed and discursive discussion over the terse presentation favoured in most mathematical works. I have also tried to assume minimal mathematical background, no more than some facility with basic mathematical notation and mathematical argument. The main drawback of this approach is that the notes have become rather lengthy, clocking in at nearly [tex]3000[/tex] words. Hopefully, however, they’ll be a rewarding read.

My notes are based on a very nice paper by John Geanakoplos (Cowles Foundation discussion paper No.1123RRRR, available at, and reprinted in Economic Theory (2005), 26: 211-215), which gives three proofs of Arrow’s theorem. The proof I give is Geanakoplos’ first (and simplest) proof. Note that there are other (related) results in the literature which go by the name Arrow’s theorem. Understanding the version presented here should enable you to understand the variants with a minimum of fuss. Arrow’s original paper appeared as “A Difficulty in the Concept of Social Welfare” in The Journal of Political Economy, Volume 58, page 328 (1950).

Voting systems

To explain what Arrow’s theorem says, we need to define more precisely what we mean by a voting system. We imagine a population of [tex]n[/tex] “voters”, to whom we assign convenient labels , like [tex]1, \ldots, n[/tex]. This population might be, for example, all the mentally competent people over the age of [tex]18[/tex] in some country (e.g., Australia).

These voters are going to vote amongst [tex]m[/tex] alternative options, which we’ll label with capital letters, [tex]A, B, C, \ldots, Z[/tex], to keep distinct from the voters. These voting options might be the political parties running for control of the Federal Government, for example.

What each voter does is produce an ordered ranking of the options, with ties allowed. For example, voter number [tex]3[/tex] might rank [tex]B[/tex] as number [tex]1[/tex], [tex]A[/tex] and [tex]C[/tex] as a tie for number [tex]2[/tex], [tex]D[/tex] as number [tex]3[/tex], and so on.

(You might wonder whether [tex]D[/tex] shouldn’t really be ranked number [tex]4[/tex], in view of the tie between [tex]B[/tex] and [tex]C[/tex] for the ranking of [tex]2[/tex]. Below we’ll make an assumption about the voting system that means that only the relative ranking matters, not the exact value of the number assigned, and so we’re justified in ignoring this issue.)

A helpful shorthand is to write [tex]S >_v T[/tex] to indicate that voter [tex]v[/tex] ranked [tex]S[/tex] strictly ahead of [tex]T[/tex]. For example, maybe voter [tex]v[/tex] ranked [tex]S[/tex] as [tex]2[/tex] and [tex]T[/tex] as [tex]4[/tex]. It’s worth keeping in mind that this notation for ordering is the reverse of the conventional numerical order – politicians usually campaign to be “number [tex]1[/tex]” rather than “number [tex]10[/tex]”! Similarly, we’ll write [tex]S \geq_v T[/tex] to indicate that voter [tex]v[/tex] ranked [tex]S[/tex] at least as highly as [tex]T[/tex]; this allows that maybe [tex]S[/tex] and [tex]T[/tex] had the same rank. By the way, we’ve been using [tex]S[/tex] and [tex]T[/tex] here simply to indicate generic voting options; we’ll do this throughout, occasionally using [tex]U[/tex] as well.

As an example, imagine there are [tex]5[/tex] voters choosing among [tex]3[/tex] alternatives, [tex]A, B, C[/tex]. The overall voting profile might look something like this:

  • Voter [tex]1[/tex]: [tex]A \rightarrow 1[/tex]; [tex]B \rightarrow 1[/tex]; [tex]C \rightarrow 2[/tex].
  • Voter [tex]2[/tex]: [tex]A \rightarrow 3[/tex]; [tex]B \rightarrow 1[/tex]; [tex]C \rightarrow 2[/tex].
  • Voter [tex]3[/tex]: [tex]A \rightarrow 2[/tex]; [tex]B \rightarrow 1[/tex]; [tex]C \rightarrow 1[/tex].
  • Voter [tex]4[/tex]: [tex]A \rightarrow 1[/tex]; [tex]B \rightarrow 1[/tex]; [tex]C \rightarrow 1[/tex].
  • Voter [tex]5[/tex]: [tex]A \rightarrow 3[/tex]; [tex]B \rightarrow 2[/tex]; [tex]C \rightarrow 1[/tex].

A voting system is a function which takes the profile as input, and produces a single social ranking of the options as its output. For example, we might add up the numerical value of the votes associated to each voting option, and then order the options accordingly. So, for example, option [tex]A[/tex] above has a total vote [tex]1+3+2+1+3=10[/tex], option [tex]B[/tex] a total [tex]1+1+1+1+2=6[/tex] and option [tex]C[/tex] a total [tex]2+2+1+1+1=7[/tex]. As a result, in this voting system the social ranking has [tex]A[/tex] as [tex]3[/tex], [tex]B[/tex] as [tex]1[/tex], and [tex]C[/tex] as [tex]2[/tex].

Once a voting system has been fixed, we write [tex]S > T[/tex] to indicate that [tex]S[/tex] has a higher social rank than [tex]T[/tex]. Similarly, we write [tex]S \geq T[/tex] to indicate that [tex]S[/tex]’s social ranking is at least as high as [tex]T[/tex]’s.

What makes a good voting system?

What properties should a good voting system have?

Obviously, this is a potentially controversial question! Arrow’s theorem is based on several reasonable properties that, arguably, we should expect to be satisfied in any good voting system.

The first property is that the voting system should respect unaninmity, in the sense that if [tex]S >_v T[/tex] for all voters, then we should have [tex]S > T[/tex]. Restating this verbally instead of symbolically, if every voter prefers [tex]S[/tex] to [tex]T[/tex], then the social ranking should place [tex]S[/tex] ahead of [tex]T[/tex]. This is pretty obviously a highly desirable property for any voting system to have – imagine if the electoral commission announced that candidate [tex]A[/tex] had won, despite the fact that every single voter ranked [tex]B[/tex] ahead of [tex]A[/tex]!

The second property is that the voting system should respect the independence of irrelevant alternatives, meaning that the relative social ranking of [tex]S[/tex] and [tex]T[/tex] (higher, lower, or the same) depends only on the relative ranking of [tex]S[/tex] and [tex]T[/tex] by individual voters, and isn’t influenced by the position of other options (e.g., [tex]U[/tex]).

It’s rather less obvious that any good voting system will respect the independence of irrelevant alternatives. For example, the simple voting system I described above (where we summed the votes) turns out not to have this property, yet it’s not so obviously a bad voting system.

However, for the purposes of this discussion we’re going to assume that this property of respecting the independece of irrelevant alternatives is regarded as highly desirable. Can we design a voting system which respects both unanimity and the independence of irrelevant alternatives?

It turns out that if there are just two voting options, [tex]A[/tex] and [tex]B[/tex], then it’s possible to design a voting system with this property. We could, for example, just sum up the the respective votes of the two voting options, and declare the winner to be whichever option has the lower total.

What happens if there are three voting options, [tex]A, B[/tex] and [tex]C[/tex]? What Arrow’s theorem shows in this case is that in any voting system which respects unanimity and the independence of irrelevant alternatives there must automatically be a voter, [tex]d[/tex], who can act as a dictator for the voting system, in the sense that if [tex]S >_d T[/tex], then [tex]S > T[/tex], no matter how the other voters rank their options!

Stated another way, what Arrow’s theorem shows is that the requirements of respecting unanimity and the independence of irrelevant alternatives are incompatible with a third desirable requirement, namely that the voting system not be a dictatorship.

In full generality, what Arrow’s theorem shows is as follows:

Arrow’s theorem: Suppose we have a voting system to rank [tex]3[/tex] or more voting options. Suppose that system respects unanimity and the independence of independent alternatives. Then there must be a dictator for the voting system.

This theorem ought to shock you. How can the assumptions of unanimity and independence of irrelevant alternatives possibly imply the existence of a dictator?

I’ll now give a short proof of Arrow’s theorem, to give you some feeling for why the theorem is true. However, much of interest can be said about Arrow’s theorem even if you don’t understand the proof, so if you’re so inclined you should feel free to skim or skip the following proof. Of course, those who want to understand Arrow’s theorem deeply should spend some time trying to prove it themselves, before reading the proof in detail.

Proof of Arrow’s theorem: The first step of the proof is to argue that if every voter ranks [tex]S[/tex] either strictly first or strictly last, then [tex]S[/tex]’s social ranking must be either strictly first or strictly last. To see this, we use a proof by contradiction, supposing that in fact [tex]S[/tex]’s social ranking is neither strictly first nor strictly last. That is, we suppose that there exist [tex]T[/tex] and [tex]U[/tex] such that [tex]T \geq S \geq U[/tex]. Suppose we rearrange each voter’s rankings so that [tex]U >_v T[/tex], but without changing the relative ranking of [tex]S[/tex] and [tex]T[/tex], or of [tex]S[/tex] and [tex]U[/tex], so the rearrangement doesn’t affect the fact that [tex]T \geq S[/tex] and [tex]S \geq U[/tex]. (Understanding why such a rearrangement can always be done requires a little thought, and maybe some working out on a separate sheet of paper, which you should do.) Unanimity then requires that [tex]U > T[/tex], which is inconsistent with the fact that [tex]T \geq S \geq U[/tex]. This is the desired contradiction.

Suppose now that we start out with a profile in which [tex]S[/tex] is ranked strictly first by every voter, and thus by unanimity must be ranked strictly first in the social ranking. Suppose we move through the voting population, and for each voter in turn change their ranking for [tex]S[/tex] from strictly first to strictly last. When this has been done for all the voters, unanimity implies that [tex]S[/tex] must be ranked strictly last, and so at some point we see that there must be a voter (who will turn out to be the dictator, [tex]d[/tex]), whose vote change causes [tex]S[/tex] to move from first to last in the social ranking.

Consider the voting profile immediately before [tex]d[/tex] changes their vote. We will say that any profile which has the same rankings for [tex]S[/tex] as this profile is an [tex]S[/tex]-profile. Similarly, an [tex]S'[/tex]-profile is one which has the same rankings for [tex]S[/tex] as the profile just after [tex]d[/tex] changes their vote. It follows from the independence of irrelevant alternatives that the overall ranking of [tex]S[/tex] must be first in any [tex]S[/tex]-profile, and last in any [tex]S'[/tex]-profile.

Suppose now that [tex]T[/tex] and [tex]U[/tex] are voting options that are not equal to [tex]S[/tex]. Suppose [tex]d[/tex] ranks [tex]T[/tex] higher than [tex]U[/tex], i.e., [tex]T >_d U[/tex]. We will show that we must have [tex]T > U[/tex], no matter how the other voters rank [tex]T[/tex] and [tex]U[/tex]. That is, [tex]d[/tex]’s vote dictates that [tex]T[/tex] be ranked above [tex]U[/tex] in the social ranking. To see this, first we rearrange the profile so that it becomes an [tex]S[/tex]-profile, without changing the relative ordering of [tex]T[/tex] and [tex]U[/tex] anywhere, and so not affecting whether [tex]T > U[/tex] or not. We can do this simply by changing each voter’s ranking for [tex]S[/tex] in an appropriate way, placing it either at the top or the bottom of their ranking. Second, we change [tex]d[/tex]’s rankings so that [tex]T >_d S >_d U[/tex]. This can be done without changing the relative ranking of [tex]T[/tex] and [tex]U[/tex], and so does not affect whether [tex]T > U[/tex]. We call the resulting voting profile after these two rearrangements the final profile. Observe that in the final profile we have [tex]S > U[/tex], since if [tex]d[/tex] changes [tex]S[/tex] to be ranked strictly first then we have an [tex]S[/tex]-profile. Similarly, in the final profile we have [tex]T > S[/tex], since if [tex]d[/tex] changes [tex]S[/tex] to be strictly last, then we have an [tex]S'[/tex]-profile. As a result, we have [tex]T > S > U[/tex] in the final profile, and thus [tex]T > U[/tex]. But we constructed the final profile so that [tex]T > U[/tex] holds only if [tex]T > U[/tex] also in the actual voting profile, and so we must have had [tex]T > U[/tex] in the actual voting profile, as we desired to show.

The final step of the proof is to argue that [tex]d[/tex] can also dictate the order of [tex]S[/tex] and [tex]T[/tex], for an arbitrary [tex]T \neq S[/tex]. To see this, pick an element [tex]U[/tex] which is neither [tex]S[/tex] nor [tex]T[/tex], and consider an [tex]S[/tex]-profile in which [tex]U[/tex] is placed strictly last everywhere that [tex]S[/tex] is first, and vice versa. The results of the first and second paragraph of this proof imply that [tex]d[/tex] can change the rank of [tex]U[/tex] from last to first simply by changing their vote. We now apply the same argument as in the paragraph before this, but with [tex]U[/tex] taking the place of [tex]S[/tex], to argue that [tex]d[/tex] can dictate the relative ordering of [tex]S[/tex] and [tex]T[/tex]. QED

Arrow’s theorem is a striking result in the theory of collective decision making. It shows the great advantages that can come by formalizing ideas in a simple mathematical model – the fact that the model can lead to striking unforseen conclusions that entirely change our perception of the phenomenon. More prosaically, by thinking closely about our values and our desired social outcomes, we can try to formalize models of those properties, and then study which models of collective decision making best respect those properties (if, indeed, any such models exists).

What of the implications of Arrow’s theorem for voting? It is, of course, true that the restrictions in Arrow’s theorem can be relaxed, and there is a large literature studying other models of voting and the extent to which they can be made fair. I’m not an expert on this literature, and so won’t comment other than to point out that economists have, of course, not been silent on the matter in the 50 plus years since Arrow’s paper!

My own interest in the subject is due to a more general hobby interest in the problem of collective cognition. In particular, I’m interested in the question of how we can design institutions which result in good collective decision making. It seems that the subject of institutional design is still in its infancy, and I find it remarkable how small a fraction of “institution space” humans have explored. One of the most interesting things about the web, in my opinion, is that it has greatly cut the cost of developing new institutions, and as a result we’re seeing new institutional models, and new types of collective cognition, develop at an incredible rate.

Expander graphs: the complete notes

The full pdf text of my series of posts about expander graphs. Thankyou very much to all the people who commented on the posts; if you’re reading this text, and haven’t seen the comments on earlier posts, I recommend you look through them to see all the alternate proofs, generalizations and so on that people have offered.

Journal club on quantum gravity

The following post is based on some notes I prepared for a journal club talk I’m going to give on quantum gravity in a few hours. A postscript equivalent is here, with a few modifications.

Disclaimer: The whole point of our journal club talks is to give talks on interesting topics about which we are not experts! For me, quantum gravity fits this description in spades. Caveat emptor.


Every physicist learns as an undergraduate (if not before) that we don’t yet have a single theory unifying quantum mechanics and general relativity, i.e., a theory of quantum gravity. What is often not explained is why it is difficult to come up with such a theory. In this journal club I want to ask and partially answer two questions: (1) what makes it so difficult to put quantum mechanics and general relativity together; and (2) what approaches might one take to developing a theory of quantum gravity?

You might wonder if this is an appropriate topic for a forum such as this. After all, none of us here, including myself, are experts on string theory, loop quantum gravity, twistors, or any of the other approaches to quantum gravity that have been proposed and are currently being pursued.

However, we don’t yet know that any of these approaches is correct, and so there’s no harm in going back and thinking through some of the basic aspects of the problem, from asn elementary point of view. This can be done by anyone who knows the rudiments of quantum mechanics and of the general theory of relativity.

If you like, you can view it as trying to solve the problem of quantum gravity without first “looking in the back of the book” to see the best attempted answers that other people have come up with. This procedure of first thinking things through for yourself has the advantage that it is likely to greatly increase the depth of your understanding of other people’s work if you later do investigate topics such as string theory, etc.

A related disclaimer is that I personally know only a miniscule fraction of all the modern thinking on quantum gravity. I prepared this lecture to force myself to think through in a naive way some of the problems involved in constructing a quantum theory of gravity, only pausing occasionally to peek in the back of the book. I won’t try to acknowledge my sources, which were many, but suffice to say that I doubt there’s anything here that hasn’t been thought before. Furthermore, people who’ve thought hard about quantum gravity over and extended period are likely to find much of what I say obvious, naive, absurd, or some combination thereof. Frankly, I don’t recommend that such people look through these notes — they’ll likely find it rather frustrating! For those less expert even than myself, perhaps you’ll find these notes a useful entertainment, and maybe they’ll stimulate you to think further on the subject.

Standard formulations of quantum mechanics and general relativity

Let’s start off by reminding ourselves of the standard formulations used for quantum mechanics and general relativity. I expect that most attendees at this journal club are extremely familiar with the basic principles of quantum mechanics, and, indeed, use them every day of their working lives. You may be rather less familiar with general relativity. I’ve tried to construct the lecture so you can follow the overall gist, even so.

Recall that the standard formulation of quantum mechanics contains the following elements:

  • The postulate that for every physical system there is a state vector in a Hilbert space which provides the most complete possible description of that system.
  • The postulate that the dynamics of a closed quantum system are described by a Hamiltonian and Schroedinger’s equation.
  • The postulate that a measurement on a system is described using an observable, a Hermitian operator acting on state space, which is used to describe measurement according to some rule for: (1) calculating measurement probabilities; and (2) describing the relationship between prior and posterior states.
  • The postulate that the state space for a composite quantum system is built up by taking the tensor product of individual state spaces. In the special case when those systems are indistinguishable, the postulate is modified so that the state space is either the symmetric or antisymmetric subspace of the total tensor product, depending on whether the systems are bosons or fermions.

It’s worth pointing out that this is merely the most common formulation of quantum mechanics. Other formulations are possible, and may be extremely valuable. It’s certainly possible that the right way of constructing a quantum theory of gravity is to start from some different formulation of quantum mechanics. My reason for describing this formulation of quantum mechanics — probably the most commonly used formulation — is so that we’re all working off the same page.

Let’s switch now to discuss general relativity. Recall that the standard formulation of general relativity contains the following elements:

  • The postulate that spacetime is a four-dimensional pseudo-Riemannian manifold, with metric signature (+1,-1,-1,-1).
  • The postulate that material in spacetime is described by a two-index tensor T known as the stress-energy tensor. The stress-energy tensor describes not only thinks like mass and energy, but also describes the transport of mass and energy, so it has aspects that are both static and dynamic.
  • The postulate known as the Einstein field equations: [tex]G = 8\pi T[/tex]. This postulate connects the stress-energy tensor T to the Einstein tensor, G. In its mathematical definition G is fundamentally a geometric object, i.e., it is determined by the “shape” of spacetime. The physical content of the Einstein field equations is therefore that the shape of spacetime is determined by the matter distribution, and vice versa.An interesting point is that because the stress-energy tensor contains components describing the transport of matter, the transport properties of matter are actually determined by the geometry. For example, it can easily be shown that, as a consequence of the Einstein field equations, test particles follow geodesics of spacetime.
  • Since 1998 it has been thought that the Einstein equations need to be modifed, becoming [tex]G+\Lambda g = 8 \pi T[/tex], where g is the metric tensor, and [tex]\Lambda[/tex] is a non-zero constant known as the cosmological constant. Rather remarkably, it turns out that, once again, test particles follow geodesics of spacetime. However, for a given stress-energy tensor, the shape of spacetime will itself be different, and so the geodesics will be different.

In an ideal world, of course, we wouldn’t just unify quantum mechanics and general relativity. We’d actually construct a single theory which incorporates both general relativity and the entire standard model of particle physics. So it’s arguable that we shouldn’t just be thinking about the standard formulation of quantum mechanics, but rather about the entire edifice of the standard model. I’m not going to do that here, because: (1) talking about vanilla quantum mechanics is plenty enough for one lecture; (2) it illustrates many of the problems that arise in the standard model, anyway; and (3) I’m a lot more comfortable with elementary quantum mechanics than I am with the standard model, and I expect much of my audience is, too.

Comparing the elements of general relativity and quantum mechanics

Let’s go through and look at each element in the standard formulations of general relativity and quantum mechanics, attempting as we do to understand some of the problems which arise when we try to unify the two theories.

Before getting started with the comparisons, let me make an aside on my presentation style. Conventionally, a good lecture is much like a good movie or a good book, in that a problem or situation is set up, preferably one involving high drama, the tension mounts, and then the problem is partially or fully resolved. Unfortunately, today is going to be a litany of problems, with markedly little success in resolution, and so the lecture may feel a little unsatisfying for those hoping, consciously or unconsciously, for a resolution.

Spacetime: In standard quantum mechanics, we usually work with respect to a fixed background spacetime of allowed configurations. By contrast, in general relativity, the metric tensor specifying the structure of spacetime is one of the physical variables of the theory. If we follow the usual prescriptions of quantum mechanics, we conclude that the metric tensor itself ought to be replaced by some suitable quantum mechanical observable, or set of observables. If one does this, it is no longer so clear that space and time can be treated as background parameters in quantum mechanics. How, for example, are we supposed to treat Schroedinger’s equation, when the physical structure of time itself is variable? Perhaps we ought to aim for an effective equation of the form

[tex]i \frac{d|\psi\rangle}{d\langle t \rangle} = H |\psi\rangle [/tex]

derived from some deeper underlying theory?

Stress-energy tensor: In general relativity T is used to describe the configuration of material bodies. Standard quantum mechanics tells us that T needs to be replaced by a suitable set of observables. In and of itself this is not obviously a major problem. However, a problem arises (again) in connection with the possible quantization of space and time. As usually understood in general relativity, T is a function of location p on the underlying four-dimensional manifold. The natural analogue in a quantized version is an observable [tex]\hat T(p)[/tex] which is again a function of position on the manifold. However, as described above, it seems likely that p itself should be replaced by some quantum equivalent, and it is not so clear how [tex]\hat T[/tex] ought to be constructed then. One possibility is that [tex]\hat T[/tex] becomes a function of some suitable [tex]\hat p[/tex]. A related problem is that the standard definition of the components of T often involve tangent vectors (essentially, velocity 4-vectors) to the underlying manifold. As for the position, p, perhaps such tangent vectors should be replaced by quantized equivalents.

Einstein field equations (with and without the cosmological constant): Consider the usual general relativistic formulation of the field equations: [tex]G+\Lambda g = 8\pi T[/tex]. The problem with constructing a quantum version ought by now to be obvious: quantum mechanics tells us that the quantities on the left — geometric quantities, to do with the shape of spacetime — are all associated with some notion of a background configuration, ordinarily left unquantized, while the quantities on the right are physical variables that ought to be quantized.

One natural speculation in this vein is that in any quantum theory of gravity we ought to have

[tex] G+\Lambda g = 8 \pi \langle T \rangle[/tex]

as an effective equation of the theory.

Hilbert space and quantum states: There is no obvious incompatability with general relativity, perhaps because it is so unclear which Hilbert space or quantum state one might use in a description of gravitation.

The Hamiltonian and Schroedinger’s equation: As already mentioned, this presents a challenge because it is not so clear how to describe time in quantum gravity. Something else which is of concern is that for many standard physical forms Schroedinger’s equation often gives rise to faster than light effects. In order to alleviate this problem we must move to a relativistic wave equation, or to a quantum field theory.

In this vein, let me mention one natural candidate description for the dynamics of a free (quantum) test particle moving in the background of a fixed (classical) spacetime. First, start with a relativistically invariant wave equation such as the Klein-Gordon equation, which can be used to describe a free spin zero particle,

[tex] -\hbar^2 \frac{\partial^2 \psi}{\partial^2 t} = -\hbar^2 c^2 \nabla^2 \psi + m^2 c^4 \psi,[/tex]

or the Dirac wave equation, which can be used to describe a free spin 1/2 particle,

[tex] i \hbar \frac{\partial \psi}{\partial t} = \left(i \hbar c \alpha \cdot \nabla – \beta mc^2 \right) \psi,[/tex]

where [tex]\alpha_x,\alpha_y,\alpha_z[/tex] and [tex]\beta[/tex] are the four Dirac matrices. In the case of the Klein-Gordon equation there is a clear prescription for how to take this over to a curved spacetime: simply replace derivatives by appropriate covariant derivatives, giving:

[tex] -\hbar^2 \nabla^2_; \psi = m^2 c^2 \psi.[/tex]

In flat spacetime this will have the same behaviour as the Klein-Gordon equation. In a fixed background curved spacetime we would expect this equation to describe a free spin zero test particle.

The same basic procedure can be followed in the case of the Dirac equation, replacing derivatives wherever necessary by covariant derivatives. I have not explicitly checked that the resulting equation is invariantly defined, but expect that it is (exercise!), and can be used to describe a free spin 1/2 test particle in a fixed background curved spacetime. It would be interesting to study the solutions of such equations for some simple nontrivial geometries, such as the Schwarzschild geometry. For metrics with sufficient symmetry, it may be possible to obtain analytic (or at least perturbative) solutions; in any case, it should be possible to investigate these problems numerically.

Of course, although it would be interesting to study this prescription, we should expect it to be inadequate in various ways. We have described a means of studying a quantum test particle moving against a fixed classical background spacetime. In reality: (1) the background may not be classical; (2) the particle itself modifies the background; and (3) because of quantum indeterminancy, the particle may modify the background in different ways. In the language of the many-worlds interpretation, it seems reasonable to expect that the which branch of the wavefunction we are in (representing different particle positions) may have some bearing on the structure of spacetime itself: in particular, different branches will correspond to different spacetimes.

This discussion highlights another significant incompatibility between general relativity and quantum mechanics. In general relativity, we know that test particles follow well-defined trajectories — geodesics of spacetime. This is a simple consequence of the field equations themselves. In quantum mechanics, no particle can follow a well-defined trajectory: the only way this could happen is if the Hamiltonian commuted with the position variables, in which case the particle would be stationary. In any case, this commutation condition can not occur when the momentum contributes to the Hamiltonian, as is typically the case.

Observables: One striking difference between quantum mechanics and general relativity is that the description of measurement is much more complex in the former. Several questions that might arise include:

  • Should wave function collapse occur instantaneously? This depends on how one interprest the wave function.
  • Should measurements be a purely local phenomena, or can we make a measurement across an entire slice of spacetime? Across all of spacetime?
  • Should we worry that in the usual description of measurement, time and space are treated in a manifestly unsymmetric manner?
  • What observables would one expect to have in a quantum theory of gravity?

The tensor product structure and indistinguishable particles:One cause for concern here is that the notion of distinguishability itself is often framed in terms of the spatial separation of particles. If the structure of space itself really ought to be thought of in quantum terms, it is perhaps not so clear that the concepts of distinguishable, indistinguishable, and spatially separated particles even make sense. This may be a hint that in a quantum theory of gravity such concepts may be absent at the foundation, though they would need to emerge as consequences of the theory.

Quantum field theory: So far, we’ve concentrated on identifying incompatabilities between general relativity and quantum mechanics. Of course, fundamental modern physics is cast in terms of an extension of quantum mechanics known as quantum field theory, and it is worth investigating what problems arise when one attempts to unify general relativity with the entire edifice of quantum field theory. We won’t do this in any kind of fullness here, but will make one comment in relation to the canonical quantization procedure usually used to construct quantum field theories. The standard procedure is to start from some classical field equation, such as the wave equation, [tex](\nabla^2 – 1/c^2 \partial^2 / \partial t^2 ) \phi = 0[/tex], to expand the solution as a linear combination of solutions for individual field modes, to regard the different mode coefficients as dynamical variables, and to then quantize by imposing canonical commutation relationships on those variables. This procedure can be carried out for many of the standard field equations, such as the wave equation, the Dirac equation, and the Klein-Gordon equation, because in each case the equation is a linear equation, and thus the solution space has a linear structure. In the case of general relativity, the field equations are nonlinear in what seems like the natural field variables — the metric tensor — and it is not possible to even get started with this procedure. One could, of course, try linearizing the field equations, and starting from there. My understanding is that when this is done the resulting quantum field theory is nonrenormalizable (?), and thus unsatisfactory.


Perhaps the most striking feature of the above discussion is an asymmetry between general relativity and quantum mechanics. Quantum mechanics, like Newton’s laws of motion, is not so much a physical theory as a framework for constructing physical theories, with many important quantities (the state, the state space, the Hamiltonian, the relevant observables) left unspecified. General relativity is much more prescriptive, specifying as it does an equation relating the distribution of material entities to the shape of spacetime, and, as a consequence, controlling the matter-energy dynamics. Once we’ve set up the initial matter-energy distribution and structure of spacetime, general relativity gives us no further control. In the analogous quantum mechanical situation we still have to specify the dynamics, and the measurements to be performed.

There is therefore a sense in which quantum mechanics is a more wideranging and flexible framework than general relativity. This is arguably a bug, not a feature, since one of general relativity’s most appealing points is its prescriptiveness; once we have the Einstein equations, we get everything else for free, in some sense. However, it also suggests that while the right approach may be to extend the quantum mechanical framework to incorporate general relativity, it is exceedingly unlikely that the right approach is to extend general relativity to incorporate quantum mechanics. On the other hand, it may also be that some extension or reformulation of quantum mechanics is necessary to incorporate gravity. Such an extension would have to be carried out rather carefully: results such as Gleason’s theorem show that quantum mechanics is surprisingly sensitive to small changes.

As an aside, let me also take this opportunity to point out something which often bugs me: the widely-made assertion that quantum gravity effects will become important at the Planck length — about [tex]10^{-35}[/tex] meters — and the notion of spacetime will break down at that length. Anyone claiming this, in my opinion, ought to be asked why the notion of mass doesn’t break down at the Planck mass, which has the rather hefty value of about [tex]10^{-8}[/tex] kilograms.

A toy model

Just for fun, let me propose a simple toy model for quantum gravity, inspired by the Klein-Gordon equation. I’m sure this is wrong or inadequate somehow, but after an hour or so’s thought, I can’t yet see why. I include it here primarily as a stimulant to further thought.

The idea is to look for a four-dimensional pseudo-Riemannian manifold M, with metric signature (-,+,+,+), and a function [tex]\psi : M \rightarrow C[/tex], such that the following equations have a solution:

[tex]G + \Lambda g = 8 \pi T [/tex]

[tex] T^{\mu \nu} = v^\mu v^\nu [/tex]

[tex] v^0 = \frac{i\hbar}{2mc^2}( \psi^* \psi^{;0}- \psi \psi^{;0 *})[/tex]

[tex] v^j = \frac{-i\hbar}{2m}( \psi^* \psi^{;j}- \psi \psi^{;j *}),[/tex]

where m, c, [tex]\Lambda[/tex] are all constants with their usual meanings, j = 1,2,3, and the expression for [tex]T^{\mu \nu}[/tex] may need a proportionality constant, probably related to m, out the front. The expressions for [tex]v^0[/tex] and [tex]v^j[/tex] are covariant versions of the corresponding expressions for the charge and current densities associated to the Klein-Gordon equation — see Chapter~13 of Schiff’s well-known text on quantum mechanics (3rd ed., Mc-Graw Hill, 1968); note that Schiff calls this equation the “relativistic Schroedinger equation”. A subtlety is that the covariant derivative itself depends on the metric g, and so these equations are potentially extremely restrictive; it is by no means obvious that a solution ever exists. However, if we take seriously the idea that [tex]T^{\mu \nu}[/tex] needs a proportionality constant related to m, then we can see that in the test particle limit, [tex]m \rightarrow 0[/tex], these equations have as a solution any [tex]\psi[/tex], and flat spacetime, which is not unreasonable.


The picture I have painted is somewhat bleak, which is perhaps not surprising: finding a quantum theory of gravity is not a trivial problem! However, the good news is that many further steps naturally suggest themselves:

  • At many points, my analysis has been incomplete, in that I haven’t thoroughly mapped out a catalogue of all the possible alternatives. A more thorough analysis of the possibilities should be done.
  • The analysis needs to be extended to incorporate modern relativistic quantum field theory.
  • Computer pioneer Alan Kay has said “A change of perspective is worth 80 IQ points”. It would be fruitful to repeat this exercise from the point of view of some of the other formulations people have of general relativity and quantum mechanics. I’d particularly like to do this for the initial value and action formulations of general relativity, and for the quasidistribution and nonlocal hidden variable formulations of quantum mechanics. It may also be useful to attempt to construct modifications of either or both theories in order to solve some of the problems that we’ve described here.
  • Read up on some of the work that other people have done on quantum gravity, from a variety of points of view. Things to learn might include: supersymmetry, string theory, loop quantum gravity, twistors, Euclidean quantum gravity, Hawking radiation, the Unruh effect, the Wheeler-de Witt equation, Penrose’s gravitational collapse, 1- and 2-dimensional quantum gravity, gravitational wave astronomy, work on the cosmological constant, …


Thanks to David Poulin for comments and encouragement.

Principles of Effective Research

Note: The entire essay is here in postscript format.

Principles of Effective Research

By Michael A. Nielsen

July 2004


This essay is intended as a letter to both myself and others, to hold up in the sharpest possible terms an ideal of research I believe is worth working toward. I’ve deliberately limited the essay to 10 pages, hoping that the resulting omissions are compensated by the forced brevity. This is a rather personal essay; it’s not the sort of thing I’d usually make publicly available. I’ve made the essay public in order to heighten my commitment to the project, and in the hope that other people will find it stimulating, and perhaps offer some thoughts of their own.

A few words of warning. My primary audience is myself, and some of the advice is specific to my career situation [*], and therefore may not be directly applicable to others. And, of course, it’s all just my opinion anyway. I hope, however, that it’ll still be stimulating and helpful.

[*] I’m a theoretical physicist; I lead a small research group at a large Australian University; I have a permanent position, with no teaching duties for the next few years; I have several colleagues on the faculty with closely related interests.

The philosophy underlying the essay is based on a famous quote attributed to Aristotle: “We are what we repeatedly do. Excellence, then, is not an act but a habit.” Underlying all our habits are models (often unconscious) of how the world works. I’m writing this essay to develop an improved personal model of how to be an effective researcher, a model that can be used as the basis for concrete actions leading to the development of new habits.

Fundamental principles

The fundamental principles of effective research are extremely similar to those for effectiveness in any other part of life. Although the principles are common sense, that doesn’t mean they’re common practice, nor does it mean that they’re easy to internalize. Personally, I find it a constant battle to act in accord with these principles, a battle requiring ongoing reflection, rediscovery and renewed commitment.

Integrating research into the rest of your life

Research is, of course, only a part of life, and must be understood in relation to the rest of life. The foundation of effective research is a strong motivation or desire to do research. If research is not incredibly exciting, rewarding and enjoyable, at least some of the time, then why not do something else that is? For the purposes of this essay, I’ll assume that you already have a strong desire to do research [*].

[*] People sometimes act or talk as though desire and motivation cannot be changed. Within limits, I think that’s wrong, and we can mold our own motivations. But that’s a subject for another essay.

Motivation and desire alone are not enough. You also need to have the rest of your life in order to be an effective researcher. Make sure you’re fit. Look after your health. Spend high quality time with your family. Have fun. These things require a lot of thought and effort to get right. If you don’t get them right, not only will your life as a whole be less good, your research will suffer. So get these things right, and make sure they’re integrated with your research life.

As an example, I once spent three years co-authoring a technical book, and for the final eighteen months I concentrated on the book almost exclusively, to the neglect of my health, relationships, and other research. It is tempting to ask the question “Was the neglect worth the benefits?” But that is the wrong question, for while the neglect paid short-term dividends in increased productivity, over the total period of writing the book I believe it probably cost me productivity, and it certainly did after the book was complete. So not only did I become less fit and healthy, and see my relationships suffer, the book took longer to complete than if I’d had my life in better order.

Principles of personal behaviour: proactivity, vision, and discipline

I believe that the foundation of effective research is to internalize a strong vision of what you want to achieve, to work proactively towards that vision, taking personal responsibility for successes and failures. You need to develop disciplined work habits, and to achieve balance between self-development and the actual creative research process.

Proactivity and personal responsibility

Effective people are proactive and take personal responsibility for the events in their lives. They form a vision of how they want their life to be, and work toward achieving that vision. They identify problems in their lives, and work toward solutions to those problems.

Isn’t this obvious, banal advice? I heard a story years ago in which a representative from McDonald’s was asked what gave McDonald’s the edge in the fast food industry. They replied that McDonald’s took care of the little things, like making sure that their restaurants and surrounds were always extremely clean. Representatives of other fast food companies replied incredulously that surely that was not the reason McDonald’s did so well, for “anyone could do that”. “But only McDonald’s does” was the response. The heart of personal effectiveness is not necessarily any special knowledge or secret: it is doing the basics consistently well.

When it comes to proactivity and responsibility, it seems to be incredibly difficult to internalize these principles and act on them consistently. Almost everyone says and thinks they are proactive and responsible, but how many of us truly respond to the force of external circumstance in the most proactive manner?

My belief is that the reason it is difficult to be consistently proactive and responsible is that over the short term it is often significantly easier to abdicate responsibility and behave in a reactive fashion. In my opinion, there are three basic ways this can occur.

The first way is to blame external circumstances for our problems. “We don’t have enough grant money.” “I have to teach too much.” “My supervisor is no good.” “My students are no good.” “I don’t have enough time for research.” When challenged on what actions we are taking to rectify the situation, we will claim that it’s the fault of other people, or of circumstances beyond our control, relieving ourselves of the burden of doing anything to solve the problem.

In short, we abdicate responsibility, preferring to blame others. This is easier over the short term, since it’s easier to complain than it is to take action, but is not a recipe for long-term happiness or effectiveness. Furthermore, we will usually deny that it is within our power to take actions to improve our situation. After all, if it was in our power, it would be us who is responsible, and our entire worldview is based upon blaming others for our own problems.

The second way of abdicating responsibility is to get caught up in displacement activities. These may give us a short-term fix, especially if they win us the approbation of other people, perhaps for responding to requests that they label urgent. Over the long run such displacement activities are ultimately unfulfilling, representing time lost from our lives.

The third way of abdicating responsibility is by getting down on yourself, worrying and feeling bad for not overcoming one’s difficulties. Winston Churchill spoke of the “black dog” of depression that overtook him during times when his political career was in eclipse. Personally, I sometimes get really down when things are not going well, and get caught up in a cycle of worry and analysis, without constructively addressing my problems. Of course, the right way to respond to a bad situation is not to beat yourself up, but rather to admit that, yes, things are going badly, to figure out exactly what problems you are facing, write out possible solutions, prioritize and implement them, without getting too worried or hamstrung by the whole process.

Why are these three options so attractive? Why do we so often choose to respond in this way to the challenges of life rather than taking things on with a proactive attitude that acknowledges that we’re responsible for our own life? What all three options share in common is that over the short-term abdicating responsibility for our problems is easier than taking responsibility for meeting the challenges of life.

A specific example that I believe speaks to many of us is when we’re having some sort of difficulty or conflict with another person. How many of us put off confronting the problem, preferring instead to hope that the problem will resolve itself? Yet, properly managed – a difficult thing to do, most likely requiring considerable preparation and aforethought – it’s nearly always better to talk with the person about the problem until you arrive at a mutual understanding of both your points of view, both sets of interests, and can resolve the issue on a basis of shared trust.

How can we learn to become proactive? I don’t know of any easy way. One powerful way is to be inspired by examples of proactive people. This can either be through direct personal contact, or indirectly through biographies, history, movies and so on. I like to set aside regular time for such activities. Another powerful tool for learning proactivity is to remind ourselves regularly of the costs and benefits of proactivity and responsibility versus reactivity and irresponsibility. These costs and benefits are easy to forget, unless you’re constantly being reminded that complaints, self-doubt, blame of others and of self are actually the easy short-term way out, and that chances are that you can construct a better life for yourself, at the cost of needing to do some hard work over the short term.

In the context of research, this means constantly reminding yourself that you are the person ultimately responsible for your research effectiveness. Not the institution you find yourself in. Not your colleagues, or supervisor. Not the society you are living in. All these things influence your research career, and may be either a help or a hindrance (more on that later), but in the final analysis if things are not working well it is up to you to take charge and
change them.


Effective people have a vision of what they’d like to achieve. Ideally, such a vision incorporates both long-term values and goals, as well as shorter-term goals. A good vision answers questions like: What sort of researcher would I like to become? What areas of research am I interested in? How am I going to achieve competence in those areas? Why are those areas interesting? How am I going to continue growing and expanding my horizons? What short-term steps will I take to achieve those goals? How will I balance the long-term goals with the short-term realities of the situation I find myself in? For example, if you’re in a temporary job and need to get another job soon, it’s probably not such a great idea to devote all your time to learning some new subject, without any visible outcome.

A vision is not something you develop overnight. You need to work at it, putting time aside for the process, and learning to integrate it into your everyday life. It’s a challenging process, but over the long run it’s also extremely rewarding. History shows that great actions usually are the outcome of great purpose, even if the action that resulted was not the original purpose. Your vision doesn’t always need to be of a great purpose; it’s good to work on the little stuff, some of the time. But you should occasionally set yourself some big, ambitious goal, a goal that gets you excited, that makes you want to get up in the morning, and where you’ve developed a confidence in your own mind that you have a chance of achieving that goal. Such a great purpose inspires in a way that the humdrum cannot; it makes things exciting and worthwhile if you feel you’re working towards some genuinely worthy end. I believe this is particularly important in the more abstract parts of research (like theoretical physics), where it can require some work to make a personal, emotional connection to one’s own research. Having a clear vision of a great end is one very good way of making such a connection. When you don’t do this, you can get stuck in the rut of the everyday; you need to get out of that rut, to develop a bigger vision.

Finally, a good vision is not inflexible. It’s something that gets changed as you go along, never lightly, but frequently. The importance of having the vision is that it informs your everyday and every week decisions, giving you a genuinely exciting goal to work towards.


Effective people are self-disciplined. They work both hard and smart, in the belief that you reap what you sow. How does one achieve such self-discipline? It’s a difficult problem. Wayne Bennett, one of the most successful coaches in the history of the sport of Rugby League, sums the problem up well when he says “I’ve had more trouble with myself than any other man I’ve ever met”.

It is a tempting but ultimately counterproductive fallacy to believe that self-discipline is merely a matter of will, of deciding what it is that you want to do, and then doing it. Many other factors affect self-discipline, and it’s important to understand those other factors. Furthermore, if you believe that it’s all a matter of willpower then you’re likely to get rather depressed when you fall short, sapping your confidenc, and resulting in less disciplined behaviour.

I now describe three factors important in achieving self-discipline.

The first factor is having clarity about what one wants to achieve, why one wants to achieve it, and how to go about achieving it. It’s easy to work hard if you’re clear about these three things, and you’re excited about what you’re doing. Conversely, I think the main cause of aimlessness and procrastination is when you lack clarity on one or more of these points.

The second factor affecting self-discipline is one’s social environment. Researchers are typically under little immediate social pressure to produce research results. Contrast this with the example of professional athletes, who often have an entire support system of coaches, managers and trainers in place, focused around the task of increasing their effectiveness. When a researcher stays out late, sleeps in, and gets a late start, no-one minds; when a professional athlete does, they’re likely to receive a blast from their coach.

Access to a social environment which encourages and supports the development of research skills and research excellence can make an enormous difference to all aspect of one’s research, including self-discipline. The key is to be accountable to other people. Some simple ways of achieving such accountability are to take on students, to collaborate with colleagues, or to set up mentoring relationships with colleagues.

The third factor affecting self-discipline is a special kind of honesty, honesty to oneself, about oneself. It’s extremely easy to kid ourselves about what we do and who we are. A colleague once told me of a friend of his who for some time used a stopwatch to keep track of how much research work he did each week. He was shocked to discover that after factoring in all the other activities he engaged in each day – interruptions, email, surfing the net, the phone, fruitless meetings, chatting with friends, and so on – he was averaging only half an hour of research per day. I wouldn’t be surprised if this was typical of many researchers. The good news, of course, is that building this kind of awareness lays the foundation for personal change, for achieving congruence between our behavioural goals and how we actually behave, in short, for achieving self-discipline.

Aspects of research: self-development and the creative process

Research involves two main aspects, self-development and the creative process of research. We’ll discuss the specifics of each aspect below, but for now I want to concentrate on the problem of achieving balance between the two, for I believe it is a common and significant mistake to concentrate too much on one aspect to the exclusion of the other.

People who concentrate mostly on self-development usually make early exits from their research careers. They may be brilliant and knowledgeable, but they fail to realize their responsibility to make a contribution to the wider community. The academic system usually ensures that this failure is recognized, and they consequently have great difficulty getting jobs. Although this is an important problem, in this essay I will focus mostly on the converse problem, the problem of focusing too much on creative research, to the exclusion of self-development.

There are a lot of incentives for people to concentrate on creative research to the exclusion of self-development. Throughout one’s research career, but particularly early on, there are many advantages to publishing lots of papers. Within limits, this is a good thing, especially for young researchers: it brings you into the community of researchers; it gives you the opportunity to learn how to write well, and give good presentations; it can help keep you motivated. I believe all researchers should publish at least a few papers each year, essentially as an obligation to the research and wider community; they should make some contribution, even if only a small one, on a relatively unimportant topic.

However, some people end up obsessed with writing as many papers as possible, as quickly as possible. While the short-term rewards of this are attractive (jobs, grants, reputation and prizes), the long-term costs are significant. In particular, it can lead to stagnation, and plateauing as a researcher. To achieve one’s full potential requires a balancing act: making a significant and regular enough research contribution to enable oneself to get and keep good jobs, while continuing to develop one’s talents, constantly renewing and replenishing oneself. In particular, once one has achieved a certain amount of job security (a long-term or permanent job) it may make sense to shift the balance so that self-development takes on a larger role.

For many people (myself included) who have concentrated mainly on making creative research contributions earlier in their careers, this can be a difficult adjustment to make, as it requires changing one’s sense of what is important. Furthermore, there is a constant pull towards concentrating on research over self-development, since there are often short-term incentives to sacrifice self-development for research (“I’ve got to get this paper out now”), but rarely vice versa. To balance these tendencies, we need to remember that nobody, no matter how talented, is born an effective researcher; that distinction can only be obtained after a considerable amount of hard work and personal change, and there is no reason to suppose that just because one is now able to publish lots of papers that one has peaked as a researcher.

In my opinion, creative research is best viewed as an extension of self-development, especially an extension of a well-developed reading program. I don’t believe the two can be completely pried apart, as the two interact in interesting non-linear ways. I’m now going to talk in a little more detail about both processes, keeping in mind that the ultimate goal of research is new ideas, insights, tools and technologies, and this goal must inform the process of self-development.

Developing research strengths

The foundation is a plan for the development of research strengths. What are you interested in? Given your interests, what are you going to try to learn? The plan needs to be driven by your research goals, but should balance short-term and long-term considerations. Some time should be spent on things that appear very likely to lead to short-term research payoff. Equally well, some time needs to be allocated to the development of strengths that may not have much immediate pay-off, but over the longer-term will have a considerable payoff.

In targeting areas of development, an important goal to keep in mind is that you want to develop unique combinations of abilities. You need to develop unique combinations of talents which give you a comparative advantage over other people. Do what you can do better than anybody; to mangle a quote from Lincoln, nobody can be better than everybody all of the time, but anybody can be better than everybody some of the time.

In my opinion the reason most people fail to do great research is that they are not willing to pay the price in self-development. Say some new field opens up that combines field X and field Y. Researchers from each of these fields flock to the new field. My experience is that virtually none of the researchers in either field will systematically learn the other field in any sort of depth. The few who do put in this effort often achieve spectacular results.

Finally, a note on how to go about developing some new research strength. A mistake I’m prone to make, and I know some others are as well, is to feel as though some degree of completeness is required in understanding a research field. In fact, in any given research field there are usually only a tiny number of papers that are really worth reading. You are almost certainly better off reading deeply in the ten most important papers of a research field than you are skimming the top five hundred.

These ideas carry over to the problem of staying current in your fields of interest: I believe that you can stay quite current by (a) quickly skimming a great deal of work, to keep track of what is known, and what sort of problems people are thinking about, and (b) based on that skimming, picking a dozen or so papers each year to read deeply, in the belief that they contain the most important research results of the year. This is not the only deep reading you’ll need to do; you’ll also need to do some which is related to the immediate problems that you’re working on. But you certainly should do some such deep reading.

Develop a high-quality research environment

There is a considerable amount of research showing that people consistently underestimate the effect of the environment on personal effectiveness. This is particularly important in an academic environment where there are usually many short-term social pressures that are not directly related to research effectiveness – teaching, writing letters of recommendation and referee reports, committee work, academic politics. By contrast, in most institutions there are few short-term social pressures to do great research work.

Some of the highest-leverage work you can do involves improving your environment so that social pressures work for you as a researcher, rather than against you. Discussing this in detail would require another essay of length at least equal to that of the present one, but I will make a few remarks.

The first is that improving your environment is something anyone can do; students, in particular, often underestimate the magnitude of the changes they can bring about. Anyone can start a seminar series, develop a discussion area, create a lounge, organize a small workshop, or organize a reading group. Furthermore, although all these things are hard to do well, if you’re willing to do critical evaluations, experiment and try radical changes, preferably in partnership with equally committed people, things are likely to improve a great deal.

Second, institutions have long memories, so changes that you make in your environment will stick around for a long time. This means that once something is working well, chances are it’ll continue to work well without much help from you – and you can move on to improve some other aspect of your environment. Furthermore, each positive change you make actually improves your leverage with other people. I’ve known undergraduate students who had made so many creative positive contributions to their departments that their influence with canny senior faculty was comparable to the influence of other senior faculty.

The creative process

The problem-solver and the problem-creator

Different people have different styles of creative work. I want to discuss two different styles that I think are particularly useful in understanding the creative process. I call these the problem-solver and the problem-creator styles. They’re not really disjoint or exclusive styles of working, but rather idealizations which are useful ways of thinking about how people go about creative work.

The problem-solver: This is the person who works intensively on well-posed technical problems, often problems known (and sometimes well-known) to the entire research community in which they work. The best problem-solvers are often extremely technically proficient and hard-working. Problem-solvers often attach great social cache to the level of difficulty of the problem they solve, without necessarily worrying so much about other indicators of the importance of the problem.

The problem-creator: This is a rarer working style. Problem-creators may often write papers that are technically rather simple, but ask an interesting new question, or pose an old problem in a new way, or demonstrate a simple but fruitful connection that no-one previously realized existed.

Of course, the problem-solver and the problem-creator are idealizations; all researchers exemplify both styles, to some extent. But they are also useful models to clarify our thinking about the creative process. One distinction between the two styles is how proactive one is in identifying problems, with the problem-solver being much more passive, while the problem-creator is extremely proactive. By contrast, the problem-solver needs to be much more proactive in developing their problem-solving skills. Both styles of research can be extremely successful.

Problem-solvers have numerous social advantages in research, and for that reason I believe they tend to be more common. In particular, it is relatively easy to recognize (and then reward) people who solve problems that are of medium or high levels of difficulty. This has rewards both in terms of the immediate esteem of one’s peers – physicists love to trade legends about brilliant colleagues who immediately see through to the solution of some difficult problems or another – and also in the hunt for jobs and other tangible forms of recognition. It takes more time (and thus can be more difficult) to recognize people whose work is technically rather simple, but whose questions may eventually open up whole new lines of enquiry.

The advantage in being a problem-creator is that there is a sizeable comparative advantage in opening up an entirely new problem area, and thus being the first into that problem area. You can work hard to get a basic foundation in the skills needed in that problem area, and then clean up many of the fundamental problems.

The skills of the problem-creator

Our training as physicists focuses pretty heavily on becoming problem-solvers; we tend not to get much training as problem-creators. One reason I’m discussing these two working styles at some length is to dispel the common idea that creative research is necessarily primarily about problem-solving. It’s true that many people have very successful research career as problem-solvers. But you can also consciously decide to invest more time and effort into developing as a problem-creator. I now describe some of the skills involved in problem-creation.

Developing a taste for what’s important: What do you think are the characteristics of important science? What makes one area thrive, while another dies away? What sorts of unifying ideas are the most useful? What have been the most important developments in your field? Why are they important? What were the apparently promising ideas that didn’t pan out? Why didn’t they pan out? You need to be thinking constantly about these issues, both in concrete terms, and also in the abstract, developing both a general feeling for what is important (and what is not), and also some specific beliefs about what is important and what is not in your fields of interest. Richard Hamming describes setting aside time each week for “Great Thoughts”, time in which he would focus on and discuss with others only things that he believed were of the highest importance. Systematically setting aside time to think (and talk with colleagues) about where the important problems are is an excellent way of developing as a problem-creator.

On this topic, let me point out one myth that exerts a powerful influence (often subconsciously) on people: the idea that difficulty is a good indicator of the importance of a problem. It is true that an elegant solution to a difficult problem (even one not a priori important) often contains important ideas. However, I believe that most people consistently over rate the importance of difficulty. Often far more important is what your work enables, the connections that it makes apparent, the unifying themes uncovered, the new questions asked, and so on.

Internal and external standards for what is important: Some of the most thought-provoking advice on physics that I ever heard was at a colloquium given by eminent physicist Max Dresden. He advised young people in the audience not to work towards a Nobel Prize, but instead to aim their research in directions that they personally find fun and interesting. I thought his advice quite sound in some regards: for some people it is extremely tempting to regard external recognition as the be-all and end-all of research success, and the Nobel Prize is perhaps the highest form of external recognition in physics. Dresden is right, in the sense that working with a primary goal of winning a Nobel Prize would be pointless and degrading; far better to work in an area one personally finds enjoyable.

On the other hand, the Nobel Prizes are usually given for very good reasons: they reward some of the most interesting work in all of physics. There is, admittedly, a political element, with certain fields being favoured, and so on. Nonetheless, imagine a world in which one of these discoveries had not been awarded a Prize for some reason. Would you be proud to have your name associated with that discovery, even so, and regard the work on it as time well spent? In every case I can think of, that certainly is the case for me, and I suspect it’s true for most other physicists.

I believe this highlights an interesting point about what makes something interesting and important. A person working toward a Nobel Prize or some other form of external recognition has, in some sense, decided to abdicate their personal decision about what is important and interesting. The external community of physicists (in this case, represented by the Nobel Committee) is what makes their decision: if it might win a Nobel, it’s important.

Balancing this observation, this is not to say that your decision about what is interesting and important should be yours along. People who work in isolation rarely end up making contributions that are all that significant. Your decision about what is important should be informed by others: talk to your peers, find out what they think is important, look in the textbooks and history books and biographies, and, yes, look at what wins prizes (of all sorts).

But at the end of the day you’ve got to form your own independent standards for what is interesting and important and worth doing, and make judgments about where you should be making a contribution, based on those standards. I think better advice from Dresden would have been to aim to produce work of the highest possible caliber, but according to what you have come to believe is important.

Exploring for problems: Obviously, all researchers do some of this. For the problem-solver, the process of exploring for problems often works along the following lines: keep moving around, looking for problems that you consider (a) well-posed, or able to be well-posed after some work on your part, (b) likely to fall within a reasonable time to the arsenal of tools at your disposal (perhaps with some small expansion of that arsenal), and (c) below some minimum thresholds of interest and difficulty. Once you’ve found a problem of this sort, you work hard on the problem, solve it, and publish.

Problem-creators may be rather more systematic about exploring for problems. For example, they may occasionally set time aside to survey the landscape of a field, looking not just for problems, but trying to identify larger patterns. What types of questions do people in the field tend to ask? Can we abstract away patterns in those questions? What other fields might there be links to? What are the few most important problems in the field? Problem-creators set aside time for doing this kind of systematic exploration, and do it in a disciplined way, often with feedback from others.

Surveying the landscape can be particularly revealing. A lot of people work in fashionable subfields of a larger field primarily because there are lots of other people working in that subfield. The problems they work on may be technically complicated, especially after a few years, when the most basic questions have been answered. This is compensated by the fact that it’s extremely comforting to work within a field where there is a standard narrative explaining the importance of the field, some canonical models for what problems are interesting, and a willing audience of people ready to appreciate your work. In addition, working in such subfields gives younger people a chance to show off their technical prowess (sometimes, not unlike elk spoiling for a fight) to peers in a position to recommend them for valuable faculty positions.

Getting ahead of the game: There are many important problems, and sometimes an entire field comes to some agreement about what is important: proving the Riemann Hypothesis, or understanding high temperature superconductivity. Sometimes, however, there is a problem either not appreciated at all, or only dimly appreciated, that is equal in importance to such gems. Consider the creation of the scanning tunneling microscope – the basic idea had been around for years, yet nobody had ever seriously tried to build the device. The inventors put it together on a shoestring, and created one of the major tools of modern physics. Or consider David Deutsch and Richard Feynman’s creation of the field of quantum computing, by framing the right questions (“What would a quantum mechanical computer be capable of?” and “Would it be faster than a classical computer?”). One of the big ways you can get ahead as a researcher is by identifying and then solving problems that are important, but perhaps not terribly difficult, ahead of everyone else.

Identify the messes: In a nice article about how he does research, physicist Steven Weinberg emphasized the importance of identifying the messes. What areas of physics appear to be a state of mess? Funnily enough, one of the signs of this can be that it’s very hard to understand. For a long time – and to some extent this persists today – physics texts on general relativity were very difficult to understand. The tensor calculus in them was often confusing and difficult to understand. There was a good reason for this: the basic definitions in the subject of differential geometry, although laid down in the 19th century, didn’t really reach their modern form until the mid part of the twentieth century, and then took considerable time to migrate to physics. The reason a lot of the discussion of tensor calculus in physics texts is confusing is because, very often, it is confused, being written by people who don’t have quite the right definitions (meaning, in this case, simplest, most elegant and natural) in mind.

When you identify such a mess, the natural inclination of many people is to shy away, to find something that is easier to understand. But a field that is a mess is really an opportunity. Chances are good that there are deep unifying and simplifying concepts still waiting to be understood and developed by someone – perhaps you.

The skills of the problem-solver

As I’ve already said, our technical training as physicists focuses a lot more on problem-solving than problem-creation, so I’m not going to say a lot about the skills needed to be a problem-solver. But I will make a few general remarks that I find helpful.

Clarity, goals, and forward momentum: In my opinion, there is little that is more important in research than building forward momentum. Being clear about some goal, even if that goal is the wrong goal, or the clarity is illusory, is tremendously powerful. For the most part, it’s better to be doing something, rather than nothing, provided, of course, that you set time aside frequently for reflection and reconsideration of your goals. Much of the time in research is spent in a fog, and taking the time to set clear goals can really help lift the fog.

Have multiple formulations: One of the most common mistakes made by researchers is to hold on very closely to a particular problem formulation. They will stick closely to a particular formulation of a problem, without asking if they can achieve insights on related problems. The important thing is to be able to make some progress: if you can find a related problem, or reformulate a problem in a way that permits you to move forward, that is progress.

Spontaneous discovery as the outcome of self-development: For me this is one of the most common ways of making discoveries. Many people’s basic research model is to identify a problem they find interesting, and then spend a lot of time working on just that problem. In fact, if you keep your mind open while engaging in exploration, and are working at the edge of what is known, you’ll often see huge opportunities open wide in front of you, provided you keep developing your range of skills.

Working on important problems

It’s important that you work towards being able to solve important problems. This sounds silly, but people don’t do this for any number of reasons. I want to talk a little about those reasons, and how to avoid them.

Reason 1: Lack of self-development. Many people don’t spend enough time on self-development. If you stop your development at the level which resulted in your first paper, it’s unlikely you’ll solve any major problems. More realistically, for many people self-development is an incidental thing, something that happens while they’re on the treadmill of trying to solve problems, generate papers, and so on, or while teaching. While such people will develop, it’s unlikely that doing so in such an ad hoc way will let them address the most important problems.

Reason 2: The treadmill of small problems. Social factors such as the need to publish, get grants, and so on, encourage people to work only on unimportant problems, without addressing the important problems. This can be a difficult treadmill to get off.

My belief is that the way to start out in a research career is by working primarily on small and relatively tractable problems, where you have a good chance of success. You then continue the process of self-development, gradually working up to more important problems (which also tend to be more difficult, although, as noted above, difficulty is most emphatically not the same as importance). The rare exception is important problems that are also likely to be technically easy; if you’re lucky you may find such a problem early in your career, or be handed one. If so, solve it quickly!

Even later on, when you’ve developed to the point that you can realistically expect to be able to attack important problems, it’s still useful to tackle a mixture of more and less important problems. The reason is that tackling smaller problems ensures that you make a reasonable contribution to science, and that you continue to take an active part in the research community. Even Andrew Wiles continued to publish papers and work on other problems during his work on Fermat’s Last Theorem, albeit at a rather low rate. If he had not, he would have lost contact with an entire research community, and losing such contact would likely have made a significant negative difference to his work on Fermat’s Last Theorem.

Reason 3: The intimidation factor. Even if people have spent enough time on self-development that they have a realistic chance of attacking big problems, they still may not. The reason is that they have a fear of working on something unsuccessfully. Imagine Andrew Wiles feeling if he had worked on Fermat’s Last Theorem for several decades, and completely failed. For most people, the fear of ending up in such a situation is enough to discourage them from doing this.

The great mathematician Andrei Kolmogorov described an interesting trick that he used to get around this problem. Rather than investing all his time and effort on attacking the problem, he’d put the problem into a larger context. He’d announce a seminar series in which he’d lecture on material that he thought would be related to the problem. He’d write a set of lecture notes (often turning into a book) on material related to the problem. That way, he lowered the psychological pressure on himself. Rather than investing all his effort in an attack on the problem – which might ultimately be a complete waste of time – he knew that he’d produce something of value. By making the research process part of a larger endeavour, he ensured that the process was a success no matter how it came out, even if he failed to solve the problem, or was scooped by someone else. It’s a case of not putting all of one’s psychological eggs in one basket.

Richard Feynman described a related trick. He said that when he started working on a problem he would try to convince himself that he had some kooky insight into the problem that it was very unlikely anybody else had. He admitted that very often this belief was erroneous, or that, even if original, his initial insight often wasn’t very good. But he claimed that he found that he could fool himself into thinking that he had the “inside track” on the problem as a result, and this was essential to getting up the forward momentum necessary to really make a big dint in a difficult problem.

Committing to work on an important problem: For the difficult problems, I think commitment is really a process rather than a moment. You may decide to prepare a lecture to talk about a problem. If that is interesting, you enjoy it, and you feel like you have some insight, you might decide to prepare a few lectures. If that goes well, perhaps you’ll start to prepare more lectures, write a review, and maybe make a really big contribution. It’s all about building up more and more insight. Ideally, you’ll do this as part of some larger process, with social support around you.

People who only attack difficult problems: There is a converse to the problem I’ve been talking about, which is people who are only interested in attacking problems that are both difficult and important. This affliction can affect people at any stage of their career, but it manifests itself in somewhat different ways at different stages.

In the case of the beginner, this is like a beginning pole vaulter insisting on putting the bar at 5 meters from the time they begin, rather than starting at some more reasonable height. Unless exceptionally pigheaded, such a person will never learn to vault 5 meters successfully, simply because they will never learn anything from failure at a more realistic starting height. This sounds
prima facie ridiculous, but I have seen people burn out by following exactly this strategy.

The case of the more experienced researcher is more difficult. As I’ve emphasized, once you’ve reached an appropriate level of development I think it’s important to spend some time working on the most important problems. But if that’s all you do, there are some very significant drawbacks. In particular, by attacking only the most important and most difficult problems an experienced researcher (a) takes themselves out of circulation, (b) stops making ongoing contributions, (c) loses the habit of success, and (d) risks losing morale, which is so important to research success. I think the solution is to balance one’s work on the more and less important problems: you need to schedule time to do the more important stuff, but should also make sure that you spend some time on less high-risk activities.

In both cases, the explanation is often, at least in part, intellectual macho. Theorists can be a pretty judgmental lot about the value of their own work, and the work of others. This helps lead some into the error of only working on big problems, and not sometimes working on little problems, for the fun of it, for the contact it brings with colleagues, and for the rewarding and enjoyable sense of making a real contribution of some significance.

This essay has been translated into Chinese by Buhua Liu.

Technical notes on Bloch’s theorem and Bravais lattices

[pdf]: Bloch’s theorem and Bravais lattices

More technical notes, this time on a completely different topic: Bloch’s theorem. Bloch’s theorem gives some powerful general information about the eigenstates of a Hamiltonian which respects the symmetry of some lattice. I’m trying to learn the basic principles of condensed matter physics at the moment, and Bloch’s theorem appears to be one of the foundation stones, thus these notes.

Previous notes

[pdf]: On linear matrix equations

Technical notes on linear matrix equations

[pdf]: On linear matrix equations

These are technical notes. As the title indicates, they’re not terribly likely to cause spontaneous celebrations in the streets, but contain material that may be of interest to people interested in quantum process tomography, the Solovay-Kitaev theorem, or quantum state estimation. Despite the format, these are not intended as a paper, and there’s little original content; they are primarily notes about work by other people that I think is interesting.