Zeroth order approximation

My friend Ben Schumacher has started a blog, Zeroth order approximation. Actually, Ben started it a few months back, and has been merrily posting away, and I simply discovered his blog yesterday. This was great for me, because it meant I got to read several months of back posts.

Here’s a little sample:

But you know, so much of academic writing is bad. It is banal, orotund, unmusical, and stuffed with wads of unnecessary jargon. It is the sort of writing that does more to obscure meaning than to convey it. I see this stuff almost every day. I swim in it. OK, maybe I do exaggerate, a little. After all, I teach at a liberal arts college that is moderately well-known for teaching people how to write. Our faculty is full of novelists and poets and whatnot. But let me tell you, it’s here, too. It’s everywhere. It is like a fungus growing over all things, blurring their shapes — the verbal equivalent, maybe, of the ivy on academic buildings. And like the ivy, I guess, its main purpose is to conceal the shabby edifices beneath.

I live in fear that one day I will write ponderous, weedy, soporific academic prose. And worse, I fear that a day will come when I will simply not know the difference.

Published
Categorized as General

Changing times

What’s the greatest single human artifact in the world?

At different points in history I think my answer would probably be objects like the Pyramids of Egypt, the Great Wall, the Suez Canal, or the Hoover Dam.

Nowadays, though, it’s not even an object, exactly. I can’t think of anything that tops the Google database.

Published
Categorized as General

The Solovay-Kitaev algorithm

New paper: “The Solovay-Kitaev algorithm”, jointly with Chris Dawson, submitted as a tutorial / review article to the journal “Quantum Information and Computation”. I expect the paper to appear at the preprint arxiv this coming Monday, as I write.

The Solovay-Kitaev theorem is an important result in quantum computing. In a classical computer, it’s well-known that the same computation can be built up out of many different basic gate sets. For example, you can build up a circuit to add two numbers either using OR and NOT gates, or out of NAND gates alone. Furthermore, you can use OR and NOT gates to simulate NAND gates (or vice versa), so, roughly speaking, these two different circuits can have the same size, up to some constant of proportionality.

In the case of quantum computers, things are more complicated. The set of possible quantum gates forms a continuum, and it’s not necessarily possible to use one gate set to simulate another exactly. Instead, some approximation may be necessary.

The Solovay-Kitaev theorem guarantees that different universal gate sets can simulate one another exceedingly quickly, to a very good approximation. From a practical point of view, this is important in compiling quantum algorithms (like Shor’s) into a form that can be implemented fault-tolerantly. From a more mathematical point of view, the Solovay-Kitaev theorem is a remarkable general statement about how quickly the group SU(d) is “filled in” by a universal set of gates.

This paper aims to provide a simple exposition of the Solovay-Kitaev theorem, which we hope is accessible to people who’ve just started working on quantum computing. We explain the proof in the form of an algorithm (just nine simple lines of pseudocode!) which, given a goal unitary operation U, builds up a good approximation to U using some prespecified set of universal gates. Chris has also prepared an online implementation of the algorithm.

Small world networks

Attention conservation notice: Just over 2000 words, requires a little graph theory.

Just for fun, I’ve been doing a little reading on small world networks, a topic that started to attract a lot of interest after a 1998 paper in Nature, by Duncan Watts and Stephen Strogatz.

The idea of small world networks goes back to (among other places) the well-known experiments demonstrating “six degrees of separation”, conducted in the 1960s by the astonishing psychologist Stanley Milgram. This is the idea that while you personally only know a relatively small sample of the total population, your extended friendship network – the people who are “friends of a friend of a…” – grows extremely rapidly as the number of links in the chain increases. Milgram’s results suggested that most of the US population is within six degrees of separation from one another, i.e., that it’s truly a small world.

(It is worth noting that doubts have been cast recently on Milgram’s results, notably by Judith Kleinfield. See the account in Duncan Watts’ recent book “Six Degrees”.)

The use of “networks” in the term “small world networks” seems to me a bit peculiar, since, at least among the people I know, the mathematical object being referred to as a network is what we’d call a graph, consisting of a set of vertices, V, and a set of edges, E, which are just pairs of vertices. I’ll use the terms “small world network” and “small world graph” interchangeably, usually preferring the former in applications, and the latter in describing mathematical objects.

What is a small world network? Stated another way, what does it mean to say that a graph has the small world property?

It’s a graph G = (V,E) with two properties that a priori might appear to be contradictory: (1) typical pairs of vertices in the graph have short paths between them, and (2) the graph is highly clustered.

To make these ideas precise, Watts and Strogatz introduce mathematical definitions for the average path length and clustering in a graph.

The average path length is defined as you would expect: define the length [tex]l(v,w)[/tex] between two vertices, v and w, as the minimum number of edges that need to be traversed to pass from v to w (or vice versa). Then the average path length L is simply the average value of [tex]l(v,w)[/tex], with v and w chosen uniformly at random,

[tex]L \equiv \frac{\sum_{vw} l(v,w)}{{n \choose 2}}, [/tex]

where the sum is over all distinct pairs of vertices, v and w, and n is the number of vertices in G. Note that for these definitions to make sense we require that the graph be connected.

Watts and Strogatz’ clustering coefficient is defined in a less intuitive fashion. The goal is to define a quantity C which varies between 0 (unclustered) and 1 (highly clustered).

To do this, first fix a vertex, v. Suppose we consider the set of vertices S which are neighbours of v. Suppose S has k elements. How connected are the vertices in S to one another? If every vertex in S is connected to every other vertex in S, then there are [tex]{k \choose 2}[/tex] edges between the elements of S. In such a case we would say that the graph is highly clustered about the vertex v. We capture this idea by defining the clustering coefficient near v, [tex]C_v[/tex], to be the number of edges between vertices in S, divided by the maximum possible number, [tex]{k \choose 2}[/tex]. The clustering coefficient for the entire graph is then defined to be the average of [tex]C_v[/tex] over all vertices in the graph.

Obviously, the clustering coefficient varies between 0 and 1, with a value near 0 meaning that most of the vertices connected to any given vertex v are not connected to each other; conversely, a value near 1 mean that most of the neighbours of any given vertex v will tend to be connected to one another.

I must admit, I harbour doubts both about how mathematically natural the clustering coefficient is, andabout how well it captures the concept of the graph being highly clustered. In particular, I would not be surprised to discover that there are other quantities which capture the same intuition, but which are far more powerful mathematically as tools for investigating small world graphs. Nonetheless, this was the definition that Watts and Strogatz used, and I will follow their lead here.

Several questions about small world graphs naturally suggest themselves:

  1. Do graphs with the small world property exist? The answer is yes, or I wouldn’t be writing these notes, but it’s not a priori obvious that this ought to be the case. How can we construct examples of graphs with the small world property?
  2. How commonly do small networks arise in nature? Might they arise in collaboration networks for movie actors, musicians, or scientists, in friendship networks, or link patterns in the internet? Under what general circumstances can small world networks arise?
  3. What, if any, consequences follow from a network being a small world network? E.g., if people’s network of acquaintances forms a small world network, does this have any implications for the spread of disease, or of memes?
  4. Can we formulate a mathematically interesting theory of small world graphs?

Constructing small world graphs

Watts and Strogatz suggest a simple method for constructing graphs with the small world property. The method is as follows.

We start by imagining a set of n vertices arranged in a ring in the plane.

Now connect each vertex to its k nearest-neighbours, in both the clockwise and anti-clockwise directions. Here, k is just some small constant, e.g., k = 2 or k = 3, whose exact value doesn’t much matter.

We complete the construction of the graph by adding a small amount of disorder. More precisely, let p > 0 be a small parameter. For every single edge in the lattice, with probability p we leave it alone, and with probability 1-p we “rewire” the edge, changing one of the end vertices to a randomly chosen vertex.

What happens to the average length and clustering coefficient in this graph as p is increased away from zero?

At p = 0 we see that the average path length between two vertices is roughly n / 4k, where n is the number of vertices in the graph. The reason is that if we choose two vertices at random on the ring, then typically they’ll be about a quarter of the way around the ring from one another, and we can get from one to the other in roughly n / 4k hops, hopping k vertices at a time.

As p increases away from zero, however, the average path length drops extremely quickly.

One way of seeing this is to do a computer simulation. Watts and Strogatz do this, finding that the average path length plummets, quickly becoming logarithmic in n, rather than linear in n.

Another way of seeing that path lengths should drop very quickly when disorder is introduced is the following heuristic argument, inspired by the renormalization group technique from statistical physics.

Suppose we subdivide the vertices in the graph up into groups of [tex]m \approx 2 / p,[/tex] vertices. We will treat those groups as single vertices in a new graph, G’, and define two vertices in G’ to be connected if the corresponding groups of vertices in G are connected through at least one edge.

A little thought about the “rewiring” prescription above shows that vertices in G’ are typically connected to approximately 2 other randomly chosen vertices in G’. There will also be some additional edges between neighbouring elements of G’, but these won’t make any difference in the subsequent analysis, and we’ll simply ignore them.

It follows from the presence of these random edges in G’ that the average path length in G’ scales roughly as [tex]\log_2(q)[/tex], where [tex]q = n / m[/tex] is the number of vertices in G’. It follows that average path lengths in G must scale roughly as [tex]m \log_2 (n/m) \approx 2 \log_2(np/2) / p[/tex].

Define [tex]N \equiv np[/tex], so, up to a small constant of proportionality, N scales as the average number of rewirings in the graph. When [tex]N > 2[/tex] we see that the average path length scales as:

[tex]\frac{2 \log N }{N} n. [/tex]

Thus, as N rises, we see that the linear term in the average path length is very rapidly suppressed. This occurs for any value of p much larger than 1/n.

It ought to be noted that this analysis is quite general: we didn’t make use of any special properties of the initial graph. In general, then, it follows that adding a small amount of disorder to any graph will strongly suppress any linear component in the average path length. In the event it hasn’t already been done, it’d be fun to extend this analysis using a more detailed renormalization group style analysis.

The behaviour of the clustering coefficient is quite different. At p = 0 the clustering coefficient is obviously quite close to one. But as disorder is added, and p is increased, the clustering coefficient remains close to one, changing only very slowly.

Watts and Strogatz show this using computer simulations, but the fundamental reason isn’t difficult to understand: even when a little bit of disorder is added, most edges in the graph are not changed, and so most clusters in the graph are not affected.

In other words, when p is small but non-zero, the resulting graph is highly clustered, but also has short average path lengths. In short, it has the small world property.

Why is this interesting?

Why is this type of graph interesting? Sure, the average path length drops rapidly while the clustering coefficient remains high. But so what? It’s surely not that difficult to come up with other graph properties, some of which change precipitously when disorder is added to a regular graph, and others of which change only very slightly.

There’s a couple of reasons to care.

First, Watts and Strogatz (and many others) have shown that there are a lot of small world networks which arise in nature. Many different types of human (and non-human) relationship network have the small world property. So too do many artificial networks, like the power grid, or the pattern of links on the internet.

This is not surprising: based on the argument above we’d expect that any network which starts out highly clustered, and which has a little disorder added, will have the small world property. Most natural systems are noisy, so we’d expect that pretty much any highly clustered network arising in nature would possess the small world property.

Of course, the fact that small world networks are common doesn’t necessarily make them interesting. Are there any interesting consequences which follow from a network having the small world property?

As an example where this is the case, Watts and Strogatz present a simple model of the spread of infectious diseases through a network, and argue that diseases ought to spread much more rapidly in a small world network than they would through a network that does not have the small world property. It’s not difficult to think up other potential consequences of a network having the small world property.

More generally, there’s an extremely interesting idea here: the idea that we can understand things about the function of a network (i.e., its dynamics), simply by looking at its static structure. This is, I gather, quite an old idea, in certain specific contexts – epidemiology, social networks – but the Watts and Strogatz paper emphasizes that it may be possible to study this connection from a more abstract point of view, and reach more general conclusions, while at the same time making the connection to concrete examples.

Analogous ideas are, of course, used in physics, where simple static properties of a system (e.g., its dimensionality) can be used to make predictions about its dynamical properties (e.g., whether it can undergo a phase transition at nonzero temperature). But it’s by no means so clear a priori that the same idea can be successfully applied to more complex systems, and I think that one nice thing about the Watts and Strogatz paper is that it suggests a wide class of examples where this can potentially be done.

Indeed, these ideas suggest an entire broad research program, with several interlocking tasks:

  • Building a taxonomy of static and dynamic properties whose presence or absence in a network can be easily checked. Some of these properties might be rather specialized, while others might be quite generally applicable in all networks.
  • Collecting data about real networks.
  • Guessing at possible connections between static and dynamic properties.
  • Checking whether these connections actually hold, both through empirical work, and through theoretical argument.
  • Ideally, this will then allow us to make predictions about the properties of other networks, and perhaps exert some kind of control over those properties.

For me personally, I can see a lot of interesting avenues over on the theoretical side, developing the appropriate mathematical tools to illuminate these kinds of questions. Of course, I’m very ignorant of what has already been done in this vein, and it would be useful to spend some time (a) figuring out what the important problems to address are, and (b) learning a bit about what has already been done.