Attention conservation notice: Just over 2000 words, requires a little graph theory.
Just for fun, I’ve been doing a little reading on small world networks, a topic that started to attract a lot of interest after a 1998 paper in Nature, by Duncan Watts and Stephen Strogatz.
The idea of small world networks goes back to (among other places) the well-known experiments demonstrating “six degrees of separationâ€, conducted in the 1960s by the astonishing psychologist Stanley Milgram.  This is the idea that while you personally only know a relatively small sample of the total population, your extended friendship network – the people who are “friends of a friend of a…†– grows extremely rapidly as the number of links in the chain increases.   Milgram’s results suggested that most of the US population is within six degrees of separation from one another, i.e., that it’s truly a small world.
(It is worth noting that doubts have been cast recently on Milgram’s results, notably by Judith Kleinfield.  See the account in Duncan Watts’ recent book “Six Degreesâ€.)
The use of “networks†in the term “small world networks†seems to me a bit peculiar, since, at least among the people I know, the mathematical object being referred to as a network is what we’d call a graph, consisting of a set of vertices, V, and a set of edges, E, which are just pairs of vertices.  I’ll use the terms “small world network†and “small world graph†interchangeably, usually preferring the former in applications, and the latter in describing mathematical objects.
What is a small world network?  Stated another way, what does it mean to say that a graph has the small world property?
It’s a graph G = (V,E) with two properties that a priori might appear to be contradictory: (1) typical pairs of vertices in the graph have short paths between them, and (2) the graph is highly clustered.
To make these ideas precise, Watts and Strogatz introduce mathematical definitions for the average path length and clustering in a graph.
The average path length is defined as you would expect: define the length [tex]l(v,w)[/tex] between two vertices, v and w, as the minimum number of edges that need to be traversed to pass from v to w (or vice versa).  Then the average path length L is simply the average value of [tex]l(v,w)[/tex], with v and w chosen uniformly at random,
[tex]L \equiv \frac{\sum_{vw} l(v,w)}{{n \choose 2}}, [/tex]
where the sum is over all distinct pairs of vertices, v and w, and n is the number of vertices in G.  Note that for these definitions to make sense we require that the graph be connected.
Watts and Strogatz’ clustering coefficient is defined in a less intuitive fashion. The goal is to define a quantity C which varies between 0 (unclustered) and 1 (highly clustered).
To do this, first fix a vertex, v.  Suppose we consider the set of vertices S which are neighbours of v.  Suppose S has k elements.  How connected are the vertices in S to one another?  If every vertex in S is connected to every other vertex in S, then there are [tex]{k \choose 2}[/tex] edges between the elements of S.  In such a case we would say that the graph is highly clustered about the vertex v.   We capture this idea by defining the clustering coefficient near v, [tex]C_v[/tex], to be the number of edges between vertices in S, divided by the maximum possible number, [tex]{k \choose 2}[/tex].   The clustering coefficient for the entire graph is then defined to be the average of [tex]C_v[/tex] over all vertices in the graph.
Obviously, the clustering coefficient varies between 0 and 1, with a value near 0 meaning that most of the vertices connected to any given vertex v are not connected to each other; conversely, a value near 1 mean that most of the neighbours of any given vertex v will tend to be connected to one another.
I must admit, I harbour doubts both about how mathematically natural the clustering coefficient is, andabout how well it captures the concept of the graph being highly clustered.  In particular, I would not be surprised to discover that there are other quantities which capture the same intuition, but which are far more powerful mathematically as tools for investigating small world graphs.  Nonetheless, this was the definition that Watts and Strogatz used, and I will follow their lead here.
Several questions about small world graphs naturally suggest themselves:
- Do graphs with the small world property exist?  The answer is yes, or I wouldn’t be writing these notes, but it’s not a priori obvious that this ought to be the case.  How can we construct examples of graphs with the small world property?
- How commonly do small networks arise in nature?  Might they arise in collaboration networks for movie actors, musicians, or scientists, in friendship networks, or link patterns in the internet? Under what general circumstances can small world networks arise?
- What, if any, consequences follow from a network being a small world network?  E.g., if people’s network of acquaintances forms a small world network, does this have any implications for the spread of disease, or of memes?
- Can we formulate a mathematically interesting theory of small world graphs?
Constructing small world graphs
Watts and Strogatz suggest a simple method for constructing graphs with the small world property.   The method is as follows.
We start by imagining a set of n vertices arranged in a ring in the plane.
Now connect each vertex to its k nearest-neighbours, in both the clockwise and anti-clockwise directions.  Here, k is just some small constant, e.g., k = 2 or k = 3, whose exact value doesn’t much matter.
We complete the construction of the graph by adding a small amount of disorder.  More precisely, let p > 0 be a small parameter.  For every single edge in the lattice, with probability p we leave it alone, and with probability 1-p we “rewire†the edge, changing one of the end vertices to a randomly chosen vertex.
What happens to the average length and clustering coefficient in this graph as p is increased away from zero?
At p = 0 we see that the average path length between two vertices is roughly n / 4k, where n is the number of vertices in the graph.  The reason is that if we choose two vertices at random on the ring, then typically they’ll be about a quarter of the way around the ring from one another, and we can get from one to the other in roughly n / 4k hops, hopping k vertices at a time.
As p increases away from zero, however, the average path length drops extremely quickly.
One way of seeing this is to do a computer simulation.  Watts and Strogatz do this, finding that the average path length plummets, quickly becoming logarithmic in n, rather than linear in n.
Another way of seeing that path lengths should drop very quickly when disorder is introduced is the following heuristic argument, inspired by the renormalization group technique from statistical physics.
Suppose we subdivide the vertices in the graph up into groups of [tex]m \approx 2 / p,[/tex] vertices.   We will treat those groups as single vertices in a new graph, G’, and define two vertices in G’ to be connected if the corresponding groups of vertices in G are connected through at least one edge.
A little thought about the “rewiring†prescription above shows that vertices in G’ are typically connected to approximately 2 other randomly chosen vertices in G’.  There will also be some additional edges between neighbouring elements of G’, but these won’t make any difference in the subsequent analysis, and we’ll simply ignore them.
It follows from the presence of these random edges in G’ that the average path length in G’ scales roughly as [tex]\log_2(q)[/tex], where [tex]q = n / m[/tex] is the number of vertices in G’.  It follows that average path lengths in G must scale roughly as [tex]m \log_2 (n/m) \approx 2 \log_2(np/2) / p[/tex].
Define [tex]N \equiv np[/tex], so, up to a small constant of proportionality, N scales as the average number of rewirings in the graph.  When [tex]N > 2[/tex] we see that the average path length scales as:
[tex]\frac{2 \log N }{N} n. [/tex]
Thus, as N rises, we see that the linear term in the average path length is very rapidly suppressed.  This occurs for any value of p much larger than 1/n.
It ought to be noted that this analysis is quite general: we didn’t make use of any special properties of the initial graph.  In general, then, it follows that adding a small amount of disorder to any graph will strongly suppress any linear component in the average path length.  In the event it hasn’t already been done, it’d be fun to extend this analysis using a more detailed renormalization group style analysis.
The behaviour of the clustering coefficient is quite different.  At p = 0 the clustering coefficient is obviously quite close to one.  But as disorder is added, and p is increased, the clustering coefficient remains close to one, changing only very slowly.
Watts and Strogatz show this using computer simulations, but the fundamental reason isn’t difficult to understand: even when a little bit of disorder is added, most edges in the graph are not changed, and so most clusters in the graph are not affected.
In other words, when p is small but non-zero, the resulting graph is highly clustered, but also has short average path lengths.  In short, it has the small world property.
Why is this interesting?
Why is this type of graph interesting?  Sure, the average path length drops rapidly while the clustering coefficient remains high.  But so what?  It’s surely not that difficult to come up with other graph properties, some of which change precipitously when disorder is added to a regular graph, and others of which change only very slightly.
There’s a couple of reasons to care.
First, Watts and Strogatz (and many others) have shown that there are a lot of small world networks which arise in nature.  Many different types of human (and non-human) relationship network have the small world property.  So too do many artificial networks, like the power grid, or the pattern of links on the internet.
This is not surprising: based on the argument above we’d expect that any network which starts out highly clustered, and which has a little disorder added, will have the small world property.  Most natural systems are noisy, so we’d expect that pretty much any highly clustered network arising in nature would possess the small world property.
Of course, the fact that small world networks are common doesn’t necessarily make them interesting.  Are there any interesting consequences which follow from a network having the small world property?
As an example where this is the case, Watts and Strogatz present a simple model of the spread of infectious diseases through a network, and argue that diseases ought to spread much more rapidly in a small world network than they would through a network that does not have the small world property.  It’s not difficult to think up other potential consequences of a network having the small world property.
More generally, there’s an extremely interesting idea here: the idea that we can understand things about the function of a network (i.e., its dynamics), simply by looking at its static structure.   This is, I gather, quite an old idea, in certain specific contexts – epidemiology, social networks – but the Watts and Strogatz paper emphasizes that it may be possible to study this connection from a more abstract point of view, and reach more general conclusions, while at the same time making the connection to concrete examples.
Analogous ideas are, of course, used in physics, where simple static properties of a system (e.g., its dimensionality) can be used to make predictions about its dynamical properties (e.g., whether it can undergo a phase transition at nonzero temperature).  But it’s by no means so clear a priori that the same idea can be successfully applied to more complex systems, and I think that one nice thing about the Watts and Strogatz paper is that it suggests a wide class of examples where this can potentially be done.
Indeed, these ideas suggest an entire broad research program, with several interlocking tasks:
-  Building a taxonomy of static and dynamic properties whose presence or absence in a network can be easily checked.  Some of these properties might be rather specialized, while others might be quite generally applicable in all networks.  
-  Collecting data about real networks.
- Guessing at possible connections between static and dynamic properties.
-  Checking whether these connections actually hold, both through empirical work, and through theoretical argument.
- Ideally, this will then allow us to make predictions about the properties of other networks, and perhaps exert some kind of control over those properties.
For me personally, I can see a lot of interesting avenues over on the theoretical side, developing the appropriate mathematical tools to illuminate these kinds of questions.  Of course, I’m very ignorant of what has already been done in this vein, and it would be useful to spend some time (a) figuring out what the important problems to address are, and (b) learning a bit about what has already been done.