Concentration of measure
Abstract discussion
Concentration of measure can be loosely described as the following principle.
- If [math]\displaystyle{ f }[/math] is a function that depends sufficiently continuously on random variables [math]\displaystyle{ X_1,\dots,X_n }[/math] that are sufficiently bounded and sufficiently independent, then the probability that [math]\displaystyle{ f-\mathbb{E}f }[/math] is large is very small.
Examples
Here are some examples, which give an idea of the principle in action.
Example 1: sums of independent Bernoulli variables
Let [math]\displaystyle{ X_1,\dots,X_n }[/math] be independent Bernoulli random variables, with probability p of equalling 1. Let [math]\displaystyle{ f(X_1,\dots,X_n)=n^{-1}(X_1+\dots+X_n). }[/math] Then the expectation of [math]\displaystyle{ f }[/math] is p. The distribution of f is binomial (up to the scaling factor of [math]\displaystyle{ n^{-1} }[/math]). The variance of f is [math]\displaystyle{ n^{-1}p(1-p), }[/math] so we expect deviations from the mean of order of magnitude [math]\displaystyle{ n^{-1/2}. }[/math] It turns out that deviations of significantly more than this are extremely unlikely: it can be shown that the probability that [math]\displaystyle{ n^{-1}(X_1+\dots+X_n) }[/math] differs from p by more than [math]\displaystyle{ tn^{-1/2} }[/math] is at most [math]\displaystyle{ \exp(-ct^2) }[/math] for some absolute constant [math]\displaystyle{ c\gt 0. }[/math] In particular, the probability that [math]\displaystyle{ n^{-1}(X_1+\dots+X_n) }[/math] differs from p by more than a constant amount [math]\displaystyle{ \delta }[/math] is at most [math]\displaystyle{ \exp(-c\delta^2n). }[/math]
Many of the comments have implicitly used this fact. For example, if you pick a random point of [math]\displaystyle{ [3]^n, }[/math] then the number of 1s will be a sum of n independent Bernoulli variables with [math]\displaystyle{ p=1/3, }[/math] as will the number of 2s and the number of 3s. Therefore, if you choose a random sequence, the probability is at most [math]\displaystyle{ 3\exp(-c\delta^2n) }[/math] that any of these numbers will differ from [math]\displaystyle{ n/3 }[/math] by more than [math]\displaystyle{ \delta n. }[/math]
A considerably more general fact than this follows from Azuma's inequality.
Example 2: Lipschitz functions on the sphere
Let x be a random point in the unit sphere of [math]\displaystyle{ \mathbb{R}^n, }[/math] and let f be a 1-Lipschitz function on the sphere (with respect to the spherical distance, say). Then it is a consequence of Lévy's isoperimetric inequality on the sphere that [math]\displaystyle{ \mathbb{P}[|f(x)-\mathbb{E}f|\geq\delta]\leq 2\exp(-c\delta^2n). }[/math]
We can relate this to the general principle above by thinking of f as a function of the coordinates [math]\displaystyle{ X_1,X_2,\dots,X_n }[/math] of x. These coordinates are by no means independent, but they do enjoy a high degree of approximate independence.
A quick sketch of how this statement relates to the isoperimetric inequality: the set of x such that f(x) is at most the median of f has measure at least 1/2. If we now expand this set by [math]\displaystyle{ \delta }[/math] (that is, we take all points within [math]\displaystyle{ \delta }[/math] of the set) then we obtain a set on which f(x) is not more than [math]\displaystyle{ \delta }[/math] above its median. The isoperimetric inequality tells us that the set of measure at least 1/2 with smallest expansion is a hemisphere, and a simple calculation shows that the measure of the complement of the expansion of a hemisphere is at most [math]\displaystyle{ \exp(-c\delta^2n). }[/math] Therefore, with very high probability f is at most its median plus [math]\displaystyle{ \delta. }[/math] A similar argument gives a bound in the other direction. And if f is concentrated about its median, then its mean must be close to its median.