# Concentration of measure

## Contents

## Abstract discussion

Concentration of measure can be loosely described as the following principle.

- If [math]f[/math] is a function that depends sufficiently continuously on random variables [math]X_1,\dots,X_n[/math] that are sufficiently bounded and sufficiently independent, then the probability that [math]f-\mathbb{E}f[/math] is large is very small.

## Examples

Here are some examples, which give an idea of the principle in action.

#### Example 1: sums of independent Bernoulli variables

Let [math]X_1,\dots,X_n[/math] be independent Bernoulli random variables, with probability p of equalling 1. Let [math]f(X_1,\dots,X_n)=n^{-1}(X_1+\dots+X_n).[/math] Then the expectation of [math]f[/math] is p. The distribution of f is binomial (up to the scaling factor of [math]n^{-1}[/math]). The variance of f is [math]n^{-1}p(1-p),[/math] so we expect deviations from the mean of order of magnitude [math]n^{-1/2}.[/math] It turns out that deviations of significantly more than this are extremely unlikely: it can be shown that the probability that [math]n^{-1}(X_1+\dots+X_n)[/math] differs from p by more than [math]tn^{-1/2}[/math] is at most [math]\exp(-ct^2)[/math] for some absolute constant [math]c\gt0.[/math] In particular, the probability that [math]n^{-1}(X_1+\dots+X_n)[/math] differs from p by more than a constant amount [math]\delta[/math] is at most [math]\exp(-c\delta^2n).[/math]

Many of the comments have implicitly used this fact. For example, if you pick a random point of [math][3]^n,[/math] then the number of 1s will be a sum of n independent Bernoulli variables with [math]p=1/3,[/math] as will the number of 2s and the number of 3s. Therefore, if you choose a random sequence, the probability is at most [math]3\exp(-c\delta^2n)[/math] that any of these numbers will differ from [math]n/3[/math] by more than [math]\delta n.[/math]

A considerably more general fact than this follows from Azuma's inequality.

#### Example 2: Lipschitz functions on the sphere

Let x be a random point in the unit sphere of [math]\mathbb{R}^n,[/math] and let f be a 1-Lipschitz function on the sphere (with respect to the spherical distance, say). Then it is a consequence of Lévy's isoperimetric inequality on the sphere that [math]\mathbb{P}[|f(x)-\mathbb{E}f|\geq\delta]\leq 2\exp(-c\delta^2n).[/math]

We can relate this to the general principle above by thinking of f as a function of the coordinates [math]X_1,X_2,\dots,X_n[/math] of x. These coordinates are by no means independent, but they do enjoy a high degree of approximate independence.

A quick sketch of how this statement relates to the isoperimetric inequality: the set of x such that f(x) is at most the *median* of f has measure at least 1/2. If we now expand this set by [math]\delta[/math] (that is, we take all points within [math]\delta[/math] of the set) then we obtain a set on which f(x) is not more than [math]\delta[/math] above its median. The isoperimetric inequality tells us that the set of measure at least 1/2 with smallest expansion is a hemisphere, and a simple calculation shows that the measure of the complement of the expansion of a hemisphere is at most [math]\exp(-c\delta^2n).[/math] Therefore, with very high probability f is at most its median plus [math]\delta.[/math] A similar argument gives a bound in the other direction. And if f is concentrated about its median, then its mean must be close to its median.