Concentration of measure: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
Started article and gave two examples
 
 
Line 9: Line 9:
Here are some examples, which give an idea of the principle in action.
Here are some examples, which give an idea of the principle in action.


====Example 1: sums of Bernoulli variables====
====Example 1: sums of independent Bernoulli variables====


Let <math>X_1,\dots,X_n</math> be independent Bernoulli random variables, with probability p of equalling 1. Let <math>f(X_1,\dots,X_n)=n^{-1}(X_1+\dots+X_n).</math> Then the expectation of <math>f</math> is p. The distribution of f is binomial (up to the scaling factor of <math>n^{-1}</math>). The variance of f is <math>n^{-1}p(1-p),</math> so we expect deviations from the mean of order of magnitude <math>n^{-1/2}.</math> It turns out that deviations of significantly more than this are extremely unlikely: it can be shown that the probability that <math>n^{-1}(X_1+\dots+X_n)</math> differs from p by more than <math>tn^{-1/2}</math> is at most <math>\exp(-ct^2)</math> for some absolute constant <math>c>0.</math> In particular, the probability that <math>n^{-1}(X_1+\dots+X_n)</math> differs from p by more than a constant amount <math>\delta</math> is at most <math>\exp(-c\delta^2n).</math>
Let <math>X_1,\dots,X_n</math> be independent Bernoulli random variables, with probability p of equalling 1. Let <math>f(X_1,\dots,X_n)=n^{-1}(X_1+\dots+X_n).</math> Then the expectation of <math>f</math> is p. The distribution of f is binomial (up to the scaling factor of <math>n^{-1}</math>). The variance of f is <math>n^{-1}p(1-p),</math> so we expect deviations from the mean of order of magnitude <math>n^{-1/2}.</math> It turns out that deviations of significantly more than this are extremely unlikely: it can be shown that the probability that <math>n^{-1}(X_1+\dots+X_n)</math> differs from p by more than <math>tn^{-1/2}</math> is at most <math>\exp(-ct^2)</math> for some absolute constant <math>c>0.</math> In particular, the probability that <math>n^{-1}(X_1+\dots+X_n)</math> differs from p by more than a constant amount <math>\delta</math> is at most <math>\exp(-c\delta^2n).</math>

Latest revision as of 09:30, 16 February 2009

Abstract discussion

Concentration of measure can be loosely described as the following principle.

  • If [math]\displaystyle{ f }[/math] is a function that depends sufficiently continuously on random variables [math]\displaystyle{ X_1,\dots,X_n }[/math] that are sufficiently bounded and sufficiently independent, then the probability that [math]\displaystyle{ f-\mathbb{E}f }[/math] is large is very small.

Examples

Here are some examples, which give an idea of the principle in action.

Example 1: sums of independent Bernoulli variables

Let [math]\displaystyle{ X_1,\dots,X_n }[/math] be independent Bernoulli random variables, with probability p of equalling 1. Let [math]\displaystyle{ f(X_1,\dots,X_n)=n^{-1}(X_1+\dots+X_n). }[/math] Then the expectation of [math]\displaystyle{ f }[/math] is p. The distribution of f is binomial (up to the scaling factor of [math]\displaystyle{ n^{-1} }[/math]). The variance of f is [math]\displaystyle{ n^{-1}p(1-p), }[/math] so we expect deviations from the mean of order of magnitude [math]\displaystyle{ n^{-1/2}. }[/math] It turns out that deviations of significantly more than this are extremely unlikely: it can be shown that the probability that [math]\displaystyle{ n^{-1}(X_1+\dots+X_n) }[/math] differs from p by more than [math]\displaystyle{ tn^{-1/2} }[/math] is at most [math]\displaystyle{ \exp(-ct^2) }[/math] for some absolute constant [math]\displaystyle{ c\gt 0. }[/math] In particular, the probability that [math]\displaystyle{ n^{-1}(X_1+\dots+X_n) }[/math] differs from p by more than a constant amount [math]\displaystyle{ \delta }[/math] is at most [math]\displaystyle{ \exp(-c\delta^2n). }[/math]

Many of the comments have implicitly used this fact. For example, if you pick a random point of [math]\displaystyle{ [3]^n, }[/math] then the number of 1s will be a sum of n independent Bernoulli variables with [math]\displaystyle{ p=1/3, }[/math] as will the number of 2s and the number of 3s. Therefore, if you choose a random sequence, the probability is at most [math]\displaystyle{ 3\exp(-c\delta^2n) }[/math] that any of these numbers will differ from [math]\displaystyle{ n/3 }[/math] by more than [math]\displaystyle{ \delta n. }[/math]

A considerably more general fact than this follows from Azuma's inequality.

Example 2: Lipschitz functions on the sphere

Let x be a random point in the unit sphere of [math]\displaystyle{ \mathbb{R}^n, }[/math] and let f be a 1-Lipschitz function on the sphere (with respect to the spherical distance, say). Then it is a consequence of Lévy's isoperimetric inequality on the sphere that [math]\displaystyle{ \mathbb{P}[|f(x)-\mathbb{E}f|\geq\delta]\leq 2\exp(-c\delta^2n). }[/math]

We can relate this to the general principle above by thinking of f as a function of the coordinates [math]\displaystyle{ X_1,X_2,\dots,X_n }[/math] of x. These coordinates are by no means independent, but they do enjoy a high degree of approximate independence.

A quick sketch of how this statement relates to the isoperimetric inequality: the set of x such that f(x) is at most the median of f has measure at least 1/2. If we now expand this set by [math]\displaystyle{ \delta }[/math] (that is, we take all points within [math]\displaystyle{ \delta }[/math] of the set) then we obtain a set on which f(x) is not more than [math]\displaystyle{ \delta }[/math] above its median. The isoperimetric inequality tells us that the set of measure at least 1/2 with smallest expansion is a hemisphere, and a simple calculation shows that the measure of the complement of the expansion of a hemisphere is at most [math]\displaystyle{ \exp(-c\delta^2n). }[/math] Therefore, with very high probability f is at most its median plus [math]\displaystyle{ \delta. }[/math] A similar argument gives a bound in the other direction. And if f is concentrated about its median, then its mean must be close to its median.