Passing between measures: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
Created page
 
No edit summary
Line 1: Line 1:
In this article we show how to pass between dense sets under the uniform measure on <math>[3]^n</math> and dense sents under the [[equal-slices measure]] on <math>[3]^n</math>, at the expense of localising to a [[combinatorial subspace]].
In this article we show how to pass from dense sets under the uniform measure on <math>[3]^n</math> to dense sets under the [[equal-slices measure]] on <math>[3]^n</math>, at the expense of localising to a [[combinatorial subspace]]. The other direction is possible too; it should be filled in at some point.
 
----


Let's first show how to pass between two product distributions.  Given <math>p_1 + p_2 + p_3 = 1</math>, let <math>\mu^n_{p_1,p_2,p_3}</math> denote the product distributions on <math>[3]^n</math> where each coordinate is independently chosen to be <math>j</math> with probability <math>p_j</math>, <math>j=1,2,3</math>.
Let's first show how to pass between two product distributions.  Given <math>p_1 + p_2 + p_3 = 1</math>, let <math>\mu^n_{p_1,p_2,p_3}</math> denote the product distributions on <math>[3]^n</math> where each coordinate is independently chosen to be <math>j</math> with probability <math>p_j</math>, <math>j=1,2,3</math>.
Line 7: Line 9:
By definition, <math>|\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu')</math>, where <math>d_{TV}</math> denotes [[total variation distance]].  Since the measures are product measures, it's more convenient to work with [[Hellinger distance]] <math>d_{H}</math> (see [http://http://arxiv.org/abs/math.PR/0209021 this article] for background information).  It is known that <math>d_{TV} \leq d_{H}</math>.
By definition, <math>|\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu')</math>, where <math>d_{TV}</math> denotes [[total variation distance]].  Since the measures are product measures, it's more convenient to work with [[Hellinger distance]] <math>d_{H}</math> (see [http://http://arxiv.org/abs/math.PR/0209021 this article] for background information).  It is known that <math>d_{TV} \leq d_{H}</math>.


The convenient aspect of Hellinger distance is that if <math>\nu = \nu_1 \times \nu_2</math>, <math>\lambda = \lambda_1 \times \lambda_2</math> are any two product distributions then we have the "triangle inequality"
The convenient aspect of Hellinger distance is that if <math>\nu = \nu_1 \times \nu_2</math>, <math>\lambda = \lambda_1 \times \lambda_2</math> are any two product distributions then we have the


<math>d_H^2(\nu, \lambda) \leq d_H^2(\nu_1, \lambda_1) + d_H^2(\nu_2, \lambda_2)</math>.
'''"Triangle inequality":''' <math>d_H^2(\nu, \lambda) \leq d_H^2(\nu_1, \lambda_1) + d_H^2(\nu_2, \lambda_2)</math>.


Hence <math>d_H(\mu, \mu') \leq \sqrt{n} d_H(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3})</math>.  In particular, if the Hellinger distance between the two 1-coordinate distributions is small compared to <math>1/\sqrt{n}</math>, the overall total variation distance is small.
Hence <math>d_H(\mu, \mu') \leq \sqrt{n} d_H(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3})</math>.  In particular, if the Hellinger distance between the two 1-coordinate distributions is small compared to <math>1/\sqrt{n}</math>, the overall total variation distance is small.
Line 15: Line 17:
To bound these 1-coordinate Hellinger distances we can use the fact that <math>d_H \leq \sqrt{2 d_{TV}}</math>.  Or if you are desperate to save small constant factors, one can check:
To bound these 1-coordinate Hellinger distances we can use the fact that <math>d_H \leq \sqrt{2 d_{TV}}</math>.  Or if you are desperate to save small constant factors, one can check:


'''Fact:''' If the 1-coordinate distribution <math>(p'_1, p'_2, p'_3)</math> is a mixture of the 1-coordinate distributions <math>(p_1,p_2,p_3)</math> and <math>(1/3, 1/3, 1/3)</math>, with mixing weights <math>1-\epsilon</math> and <math>\epsilon</math>, then <math>d_{H}(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3}) \sim \sqrt{\epsilon}</math> for small $\epsilon$, and is always at ost \leq \sqrt{2\epsilon}</math>.
'''Fact:''' If the 1-coordinate distribution <math>(p'_1, p'_2, p'_3)</math> is a mixture of the 1-coordinate distributions <math>(p_1,p_2,p_3)</math> and <math>(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})</math>, with mixing weights <math>1-\epsilon</math> and <math>\epsilon</math>, then <math>d_{H}(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3}) \sim \sqrt{\epsilon}</math> for small <math>\epsilon</math>, and is always at most <math>\sqrt{2\epsilon}</math>.
 
Combining these results, we conclude:
 
'''Proposition:''' Let <math>\mu</math> denote the <math>(p_1,p_2,p_3)</math> product distribution on <math>[3]^n</math>.  Let <math>\mu''</math> denote the product distribution on <math>\{1,2,3,\star\}^n</math> in which each coordinate is chosen independently to be <math>\star</math> with probability <math>\epsilon</math> and is chosen to be <math>j</math> with probability <math>p_j</math> otherwise, <math>j = 1,2,3</math>.  Further, let <math>\mu'</math> denote the distribution gotten by first drawing from <math>\mu'</math> and then filling in the <math>\star</math>'s with the uniform distribution on 1's, 2's, and 3's.  Then <math>d_H(\mu,\mu') \leq \sqrt{2 \epsilon n}</math>.
 
It is also a fact that if we take a mixture of distributions <math>(\mu_\alpha)_\alpha</math> and the ''same'' mixture of distributions <math>(\mu'_\alpha)_\alpha</math> (i.e., the "mixing weight" on each component <math>\alpha</math> is the same), then the Hellinger distance between the two mixture distribution is upper-bounded by the average Hellinger distances of the mixing components, according to the "mixing weights". 
 
In particular, we know:
 
'''Fact:''' The [[equal-slices measure]] on <math>[3]^n</math> is achieved by first drawing <math>(p_1,p_2,p_3)</math> from the simplex and then drawing from the resulting probability distribution.
 
This fact is a consequence of the [http://en.wikipedia.org/wiki/Dirichlet_distribution "type 1 Dirichlet integral"].  We conclude:
 
'''Proposition:''' <math>d_TV(</math>equal-slices<math>, \nu_\epsilon) \leq \sqrt{2\epsilon n}</math>
 
where
 
'''Definition:''' <math>\nu'_\epsilon</math> denotes the distribution on <math>\{1,2,3,\star\}^n</math> gotten by first drawing <math>(p_1,p_2,p_3)</math> from the simplex, then drawing from the product distribution with probability <math>\epsilon</math> for <math>\star</math> and from the <math>(p_1,p_2,p_3)</math> distribution with probability <math>1-\epsilon</math>.  <math>\nu_\epsilon</math> denotes the distribution on <math>[3]^n</math> gotten by drawing form <math>\nu'_\epsilon</math> and then filling in the <math>\star</math>'s randomly from the uniform distribution.

Revision as of 10:11, 6 March 2009

In this article we show how to pass from dense sets under the uniform measure on [math]\displaystyle{ [3]^n }[/math] to dense sets under the equal-slices measure on [math]\displaystyle{ [3]^n }[/math], at the expense of localising to a combinatorial subspace. The other direction is possible too; it should be filled in at some point.


Let's first show how to pass between two product distributions. Given [math]\displaystyle{ p_1 + p_2 + p_3 = 1 }[/math], let [math]\displaystyle{ \mu^n_{p_1,p_2,p_3} }[/math] denote the product distributions on [math]\displaystyle{ [3]^n }[/math] where each coordinate is independently chosen to be [math]\displaystyle{ j }[/math] with probability [math]\displaystyle{ p_j }[/math], [math]\displaystyle{ j=1,2,3 }[/math].

Let [math]\displaystyle{ A \subseteq [3]^n }[/math], which we can think of as a set or as an event. We are interested in [math]\displaystyle{ A }[/math]'s probability under different product distributions; say, [math]\displaystyle{ \mu = \mu^n_{p_1,p_2,p_3} }[/math] and [math]\displaystyle{ \mu' = \mu^n_{p'_1,p'_2,p'_3} }[/math].

By definition, [math]\displaystyle{ |\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu') }[/math], where [math]\displaystyle{ d_{TV} }[/math] denotes total variation distance. Since the measures are product measures, it's more convenient to work with Hellinger distance [math]\displaystyle{ d_{H} }[/math] (see this article for background information). It is known that [math]\displaystyle{ d_{TV} \leq d_{H} }[/math].

The convenient aspect of Hellinger distance is that if [math]\displaystyle{ \nu = \nu_1 \times \nu_2 }[/math], [math]\displaystyle{ \lambda = \lambda_1 \times \lambda_2 }[/math] are any two product distributions then we have the

"Triangle inequality": [math]\displaystyle{ d_H^2(\nu, \lambda) \leq d_H^2(\nu_1, \lambda_1) + d_H^2(\nu_2, \lambda_2) }[/math].

Hence [math]\displaystyle{ d_H(\mu, \mu') \leq \sqrt{n} d_H(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3}) }[/math]. In particular, if the Hellinger distance between the two 1-coordinate distributions is small compared to [math]\displaystyle{ 1/\sqrt{n} }[/math], the overall total variation distance is small.

To bound these 1-coordinate Hellinger distances we can use the fact that [math]\displaystyle{ d_H \leq \sqrt{2 d_{TV}} }[/math]. Or if you are desperate to save small constant factors, one can check:

Fact: If the 1-coordinate distribution [math]\displaystyle{ (p'_1, p'_2, p'_3) }[/math] is a mixture of the 1-coordinate distributions [math]\displaystyle{ (p_1,p_2,p_3) }[/math] and [math]\displaystyle{ (\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) }[/math], with mixing weights [math]\displaystyle{ 1-\epsilon }[/math] and [math]\displaystyle{ \epsilon }[/math], then [math]\displaystyle{ d_{H}(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3}) \sim \sqrt{\epsilon} }[/math] for small [math]\displaystyle{ \epsilon }[/math], and is always at most [math]\displaystyle{ \sqrt{2\epsilon} }[/math].

Combining these results, we conclude:

Proposition: Let [math]\displaystyle{ \mu }[/math] denote the [math]\displaystyle{ (p_1,p_2,p_3) }[/math] product distribution on [math]\displaystyle{ [3]^n }[/math]. Let [math]\displaystyle{ \mu'' }[/math] denote the product distribution on [math]\displaystyle{ \{1,2,3,\star\}^n }[/math] in which each coordinate is chosen independently to be [math]\displaystyle{ \star }[/math] with probability [math]\displaystyle{ \epsilon }[/math] and is chosen to be [math]\displaystyle{ j }[/math] with probability [math]\displaystyle{ p_j }[/math] otherwise, [math]\displaystyle{ j = 1,2,3 }[/math]. Further, let [math]\displaystyle{ \mu' }[/math] denote the distribution gotten by first drawing from [math]\displaystyle{ \mu' }[/math] and then filling in the [math]\displaystyle{ \star }[/math]'s with the uniform distribution on 1's, 2's, and 3's. Then [math]\displaystyle{ d_H(\mu,\mu') \leq \sqrt{2 \epsilon n} }[/math].

It is also a fact that if we take a mixture of distributions [math]\displaystyle{ (\mu_\alpha)_\alpha }[/math] and the same mixture of distributions [math]\displaystyle{ (\mu'_\alpha)_\alpha }[/math] (i.e., the "mixing weight" on each component [math]\displaystyle{ \alpha }[/math] is the same), then the Hellinger distance between the two mixture distribution is upper-bounded by the average Hellinger distances of the mixing components, according to the "mixing weights".

In particular, we know:

Fact: The equal-slices measure on [math]\displaystyle{ [3]^n }[/math] is achieved by first drawing [math]\displaystyle{ (p_1,p_2,p_3) }[/math] from the simplex and then drawing from the resulting probability distribution.

This fact is a consequence of the "type 1 Dirichlet integral". We conclude:

Proposition: [math]\displaystyle{ d_TV( }[/math]equal-slices[math]\displaystyle{ , \nu_\epsilon) \leq \sqrt{2\epsilon n} }[/math]

where

Definition: [math]\displaystyle{ \nu'_\epsilon }[/math] denotes the distribution on [math]\displaystyle{ \{1,2,3,\star\}^n }[/math] gotten by first drawing [math]\displaystyle{ (p_1,p_2,p_3) }[/math] from the simplex, then drawing from the product distribution with probability [math]\displaystyle{ \epsilon }[/math] for [math]\displaystyle{ \star }[/math] and from the [math]\displaystyle{ (p_1,p_2,p_3) }[/math] distribution with probability [math]\displaystyle{ 1-\epsilon }[/math]. [math]\displaystyle{ \nu_\epsilon }[/math] denotes the distribution on [math]\displaystyle{ [3]^n }[/math] gotten by drawing form [math]\displaystyle{ \nu'_\epsilon }[/math] and then filling in the [math]\displaystyle{ \star }[/math]'s randomly from the uniform distribution.