Passing between measures: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
No edit summary
Created properly
Line 7: Line 7:
Let <math>A \subseteq [3]^n</math>, which we can think of as a set or as an event.  We are interested in <math>A</math>'s probability under different product distributions; say, <math>\mu = \mu^n_{p_1,p_2,p_3}</math> and  <math>\mu' = \mu^n_{p'_1,p'_2,p'_3}</math>.
Let <math>A \subseteq [3]^n</math>, which we can think of as a set or as an event.  We are interested in <math>A</math>'s probability under different product distributions; say, <math>\mu = \mu^n_{p_1,p_2,p_3}</math> and  <math>\mu' = \mu^n_{p'_1,p'_2,p'_3}</math>.


By definition, <math>|\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu')</math>, where <math>d_{TV}</math> denotes [[total variation distance]].  Since the measures are product measures, it's more convenient to work with [[Hellinger distance]] <math>d_{H}</math> (see [http://http://arxiv.org/abs/math.PR/0209021 this article] for background information).  It is known that <math>d_{TV} \leq d_{H}</math>.
By definition, <math>|\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu')</math>, where <math>d_{TV}</math> denotes [[total variation distance]].  The following basic fact about total variation between product distributions appears, e.g., as [http://www.stat.yale.edu/~pollard/Books/Asymptopia/Metrics.pdf Exercise 21 here]:


The convenient aspect of Hellinger distance is that if <math>\nu = \nu_1 \times \nu_2</math>, <math>\lambda = \lambda_1 \times \lambda_2</math> are any two product distributions then we have the
'''Fact:''' If we write <math>p'_j = (1+\Delta_j) p_j</math>, then <math>d_{TV}(\mu, \mu')^2 \leq (1+(\sum_{j=1}^3 p_j \Delta_j^2))^n - 1</math>, and hence <math>d_{TV}(\mu,\mu') \leq \sqrt{\sum_{j=1}^3 p_j \Delta_j^2}\sqrt{n}</math>.


'''"Triangle inequality":''' <math>d_H^2(\nu, \lambda) \leq d_H^2(\nu_1, \lambda_1) + d_H^2(\nu_2, \lambda_2)</math>.
In particular, suppose that <math>(p'_1, p'_2, p'_3) = (1 - \epsilon)(p_1, p_2, p_3) + \epsilon(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})</math>. Then we get


Hence <math>d_H(\mu, \mu') \leq \sqrt{n} d_H(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3})</math>.  In particular, if the Hellinger distance between the two 1-coordinate distributions is small compared to <math>1/\sqrt{n}</math>, the overall total variation distance is small.
'''Proposition 2:''' <math>d_{TV}(\mu,\mu') \leq \sqrt{c(p_1,p_2,p_3)} \cdot \epsilon \sqrt{n}</math>, where


To bound these 1-coordinate Hellinger distances we can use the fact that <math>d_H \leq \sqrt{2 d_{TV}}</math>.  Or if you are desperate to save small constant factors, one can check:
<math>c(p_1,p_2,p_3) = \frac{1}{9p_1} +  \frac{1}{9p_2} +  \frac{1}{9p_3} - 1</math>


'''Fact:''' If the 1-coordinate distribution <math>(p'_1, p'_2, p'_3)</math> is a mixture of the 1-coordinate distributions <math>(p_1,p_2,p_3)</math> and <math>(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})</math>, with mixing weights <math>1-\epsilon</math> and <math>\epsilon</math>, then <math>d_{H}(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3}) \sim \sqrt{\epsilon}</math> for small <math>\epsilon</math>, and is always at most <math>\sqrt{2\epsilon}</math>.
Next, suppose that we define <math>\nu</math> to be the distribution on <math>[3]^n</math> gotten by first drawing <math>(p_1, p_2, p_3)</math> from the simplex and then drawing from <math>\mu^n_{p_1,p_2,p_3}</math>.  Define <math>\nu'</math> in the analogous way.


Combining these results, we conclude:
'''Fact:''' The distribution <math>\nu</math> is the [[equal-slices measure]] on <math>[3]^n</math>.


'''Proposition:''' Let <math>\mu</math> denote the <math>(p_1,p_2,p_3)</math> product distribution on <math>[3]^n</math>. Let <math>\mu''</math> denote the product distribution on <math>\{1,2,3,\star\}^n</math> in which each coordinate is chosen independently to be <math>\star</math> with probability <math>\epsilon</math> and is chosen to be <math>j</math> with probability <math>p_j</math> otherwise, <math>j = 1,2,3</math>. Further, let <math>\mu'</math> denote the distribution gotten by first drawing from <math>\mu'</math> and then filling in the <math>\star</math>'s with the uniform distribution on 1's, 2's, and 3's.  Then <math>d_H(\mu,\mu') \leq \sqrt{2 \epsilon n}</math>.
This fact is a consequence of the [http://en.wikipedia.org/wiki/Dirichlet_distribution "type 1 Dirichlet integral"].  


It is also a fact that if we take a mixture of distributions <math>(\mu_\alpha)_\alpha</math> and the ''same'' mixture of distributions <math>(\mu'_\alpha)_\alpha</math> (i.e., the "mixing weight" on each component <math>\alpha</math> is the same), then the Hellinger distance between the two mixture distribution is upper-bounded by the average Hellinger distances of the mixing components, according to the "mixing weights". 
Clearly,  


In particular, we know:
<math>d_{TV}(\nu, \nu') \leq \left(\int_{p_1+p_2+p_3 = 1} \sqrt{c(p_1,p_2,p_3)}\right) \cdot \epsilon \sqrt{n}</math>.


'''Fact:''' The [[equal-slices measure]] on <math>[3]^n</math> is achieved by first drawing <math>(p_1,p_2,p_3)</math> from the simplex and then drawing from the resulting probability distribution.
It is elementary to check that the integral is finite; in fact, it should be <math>O(\sqrt{k})</math> when we have the analogous situation over <math>[k]^n</math>. So:


This fact is a consequence of the [http://en.wikipedia.org/wiki/Dirichlet_distribution "type 1 Dirichlet integral"]. We conclude:
'''Proposition 2:''' <math>d_{TV}(\nu, \nu') \leq O(\epsilon \sqrt{n})</math>.


'''Proposition:''' <math>d_TV(</math>equal-slices<math>, \nu_\epsilon) \leq \sqrt{2\epsilon n}</math>
----


where
We can view a draw from <math>\nu'</math> as follows: First, we choose a random set of wildcards <math>S</math> by including each coordinate independently with probability <math>\epsilon</math>.  Next, choose <math>(p_1,p_2,p_3)</math> at random from the simplex.  Next, choose a string <math>x \in [3]^{[n] \setminus S}</math> from <math>\mu_{p_1,p_2,p_3}</math>.  Finally, choose <math>y \in [3]^n</math> uniformly, and output <math>(x,y)</math>.


'''Definition:''' <math>\nu'_\epsilon</math> denotes the distribution on <math>\{1,2,3,\star\}^n</math> gotten by first drawing <math>(p_1,p_2,p_3)</math> from the simplex, then drawing from the product distribution with probability <math>\epsilon</math> for <math>\star</math> and from the <math>(p_1,p_2,p_3)</math> distribution with probability <math>1-\epsilon</math><math>\nu_\epsilon</math> denotes the distribution on <math>[3]^n</math> gotten by drawing form <math>\nu'_\epsilon</math> and then filling in the <math>\star</math>'s randomly from the uniform distribution.
By Proposition 2,
 
<math>\Pr_\nu[A] - O(\epsilon \sqrt{n}) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S} \mathbf{E}_{p,x} \Pr_y [A].</math>
 
By a large-deviation bound, the probability that <math>|S| < \epsilon n / 2</math> is at most <math>\exp(-\Omega(\epsilon n))</math>.  Renaming <math>\epsilon</math> as <math>2\epsilon</math> for simplicity, we conclude that
 
<math>\Pr_\nu[A] - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n)) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S}[\mathbf{E}_{p,x} \Pr_y [A] \mid |S| \geq \epsilon n].</math>
 
Hence:
 
'''Theorem:''' Suppose <math>A</math> has equal-slices density <math>\delta</math>.  Then for any <math>\epsilon</math>, there exists a combinatorial subspace <math>(x,S)</math> with <math>|S| \geq \epsilon n</math> on which <math>A</math> has uniform-distribution density at least <math>\delta - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n))</math>.
 
As mentioned, if this is done over <math>[k]^n</math>, we should only have to change <math>O(\epsilon \sqrt{n})</math> to <math>O(\epsilon \sqrt{kn})</math>.

Revision as of 13:13, 6 March 2009

In this article we show how to pass from dense sets under the uniform measure on [math]\displaystyle{ [3]^n }[/math] to dense sets under the equal-slices measure on [math]\displaystyle{ [3]^n }[/math], at the expense of localising to a combinatorial subspace. The other direction is possible too; it should be filled in at some point.


Let's first show how to pass between two product distributions. Given [math]\displaystyle{ p_1 + p_2 + p_3 = 1 }[/math], let [math]\displaystyle{ \mu^n_{p_1,p_2,p_3} }[/math] denote the product distributions on [math]\displaystyle{ [3]^n }[/math] where each coordinate is independently chosen to be [math]\displaystyle{ j }[/math] with probability [math]\displaystyle{ p_j }[/math], [math]\displaystyle{ j=1,2,3 }[/math].

Let [math]\displaystyle{ A \subseteq [3]^n }[/math], which we can think of as a set or as an event. We are interested in [math]\displaystyle{ A }[/math]'s probability under different product distributions; say, [math]\displaystyle{ \mu = \mu^n_{p_1,p_2,p_3} }[/math] and [math]\displaystyle{ \mu' = \mu^n_{p'_1,p'_2,p'_3} }[/math].

By definition, [math]\displaystyle{ |\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu') }[/math], where [math]\displaystyle{ d_{TV} }[/math] denotes total variation distance. The following basic fact about total variation between product distributions appears, e.g., as Exercise 21 here:

Fact: If we write [math]\displaystyle{ p'_j = (1+\Delta_j) p_j }[/math], then [math]\displaystyle{ d_{TV}(\mu, \mu')^2 \leq (1+(\sum_{j=1}^3 p_j \Delta_j^2))^n - 1 }[/math], and hence [math]\displaystyle{ d_{TV}(\mu,\mu') \leq \sqrt{\sum_{j=1}^3 p_j \Delta_j^2}\sqrt{n} }[/math].

In particular, suppose that [math]\displaystyle{ (p'_1, p'_2, p'_3) = (1 - \epsilon)(p_1, p_2, p_3) + \epsilon(\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) }[/math]. Then we get

Proposition 2: [math]\displaystyle{ d_{TV}(\mu,\mu') \leq \sqrt{c(p_1,p_2,p_3)} \cdot \epsilon \sqrt{n} }[/math], where

[math]\displaystyle{ c(p_1,p_2,p_3) = \frac{1}{9p_1} + \frac{1}{9p_2} + \frac{1}{9p_3} - 1 }[/math]

Next, suppose that we define [math]\displaystyle{ \nu }[/math] to be the distribution on [math]\displaystyle{ [3]^n }[/math] gotten by first drawing [math]\displaystyle{ (p_1, p_2, p_3) }[/math] from the simplex and then drawing from [math]\displaystyle{ \mu^n_{p_1,p_2,p_3} }[/math]. Define [math]\displaystyle{ \nu' }[/math] in the analogous way.

Fact: The distribution [math]\displaystyle{ \nu }[/math] is the equal-slices measure on [math]\displaystyle{ [3]^n }[/math].

This fact is a consequence of the "type 1 Dirichlet integral".

Clearly,

[math]\displaystyle{ d_{TV}(\nu, \nu') \leq \left(\int_{p_1+p_2+p_3 = 1} \sqrt{c(p_1,p_2,p_3)}\right) \cdot \epsilon \sqrt{n} }[/math].

It is elementary to check that the integral is finite; in fact, it should be [math]\displaystyle{ O(\sqrt{k}) }[/math] when we have the analogous situation over [math]\displaystyle{ [k]^n }[/math]. So:

Proposition 2: [math]\displaystyle{ d_{TV}(\nu, \nu') \leq O(\epsilon \sqrt{n}) }[/math].


We can view a draw from [math]\displaystyle{ \nu' }[/math] as follows: First, we choose a random set of wildcards [math]\displaystyle{ S }[/math] by including each coordinate independently with probability [math]\displaystyle{ \epsilon }[/math]. Next, choose [math]\displaystyle{ (p_1,p_2,p_3) }[/math] at random from the simplex. Next, choose a string [math]\displaystyle{ x \in [3]^{[n] \setminus S} }[/math] from [math]\displaystyle{ \mu_{p_1,p_2,p_3} }[/math]. Finally, choose [math]\displaystyle{ y \in [3]^n }[/math] uniformly, and output [math]\displaystyle{ (x,y) }[/math].

By Proposition 2,

[math]\displaystyle{ \Pr_\nu[A] - O(\epsilon \sqrt{n}) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S} \mathbf{E}_{p,x} \Pr_y [A]. }[/math]

By a large-deviation bound, the probability that [math]\displaystyle{ |S| \lt \epsilon n / 2 }[/math] is at most [math]\displaystyle{ \exp(-\Omega(\epsilon n)) }[/math]. Renaming [math]\displaystyle{ \epsilon }[/math] as [math]\displaystyle{ 2\epsilon }[/math] for simplicity, we conclude that

[math]\displaystyle{ \Pr_\nu[A] - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n)) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S}[\mathbf{E}_{p,x} \Pr_y [A] \mid |S| \geq \epsilon n]. }[/math]

Hence:

Theorem: Suppose [math]\displaystyle{ A }[/math] has equal-slices density [math]\displaystyle{ \delta }[/math]. Then for any [math]\displaystyle{ \epsilon }[/math], there exists a combinatorial subspace [math]\displaystyle{ (x,S) }[/math] with [math]\displaystyle{ |S| \geq \epsilon n }[/math] on which [math]\displaystyle{ A }[/math] has uniform-distribution density at least [math]\displaystyle{ \delta - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n)) }[/math].

As mentioned, if this is done over [math]\displaystyle{ [k]^n }[/math], we should only have to change [math]\displaystyle{ O(\epsilon \sqrt{n}) }[/math] to [math]\displaystyle{ O(\epsilon \sqrt{kn}) }[/math].