Passing between measures: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
Created page
 
No edit summary
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
In this article we show how to pass between dense sets under the uniform measure on <math>[3]^n</math> and dense sents under the [[equal-slices measure]] on <math>[3]^n</math>, at the expense of localising to a [[combinatorial subspace]].
In this article we show how to pass between dense sets under the uniform measure on <math>[3]^n</math> and dense sets under the [[equal-slices measure]] on <math>[3]^n</math>, at the expense of localising to a [[combinatorial subspace]].
 
== Passing between product distributions ==


Let's first show how to pass between two product distributions.  Given <math>p_1 + p_2 + p_3 = 1</math>, let <math>\mu^n_{p_1,p_2,p_3}</math> denote the product distributions on <math>[3]^n</math> where each coordinate is independently chosen to be <math>j</math> with probability <math>p_j</math>, <math>j=1,2,3</math>.
Let's first show how to pass between two product distributions.  Given <math>p_1 + p_2 + p_3 = 1</math>, let <math>\mu^n_{p_1,p_2,p_3}</math> denote the product distributions on <math>[3]^n</math> where each coordinate is independently chosen to be <math>j</math> with probability <math>p_j</math>, <math>j=1,2,3</math>.
Line 5: Line 7:
Let <math>A \subseteq [3]^n</math>, which we can think of as a set or as an event.  We are interested in <math>A</math>'s probability under different product distributions; say, <math>\mu = \mu^n_{p_1,p_2,p_3}</math> and  <math>\mu' = \mu^n_{p'_1,p'_2,p'_3}</math>.
Let <math>A \subseteq [3]^n</math>, which we can think of as a set or as an event.  We are interested in <math>A</math>'s probability under different product distributions; say, <math>\mu = \mu^n_{p_1,p_2,p_3}</math> and  <math>\mu' = \mu^n_{p'_1,p'_2,p'_3}</math>.


By definition, <math>|\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu')</math>, where <math>d_{TV}</math> denotes [[total variation distance]].  Since the measures are product measures, it's more convenient to work with [[Hellinger distance]] <math>d_{H}</math> (see [http://http://arxiv.org/abs/math.PR/0209021 this article] for background information).  It is known that <math>d_{TV} \leq d_{H}</math>.
By definition, <math>|\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu')</math>, where <math>d_{TV}</math> denotes [[total variation distance]].  The following basic fact about total variation between product distributions appears, e.g., as [http://www.stat.yale.edu/~pollard/Books/Asymptopia/Metrics.pdf Exercise 21 here]:
 
'''Fact:''' If we write <math>p'_j = (1+\Delta_j) p_j</math>, then <math>d_{TV}(\mu, \mu')^2 \leq (1+(\sum_{j=1}^3 p_j \Delta_j^2))^n - 1</math>, and hence <math>d_{TV}(\mu,\mu') \leq \sqrt{\sum_{j=1}^3 p_j \Delta_j^2}\sqrt{n}</math>.
 
== A product distribution slightly corrupted ==
 
Suppose <math>\mu = \mu_{p_1,p_2,p_3}</math> is our basic product measure, and <math>\mu' = \mu_{p_1',p_2',p_3'}</math> is defined by <math>(p_1',p_2',p_3') = (1-\epsilon)(p_1,p_2,p_3) + \epsilon(q_1,q_2,q_3)</math>.  Then one can calculate that the <math>\Delta_i</math> above equals <math>-\epsilon(1-q_i/p_i)</math>, and hence one gets:
 
: '''Proposition:''' <math>d_{TV}(\mu, \mu') \leq a(p,q) \cdot \epsilon\sqrt{n}</math>, where
 
<center><math>a(p,q) = \sqrt{-1 + \sum_{i=1}^3 q_i^2/p_i}</math>.</center>
 
One can bound <math>a(p,q)</math> crudely by
 
<center><math>a(p,q) = \sqrt{-1 + \sum_{i=1}^3 q_i^2/p_i} \leq \sqrt{\sum_{i=1}^3 1/p_i} \leq \sum_{i=1}^3 1/\sqrt{p_i}</math>.</center>
 
Recall that the Equal Slices distribution <math>\overline{\nu}</math> is given by first choosing <math>(p_1, p_2, p_3)</math> from the simplex, and then drawing from the product distribution <math>\mu_{p_1,p_2,p_3}</math>.  Since  
 
<center><math>\int_{p_1+p_2+p_3 = 1} (\sum_{i=1}^3 1/\sqrt{p_i}) = O(1)</math></center>
 
(it's <math>48/5</math>, I believe), we conclude the following:
 
: '''Proposition:''' Let <math>\gamma</math> be the distribution on <math>[3]^n</math> given by the following procedure: (a) pick a random subset <math>S \subseteq [n]</math> by including each coordinate with probability <math>\epsilon</math>; (b) draw from the product distribution <math>\mu_{q_1, q_2, q_3}</math> on the <math>S</math> coordinates; (c) draw from <math>\mu_{p_1, p_2, p_3}</math> on the <math>[n] \setminus S</math> coordinates.  Then <math>d_{TV}(\overline{\nu}, \gamma) \leq O(\epsilon \sqrt{n})</math>.
 
: '''Corollary:''' The same would be true if in (b) we drew from Equal Slices on <math>[3]^S</math>; this is because Equal Slices is a mixture of product distributions.
 
In fact, I'm quite certain it would be true if we put <i>any</i> distribution whatsoever on the <math>S</math> coordinates, since <math>a(p,q)</math> could be upper-bounded independently of the <math>q_i</math>'s.  If we need this later, we can fill in the details.
 
 
 
== Passing from uniform to equal-slices ==
 
We will be interested in the case when <math>(p_1,p_2,p_3) = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})</math> generates the uniform distribution and <math>(p'_1, p'_2, p'_3)</math> is a slight mixture with another distribution.  Specifically, say
 
'''Definition:''' <math>(p'_1, p'_2, p'_3) = (1-\epsilon)(\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) + \epsilon(q_1, q_2, q_3)</math>.
 
Then we can calculate:
 
<center><math>\sum_{j=1}^3 p_j \Delta_j^2 = 3(q_1^2 + q_2^2 + q_3^2 - 1/3) \epsilon^2 \leq 3\epsilon^2.</math></center>
 
Hence:
 
'''Proposition:''' In this scenario, <math>d_{TV}(\mu, \mu') \leq \sqrt{3} \epsilon\sqrt{n}</math> independently of <math>(q_1, q_2, q_3)</math>. 
 
Recall that [[equal-slices measure]] is gotten by first drawing <math>(q_1, q_2, q_3)</math> from the simplex and then drawing from the product distribution generated by <math>\mu^1_{q_1,q_2,q_3}</math>.  Since the above proposition holds for every choice of <math>(q_1, q_2, q_3)</math>, it holds when we average these choices over the simplex.  It would also hold if we averaged over <math>(q_1, q_2, 0)</math> drawn from the 2-simplex <math>q_1 + q_2 = 1</math>.  We conclude:
 
'''Theorem:'''  Suppose <math>\lambda</math> is the distribution on <math>[3]^n</math> generated as follows: First we draw a random restriction in which each coordinate is free with probability <math>\epsilon</math> and fixed uniformly with probability <math>1-\epsilon</math>. Next, we draw the free coordinates from the equal-slices measure.  Then <math>d_{TV}(\mathrm{uniform}, \lambda) \leq \sqrt{3}\epsilon \sqrt{n}</math>.  Furthermore, this is also true for <math>\lambda'</math>, defined in the same way except that the free coordinates are drawn form the equal-slices measure on <math>[2]^n</math>.
 
'''Corollary:''' Suppose we change <math>\lambda</math> slightly by also conditioning on at least <math>\epsilon n/2</math> coordinates being left free initially and hence drawn from an equal-slices distribution.  Then <math>d_{TV}(\mathrm{uniform}, \lambda) \leq \sqrt{3}\epsilon \sqrt{n} + \exp(-\Omega(\epsilon n))</math>.  The analogous statement is true for <math>\lambda'</math>.
 
'''Proof:''' This follows because the probability of getting too few free coordinates is at most <math>\exp(-\Omega(\epsilon n))</math>, by a standard large-deviation bound.
 
== Passing from equal-slices to uniform ==
 
This is a bit easier, which is convenient as we'll want to prove something stronger.
 
Suppose <math>A \subseteq [3]^n</math> has <math>\Pr_{\nu}[A] = \delta</math>, where again <math>\nu</math> denotes equal-slices measure.  Recall that we can think of <math>\nu</math> as first choosing <math>(p_1,p_2,p_3)</math> from the simplex and then drawing from the product distribution <math>\mu^n_{p_1,p_2,p_3}</math>.  The following is easy to check:
 
'''Fact:''' Any distribution <math>\mu^1_{p_1,p_2,p_3}</math> on <math>[3]</math> can be written as a mixture distribution <math>\alpha \mu^1_{1/3,1/3,1/3} + (1-\alpha) \lambda</math>, where <math>\alpha = 3\min\{p_1,p_2,p_3\}</math> and <math>\lambda</math> is some other distribution.
 
'''Corollary:''' We can express a draw from <math>\mu^n_{p_1,p_2,p_3}</math> as: (i) forming a set <math>S \subseteq [n]</math> by including each coordinate with probability <math>1-\alpha</math>, independently; (ii) fixing the coordinates in <math>S</math> to some string <math>x</math>, according to the product distribution <math>\lambda</math>; (iii) drawing <math>y</math> uniformly at random from the combinatorial subspace formed by <math>(x,S)</math>.
 
Hence, using the notation <math>A_x</math> for the restriction of <math>A</math> to the combinatorial subspace defined by fixing <math>x</math> into the coordinates <math>S</math>, we have
 
<center><math>\delta = \Pr_{\nu}[A] = \mathbf{E}_{p_1+p_2+p_3 = 1} \mathbf{E}_{S,x} \Pr_{y \text{ uniform}}[y \in A_{x}]</math>.</center>
 
Hence for any <math>\eta > 0</math> we have that with probability at least <math>\eta</math> over the formation of the combinatorial subspace determined by <math>x</math>, the uniform measure of <math>A_x</math> is at least <math>\delta - \eta</math>.
 
The only thing that could go somewhat wrong here is that <math>S</math> might be all of <math>[n]</math>, or nearly so.  But this is quite unlikely.  Specifically, one easily calculates that the probability that <math>\min\{p_1,p_2,p_3\} \leq \eta/8</math> is at most <math>\eta/4</math>.  Assuming this doesn't happen we have <math>\alpha \geq (3/8)\eta</math>; thus we expect the number of free coordinates in our combinatorial subspace to be <math>(3/8)\eta n</math>, and it will be at least <math>(\eta/4) n</math> except with probability <math>\exp(-\Omega(\eta n))</math>.  Overall, the probability that everything goes right is at least <math>\eta - \eta/4 - \exp(-\Omega(\eta n))</math>.  This is positive if <math> \eta \gg (\log n)/n</math>.
 
We conclude:
 
'''Theorem:'''  Suppose the set <math>A</math> has equal-slices density <math>\delta</math>.  Assume that <math>\eta \geq O(\log n / n)</math>.  Then there exists a combinatorial subspace with at least <math>(\eta/4) n</math> free coordinates on which <math>A</math> has uniform density at least <math>\delta - \eta</math>.
 
=== Relative density version ===
 
It will useful to have also a "relative density" version of this:
 
'''Theorem:'''  Let <math>\nu</math> denote equal-slices measure and let <math>\mu</math> denote uniform measure. Suppose that <math>A \subseteq B \subseteq [3]^n</math>, with <math>\nu(B) = \gamma</math>, <math>\nu(A) \geq \delta \nu(B)</math>.  Assume <math>\eta</math> is such that <math>\eta \gamma \geq O(\log n / n)</math>.  Then there exists a combinatorial subspace with at least <math>(\eta \gamma / 4) n</math> free coordinates on which <math>\mu(B) \geq \eta \gamma/2</math> and <math>\mu(A) \geq (\delta - \eta) \mu(B)</math>.
 
'''Proof:''' As in the previous section, think of <math>\nu</math> as first drawing the combinatorial subspace defined by <math>x</math> and then drawing from <math>\mu</math> on this subspace.  With regard to the draw of <math>x</math>, let <math>H</math> event that ''either'' <math>\mu(B_x) \leq \eta \gamma/2</math> ''or'' <math>x</math> leaves fewer than <math>(\eta \gamma/4) n</math> coordinates free.  We have
 
<center><math>\delta \gamma \leq \mathbf{E}_x[\mu(A_x)] = \Pr[H] \cdot \mathbf{E}_x[\mu(A_x) \mid H] + \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(A_x) \mid \overline{H}].</math></center>
 
The first summand on the right contributes at most <math>\eta \gamma/2</math> (from when <math>\mu(B_x) \leq \eta \gamma/2</math>) plus <math>\eta\gamma/4 + \exp(-\Omega(\eta \gamma n)) \leq \eta \gamma/2</math> (from when <math>x</math> leaves too few coordinates free, by the same argument as in the previous section).  Subtracting <math>\eta \gamma</math> from both sides we get
 
<center><math>(\delta - \eta) \gamma \leq \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(A_x) \mid \overline{H}].</math></center>
 
But since <math>\gamma = \mathbf{E}_x[\mu(B_x)] \geq \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(B_x) \mid \overline{H}]</math>, we have
 
<center><math>\Pr[\overline{H}] \cdot \mathbf{E}_x[(\delta - \eta) \mu(B_x) \mid \overline{H}] \leq \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(A_x) \mid \overline{H}].</math></center>
 
Clearly then there exists some <math>x</math> for which both <math>H</math> does not occur and
 
<center><math>\mu(A_x) \geq (\delta - \eta) \mu(B_x)</math>.</center>
 
This completes the proof.
 
== An older version of passing from equal-slices to uniform ==
 
Suppose we define <math>\nu</math> to be the distribution on <math>[3]^n</math> gotten by first drawing <math>(p_1, p_2, p_3)</math> from the simplex and then drawing from <math>\mu^n_{p_1,p_2,p_3}</math>.   
 
'''Fact:''' The distribution <math>\nu</math> is the [[equal-slices measure]] on <math>[3]^n</math>.
 
This fact is a consequence of the [http://en.wikipedia.org/wiki/Dirichlet_distribution "type 1 Dirichlet integral"].
 
Next, define <math>\nu'</math> as follows: first draw <math>(p_1,p_2,p_3)</math> from the simplex, then draw from <math>\mu^n_{p'_1,p'_2,p'_3}</math>, where <math>(p'_1, p'_2, p'_3) = (1 - \epsilon)(p_1, p_2, p_3) + \epsilon(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})</math>. 
 
Using the Fact in the first section, we get
 
'''Proposition:''' <math>d_{TV}(\mu,\mu') \leq \sqrt{c(p_1,p_2,p_3)} \cdot \epsilon \sqrt{n}</math>, where
 
<center><math>c(p_1,p_2,p_3) = \frac{1}{9p_1} +  \frac{1}{9p_2} +  \frac{1}{9p_3} - 1</math>.</center>
 
Clearly then,
 
<center><math>d_{TV}(\nu, \nu') \leq \left(\int_{p_1+p_2+p_3 = 1} \sqrt{c(p_1,p_2,p_3)}\right) \cdot \epsilon \sqrt{n}</math></center>.
 
It is elementary to check that the integral is finite; in fact, it should be <math>O(\sqrt{k})</math> when we have the analogous situation over <math>[k]^n</math>. So:
 
'''Proposition 2:''' <math>d_{TV}(\nu, \nu') \leq O(\epsilon \sqrt{n})</math>.
 
We can view a draw from <math>\nu'</math> as follows: First, we choose a random set of wildcards <math>S</math> by including each coordinate independently with probability <math>\epsilon</math>.  Next, choose <math>(p_1,p_2,p_3)</math> at random from the simplex.  Next, choose a string <math>x \in [3]^{[n] \setminus S}</math> from <math>\mu_{p_1,p_2,p_3}</math>. Finally, choose <math>y \in [3]^n</math> uniformly, and output <math>(x,y)</math>.
 
By Proposition 2,
 
<center><math>\Pr_\nu[A] - O(\epsilon \sqrt{n}) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S} \mathbf{E}_{p,x} \Pr_y [A].</math></center>


The convenient aspect of Hellinger distance is that if <math>\nu = \nu_1 \times \nu_2</math>, <math>\lambda = \lambda_1 \times \lambda_2</math> are any two product distributions then we have the "triangle inequality"
By a large-deviation bound, the probability that <math>|S| < \epsilon n / 2</math> is at most <math>\exp(-\Omega(\epsilon n))</math>.  Renaming <math>\epsilon</math> as <math>2\epsilon</math> for simplicity, we conclude that


<math>d_H^2(\nu, \lambda) \leq d_H^2(\nu_1, \lambda_1) + d_H^2(\nu_2, \lambda_2)</math>.
<center><math>\Pr_\nu[A] - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n)) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S}[\mathbf{E}_{p,x} \Pr_y [A] \mid |S| \geq \epsilon n].</math></center>


Hence <math>d_H(\mu, \mu') \leq \sqrt{n} d_H(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3})</math>.  In particular, if the Hellinger distance between the two 1-coordinate distributions is small compared to <math>1/\sqrt{n}</math>, the overall total variation distance is small.
Hence:


To bound these 1-coordinate Hellinger distances we can use the fact that <math>d_H \leq \sqrt{2 d_{TV}}</math>. Or if you are desperate to save small constant factors, one can check:
'''Theorem:''' (Weaker than in the previous section.) Suppose <math>A</math> has equal-slices density <math>\delta</math>.  Then for any <math>\epsilon</math>, there exists a combinatorial subspace <math>(x,S)</math> with <math>|S| \geq \epsilon n</math> on which <math>A</math> has uniform-distribution density at least <math>\delta - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n))</math>.


'''Fact:''' If the 1-coordinate distribution <math>(p'_1, p'_2, p'_3)</math> is a mixture of the 1-coordinate distributions <math>(p_1,p_2,p_3)</math> and <math>(1/3, 1/3, 1/3)</math>, with mixing weights <math>1-\epsilon</math> and <math>\epsilon</math>, then <math>d_{H}(\mu^1_{p_1,p_2,p_3}, \mu^1_{p'_1,p'_2,p'_3}) \sim \sqrt{\epsilon}</math> for small $\epsilon$, and is always at ost \leq \sqrt{2\epsilon}</math>.
By the way, f this is done over <math>[k]^n</math>, we should only have to change <math>O(\epsilon \sqrt{n})</math> to <math>O(\epsilon \sqrt{kn})</math>.

Latest revision as of 16:55, 23 April 2009

In this article we show how to pass between dense sets under the uniform measure on [math]\displaystyle{ [3]^n }[/math] and dense sets under the equal-slices measure on [math]\displaystyle{ [3]^n }[/math], at the expense of localising to a combinatorial subspace.

Passing between product distributions

Let's first show how to pass between two product distributions. Given [math]\displaystyle{ p_1 + p_2 + p_3 = 1 }[/math], let [math]\displaystyle{ \mu^n_{p_1,p_2,p_3} }[/math] denote the product distributions on [math]\displaystyle{ [3]^n }[/math] where each coordinate is independently chosen to be [math]\displaystyle{ j }[/math] with probability [math]\displaystyle{ p_j }[/math], [math]\displaystyle{ j=1,2,3 }[/math].

Let [math]\displaystyle{ A \subseteq [3]^n }[/math], which we can think of as a set or as an event. We are interested in [math]\displaystyle{ A }[/math]'s probability under different product distributions; say, [math]\displaystyle{ \mu = \mu^n_{p_1,p_2,p_3} }[/math] and [math]\displaystyle{ \mu' = \mu^n_{p'_1,p'_2,p'_3} }[/math].

By definition, [math]\displaystyle{ |\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu') }[/math], where [math]\displaystyle{ d_{TV} }[/math] denotes total variation distance. The following basic fact about total variation between product distributions appears, e.g., as Exercise 21 here:

Fact: If we write [math]\displaystyle{ p'_j = (1+\Delta_j) p_j }[/math], then [math]\displaystyle{ d_{TV}(\mu, \mu')^2 \leq (1+(\sum_{j=1}^3 p_j \Delta_j^2))^n - 1 }[/math], and hence [math]\displaystyle{ d_{TV}(\mu,\mu') \leq \sqrt{\sum_{j=1}^3 p_j \Delta_j^2}\sqrt{n} }[/math].

A product distribution slightly corrupted

Suppose [math]\displaystyle{ \mu = \mu_{p_1,p_2,p_3} }[/math] is our basic product measure, and [math]\displaystyle{ \mu' = \mu_{p_1',p_2',p_3'} }[/math] is defined by [math]\displaystyle{ (p_1',p_2',p_3') = (1-\epsilon)(p_1,p_2,p_3) + \epsilon(q_1,q_2,q_3) }[/math]. Then one can calculate that the [math]\displaystyle{ \Delta_i }[/math] above equals [math]\displaystyle{ -\epsilon(1-q_i/p_i) }[/math], and hence one gets:

Proposition: [math]\displaystyle{ d_{TV}(\mu, \mu') \leq a(p,q) \cdot \epsilon\sqrt{n} }[/math], where
[math]\displaystyle{ a(p,q) = \sqrt{-1 + \sum_{i=1}^3 q_i^2/p_i} }[/math].

One can bound [math]\displaystyle{ a(p,q) }[/math] crudely by

[math]\displaystyle{ a(p,q) = \sqrt{-1 + \sum_{i=1}^3 q_i^2/p_i} \leq \sqrt{\sum_{i=1}^3 1/p_i} \leq \sum_{i=1}^3 1/\sqrt{p_i} }[/math].

Recall that the Equal Slices distribution [math]\displaystyle{ \overline{\nu} }[/math] is given by first choosing [math]\displaystyle{ (p_1, p_2, p_3) }[/math] from the simplex, and then drawing from the product distribution [math]\displaystyle{ \mu_{p_1,p_2,p_3} }[/math]. Since

[math]\displaystyle{ \int_{p_1+p_2+p_3 = 1} (\sum_{i=1}^3 1/\sqrt{p_i}) = O(1) }[/math]

(it's [math]\displaystyle{ 48/5 }[/math], I believe), we conclude the following:

Proposition: Let [math]\displaystyle{ \gamma }[/math] be the distribution on [math]\displaystyle{ [3]^n }[/math] given by the following procedure: (a) pick a random subset [math]\displaystyle{ S \subseteq [n] }[/math] by including each coordinate with probability [math]\displaystyle{ \epsilon }[/math]; (b) draw from the product distribution [math]\displaystyle{ \mu_{q_1, q_2, q_3} }[/math] on the [math]\displaystyle{ S }[/math] coordinates; (c) draw from [math]\displaystyle{ \mu_{p_1, p_2, p_3} }[/math] on the [math]\displaystyle{ [n] \setminus S }[/math] coordinates. Then [math]\displaystyle{ d_{TV}(\overline{\nu}, \gamma) \leq O(\epsilon \sqrt{n}) }[/math].
Corollary: The same would be true if in (b) we drew from Equal Slices on [math]\displaystyle{ [3]^S }[/math]; this is because Equal Slices is a mixture of product distributions.

In fact, I'm quite certain it would be true if we put any distribution whatsoever on the [math]\displaystyle{ S }[/math] coordinates, since [math]\displaystyle{ a(p,q) }[/math] could be upper-bounded independently of the [math]\displaystyle{ q_i }[/math]'s. If we need this later, we can fill in the details.


Passing from uniform to equal-slices

We will be interested in the case when [math]\displaystyle{ (p_1,p_2,p_3) = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) }[/math] generates the uniform distribution and [math]\displaystyle{ (p'_1, p'_2, p'_3) }[/math] is a slight mixture with another distribution. Specifically, say

Definition: [math]\displaystyle{ (p'_1, p'_2, p'_3) = (1-\epsilon)(\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) + \epsilon(q_1, q_2, q_3) }[/math].

Then we can calculate:

[math]\displaystyle{ \sum_{j=1}^3 p_j \Delta_j^2 = 3(q_1^2 + q_2^2 + q_3^2 - 1/3) \epsilon^2 \leq 3\epsilon^2. }[/math]

Hence:

Proposition: In this scenario, [math]\displaystyle{ d_{TV}(\mu, \mu') \leq \sqrt{3} \epsilon\sqrt{n} }[/math] independently of [math]\displaystyle{ (q_1, q_2, q_3) }[/math].

Recall that equal-slices measure is gotten by first drawing [math]\displaystyle{ (q_1, q_2, q_3) }[/math] from the simplex and then drawing from the product distribution generated by [math]\displaystyle{ \mu^1_{q_1,q_2,q_3} }[/math]. Since the above proposition holds for every choice of [math]\displaystyle{ (q_1, q_2, q_3) }[/math], it holds when we average these choices over the simplex. It would also hold if we averaged over [math]\displaystyle{ (q_1, q_2, 0) }[/math] drawn from the 2-simplex [math]\displaystyle{ q_1 + q_2 = 1 }[/math]. We conclude:

Theorem: Suppose [math]\displaystyle{ \lambda }[/math] is the distribution on [math]\displaystyle{ [3]^n }[/math] generated as follows: First we draw a random restriction in which each coordinate is free with probability [math]\displaystyle{ \epsilon }[/math] and fixed uniformly with probability [math]\displaystyle{ 1-\epsilon }[/math]. Next, we draw the free coordinates from the equal-slices measure. Then [math]\displaystyle{ d_{TV}(\mathrm{uniform}, \lambda) \leq \sqrt{3}\epsilon \sqrt{n} }[/math]. Furthermore, this is also true for [math]\displaystyle{ \lambda' }[/math], defined in the same way except that the free coordinates are drawn form the equal-slices measure on [math]\displaystyle{ [2]^n }[/math].

Corollary: Suppose we change [math]\displaystyle{ \lambda }[/math] slightly by also conditioning on at least [math]\displaystyle{ \epsilon n/2 }[/math] coordinates being left free initially and hence drawn from an equal-slices distribution. Then [math]\displaystyle{ d_{TV}(\mathrm{uniform}, \lambda) \leq \sqrt{3}\epsilon \sqrt{n} + \exp(-\Omega(\epsilon n)) }[/math]. The analogous statement is true for [math]\displaystyle{ \lambda' }[/math].

Proof: This follows because the probability of getting too few free coordinates is at most [math]\displaystyle{ \exp(-\Omega(\epsilon n)) }[/math], by a standard large-deviation bound.

Passing from equal-slices to uniform

This is a bit easier, which is convenient as we'll want to prove something stronger.

Suppose [math]\displaystyle{ A \subseteq [3]^n }[/math] has [math]\displaystyle{ \Pr_{\nu}[A] = \delta }[/math], where again [math]\displaystyle{ \nu }[/math] denotes equal-slices measure. Recall that we can think of [math]\displaystyle{ \nu }[/math] as first choosing [math]\displaystyle{ (p_1,p_2,p_3) }[/math] from the simplex and then drawing from the product distribution [math]\displaystyle{ \mu^n_{p_1,p_2,p_3} }[/math]. The following is easy to check:

Fact: Any distribution [math]\displaystyle{ \mu^1_{p_1,p_2,p_3} }[/math] on [math]\displaystyle{ [3] }[/math] can be written as a mixture distribution [math]\displaystyle{ \alpha \mu^1_{1/3,1/3,1/3} + (1-\alpha) \lambda }[/math], where [math]\displaystyle{ \alpha = 3\min\{p_1,p_2,p_3\} }[/math] and [math]\displaystyle{ \lambda }[/math] is some other distribution.

Corollary: We can express a draw from [math]\displaystyle{ \mu^n_{p_1,p_2,p_3} }[/math] as: (i) forming a set [math]\displaystyle{ S \subseteq [n] }[/math] by including each coordinate with probability [math]\displaystyle{ 1-\alpha }[/math], independently; (ii) fixing the coordinates in [math]\displaystyle{ S }[/math] to some string [math]\displaystyle{ x }[/math], according to the product distribution [math]\displaystyle{ \lambda }[/math]; (iii) drawing [math]\displaystyle{ y }[/math] uniformly at random from the combinatorial subspace formed by [math]\displaystyle{ (x,S) }[/math].

Hence, using the notation [math]\displaystyle{ A_x }[/math] for the restriction of [math]\displaystyle{ A }[/math] to the combinatorial subspace defined by fixing [math]\displaystyle{ x }[/math] into the coordinates [math]\displaystyle{ S }[/math], we have

[math]\displaystyle{ \delta = \Pr_{\nu}[A] = \mathbf{E}_{p_1+p_2+p_3 = 1} \mathbf{E}_{S,x} \Pr_{y \text{ uniform}}[y \in A_{x}] }[/math].

Hence for any [math]\displaystyle{ \eta \gt 0 }[/math] we have that with probability at least [math]\displaystyle{ \eta }[/math] over the formation of the combinatorial subspace determined by [math]\displaystyle{ x }[/math], the uniform measure of [math]\displaystyle{ A_x }[/math] is at least [math]\displaystyle{ \delta - \eta }[/math].

The only thing that could go somewhat wrong here is that [math]\displaystyle{ S }[/math] might be all of [math]\displaystyle{ [n] }[/math], or nearly so. But this is quite unlikely. Specifically, one easily calculates that the probability that [math]\displaystyle{ \min\{p_1,p_2,p_3\} \leq \eta/8 }[/math] is at most [math]\displaystyle{ \eta/4 }[/math]. Assuming this doesn't happen we have [math]\displaystyle{ \alpha \geq (3/8)\eta }[/math]; thus we expect the number of free coordinates in our combinatorial subspace to be [math]\displaystyle{ (3/8)\eta n }[/math], and it will be at least [math]\displaystyle{ (\eta/4) n }[/math] except with probability [math]\displaystyle{ \exp(-\Omega(\eta n)) }[/math]. Overall, the probability that everything goes right is at least [math]\displaystyle{ \eta - \eta/4 - \exp(-\Omega(\eta n)) }[/math]. This is positive if [math]\displaystyle{ \eta \gg (\log n)/n }[/math].

We conclude:

Theorem: Suppose the set [math]\displaystyle{ A }[/math] has equal-slices density [math]\displaystyle{ \delta }[/math]. Assume that [math]\displaystyle{ \eta \geq O(\log n / n) }[/math]. Then there exists a combinatorial subspace with at least [math]\displaystyle{ (\eta/4) n }[/math] free coordinates on which [math]\displaystyle{ A }[/math] has uniform density at least [math]\displaystyle{ \delta - \eta }[/math].

Relative density version

It will useful to have also a "relative density" version of this:

Theorem: Let [math]\displaystyle{ \nu }[/math] denote equal-slices measure and let [math]\displaystyle{ \mu }[/math] denote uniform measure. Suppose that [math]\displaystyle{ A \subseteq B \subseteq [3]^n }[/math], with [math]\displaystyle{ \nu(B) = \gamma }[/math], [math]\displaystyle{ \nu(A) \geq \delta \nu(B) }[/math]. Assume [math]\displaystyle{ \eta }[/math] is such that [math]\displaystyle{ \eta \gamma \geq O(\log n / n) }[/math]. Then there exists a combinatorial subspace with at least [math]\displaystyle{ (\eta \gamma / 4) n }[/math] free coordinates on which [math]\displaystyle{ \mu(B) \geq \eta \gamma/2 }[/math] and [math]\displaystyle{ \mu(A) \geq (\delta - \eta) \mu(B) }[/math].

Proof: As in the previous section, think of [math]\displaystyle{ \nu }[/math] as first drawing the combinatorial subspace defined by [math]\displaystyle{ x }[/math] and then drawing from [math]\displaystyle{ \mu }[/math] on this subspace. With regard to the draw of [math]\displaystyle{ x }[/math], let [math]\displaystyle{ H }[/math] event that either [math]\displaystyle{ \mu(B_x) \leq \eta \gamma/2 }[/math] or [math]\displaystyle{ x }[/math] leaves fewer than [math]\displaystyle{ (\eta \gamma/4) n }[/math] coordinates free. We have

[math]\displaystyle{ \delta \gamma \leq \mathbf{E}_x[\mu(A_x)] = \Pr[H] \cdot \mathbf{E}_x[\mu(A_x) \mid H] + \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(A_x) \mid \overline{H}]. }[/math]

The first summand on the right contributes at most [math]\displaystyle{ \eta \gamma/2 }[/math] (from when [math]\displaystyle{ \mu(B_x) \leq \eta \gamma/2 }[/math]) plus [math]\displaystyle{ \eta\gamma/4 + \exp(-\Omega(\eta \gamma n)) \leq \eta \gamma/2 }[/math] (from when [math]\displaystyle{ x }[/math] leaves too few coordinates free, by the same argument as in the previous section). Subtracting [math]\displaystyle{ \eta \gamma }[/math] from both sides we get

[math]\displaystyle{ (\delta - \eta) \gamma \leq \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(A_x) \mid \overline{H}]. }[/math]

But since [math]\displaystyle{ \gamma = \mathbf{E}_x[\mu(B_x)] \geq \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(B_x) \mid \overline{H}] }[/math], we have

[math]\displaystyle{ \Pr[\overline{H}] \cdot \mathbf{E}_x[(\delta - \eta) \mu(B_x) \mid \overline{H}] \leq \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(A_x) \mid \overline{H}]. }[/math]

Clearly then there exists some [math]\displaystyle{ x }[/math] for which both [math]\displaystyle{ H }[/math] does not occur and

[math]\displaystyle{ \mu(A_x) \geq (\delta - \eta) \mu(B_x) }[/math].

This completes the proof.

An older version of passing from equal-slices to uniform

Suppose we define [math]\displaystyle{ \nu }[/math] to be the distribution on [math]\displaystyle{ [3]^n }[/math] gotten by first drawing [math]\displaystyle{ (p_1, p_2, p_3) }[/math] from the simplex and then drawing from [math]\displaystyle{ \mu^n_{p_1,p_2,p_3} }[/math].

Fact: The distribution [math]\displaystyle{ \nu }[/math] is the equal-slices measure on [math]\displaystyle{ [3]^n }[/math].

This fact is a consequence of the "type 1 Dirichlet integral".

Next, define [math]\displaystyle{ \nu' }[/math] as follows: first draw [math]\displaystyle{ (p_1,p_2,p_3) }[/math] from the simplex, then draw from [math]\displaystyle{ \mu^n_{p'_1,p'_2,p'_3} }[/math], where [math]\displaystyle{ (p'_1, p'_2, p'_3) = (1 - \epsilon)(p_1, p_2, p_3) + \epsilon(\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) }[/math].

Using the Fact in the first section, we get

Proposition: [math]\displaystyle{ d_{TV}(\mu,\mu') \leq \sqrt{c(p_1,p_2,p_3)} \cdot \epsilon \sqrt{n} }[/math], where

[math]\displaystyle{ c(p_1,p_2,p_3) = \frac{1}{9p_1} + \frac{1}{9p_2} + \frac{1}{9p_3} - 1 }[/math].

Clearly then,

[math]\displaystyle{ d_{TV}(\nu, \nu') \leq \left(\int_{p_1+p_2+p_3 = 1} \sqrt{c(p_1,p_2,p_3)}\right) \cdot \epsilon \sqrt{n} }[/math]

.

It is elementary to check that the integral is finite; in fact, it should be [math]\displaystyle{ O(\sqrt{k}) }[/math] when we have the analogous situation over [math]\displaystyle{ [k]^n }[/math]. So:

Proposition 2: [math]\displaystyle{ d_{TV}(\nu, \nu') \leq O(\epsilon \sqrt{n}) }[/math].

We can view a draw from [math]\displaystyle{ \nu' }[/math] as follows: First, we choose a random set of wildcards [math]\displaystyle{ S }[/math] by including each coordinate independently with probability [math]\displaystyle{ \epsilon }[/math]. Next, choose [math]\displaystyle{ (p_1,p_2,p_3) }[/math] at random from the simplex. Next, choose a string [math]\displaystyle{ x \in [3]^{[n] \setminus S} }[/math] from [math]\displaystyle{ \mu_{p_1,p_2,p_3} }[/math]. Finally, choose [math]\displaystyle{ y \in [3]^n }[/math] uniformly, and output [math]\displaystyle{ (x,y) }[/math].

By Proposition 2,

[math]\displaystyle{ \Pr_\nu[A] - O(\epsilon \sqrt{n}) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S} \mathbf{E}_{p,x} \Pr_y [A]. }[/math]

By a large-deviation bound, the probability that [math]\displaystyle{ |S| \lt \epsilon n / 2 }[/math] is at most [math]\displaystyle{ \exp(-\Omega(\epsilon n)) }[/math]. Renaming [math]\displaystyle{ \epsilon }[/math] as [math]\displaystyle{ 2\epsilon }[/math] for simplicity, we conclude that

[math]\displaystyle{ \Pr_\nu[A] - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n)) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S}[\mathbf{E}_{p,x} \Pr_y [A] \mid |S| \geq \epsilon n]. }[/math]

Hence:

Theorem: (Weaker than in the previous section.) Suppose [math]\displaystyle{ A }[/math] has equal-slices density [math]\displaystyle{ \delta }[/math]. Then for any [math]\displaystyle{ \epsilon }[/math], there exists a combinatorial subspace [math]\displaystyle{ (x,S) }[/math] with [math]\displaystyle{ |S| \geq \epsilon n }[/math] on which [math]\displaystyle{ A }[/math] has uniform-distribution density at least [math]\displaystyle{ \delta - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n)) }[/math].

By the way, f this is done over [math]\displaystyle{ [k]^n }[/math], we should only have to change [math]\displaystyle{ O(\epsilon \sqrt{n}) }[/math] to [math]\displaystyle{ O(\epsilon \sqrt{kn}) }[/math].