Measures.tex: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
No edit summary
No edit summary
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
\section{Probability measures}
\section{Passing between probability measures} \label{sec:measures}


{\tiny [[Ryan: \bf{The quickest and clearest way to marshal all the requisite facts about equal-slices involves describing it as a uniform mixture of product distributions.  However, this has the slight downside of introducing non-discrete probability distributions into an otherwise entirely finitary paperIn particular, an extremely pedantic mathematician might not enjoy the casual treatment I employ in dealing with continuous distributions. However, I think getting hyper-formal would badly obscure things for no real gain.}]]}
The goal of this section is to work out bounds for the error arising when passing back and forth between $\unif_k$ and $\ens{k}$, as described in Section~\ref{sec:outline-dist}.  Lemma~\ref{lem:distributions} below gives the bounds we needThe reader will not lose much by just reading its statement; the proof is just technical calculations.


\subsection{Comparing product distributions}
Before stating Lemma~\ref{lem:distributions} we need some definitions.


We begin by recording some simple facts about distances between probability distributions.  Throughout this subsection we assume $\pi$ and $\nu$ are probability distributions on the same finite set $\Omega$.\noteryan{Perhaps change $\Omega$ to $[k]$ throughout.}
\ignore{
\begin{definition} Given a set $A \subseteq [k]^n$ and a restriction $(J,x_\barJ)$, we write $A_{x_\barJ}$ for the subset of $[k]^{J}$ defined by $A_{x_\barJ} = \{y \in [k]^J : (x_{\barJ}, y_J) \in A\}$.
\end{definition}}


\begin{definition} Given a distribution $\pi$ on $\Omega$, write $\min(\pi) = \min_{x \in \Omega} \pi[x]$.
\begin{definition} \label{def:r4r} For $0 \leq \ppn \leq 1$, we say that $J$ is a \emph{$\ppn$-random subset} of $[n]$ if $J$ is formed by including each coordinate $i \in [n]$ independently with probability $\ppn$.  Assuming $r \leq n/2$, we say that $J$ is an \emph{$[r,4r]$-random subset} of $[n]$ if $J$ is a $\ppn$-random subset of $[n]$ conditioned on $r \leq \abs{J} \leq 4r$, with $\ppn = 2r/n$.
\end{definition}
\begin{definition} A \emph{distribution family} $(\distra^m)_{m \in \N}$ (over $[k]$) is a sequence of probability distributions, where $\distra^m$ is a distribution on $[k]^m$.  In this paper the families we consider will either be the equal-(nondegenerate-)slices family $\distra^m = \ens{k}^m$ or $\distra^m = \eqs{k}^m$, or will be the product distributions based on a single distribution $\prd$ on $[k]$, $\distra^m = \prd^{\otimes m}$.
\end{definition}
\end{definition}


\begin{definition}   The \emph{total variation distance} $\dtv{\pi}{\nu}$ is defined by  
 
\begin{lemma} \label{lem:distributions} Let $(\distra^m)$ and $(\distrb^m)$ be distribution families.  Assume $2 \ln n \leq r \leq n/2$.  Let $J$ be an $[r,4r]$-random subset of $[n]$, let $x$ be drawn from $[k]^{\barJ}$ according to $\distra^{\abs{\barJ}}$, and let $y$ be drawn from $[k]^J$ according to $\distrb^{\abs{J}}$.  The resulting distribution on the composite string $(x,y) \in [k]^n$ has total variation distance from $\distra^n$ which can be bounded as follows:
\begin{enumerate}
\item (Product to equal-slices.)  \label{eqn:distrs-prd-eqs} If $\distra^m = \prd^{\otimes m}$ and $\distrb^m = \eqs{\ell}^m$ for $\ell \leq k$, the bound is \noteryan{You know, we only need this result for the uniform distribution, in which case we can bound the below by the simpler $2k \cdot r/\sqrt{n}$.}
\[
\[
\dtv{\pi}{\nu} = \frac12 \sum_{x \in \Omega} |\pi[x] - \nu[x]| = \max_{A \subseteq X} |\pi[A] - \nu[A]|.
(2{\textstyle \sqrt{\frac{1}{\min(\prd)}-1}})+2) \cdot r / \sqrt{n}.
\]
\]
\end{definition}
\item (Equal-slices to product.) \label{eqn:distrs-eqs-prd} If $\distra^m = \eqs{k}^m$ and $\distrb^m = \prd^{\otimes m}$, the bound is $4k \cdot r/\sqrt{n}$, independent of $\prd$.
\begin{fact}  \label{fact:tv-mix} Let $(\nu_\kappa)_{\kappa \in K}$ be a family of distributions on $\Omega$, let $\lambda$ be a distribution on $K$, and let $\mu$ be the associated mixture distribution, given by drawing $\kappa \sim \lambda$ and then drawing from $\nu_\kappa$. Then\noteryan{need to prove? It's just the triangle inequality}
\item (Equal-slices to equal-slices.) \label{eqn:distrs-eqs-eqs} If $\distra^m = \eqs{k}^m$ and $\distrb^m = \eqs{\ell}^m$ for $\ell \leq k$, the bound is $4k \cdot r/\sqrt{n}$.
\[
\end{enumerate}
\dtv{\pi}{\mu} \leq \Ex_{\kappa \sim \lambda}[\dtv{\pi}{\nu_\kappa}].
\end{lemma}
\]
 
\end{fact}
Although Lemma~\ref{lem:distributions} involves the equal-slices distribution, one can convert to equal-nondegenerate-slices if desired using Proposition~\ref{prop:degen}.
\begin{definition}
 
The \emph{$\chi^2$ distance} $\dchi{\pi}{\nu}$ is defined by\noteryan{I assume the reader is savvy enough to realize we mean to take the positive square root\dots}
 
Since $\eqs{k}^n$ is a mixture of product distributions (Proposition~\ref{prop:eqs-mix}), the main work in proving Lemma~\ref{lem:distributions} involves comparing product distributions.
 
 
\subsection{Comparing product distributions}
\begin{definition}   For $\distra$ and $\distrb$ probability distributions on $\Omega^n$, the \emph{$\chi^2$ distance} $\dchi{\pi}{\nu}$ is defined by
\[
\[
\dchi{\pi}{\nu}^2 = \Varx_{x \sim \pi}\left[\frac{\nu[x]}{\pi[x]}\right].
\dchi{\distra}{\distrb} = \sqrt{\Varx_{x \sim \distra}\left[\frac{\distrb[x]}{\distra[x]}\right]}.
\]
\]
Note that $\dchi{\pi}{\nu}$ is \emph{not} symmetric in $\pi$ and $\nu$.
Note that $\dchi{\distra}{\distrb}$ is \emph{not} symmetric in $\distra$ and $\distrb$.
\end{definition}
\end{definition}


The $\chi^2$ distance is introduced to help us prove the following fact:
The $\chi^2$ distance is introduced to help us prove the following fact:
\begin{proposition} \label{prop:mix-distance} Assume $\min(\pi) \neq 0$; i.e., $\pi$ has full support.\noteryan{A bit of a drag to bring this up, but technically, it's needed for the first deduction in the proof.}  Suppose $\pi$ is slightly mixed with $\nu$, forming $\hat{\pi}$; specifically, $\hat{\pi} = (1-\eps) \pi + \eps \nu$.  Then the associated product distributions $\pi^{\otimes n}$, $\hat{\pi}^{\otimes n}$ on $\Omega^{n}$ satisfy
\begin{proposition} \label{prop:mix-distance} Let $\prd$ be a distribution on $\Omega$ with full support; i.e., $\min(\pi) \neq 0$. Suppose $\prd$ is slightly mixed with $\distrb$, forming $\wh{\prd}$; specifically, $\wh{\prd} = (1-\ppn) \prd + \ppn \distrb$.  Then the associated product distributions $\prd^{\otimes n}$, $\wh{\prd}^{\otimes n}$ on $\Omega^{n}$ satisfy
\[
\[
\dtv{\pi^{\otimes n}}{\hat{\pi}^{\otimes n}} \leq \dchi{\pi}{\nu} \cdot \eps \sqrt{n}.
\dtv{\prd^{\otimes n}}{\wh{\prd}^{\otimes n}} \leq \dchi{\prd}{\distrb} \cdot \ppn \sqrt{n}.
\]
\]
\end{proposition}
\end{proposition}
\begin{proof}  It is a straightforward consequence of Cauchy-Schwarz (see, e.g.~\cite[p.\ 101]{Rei89}) that  
\begin{proof}  It is a straightforward consequence of Cauchy-Schwarz (see, e.g.~\cite[p.\ 101]{Rei89})\noteryan{This is the part using $\min(\prd) \neq 0$, by the way.} that  
\[
\[
\dtv{\pi^{\otimes n}}{\hat{\pi}^{\otimes n}} \leq \dchi{\pi}{\hat{\pi}} \cdot \sqrt{n},
\dtv{\prd^{\otimes n}}{\wh{\prd}^{\otimes n}} \leq \dchi{\prd}{\wh{\prd}} \cdot \sqrt{n},
\]
\]
and the identity $\dchi{\pi}{\hat{\pi}} = \eps \dchi{\pi}{\nu}$ follows easily from the definitions.
and the identity $\dchi{\prd}{\wh{\prd}} = \ppn \cdot \dchi{\prd}{\distrb}$ follows easily from the definitions.
\end{proof}
\end{proof}
This can be bounded independently of $\nu$, as follows:
This can be bounded independently of $\distrb$, as follows:
\begin{corollary} \label{cor:mix-distance} In the setting of Proposition~\ref{prop:mix-distance},  
\begin{corollary} \label{cor:mix-distance} In the setting of Proposition~\ref{prop:mix-distance},  
\[
\[
\dtv{\pi^{\otimes n}}{\hat{\pi}^{\otimes n}} \leq \sqrt{{\textstyle \frac{1}{\min(\pi)}} - 1} \cdot \eps \sqrt{n},
\dtv{\prd^{\otimes n}}{\wh{\prd}^{\otimes n}} \leq \sqrt{{\textstyle \frac{1}{\min(\prd)}} - 1} \cdot \ppn \sqrt{n},
\]
\]
\end{corollary}
\end{corollary}
\begin{proof}  It is easy to check that the distribution $\nu$ maximizing $\dchi{\pi}{\nu}$ is the one putting all its mass on the $x$ minimizing $\pi[x]$.  In this case one calculates $\dchi{\pi}{\nu} = \sqrt{\frac{1}{\min(\pi)} - 1}$.
\begin{proof}  It is easy to check that the distribution $\distrb$ maximizing $\dchi{\prd}{\distrb}$ is the one putting all its mass on the $x$ minimizing $\prd[x]$.  In this case one calculates $\dchi{\prd}{\distrb} = \sqrt{\frac{1}{\min(\pi)} - 1}$.
\end{proof}
\end{proof}




\subsection{Equal-slices}


\begin{definition}  \label{def:spacings} We say that $\nu$ is a \emph{random distribution} on $[k]$ if the vector of probabilities $(\nu[1], \dots, \nu[k])$ is formed as follows: $k$ points are chosen on the circle of unit circumference, uniformly and independently, and $\nu[a]$ is set to the length of the arc emanating clockwise from the $a$th point.
\end{definition}


\begin{remark}  It is well known (see, e.g.~\cite[Ch.\ 1]{Rei89})\noteryan{I think the traditional citation here is Pyke '65, but we can save a spot in the biblio by citing Reiss again (which is also a text, whereas Pyke is a bit painful to read)} that a random distribution $\nu$ can equivalently be defined as choosing the vector of probabilities $(\nu[1], \dots, \nu[k])$ uniformly from the simplex $\{(p_1, \dots, p_k) : \sum p_i = 1, p_i \geq 0\ \forall i\}$. \noteryan{Or, by taking the spacings of $k-1$ random points in a unit segment.  Since we don't use this fact, I think it's fair to not prove it.}
\end{remark}


\begin{definition}  The \emph{equal-slices} distribution on $[k]^{n}$, denoted $\eqs{k}^{n}$, is defined as follows: First, choose a random distribution $\nu$ on $[k]$; then draw from the product distribution $\nu^{\otimes n}$ on $[k]^{n}$.  We sometimes write $\eqs{k}$ in place of $\eqs{k}^n$ when $n$ is clear from context.
\subsection{Proof of Lemma~\ref{lem:distributions}}
\end{definition}
 


The terminology ``equal-slices'' comes from the following:\noteryan{So far we don't actually need the observations that follow}
\begin{definition} \label{def:compos-distr} Let $0 \leq \ppn \leq 1$ and let $(\distra^m)$, $(\distrb^m)$ be distribution families. Drawing from the \emph{$(\ppn, \distra, \distrb)$-composite distribution} on $[k]^n$ entails the following: $J$ is taken to be a $\ppn$-random subset of~$[n]$; $x$ is drawn from $[k]^{\barJ}$ according to $\distra^{\abs{\barJ}}$; and, $y$ is drawn from $[k]^J$ according to $\distrb^{\abs{J}}$.  We sometimes think of this distribution as just being a distribution on composite strings $z = (x, y) \in [k]^n$.
\begin{definition} Given a vector $C \in \N^k$ whose components sum to $n$, the associated \emph{slice} of $[k]^{n}$ is the set
\[
\{x \in [k]^{n} : x \text{ has $C_1$ many $1$'s, $C_2$ many $2$'s, \dots, $C_k$ many $k$'s}\}.
\]
We abuse terminology by also referring to $C$ as the slice.  A slice is \emph{nondegenerate} when $C_i \neq 0$ for all $i$.  There are $\binom{n+k-1}{k-1}$ different slices, of which $\binom{n-1}{k-1}$ are nondegenerate.\noteryan{Justify with bars and dots?}
\end{definition}
\end{definition}
\begin{proposition} The equal-slices distribution $\eqs{k}^{n}$ is equivalent to first choosing a slice uniformly from the $\binom{n+k-1}{k-1}$ many possibilities, then drawing a string uniformly from that slice. \noteryan{Include proof?  If so, it's by throwing $n+k-1$ points into the interval and looking at the result in two ways.}
\end{proposition}
The following observation will be crucial:
\begin{lemma}  Let $x \sim \eqs{k}^n$, and let $a \sim [k-1]$ be uniformly random.  Then $\chg{x}{k}{a}$ is distributed as~$\eqs{k-1}^n$.
\end{lemma}
\begin{proof}
From the definition, the string $\chg{x}{k}{a}$ is distributed as follows: first $\nu$ is taken to be a random distribution on $[k]$; then $\wt{\nu}$ is taken to be the distribution on $[k-1]$ with $\wt{\nu}[b] = \nu[b]$ for $b \neq a$ and $\wt{\nu}[a] = \nu[a] + \nu[k]$; then a string is drawn from the product distribution $\wt{\nu}^{\otimes n}$.  Thus it suffices to show that $\wt{\nu}$ is a random distribution on $[k-1]$ in the sense of Definition~\ref{def:spacings}.  This is easy to see: having chosen $k$ points on the circle, thus forming $\nu$, we get the correct distribution $\wt{\nu}$ simply by removing the $k$th point. \noteryan{This is not especially well-explained; will try to think how to say it with the maximum combination of concision and correctness.}
\end{proof}
\subsection{Composite distributions}
\begin{definition} A \emph{restriction} on $[k]^n$ is a pair $(J, x_\barJ)$, where $J \subseteq [n]$ and $x_\barJ \in [k]^{\barJ}$, where $\barJ = [n] \setminus J$.  It is thought of as a ``partial string'', where the $\barJ$-coordinates are ``fixed'' according to $x_\barJ$ and the $J$-coordinates are still ``free''.  Given a fixing $y_J \in [k]^J$, we write $(x_\barJ, y_J)$ for the complete composite string in $[k]^n$.\noteryan{Although this notation looks awkward, I believe it's useful\dots}
\end{definition}


\begin{definition} \label{def:restriction} Let $0 \leq \eps \leq 1$ and let $(\pi^m)$ be a family of probability distributions, where $\pi^m$ is a distribution on $[k]^m$.  We say that $(J, x_\barJ)$ is an \emph{$(\epsilon,\pi)$-random restriction} if $J$ is chosen by including each coordinate of $[n]$ independently with probability $\eps$, and then $x_\barJ$ is drawn from $[k]^{\barJ}$ according to $\pi^{|\barJ|}$.
Note that the distribution described in Lemma~\ref{lem:distributions} is very similar to the $(\ppn, \distra, \distrb)$-composite distribution, except that it uses an $[r, 4r]$-random subset rather than a $\ppn$-random subset. We can account for this difference with a standard Chernoff (large-deviation) bound:\noteryan{Citation needed?}
\end{definition}
\begin{fact}  \label{fact:dev} If $J$ is a $\ppn$-random subset of $[n]$ with $\ppn = 2r/n$ as in Definition~\ref{def:r4r}, then $r \leq \abs{J} \leq 4r$ holds except with probability at most $2\exp(-r/4)$.
In the above definition, $\pi^m$ is typically either equal-slices $\eqs{k}^m$ or a product distribution $\nu^{\otimes m}$ based on some distribution $\nu$ on $[k]$. We briefly record here a useful, standard large-deviation bound:
\begin{fact}  \label{fact:large-dev} If $(J,x_\barJ)$ is an $(\epsilon,\pi)$-random restriction, then $(\eps/2)n \leq |J| \leq 2\eps n$ except with probability at most $2\exp(-\eps n/8)$.
\end{fact}
\end{fact}


\begin{definition}  In the setting of Definition~\ref{def:restriction}, suppose that $(\nu^m)$ is another family of distributions.  The \emph{$(\eps, \pi, \nu)$-composite distribution} generates an $(\eps,\pi)$-random restriction $(J, x_\barJ)$ and then generates $y_J \in [k]^J$ according to $\nu^{|J|}$.  We sometimes think of this distribution as just being a distribution on composite strings $z = (x_\barJ, y_J) \in [k]^n$.
The utility of using $\ppn$-random subsets in Definition~\ref{def:compos-distr} is the following observation:
\end{definition}
\begin{fact}  If $\prd$ and $\distrb$ are distributions on $[k]$, thought of also as product distribution families, then the $(\ppn, \prd, \distrb)$-composite distribution on $[k]^n$ is precisely the product distribution $\wh{\prd}^{\otimes n}$, where $\wh{\prd}$ is the mixture distribution $(1-\ppn)\prd + \ppn \distrb$ on $[k]$.
 
\begin{fact}  If $\pi$ and $\nu$ are distributions on $[k]$, thought of also as product distribution families, then the $(\eps, \pi, \nu)$-composite distribution on $[k]^n$ is precisely the product distribution $\hat{\pi}^{\otimes n}$, where $\hat{\pi}$ is the mixture distribution $(1-\eps)\pi + \eps \nu$ on $[k]$.
\end{fact}
\end{fact}


Because of this, we can use Corollary~\ref{cor:mix-distance} to bound the total variation distance between $\pi^{\otimes n}$ and a composite distribution.  For example, we conclude:
Because of this, we can use Corollary~\ref{cor:mix-distance} to bound the total variation distance between $\prd^{\otimes n}$ and a composite distribution.  We conclude:
\begin{proposition}  \label{prop:unif-composite} Let $\mu_k$ denote the uniform distribution $[k]$, and let $\nu$ be any distribution on $[k]$.  Writing $\mu_k^{(\nu)}$ for the $(\eps,\mu_k,\nu)$-composite distribution on strings in $[k]^n$, we have
\begin{proposition}  \label{prop:prod-composite} Let $\prd$ and $\distrb$ be any distributions on $[k]$, thought of also as product distribution families.  Writing $\wt{\prd}$ for the $(\ppn,\prd,\distrb)$-composite distribution on strings in $[k]^n$, we have
\[
\[
\dtv{\mu_k^{\otimes n}}{\mu_k^{(\nu)}} \leq \sqrt{k-1} \cdot \eps \sqrt{n}.
\dtv{\prd^{\otimes n}}{\wt{\prd}} \leq {\textstyle \sqrt{\frac{1}{\min(\prd)}-1}} \cdot \ppn \sqrt{n}.
\]
\]
\end{proposition}
\end{proposition}
For any $\ell \leq k$, the equal-slices distribution $\eqs{\ell}^{m}$ on $m$ coordinates is a mixture of product distributions $\nu^{\otimes m}$ on $[k]^m$.  Hence we conclude from Proposition~\ref{prop:unif-composite} and Fact~\ref{fact:tv-mix}:\noteryan{this proposition lets us pass from uniform to eq-slices at the very beginning}
 
\begin{proposition}  Let $\mu_k$ denote the uniform distribution $[k]$, and let $\ell \leq k$.  Writing $\mu'$ for the $(\eps,\mu_k,\eqs{\ell})$-composite distribution on strings in $[k]^n$, we have
Recall that for any $\ell \leq k$, the equal-slices distribution $\eqs{\ell}^{m}$ on $m$ coordinates is a mixture of product distributions $\spac^{\otimes m}$ on $[k]^m$.  We can therefore average Proposition~\ref{prop:prod-composite} over $\distrb$ to obtain:
\begin{proposition}  \label{prop:prod-eqs} If $\wt{\pi}$ denotes the $(\ppn,\pi,\eqs{\ell})$-composite distribution on strings in $[k]^n$, where $\ell \leq k$, then we have
\[
\[
\dtv{\mu_k^{\otimes n}}{\mu'} \leq \sqrt{k-1} \cdot \eps \sqrt{n}.
\dtv{\pi^{\otimes n}}{\wt{\pi}} \leq {\textstyle \sqrt{\frac{1}{\min(\pi)}-1}} \cdot \ppn \sqrt{n}.
\]
\]
\end{proposition}
\end{proposition}
 
Here we have used the following basic bound, based on the triangle inequality:
Using Corollary~\ref{cor:mix-distance} more generally, we obtain:\noteryan{this lets us pass from an arbitrary product distribution to equal-slices}
\begin{fact}  \label{fact:tv-mix} Let $(\distrb_\kappa)_{\kappa \in K}$ be a family of distributions on $\Omega^n$, let $\varsigma$ be a distribution on $K$, and let $\overline{\distrb}$ denote the associated mixture distribution, given by drawing $\kappa \sim \varsigma$ and then drawing from $\distrb_\kappa$.  Then
\begin{proposition}  \label{prop:prod-composite} Let $\nu$ be any distribution on $[k]$, and $\ell \leq k$.  Writing $\nu'$ for the $(\eps,\nu,\eqs{\ell})$-composite distribution on strings in $[k]^n$, we have
\[
\[
\dtv{\mu_k^{\otimes n}}{\nu'} \leq {\textstyle \sqrt{\frac{1}{\min(\nu)}-1}} \cdot \eps \sqrt{n}.
\dtv{\distra}{\overline{\distrb}} \leq \Ex_{\kappa \sim \varsigma}[\dtv{\distra}{\distrb_\kappa}].
\]
\]
\end{proposition}
\end{fact}
Hence:\noteryan{this proposition lets us do restrictions under equal slices; even to equal-slices on fewer symbols}
 
\begin{proposition}  \label{prop:eqs-compos} Let $2 \leq \ell \leq k$ and let $\wt{\mu}$ denote the $(\eps, \eqs{k}, \eqs{\ell})$-composite distribution on strings in $[k]^n$.  Then
If we instead use this fact to average Proposition~\ref{prop:prod-composite} over $\prd$, we can obtain:
\begin{proposition}  \label{prop:eqs-prod} Let $\distrb$ be any distribution on $[k]$.  Writing $\distra$ for the $(\ppn, \eqs{k}, \distrb)$-composite distribution on strings in $[k]^n$, we have
\[
\[
\dtv{\eqs{k}^n}{\wt{\mu}} \leq 2k \eps \sqrt{n}.
\dtv{\eqs{k}^n}{\distra} \leq (2k-1)\ppn \sqrt{n}.
\]
\]
\end{proposition}
\end{proposition}
\begin{proof}
\begin{proof}
Recalling again that $\eqs{k}^m$ is the mixture of product distributions $\nu^{\otimes m}$, where $\nu$ is a random distribution on $[k]$, we apply Fact~\ref{fact:tv-mix} and Proposition~\ref{prop:prod-composite} to conclude
Thinking of $\eqs{k}^m$ as the mixture of product distributions $\spac^{\otimes m}$, where $\spac$ is a random spacing on $[k]$, Fact~\ref{fact:tv-mix} and Proposition~\ref{prop:prod-composite} imply
\[
\[
\dtv{\eqs{k}^n}{\wt{\mu}} \leq \Ex_{\nu}\left[{\textstyle \sqrt{\frac{1}{\min(\nu)}-1}}\right] \cdot \eps \sqrt{n}.
\dtv{\eqs{k}^n}{\distra} \leq \Ex_{\spac}\left[{\textstyle \sqrt{\frac{1}{\min(\spac)}-1}}\right] \cdot \ppn \sqrt{n}.
\]
\]
We can upper-bound the expectation\noteryan{Undoubtedly someone has worked hard on this $-1/2$th moment of the least spacing before (Devroye '81 or '86 perhaps), but I think it's probably okay to do the following simple thing here} by  
We can upper-bound the expectation\noteryan{Undoubtedly someone has worked hard on this $-1/2$th moment of the least spacing before (Devroye '81 or '86 perhaps), but I think it's probably okay to do the following simple thing here} by  
\begin{multline*}
\Ex_{\spac}\left[{\textstyle \sqrt{\frac{1}{\min(\spac)}}}\right] \quad=\quad \int_{0}^\infty \Pr_{\spac}\left[{\textstyle \sqrt{\frac{1}{\min(\spac)}}} \geq t\right]\,dt
\quad=\quad  \int_{0}^\infty \Pr_{\spac}[\min(\spac) \leq 1/t^2]\,dt \\
\leq\quad k +  \int_{k}^\infty \Pr_{\spac}[\min(\spac) \leq 1/t^2]\,dt \quad\leq\quad k + \int_{k}^\infty (k(k-1)/t^2) \,dt \quad=\quad 2k-1,
\end{multline*}
where in the second-to-last step we used Proposition~\ref{prop:rand-min}.
\end{proof}
Averaging now once more in the second component, we obtain the following:
\begin{proposition}  \label{prop:eqs-eqs} Let $2 \leq \ell \leq k$ and let $\distra'$ denote the $(\ppn, \eqs{k}, \eqs{\ell})$-composite distribution on strings in $[k]^n$.  Then
\[
\[
\Ex_{\nu}\left[{\textstyle \sqrt{\frac{1}{\min(\nu)}}}\right] =  \int_{0}^\infty \Pr_{\nu}\left[{\textstyle \sqrt{\frac{1}{\min(\nu)}}} \geq t\right]\,dt = \int_{0}^\infty \Pr_{\nu}\left[\min(\nu) \leq \frac{1}{t^2}\right]\,dt \leq k +  \int_{k}^\infty \Pr_{\nu}\left[\min(\nu) \leq \frac{1}{t^2}\right]\,dt. \label{eqn:eq-dist-integral}
\dtv{\eqs{k}^n}{\distra'} \leq (2k-1) \ppn \sqrt{n}.
\]
\]
\ignore{
\end{proposition}
\begin{eqnarray}
 
\Ex_{\nu}\left[{\textstyle \sqrt{\frac{1}{\min(\nu)}}}\right] & = & \int_{0}^\infty \Pr_{\nu}\left[{\textstyle \sqrt{\frac{1}{\min(\nu)}}} \geq t\right]\,dt\nonumber\\
 
& = & \int_{0}^\infty \Pr_{\nu}[\min(\nu) \leq 1/t^2]\,dt\nonumber\\
 
& \leq & k +  \int_{k}^\infty \Pr_{\nu}[\min(\nu) \leq 1/t^2]\,dt. \label{eqn:eq-dist-integral}
 
\end{eqnarray}
We can now obtain the proof of Lemma~\ref{lem:distributions}:
}
 
From Definition~\ref{def:spacings}, the event $\min(\nu) \leq 1/t^2$ occurs only if some two out of $k$ points chosen uniformly from the unit circle are within distance $1/t^2$.  By a union bound, this is at most $\binom{k}{2}$ times the probability that two random points on the unit circle are within distance $1/t^2$, which is precisely $2/t^2$.  Hence the integral above is bounded by
\begin{proof} The three statements  in Lemma~\ref{lem:distributions} essentially follow from Propositions~\ref{prop:prod-eqs}, \ref{prop:eqs-prod}, and \ref{prop:eqs-eqs}, taking $\ppn = 2r/n$.  This would give bounds of $2{\textstyle \sqrt{\frac{1}{\min(\pi)}-1}} \cdot r / \sqrt{n}$, $(4k-2) \cdot r/\sqrt{n}$, and $(4k-2) \cdot r/\sqrt{n}$, respectively.  However we need to account for conditioning on $r \leq \abs{J} \leq 4r$. By Fact~\ref{fact:dev}, this conditioning increases the total variation distance by at most $2\exp(-r/4)$.  Using the lower bound $r \geq 2 \ln n$ from the lemma's hypothesis, this quantity is at most $2r/\sqrt{n}$, completing the proof.
\[
\int_{k}^\infty \binom{k}{2} \cdot \frac{2}{t^2}\,dt \leq \int_{k}^\infty \frac{k^2}{t^2}\,dt = k,
\]
completing the proof.
\end{proof}
\end{proof}

Latest revision as of 12:25, 8 July 2009

\section{Passing between probability measures} \label{sec:measures}

The goal of this section is to work out bounds for the error arising when passing back and forth between $\unif_k$ and $\ens{k}$, as described in Section~\ref{sec:outline-dist}. Lemma~\ref{lem:distributions} below gives the bounds we need. The reader will not lose much by just reading its statement; the proof is just technical calculations.

Before stating Lemma~\ref{lem:distributions} we need some definitions.

\ignore{ \begin{definition} Given a set $A \subseteq [k]^n$ and a restriction $(J,x_\barJ)$, we write $A_{x_\barJ}$ for the subset of $[k]^{J}$ defined by $A_{x_\barJ} = \{y \in [k]^J : (x_{\barJ}, y_J) \in A\}$. \end{definition}}

\begin{definition} \label{def:r4r} For $0 \leq \ppn \leq 1$, we say that $J$ is a \emph{$\ppn$-random subset} of $[n]$ if $J$ is formed by including each coordinate $i \in [n]$ independently with probability $\ppn$. Assuming $r \leq n/2$, we say that $J$ is an \emph{$[r,4r]$-random subset} of $[n]$ if $J$ is a $\ppn$-random subset of $[n]$ conditioned on $r \leq \abs{J} \leq 4r$, with $\ppn = 2r/n$. \end{definition} \begin{definition} A \emph{distribution family} $(\distra^m)_{m \in \N}$ (over $[k]$) is a sequence of probability distributions, where $\distra^m$ is a distribution on $[k]^m$. In this paper the families we consider will either be the equal-(nondegenerate-)slices family $\distra^m = \ens{k}^m$ or $\distra^m = \eqs{k}^m$, or will be the product distributions based on a single distribution $\prd$ on $[k]$, $\distra^m = \prd^{\otimes m}$. \end{definition}


\begin{lemma} \label{lem:distributions} Let $(\distra^m)$ and $(\distrb^m)$ be distribution families. Assume $2 \ln n \leq r \leq n/2$. Let $J$ be an $[r,4r]$-random subset of $[n]$, let $x$ be drawn from $[k]^{\barJ}$ according to $\distra^{\abs{\barJ}}$, and let $y$ be drawn from $[k]^J$ according to $\distrb^{\abs{J}}$. The resulting distribution on the composite string $(x,y) \in [k]^n$ has total variation distance from $\distra^n$ which can be bounded as follows: \begin{enumerate} \item (Product to equal-slices.) \label{eqn:distrs-prd-eqs} If $\distra^m = \prd^{\otimes m}$ and $\distrb^m = \eqs{\ell}^m$ for $\ell \leq k$, the bound is \noteryan{You know, we only need this result for the uniform distribution, in which case we can bound the below by the simpler $2k \cdot r/\sqrt{n}$.} \[ (2{\textstyle \sqrt{\frac{1}{\min(\prd)}-1}})+2) \cdot r / \sqrt{n}. \] \item (Equal-slices to product.) \label{eqn:distrs-eqs-prd} If $\distra^m = \eqs{k}^m$ and $\distrb^m = \prd^{\otimes m}$, the bound is $4k \cdot r/\sqrt{n}$, independent of $\prd$. \item (Equal-slices to equal-slices.) \label{eqn:distrs-eqs-eqs} If $\distra^m = \eqs{k}^m$ and $\distrb^m = \eqs{\ell}^m$ for $\ell \leq k$, the bound is $4k \cdot r/\sqrt{n}$. \end{enumerate} \end{lemma}

Although Lemma~\ref{lem:distributions} involves the equal-slices distribution, one can convert to equal-nondegenerate-slices if desired using Proposition~\ref{prop:degen}.


Since $\eqs{k}^n$ is a mixture of product distributions (Proposition~\ref{prop:eqs-mix}), the main work in proving Lemma~\ref{lem:distributions} involves comparing product distributions.


\subsection{Comparing product distributions} \begin{definition} For $\distra$ and $\distrb$ probability distributions on $\Omega^n$, the \emph{$\chi^2$ distance} $\dchi{\pi}{\nu}$ is defined by \[ \dchi{\distra}{\distrb} = \sqrt{\Varx_{x \sim \distra}\left[\frac{\distrb[x]}{\distra[x]}\right]}. \] Note that $\dchi{\distra}{\distrb}$ is \emph{not} symmetric in $\distra$ and $\distrb$. \end{definition}

The $\chi^2$ distance is introduced to help us prove the following fact: \begin{proposition} \label{prop:mix-distance} Let $\prd$ be a distribution on $\Omega$ with full support; i.e., $\min(\pi) \neq 0$. Suppose $\prd$ is slightly mixed with $\distrb$, forming $\wh{\prd}$; specifically, $\wh{\prd} = (1-\ppn) \prd + \ppn \distrb$. Then the associated product distributions $\prd^{\otimes n}$, $\wh{\prd}^{\otimes n}$ on $\Omega^{n}$ satisfy \[ \dtv{\prd^{\otimes n}}{\wh{\prd}^{\otimes n}} \leq \dchi{\prd}{\distrb} \cdot \ppn \sqrt{n}. \] \end{proposition} \begin{proof} It is a straightforward consequence of Cauchy-Schwarz (see, e.g.~\cite[p.\ 101]{Rei89})\noteryan{This is the part using $\min(\prd) \neq 0$, by the way.} that \[ \dtv{\prd^{\otimes n}}{\wh{\prd}^{\otimes n}} \leq \dchi{\prd}{\wh{\prd}} \cdot \sqrt{n}, \] and the identity $\dchi{\prd}{\wh{\prd}} = \ppn \cdot \dchi{\prd}{\distrb}$ follows easily from the definitions. \end{proof} This can be bounded independently of $\distrb$, as follows: \begin{corollary} \label{cor:mix-distance} In the setting of Proposition~\ref{prop:mix-distance}, \[ \dtv{\prd^{\otimes n}}{\wh{\prd}^{\otimes n}} \leq \sqrt{{\textstyle \frac{1}{\min(\prd)}} - 1} \cdot \ppn \sqrt{n}, \] \end{corollary} \begin{proof} It is easy to check that the distribution $\distrb$ maximizing $\dchi{\prd}{\distrb}$ is the one putting all its mass on the $x$ minimizing $\prd[x]$. In this case one calculates $\dchi{\prd}{\distrb} = \sqrt{\frac{1}{\min(\pi)} - 1}$. \end{proof}



\subsection{Proof of Lemma~\ref{lem:distributions}}


\begin{definition} \label{def:compos-distr} Let $0 \leq \ppn \leq 1$ and let $(\distra^m)$, $(\distrb^m)$ be distribution families. Drawing from the \emph{$(\ppn, \distra, \distrb)$-composite distribution} on $[k]^n$ entails the following: $J$ is taken to be a $\ppn$-random subset of~$[n]$; $x$ is drawn from $[k]^{\barJ}$ according to $\distra^{\abs{\barJ}}$; and, $y$ is drawn from $[k]^J$ according to $\distrb^{\abs{J}}$. We sometimes think of this distribution as just being a distribution on composite strings $z = (x, y) \in [k]^n$. \end{definition}

Note that the distribution described in Lemma~\ref{lem:distributions} is very similar to the $(\ppn, \distra, \distrb)$-composite distribution, except that it uses an $[r, 4r]$-random subset rather than a $\ppn$-random subset. We can account for this difference with a standard Chernoff (large-deviation) bound:\noteryan{Citation needed?} \begin{fact} \label{fact:dev} If $J$ is a $\ppn$-random subset of $[n]$ with $\ppn = 2r/n$ as in Definition~\ref{def:r4r}, then $r \leq \abs{J} \leq 4r$ holds except with probability at most $2\exp(-r/4)$. \end{fact}

The utility of using $\ppn$-random subsets in Definition~\ref{def:compos-distr} is the following observation: \begin{fact} If $\prd$ and $\distrb$ are distributions on $[k]$, thought of also as product distribution families, then the $(\ppn, \prd, \distrb)$-composite distribution on $[k]^n$ is precisely the product distribution $\wh{\prd}^{\otimes n}$, where $\wh{\prd}$ is the mixture distribution $(1-\ppn)\prd + \ppn \distrb$ on $[k]$. \end{fact}

Because of this, we can use Corollary~\ref{cor:mix-distance} to bound the total variation distance between $\prd^{\otimes n}$ and a composite distribution. We conclude: \begin{proposition} \label{prop:prod-composite} Let $\prd$ and $\distrb$ be any distributions on $[k]$, thought of also as product distribution families. Writing $\wt{\prd}$ for the $(\ppn,\prd,\distrb)$-composite distribution on strings in $[k]^n$, we have \[ \dtv{\prd^{\otimes n}}{\wt{\prd}} \leq {\textstyle \sqrt{\frac{1}{\min(\prd)}-1}} \cdot \ppn \sqrt{n}. \] \end{proposition}

Recall that for any $\ell \leq k$, the equal-slices distribution $\eqs{\ell}^{m}$ on $m$ coordinates is a mixture of product distributions $\spac^{\otimes m}$ on $[k]^m$. We can therefore average Proposition~\ref{prop:prod-composite} over $\distrb$ to obtain: \begin{proposition} \label{prop:prod-eqs} If $\wt{\pi}$ denotes the $(\ppn,\pi,\eqs{\ell})$-composite distribution on strings in $[k]^n$, where $\ell \leq k$, then we have \[ \dtv{\pi^{\otimes n}}{\wt{\pi}} \leq {\textstyle \sqrt{\frac{1}{\min(\pi)}-1}} \cdot \ppn \sqrt{n}. \] \end{proposition} Here we have used the following basic bound, based on the triangle inequality: \begin{fact} \label{fact:tv-mix} Let $(\distrb_\kappa)_{\kappa \in K}$ be a family of distributions on $\Omega^n$, let $\varsigma$ be a distribution on $K$, and let $\overline{\distrb}$ denote the associated mixture distribution, given by drawing $\kappa \sim \varsigma$ and then drawing from $\distrb_\kappa$. Then \[ \dtv{\distra}{\overline{\distrb}} \leq \Ex_{\kappa \sim \varsigma}[\dtv{\distra}{\distrb_\kappa}]. \] \end{fact}

If we instead use this fact to average Proposition~\ref{prop:prod-composite} over $\prd$, we can obtain: \begin{proposition} \label{prop:eqs-prod} Let $\distrb$ be any distribution on $[k]$. Writing $\distra$ for the $(\ppn, \eqs{k}, \distrb)$-composite distribution on strings in $[k]^n$, we have \[ \dtv{\eqs{k}^n}{\distra} \leq (2k-1)\ppn \sqrt{n}. \] \end{proposition} \begin{proof} Thinking of $\eqs{k}^m$ as the mixture of product distributions $\spac^{\otimes m}$, where $\spac$ is a random spacing on $[k]$, Fact~\ref{fact:tv-mix} and Proposition~\ref{prop:prod-composite} imply \[ \dtv{\eqs{k}^n}{\distra} \leq \Ex_{\spac}\left[{\textstyle \sqrt{\frac{1}{\min(\spac)}-1}}\right] \cdot \ppn \sqrt{n}. \] We can upper-bound the expectation\noteryan{Undoubtedly someone has worked hard on this $-1/2$th moment of the least spacing before (Devroye '81 or '86 perhaps), but I think it's probably okay to do the following simple thing here} by \begin{multline*} \Ex_{\spac}\left[{\textstyle \sqrt{\frac{1}{\min(\spac)}}}\right] \quad=\quad \int_{0}^\infty \Pr_{\spac}\left[{\textstyle \sqrt{\frac{1}{\min(\spac)}}} \geq t\right]\,dt \quad=\quad \int_{0}^\infty \Pr_{\spac}[\min(\spac) \leq 1/t^2]\,dt \\ \leq\quad k + \int_{k}^\infty \Pr_{\spac}[\min(\spac) \leq 1/t^2]\,dt \quad\leq\quad k + \int_{k}^\infty (k(k-1)/t^2) \,dt \quad=\quad 2k-1, \end{multline*} where in the second-to-last step we used Proposition~\ref{prop:rand-min}. \end{proof} Averaging now once more in the second component, we obtain the following: \begin{proposition} \label{prop:eqs-eqs} Let $2 \leq \ell \leq k$ and let $\distra'$ denote the $(\ppn, \eqs{k}, \eqs{\ell})$-composite distribution on strings in $[k]^n$. Then \[ \dtv{\eqs{k}^n}{\distra'} \leq (2k-1) \ppn \sqrt{n}. \] \end{proposition}



We can now obtain the proof of Lemma~\ref{lem:distributions}:

\begin{proof} The three statements in Lemma~\ref{lem:distributions} essentially follow from Propositions~\ref{prop:prod-eqs}, \ref{prop:eqs-prod}, and \ref{prop:eqs-eqs}, taking $\ppn = 2r/n$. This would give bounds of $2{\textstyle \sqrt{\frac{1}{\min(\pi)}-1}} \cdot r / \sqrt{n}$, $(4k-2) \cdot r/\sqrt{n}$, and $(4k-2) \cdot r/\sqrt{n}$, respectively. However we need to account for conditioning on $r \leq \abs{J} \leq 4r$. By Fact~\ref{fact:dev}, this conditioning increases the total variation distance by at most $2\exp(-r/4)$. Using the lower bound $r \geq 2 \ln n$ from the lemma's hypothesis, this quantity is at most $2r/\sqrt{n}$, completing the proof. \end{proof}