Measures.tex: Difference between revisions

Revision as of 05:44, 18 May 2009

\section{Passing between probability measures}

\noteryan{As mentioned at the end, I think this section would be better if the master lemma at the end came first}

\subsection{Comparing product distributions}

We begin by recording some simple facts about distances between probability distributions. Throughout this subsection we assume $\pi$ and $\nu$ are probability distributions on the same finite set $\Omega$.\noteryan{Perhaps change $\Omega$ to $[k]$ throughout.}

\begin{fact} \label{fact:tv-mix} Let $(\nu_\kappa)_{\kappa \in K}$ be a family of distributions on $\Omega$, let $\lambda$ be a distribution on $K$, and let $\mu$ be the associated mixture distribution, given by drawing $\kappa \sim \lambda$ and then drawing from $\nu_\kappa$. Then\noteryan{need to prove? It's just the triangle inequality} \[ \dtv{\pi}{\mu} \leq \Ex_{\kappa \sim \lambda}[\dtv{\pi}{\nu_\kappa}]. \] \end{fact} \begin{definition} The \emph{$\chi^2$ distance} $\dchi{\pi}{\nu}$ is defined by\noteryan{I assume the reader is savvy enough to realize we mean to take the positive square root\dots} \[ \dchi{\pi}{\nu}^2 = \Varx_{x \sim \pi}\left[\frac{\nu[x]}{\pi[x]}\right]. \] Note that $\dchi{\pi}{\nu}$ is \emph{not} symmetric in $\pi$ and $\nu$. \end{definition}

The $\chi^2$ distance is introduced to help us prove the following fact: \begin{proposition} \label{prop:mix-distance} Assume $\min(\pi) \neq 0$; i.e., $\pi$ has full support. Suppose $\pi$ is slightly mixed with~$\nu$, forming $\hat{\pi}$; specifically, $\hat{\pi} = (1-\eps) \pi + \eps \nu$. Then the associated product distributions $\pi^{\otimes n}$, $\hat{\pi}^{\otimes n}$ on $\Omega^{n}$ satisfy \[ \dtv{\pi^{\otimes n}}{\hat{\pi}^{\otimes n}} \leq \dchi{\pi}{\nu} \cdot \eps \sqrt{n}. \] \end{proposition} \begin{proof} It is a straightforward consequence of Cauchy-Schwarz (see, e.g.~\cite[p.\ 101]{Rei89})\noteryan{This is the part using $\min(\pi) \neq 0$, by the way.} that \[ \dtv{\pi^{\otimes n}}{\hat{\pi}^{\otimes n}} \leq \dchi{\pi}{\hat{\pi}} \cdot \sqrt{n}, \] and the identity $\dchi{\pi}{\hat{\pi}} = \eps \dchi{\pi}{\nu}$ follows easily from the definitions. \end{proof} This can be bounded independently of $\nu$, as follows: \begin{corollary} \label{cor:mix-distance} In the setting of Proposition~\ref{prop:mix-distance}, \[ \dtv{\pi^{\otimes n}}{\hat{\pi}^{\otimes n}} \leq \sqrt{{\textstyle \frac{1}{\min(\pi)}} - 1} \cdot \eps \sqrt{n}, \] \end{corollary} \begin{proof} It is easy to check that the distribution $\nu$ maximizing $\dchi{\pi}{\nu}$ is the one putting all its mass on the $x$ minimizing $\pi[x]$. In this case one calculates $\dchi{\pi}{\nu} = \sqrt{\frac{1}{\min(\pi)} - 1}$. \end{proof}

\subsection{Restrictions and composite distributions}

\begin{definition} A \emph{restriction} on $[k]^n$ is a pair $(J, x_\barJ)$, where $J \subseteq [n]$ and $x_\barJ \in [k]^{\barJ}$, where $\barJ = [n] \setminus J$. It is thought of as a ``partial string, where the $\barJ$-coordinates are ``fixed according to $x_\barJ$ and the $J$-coordinates are still ``free. Given a fixing $y_J \in [k]^J$, we write $(x_\barJ, y_J)$ for the complete composite string in $[k]^n$.\noteryan{Although this notation looks awkward, I believe it's useful. Also, I suppose we should point out that a restriction is a special kind of combinatorial subspace.} \end{definition}

\begin{definition} Given a set $A \subseteq [k]^n$ and a restriction $(J,x_\barJ)$, we write $A_{x_\barJ}$ for the subset of $[k]^{J}$ defined by $A_{x_\barJ} = \{y \in [k]^J : (x_{\barJ}, y_J) \in A\}$. \end{definition}

\begin{definition} If $0 \leq \eps \leq 1$, we say that $J$ is an \emph{$\eps$-random subset} of $[n]$ if $J$ is formed by including each coordinate $i \in [n]$ independently with probability $\eps$. \end{definition}

\begin{definition} \label{def:rand-restriction} Let $0 \leq \eps \leq 1$ and let $(\pi^m)$ be a family of probability distributions, where $\pi^m$ is a distribution on $[k]^m$. An \emph{$(\epsilon,\pi)$-random restriction} $(J, x_\barJ)$ is formed by letting $J$ be an $\eps$-random subset of $[n]$ and then drawing $x_\barJ$ from $[k]^{\barJ}$ according to $\pi^{|\barJ|}$. \end{definition} In the above definition, $\pi^m$ will either be equal-slices $\eqs{k}^m$ or a product distribution $\nu^{\otimes m}$ based on some distribution $\nu$ on $[k]$.

\begin{definition} In the setting of Definition~\ref{def:rand-restriction}, suppose that $(\nu^m)$ is another family of distributions. The \emph{$(\eps, \pi, \nu)$-composite distribution} generates an $(\eps,\pi)$-random restriction $(J, x_\barJ)$ and then generates $y_J \in [k]^J$ according to $\nu^{|J|}$. We sometimes think of this distribution as just being a distribution on composite strings $z = (x_\barJ, y_J) \in [k]^n$. \end{definition}

\begin{fact} If $\pi$ and $\nu$ are distributions on $[k]$, thought of also as product distribution families, then the $(\eps, \pi, \nu)$-composite distribution on $[k]^n$ is precisely the product distribution $\hat{\pi}^{\otimes n}$, where $\hat{\pi}$ is the mixture distribution $(1-\eps)\pi + \eps \nu$ on $[k]$. \end{fact}

Because of this, we can use Corollary~\ref{cor:mix-distance} to bound the total variation distance between $\pi^{\otimes n}$ and a composite distribution. We conclude:

\begin{proposition} \label{prop:prod-composite} Let $\pi$ and $\nu$ be any distributions on $[k]$. Writing $\wt{\pi}$ for the $(\eps,\pi,\nu)$-composite distribution on strings in $[k]^n$, we have \[ \dtv{\pi^{\otimes n}}{\wt{\pi}} \leq {\textstyle \sqrt{\frac{1}{\min(\pi)}-1}} \cdot \eps \sqrt{n}. \] \end{proposition} For any $\ell \leq k$, the equal-slices distribution $\eqs{\ell}^{m}$ on $m$ coordinates is a mixture of product distributions $\nu^{\otimes m}$ on $[k]^m$. In light of Fact~\ref{fact:tv-mix}, we can average Proposition~\ref{prop:prod-composite} over $\nu$ to obtain: \begin{proposition} \label{prop:prod-eqs} If $\wt{\pi}$ denotes the $(\eps,\pi,\eqs{\ell})$-composite distribution on strings in $[k]^n$, we have \[ \dtv{\pi^{\otimes n}}{\wt{\pi}} \leq {\textstyle \sqrt{\frac{1}{\min(\pi)}-1}} \cdot \eps \sqrt{n}. \] \end{proposition} If we instead average Proposition~\ref{prop:prod-composite} over $\pi$, we can obtain: \begin{proposition} \label{prop:eqs-prod} Let $\nu$ be any distribution on $[k]$. Writing $\wt{\mu}$ for the $(\eps, \eqs{k}, \nu)$-composite distribution on strings in $[k]^n$, we have \[ \dtv{\eqs{k}^n}{\wt{\mu}} \leq (2k-1)\eps \sqrt{n}. \] \end{proposition} \begin{proof} Thinking of $\eqs{k}^m$ as the mixture of product distributions $\pi^{\otimes m}$, where $\pi$ is a random distribution on $[k]$, Fact~\ref{fact:tv-mix} and Proposition~\ref{prop:prod-composite} imply \[ \dtv{\eqs{k}^n}{\wt{\mu}} \leq \Ex_{\pi}\left[{\textstyle \sqrt{\frac{1}{\min(\pi)}-1}}\right] \cdot \eps \sqrt{n}. \] We can upper-bound the expectation\noteryan{Undoubtedly someone has worked hard on this $-1/2$th moment of the least spacing before (Devroye '81 or '86 perhaps), but I think it's probably okay to do the following simple thing here} by \begin{multline*} \Ex_{\pi}\left[{\textstyle \sqrt{\frac{1}{\min(\pi)}}}\right] \quad=\quad \int_{0}^\infty \Pr_{\pi}\left[{\textstyle \sqrt{\frac{1}{\min(\pi)}}} \geq t\right]\,dt \quad=\quad \int_{0}^\infty \Pr_{\pi}[\min(\pi) \leq 1/t^2]\,dt \\ \leq\quad k + \int_{k}^\infty \Pr_{\pi}[\min(\pi) \leq 1/t^2]\,dt \quad\leq\quad k + \int_{k}^\infty (k(k-1)/t^2) \,dt \quad=\quad 2k-1, \end{multline*} where in the second-to-last step we used Proposition~\ref{prop:rand-min}. \end{proof} Averaging now once more in the second component, we obtain the following: \begin{proposition} \label{prop:eqs-eqs} Let $2 \leq \ell \leq k$ and let $\wt{\mu}$ denote the $(\eps, \eqs{k}, \eqs{\ell})$-composite distribution on strings in $[k]^n$. Then \[ \dtv{\eqs{k}^n}{\wt{\mu}} \leq (2k-1) \eps \sqrt{n}. \] \end{proposition}

Finally, it is inconvenient that in an $(\eps, \pi, \nu)$-composite distribution, the number of restricted coordinates $|J|$ can occasionally deviate significantly from its expectation $\eps n$. We can prevent this with another small loss in total variation distance. \begin{definition} \label{def:r4r} Given $r \leq n/2$, define $\eps = 2r/n$. We say that $J$ is an \emph{$[r,4r]$-random subset} of $[n]$ if $J$ is an $\eps$-random subset of $[n]$ conditioned on $r \leq |J| \leq 4r$. \end{definition} The following large-deviation bound is standard:\noteryan{Citation needed?} \begin{fact} \label{fact:dev} In Definition~\ref{def:r4r}, the probability that $r \leq |J| \leq 4r$ fails is at most $2\exp(-r/4)$. \end{fact}

We can combine this fact with the previous propositions to get a master lemma for passing between distributions:\noteryan{I think it would be better if in this section we: 1. Had the equal-slices subsection. 2. Then had one more subsection which: a) begins with the statement of this lemma; then, b) says, ``Psst, reader, this lemma's proof is very boring. We wouldn't be disappointed if you skipped reading it.} \begin{lemma} \label{lem:distributions} Let $(\pi^m)$ and $(\nu^m)$ be distribution families. Assume $2 \ln n \leq r \leq n/2$. Let $J$ be an $[r,4r]$-random subset of $[n]$, let $x$ be drawn from $[k]^{\barJ}$ according to $\pi^{|\barJ|}$, and let $y$ be drawn from $[k]^J$ according to $\nu^{|J|}$. The resulting distribution on $(x,y)$ has total variation distance from $\pi^n$ which can be bounded as follows: \begin{enumerate} \item (Product to equal-slices.) If $\pi^m = \pi^{\otimes m}$ and $\nu^m = \eqs{\ell}^m$ for $\ell \leq k$, the bound is \[ (2{\textstyle \sqrt{\frac{1}{\min(\pi)}-1}})+2) \cdot r / \sqrt{n}. \] \item (Equal-slices to product.) If $\pi^m = \eqs{k}^m$ and $\nu^m = \nu^{\otimes m}$, the bound is $4k \cdot r/\sqrt{n}$. \item (Equal-slices to equal-slices.) \label{eqn:distrs-eqs-eqs} If $\pi^m = \eqs{k}^m$ and $\nu^m = \eqs{\ell}^m$ for $\ell \leq k$, the bound is $4k \cdot r/\sqrt{n}$. \end{enumerate} \end{lemma} \begin{proof} We apply Propositions~\ref{prop:prod-eqs}, \ref{prop:eqs-prod}, and \ref{prop:eqs-eqs} with $\eps = 2r/n$, but condition on $r \leq |J| \leq 4r$. This increases the total variation distance by at most $2\gamma$, where $\gamma$ is the probability that $r \leq |J| \leq 4r$ fails. \textbf{ACTUALLY, JUST BY $\gamma$.} By Fact~\ref{fact:dev}, the increase is at most $4\exp(-r/4)$, which is at most $2 r / \sqrt{n}$ using the lower bound $r \geq 2 \ln n$. The claimed bounds now follow.\noteryan{This is explained messily.} \end{proof}

@@ Line 1: / Line 1: @@
-\section{Probability measures}
+\section{Passing between probability measures}
-{\tiny [[Ryan: \bf{The quickest and clearest way to marshal all the requisite facts about equal-slices involves describing it as a uniform mixture of product distributions.  However, this has the slight downside of introducing non-discrete probability distributions into an otherwise entirely finitary paper.  In particular, an extremely pedantic mathematician might not enjoy the casual treatment I employ in dealing with continuous distributions.  However, I think getting hyper-formal would badly obscure things for no real gain.}]]}
-\subsection{Equal-slices}
-\begin{definition}  \label{def:spacings} We say that $\nu$ is a \emph{random distribution} on $[k]$ if the vector of probabilities $(\nu[1], \dots, \nu[k])$ is formed as follows: $k$ points $q_1, \dots, q_k$ are chosen uniformly and independently on the circle of unit circumference; then $\nu[a]$ is set to the length of the arc emanating clockwise from $q_a$.
-\end{definition}
-\begin{remark}  It is well known (see, e.g.~\cite[Ch.\ 1]{Rei89})\noteryan{I think the traditional citation here is Pyke '65, but we can save a spot in the biblio by citing Reiss again (which is also a text, whereas Pyke is a bit painful to read)} that a random distribution $\nu$ can equivalently be defined as choosing the vector of probabilities $(\nu[1], \dots, \nu[k])$ uniformly from the simplex $\{(p_1, \dots, p_k) : \sum p_i = 1, p_i \geq 0\ \forall i\}$. \noteryan{Or, by taking the spacings of $k-1$ random points in a unit segment.  Since we don't use this fact, I think it's fair to not prove it.}
-\end{remark}
-We will occasionally use the following simple fact:
-\begin{proposition}  \label{prop:rand-min} If $\nu$ is a random distribution on $[k]$ as in Definition~\ref{def:spacings}, then
-\[
-\Pr[\min(\nu) \leq \gamma] \leq k(k-1) \gamma.
-\]
-\end{proposition}
-\begin{proof}
-By a union bound and symmetry, $\Pr[\min(\nu) \leq \gamma] \leq k \Pr[\nu[1] \leq \gamma]$.  The event $\nu[1] \leq \gamma$ occurs only if one of $q_2, \dots, q_k$ falls within the arc of length $\gamma$ clockwise of $q_1$.  By another union bound, this has probability at most $(k-1)\gamma$.
-\end{proof}
-The following distribution on strings is important for our proof.
-\begin{definition}  The \emph{equal-slices} distribution on $[k]^{n}$, denoted $\eqs{k}^{n}$, is defined as follows: First, choose a random distribution $\nu$ on $[k]$; then draw from the product distribution $\nu^{\otimes n}$ on $[k]^{n}$.  More generally we write $\eqs{k}^J$ for the analogous distribution on $[k]^{J}$, and we write simply $\eqs{k}$ when $n$ or $J$ is clear from context.
-\end{definition}
-The terminology ``equal-slices'' comes from the following:\noteryan{So far we don't actually need the observations that follow --- updated: or do we? See Prop~\ref{prop:degen} below}
-\begin{definition}  Given a vector $C \in \N^k$ whose components sum to $n$, the associated \emph{slice} of $[k]^{n}$ is the set
-\[
-\{x \in [k]^{n} : x \text{ has $C_1$ many $1$'s, $C_2$ many $2$'s, \dots, $C_k$ many $k$'s}\}.
-\]
-We abuse terminology by also referring to $C$ as the slice.  A slice is \emph{nondegenerate} when $C_i \neq 0$ for all $i$.  There are $\binom{n+k-1}{k-1}$ different slices, of which $\binom{n-1}{k-1}$ are nondegenerate.\noteryan{Justify with bars and dots?}
-\end{definition}
-\begin{proposition} \label{prop:eqs-equiv} The equal-slices distribution $\eqs{k}^{n}$ is equivalent to first choosing a slice uniformly from the $\binom{n+k-1}{k-1}$ many possibilities, then drawing a string uniformly from that slice. \noteryan{Include proof?  If so, it's by throwing $n+k-1$ points into the interval and looking at the result in two ways.}
-\end{proposition}
-The following observation will be crucial:
-\begin{lemma}  Let $x \sim \eqs{k}^n$, and let $i \sim [k-1]$ be uniformly random.  Then $\chg{x}{k}{i}$ is distributed as~$\eqs{k-1}^n$.
-\end{lemma}
-\begin{proof}
-From the definition, the string $\chg{x}{k}{a}$ is distributed as follows: first $\nu$ is taken to be a random distribution on $[k]$; then $\wt{\nu}$ is taken to be the distribution on $[k-1]$ with $\wt{\nu}[j] = \nu[j]$ for $j \neq i$ and $\wt{\nu}[i] = \nu[i] + \nu[k]$; then a string is drawn from the product distribution $\wt{\nu}^{\otimes n}$.  Thus it suffices to show that $\wt{\nu}$ is a random distribution on $[k-1]$ in the sense of Definition~\ref{def:spacings}.  This is easy to see: having chosen $k$ points on the circle, thus forming $\nu$, we get the correct distribution $\wt{\nu}$ simply by removing the $k$th point. \noteryan{This is not especially well-explained; will try to think how to say it with the maximum combination of concision and correctness.}
-\end{proof}
-We will also need the following simple bound:
-\begin{proposition} \label{prop:degen} Let $x \sim \eqs{k}^n$.  The probability $x$ is in a degenerate slice is at most $k(k-1)/n$.
-\end{proposition}
-\begin{proof}  We can view a draw $x \sim \eqs{k}^n$ as follows: Pick $q_1, \dots, q_k$ on the unit circle as in Definition~\ref{def:spacings} to define the random distribution $\nu$.  Next, pick $n$ more points $r_1, \dots, r_n$ independently and uniformly on the circle, and define $x_j$ to be $i$ iff $r_j$ is in the arc emanating clockwise from $q_i$.\noteryan{Not worth it to point out that there are certain $0$-probability events involved here\dots}.  The string $x$ is degenerate iff after the $n+k$ points are drawn there are two $q_i$ points appearing ``consecutively'' on the circle.
-Given the locations but not the labels of the $n+k$ points ultimately drawn, the identity of the $q_i$'s is uniformly random from the $(n+k)(n+k-1) \cdots (n+1)$ possibilities.  By symmetry, the probability that two $q_i$'s appear consecutively is, by a union bound, at most $\binom{k}{2}$ times the probability that $q_1$ and $q_2$ appear consecutively.  We can imagine that $q_1$ is chosen first and $q_2$ second; then the probability they appear consecutively is precisely $2/(n+k-1)$.  Hence the probability that $x$ is degenerate is at most $\binom{k}{2} \cdot 2/(n+k-1) \geq k(k-1)/n$. \noteryan{Again, I'd prefer if this proof was a little more elegantly phrased.}
-\end{proof}
+\noteryan{As mentioned at the end, I think this section would be better if the master lemma at the end came first}
@@ Line 58: / Line 8: @@
 We begin by recording some simple facts about distances between probability distributions.  Throughout this subsection we assume $\pi$ and $\nu$ are probability distributions on the same finite set $\Omega$.\noteryan{Perhaps change $\Omega$ to $[k]$ throughout.}
-\begin{definition} Given a distribution $\pi$ on $\Omega$, write $\min(\pi) = \min_{x \in \Omega} \pi[x]$.
-\end{definition}
-\begin{definition}   The \emph{total variation distance} $\dtv{\pi}{\nu}$ is defined by
-\[
-\dtv{\pi}{\nu} = \frac12 \sum_{x \in \Omega} |\pi[x] - \nu[x]| = \max_{A \subseteq X} |\pi[A] - \nu[A]|.
-\]
-\end{definition}
 \begin{fact}  \label{fact:tv-mix} Let $(\nu_\kappa)_{\kappa \in K}$ be a family of distributions on $\Omega$, let $\lambda$ be a distribution on $K$, and let $\mu$ be the associated mixture distribution, given by drawing $\kappa \sim \lambda$ and then drawing from $\nu_\kappa$.  Then\noteryan{need to prove? It's just the triangle inequality}
 \[
@@ Line 183: / Line 125: @@
 \end{lemma}
 \begin{proof}
-We apply Propositions~\ref{prop:prod-eqs}, \ref{prop:eqs-prod}, and \ref{prop:eqs-eqs} with $\eps = 2r/n$, but condition on $r \leq |J| \leq 4r$.  This increases the total variation distance by at most $2\gamma$, where $\gamma$ is the probability that $r \leq |J| \leq 4r$ fails.  By Fact~\ref{fact:dev}, the increase is at most $4\exp(-r/4)$, which is at most $2 r / \sqrt{n}$ using the lower bound $r \geq 2 \ln n$.  The claimed bounds now follow.\noteryan{This is explained messily.}
+We apply Propositions~\ref{prop:prod-eqs}, \ref{prop:eqs-prod}, and \ref{prop:eqs-eqs} with $\eps = 2r/n$, but condition on $r \leq |J| \leq 4r$.  This increases the total variation distance by at most $2\gamma$, where $\gamma$ is the probability that $r \leq |J| \leq 4r$ fails. \textbf{ACTUALLY, JUST BY $\gamma$.} By Fact~\ref{fact:dev}, the increase is at most $4\exp(-r/4)$, which is at most $2 r / \sqrt{n}$ using the lower bound $r \geq 2 \ln n$.  The claimed bounds now follow.\noteryan{This is explained messily.}
 \end{proof}

Measures.tex: Difference between revisions

Revision as of 05:44, 18 May 2009

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools