Measures.tex

From Polymath Wiki
Revision as of 17:59, 1 May 2009 by Ryanworldwide (talk | contribs) (New page: \section{Probability measures} \subsection{Comparing product distributions} We begin by recording some simple facts about distances between probability distributions. Throughout, we ass...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

\section{Probability measures}


\subsection{Comparing product distributions} We begin by recording some simple facts about distances between probability distributions. Throughout, we assume $\pi$ and $\nu$ are probability distributions on the same finite set $\Omega$.\noteryan{Perhaps change $\Omega$ to $[k]$ throughout.}

\begin{definition} Given a distribution $\pi$ on $\Omega$, write $\min(\pi) = \min_{x \in \Omega} \pi[x]$. \end{definition}

\begin{definition} The \emph{total variation distance} $\dtv{\pi}{\nu}$ is defined by \[ \dtv{\pi}{\nu} = \frac12 \sum_{x \in \Omega} |\pi[x] - \nu[x]| = \max_{A \subseteq X} |\pi[A] - \nu[A]|. \] The \emph{$\chi^2$ distance} $\dchi{\pi}{\nu}$ is defined by\noteryan{I assume the reader is savvy enough to realize we mean to take the positive square root\dots} \[ \dchi{\pi}{\nu}^2 = \Varx_{x \sim \pi}\left[\frac{\nu[x]}{\pi[x]}\right]. \] Note that $\dchi{\pi}{\nu}$ is \emph{not} symmetric in $\pi$ and $\nu$. \end{definition}

The $\chi^2$ distance is introduced to help us prove the following fact: \begin{proposition} \label{prop:mix-distance} Assume $\min(\pi) \neq 0$; i.e., $\pi$ has full support.\noteryan{A bit of a drag to bring this up, but technically, it's needed for the first deduction in the proof.} Suppose $\pi$ is slightly mixed with $\nu$, forming $\hat{\pi}$; specifically, $\hat{\pi} = (1-\eps) \pi + \eps \nu$. Then the associated product distributions $\pi^{\otimes n}$, $\hat{\pi}^{\otimes n}$ on $\Omega^{n}$ satisfy \[ \dtv{\pi^{\otimes n}}{\hat{\pi}^{\otimes n}} \leq \dchi{\pi}{\nu} \cdot \eps \sqrt{n}. \] \end{proposition} \begin{proof} It is a straightforward consequence of Cauchy-Schwarz (see, e.g.~\cite[p.\ 101]{Rei89}) that \[ \dtv{\pi^{\otimes n}}{\hat{\pi}^{\otimes n}} \leq \dchi{\pi}{\hat{\pi}} \cdot \sqrt{n}, \] and the identity $\dchi{\pi}{\hat{\pi}} = \eps \dchi{\pi}{\nu}$ follows easily from the definitions. \end{proof} This can be bounded independently of $\nu$, as follows: \begin{corollary} \label{cor:mix-distance} In the setting of Proposition~\ref{prop:mix-distance}, \[ \dtv{\pi^{\otimes n}}{\hat{\pi}^{\otimes n}} \leq \sqrt{{\textstyle \frac{1}{\min(\pi)}} - 1} \cdot \eps \sqrt{n}, \] \end{corollary} \begin{proof} It is easy to check that the distribution $\nu$ maximizing $\dchi{\pi}{\nu}$ is the one putting all its mass on the $x$ minimizing $\pi[x]$. In this case one calculates $\dchi{\pi}{\nu} = \sqrt{\frac{1}{\min(\pi)} - 1}$. \end{proof}


\subsection{Equal-slices}

\begin{definition} We say that $\pi$ is a \emph{random distribution} on $[k]$ if the vector $(\pi[1], \dots, \pi[k])$ is chosen uniformly from the simplex $\{(p_1, \dots, p_k) : \sum p_i = 1, p_i \geq 0\ \forall i\}$. \end{definition}

\begin{remark} It is well known (see, e.g.~\cite[Ch.\ 1]{Rei89}) that a random distribution $\pi$ can equivalently be defined by choosing $k-1$ points in $[0,1]$ uniformly and independently and letting $\pi[i]$ be the length of the $i$th spacing thus formed, $i = 1 \dots k$. \end{remark}

\begin{definition} The \emph{equal-slices} distribution on $[k]^{n}$, denoted $\eqs{k}^{n}$, is defined as follows: First, choose a random distribution $\pi$ on $[k]$; then draw from the product distribution $\pi^{\otimes n}$ on $[k]^{n}$. We sometimes write $\eqs{k}$ in place of $\eqs{k}^n$ when $n$ is clear from context. \end{definition}

The terminology ``equal-slices comes from the following: \begin{definition} Given a vector $C \in \N^k$ whose components sum to $n$, the associated \emph{slice} of $[k]^{n}$ is the set \[ \{x \in [k]^{n} : x \text{ has $C_1$ many $1$'s, $C_2$ many $2$'s, \dots, $C_k$ many $k$'s}\}. \] We abuse terminology by also referring to $C$ as the slice. A slice is \emph{nondegenerate} when $C_i \neq 0$ for all $i$. There are $\binom{n+k-1}{k-1}$ different slices, of which $\binom{n-1}{k-1}$ are nondegenerate.\noteryan{Justify with bars and dots?} \end{definition} \begin{proposition} The equal-slices distribution $\eqs{k}^{n}$ is equivalent to first choosing a slice uniformly from the $\binom{n+k-1}{k-1}$ many possibilities, then drawing a string uniformly from that slice. \noteryan{Include proof? If so, it's by throwing $n+k-1$ points into the interval and looking at the result in two ways.} \end{proposition}

\subsection{Random restrictions} \begin{definition} A \emph{restriction} on $[k]^n$ is a pair $(T, x_\barT)$, where $T \subseteq [n]$ and $x_\barT \in [k]^{\barT}$, where $\barT = [n] \setminus T$. It is thought of as a ``partial string, where the $\barT$-coordinates are ``fixed according to $x_\barT$ and the $T$-coordinates are still ``free. Given a fixing $y_T \in [k]^T$, we write $(x_\barT, y_T)$ for the complete composite string in $[k]^n$.\noteryan{Some may say this is bad notation, but I believe it is very useful notation. We can address this point.} \end{definition}

\begin{definition} Let $0 \leq \eps \leq 1$ and let $(\pi^m)$ be a family of probability distributions, where $\pi^m$ is a distribution on $[k]^m$. We say that $(T, x_\barT)$ is an \emph{$(\epsilon,\pi)$-random restriction} if $T$ is chosen by including each coordinate of $[n]$ independently with probability $\eps$, and then $x_\barT$ is drawn from $[k]^{\barT}$ according to $\pi^{|\barT|}$. \end{definition} In the above definition, $\pi^m$ is typically either equal-slices $\eqs{k}^m$ or a product distribution $\mu^{\otimes m}$ based on some distribution $\mu$ on $[k]$.

\begin{definition} In the setting of the previous definition, suppose that $(\nu^m)$ is another family of distributions. The \emph{$(\eps, \pi, \nu)$-composite distribution} generates an $(\eps,\pi)$-random restriction $(T, x_\barT)$ and then generates $y_T \in [k]^T$ according to $\nu^{|T|}$. We sometimes think of this distribution as just being a distribution on composite strings $z = (x_\barT, y_T) \in [k]^n$. \end{definition}

\begin{fact} If $\pi$ and $\nu$ are distributions on $[k]$, thought of also as product distribution families, then the $(\eps, \pi, \nu)$-composite distribution on $[k]^n$ is precisely the product distribution $\hat{\pi}^{\otimes n}$, where $\hat{\pi}$ is the mixture distribution $(1-\eps)\pi + \eps \nu$ on $[k]$. \end{fact}

Because of this, we can use Corollary~\ref{cor:mix-distance} to bound the total variation distance between $\pi^{\otimes n}$ and a composite distribution. For example, we conclude: \begin{lemma} Let $\mu_k$ denote the uniform distribution $[k]$, and let $\nu$ be any distribution on $[k]$. If $\mu_k'$ denotes the $(\eps,\mu,\nu)$-composite distribution on strings in $[k]^n$, then \[ \dtv{\mu_k^{\otimes n}}{\mu'} \leq \sqrt{k-1} \cdot \eps \sqrt{n}. \] \end{lemma}