Furstenberg-Katznelson argument: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
This argument is being pursued in our [http://terrytao.wordpress.com/2009/02/11/a-reading-seminar-on-density-hales-jewett/ reading seminar].  Below is an ongoing "combinatorial" translation of the argument.
This proof of [[DHJ(3)]] is being pursued in our [http://terrytao.wordpress.com/2009/02/11/a-reading-seminar-on-density-hales-jewett/ reading seminar].  Below is an ongoing "combinatorial" translation of the argument.


'''Theorem [[DHJ(3)]]'''.  For every <math>\delta > 0</math> there exists n such that every subset <math>A \subset [3]^n</math> of density at least <math>\delta</math> contains a [[combinatorial line]].
'''Proof'''.  Suppose for contradiction that the claim failed; then there exists <math>\delta > 0</math>, a sequence of n going to infinity, and a sequence <math>A^{(n)} \subset [3]^n</math> of line-free sets, all with density at least <math>\delta</math>.


== Step 1: Random sampling ==
== Step 1: Random sampling ==


Let <math>m = m^{(n)}</math> be a sufficiently slowly growing integer-valued function of n (e.g. one can take <math>m := \lfloor \log \log \log n \rfloor </math>).  We can then construct a random embedding <math>\phi^{(n)}: [3]^m \to [3]^n</math> as follows:
Suppose for contradiction that [[DHJ(3)]]; then there exists <math>\delta > 0</math>, a sequence of n going to infinity, and a sequence <math>A^{(n)} \subset [3]^n</math> of [[line-free]] sets, all with density at least <math>\delta</math>.
 
# Pick a random base point <math>x \in [3]^n</math>.  Note from the law of large numbers that the number of 0s, 1s, and 2s in x will be comparable to n/3 with probability 1-o(1) (here we use o(1) to denote anything that goes to zero in the limit <math>n \to \infty</math>, independently of the choice of m).
# Pick m random indices <math>i_1,\ldots,i_m</math> in the 0-set of x.  These will be distinct with probability 1-o(1), if m is a sufficiently slowly growing function of n.
# Define <math>\phi^{(n)}(w_1 \ldots w_m)</math> to be the word in <math>[3]^n</math> formed by starting with x, and replacing the <math>i_j</math> coefficient of x (which is 0) with <math>w_i</math>, for each <math>i=1,\ldots,m</math>.
 
For instance, if n=5, m=2, x=01002, and <math>i_1 = 1; i_2 = 4</math>, then <math>\phi(ab) = a10b2</math> for all <math>a,b=0,1,2</math>.


The above map is well-defined with probability 1-o(1), and maps combinatorial lines to combinatorial lines.  Thus the random set <math>B^{(n)}_m := (\phi^{(n)})^{-1}(A^{(n)}) \subset [3]^m</math> will be line-free.
For any fixed integer m, and with any n larger than m, we can then form a random line-free subset <math>A^{(n)}_m \subset [3]^m</math> by randomly embedding <math>[3]^m</math> inside <math>[3]^n</math>. More precisely,


Given any <math>0 \leq j \leq m</math> and <math>w \in [3]^j</math>, let <math>B_{m-j,w} \subset [3]^{m-j}</math> denote the random set
# Pick m distinct positions <math>a_1,\ldots,a_m \in [n]</math> uniformly at random.
: <math>B^{(n)}_{m-j,w} := \{ v \in [3]^{m-j}: vw \in B^{(n)}_m \}</math>
# Outside of these m positions, pick a random word x in <math>[3]^{[n] \backslash \{a_1,\ldots,a_m\}}</math> uniformly at random.
where vw is of course the concatenation of v and w. A convenient observation is that the <math>B^{(n)}_{m-j,w}</math> are asymptotically ''stationary'' in the sense that the distribution are asymptotically independent of w:
# Define the random sampling map <math>\phi_m^{(n)}: [3]^m \to [3]^n$ by mapping <math>w_1 \ldots w_m</math> to the string that is equal to x outside of the positions <math>a_1,\ldots,a_m</math>, and then setting the position at <math>a_i</math> equal to <math>w_i</math>.
# Define <math>A^{(n)}_m := (\phi_m^{(n)})^{-1}(A^{(n)})</math>.


'''Stationarity lemma'''.  Let <math>0 \leq j\leq m</math>, let <math>B \subset [3]^{m-j}</math>, and let <math>w, w' \in [3]^j</math>.  Then
One can think of <math>A^{(n)}_m</math> as a random variable taking values in the power set <math>X_m := 2^{[3]^m}</math>, so its probability distribution is a probability measure <math>\mu^{(n)}_m</math> on <math>X_m</math>.  Since <math>A^{(n)}</math> is line-free, the <math>A^{(n)}_m</math> are also.


: <math>{\Bbb P}( B^{(n)}_{m-j,w} = B ) = {\Bbb P}( B^{(n)}_{m-j,w'} = B ) + o(1).</math>
These random variables have two notable stationarity properties.  The first is permutation symmetry: the obvious action of the permutation group <math>S_m</math> on <math>X_m</math> preserves <math>\mu^{(n)}_m</math>, thus <math>A^{(n)}_m</math> is stationary with respect to this action.  The other symmetry comes from the ''slicing maps'' <math>U_{m+|w|}^w: X_{m+|w|} \to X_m</math> defined for any m, and any word <math>w \in [3]^{|w|}</math> of length |w| by the formula


'''Proof'''.  Observe that <math>B_{m-j,w} = (\phi^{(n)}_{m-j,w})^{-1}(A_n)</math>, where <math>\phi^{(n)}_{m-j,w}: [3]^{m-j} \to [3]^n</math> is defined like <math>\phi^{(n)}</math>, but replacing m with m-j, and also replacing x with x', defined by replacing x at j random locations <math>i_{m-j+1},\ldots,i_m</math> in the 0-set by the entries in w.  But a direct computation using Stirling's formula (if m is sufficiently slowly growing) shows that the probability distribution of x' is within o(1) in total variation norm from the distribution of x (intuitively the j changes to x are a "drop in the ocean" compared with the noise level <math>\sqrt{n}</math> of x). In particular, replacing w with w' only affects the distribution of x' by o(1) in total variation, and the claim follows.
:<math>U_{m+|w|}^w(A_{m+|w|}) := \{ v \in [3]^m: vw \in A_{m+|w|} \}</math>.
<math>\Box</math>


== Step 2: Taking limits ==
Observe the semigroupoid law


For fixed n, and <math>m = m^{(n)}</math>, we can construct random sets <math>B^{(n)}_i \subset [3]^i</math> for <math>i=0,\ldots,m</math> by the formula <math>B^{(n)}_i := B^{(n)}_{i,0^{m-i}}</math>; these are all line-free.  We then have the asymptotic stationarity property
:<math>U_{m+|w|}^w U_{m+|w|+|u|}^u = U_{m+|wu|}^{wu}</math> (1)


: <math>{\Bbb P}( B^{(n)}_{i,w} = B ) = {\Bbb P}( B^{(n)}_i = B ) + o(1)</math>
for all words w, u.


for all <math>w \in [3]^j</math> with <math>i+j \leq m</math>, where
One has the <i>almost stationary</i> property: for fixed m and w, the random variables <math>A^{(n)}_m</math> and <math>U_{m+|w|}^w(A^{(n)}_{m+|w|})</math> have asymptotically the same distribution in the limit <math>n \to \infty</math>.  This is because both of these random subsets of <math>[3]^m</math> are formed by pulling back <math>A^{(n)}</math> by a random m-dimensional subspace; the probability distributions of these random subspaces differ slightly, but one can compute that the total variation of the difference goes to zero in the limit <math>n \to \infty</math> (holding m,w fixed).


: <math>B^{(n)}_{i,w} := \{ v \in [3]^i: vw \in B_{i+j}^{(n)} \}.</math> (1)
Taking a subsequence limit as <math>n \to \infty</math> (and using the Helly selection principle, a.k.a. the vague sequential compactness of probability measures), we can then find random line-free sets <math>A_m \subset [3]^m</math> (or equivalently, a probability measure <math>\mu_m</math> on <math>X_m</math>) which is stationary with respect to the permutation groups <math>S_m</math> acting on <math>X_m</math>, and also with respect to the slicing maps <math>U_{m+|w|}^w: X_{m+|w|} \to X_m</math>.  Also, since <math>A^{(n)}_0</math> was non-empty with probability at least <math>\delta</math>, then so does <math>A_0</math>; by stationarity, this implies that for any <math>v \in [3]^m</math>, that <math>v\in A_m</math> with probability at least <math>\delta</math> also.


Also, observe that <math>{\Bbb P}(B^{(n)}_0 \neq \emptyset)</math> is the density of <math>A_n</math> and is thus at least <math>\delta</math>.
== Step 2.  Inverting the translations ==


Using the Helly selection principle we can take a subsequence limit as <math>n \to \infty</math> in the vague topologyThis gives us a new sequence of random sets <math>B_i \subset [3]^i</math> for all <math>i \geq 0</math> obeying the exact stationarity property
This is all very well and good, but there is one minor technical problem which will cause difficulty in later parts of the argument: the slicing maps <math>U_{m+|w|}^w: X_{m+r} \to X_m</math> are not invertibleWe would like to (artificially) upgrade them to be invertible (thus upgrading the semigroupoid of transformations <math>U_{m+r}^w</math> to a groupoid). This can be done, but requires extending the measure spaces $latex X_m$ a bit (or in probabilistic language, by adding random number generators). 


: <math>{\Bbb P}( B_{i,w} = B ) = {\Bbb P}( B_i = B )</math>
Let's turn to the details.  It suffices to understand what is going on with the hyperplane slicing maps <math>U_{m+1}^i: X_{m+1} \to X_m</math> for <math>i=0,1,2</math>, since the more general maps can be expressed as compositions of these maps by (1).  The key lemma is


for all <math>w \in [3]^j</math>, where <math>B_{i,w}</math> is defined similarly to (1), and such that <math>B_0 \neq \emptyset</math> with positive probabilityAlso, all the <math>B_i</math> are almost surely line-free; by removing an event of probability zero we may assume that they are ''surely'' line-free.
'''Lemma 1'''  Let <math>(X,\mu_X)</math> and <math>(Y,\mu_Y)</math> be finite probability spaces, and let <math>U: X \to Y</math> be a surjective map which is probability-preserving (thus <math>U_* \mu_X = \mu_Y</math>, or equivalently that <math>\mu_X(U^{-1}(E)) = \mu_Y(E)</math> for every <math>E \subset Y)Then one can lift U to an (almost surely) invertible map <math>\tilde U: X \times [0,1] \to Y \times [0,1]</math> on the product probability spaces which forms a commuting square with the projections <math>\pi_X: X \times [0,1] \to X</math>, <math>\pi_Y: Y \times [0,1] \to Y</math>, such that <math>\tilde U, \tilde U^{-1}</math> are measure-preserving.


== Step 3: Creating transformations ==
'''Proof''': By working one fibre of U at a time, one can reduce to the case when Y is a point (so that U is trivial).  Order X as <math>x_1,\ldots,x_k</math>, with probabilities <math>p_1,\ldots,p_k</math>.  One can then take


For each i, let <math>X_i := 2^{[3]^i}</math> denote the space of all subsets <math>B_i \subset [3]^n</math>, with the product sigma-algebra.  The probability distribution of the random set <math>B_i</math> defined in the previous section is thus a probability measure <math>\mu_i</math> on <math>X_i</math>.  The set <math>E_0 := \{ B_0 \neq \emptyset \}</math> has positive measure with respect to <math>\mu_0</math>. 
:<math>\tilde U( x_j, t ) := p_1 + \ldots + p_{j-1} + p_j t</math>


For each j=0,1,2 and <math>i \geq 1</math>, let <math>\tilde U_i^j: X_i \to X_{i-1}</math> be the transformation that sends <math>B_i</math> to <math>B_{i,j} := \{ w \in [3]^{i-1}: wj \in B_i \}</math>.  The stationarity property is nothing more than the statement that <math>\tilde U_i^j</math> is measure-preserving in the sense that <math>\mu_i((\tilde U_i^j)^{-1}(F)) = \mu_{i-1}(F)</math> for any set F in <math>X_{i-1}</math>Observe that if we define <math>\tilde U_w: X_n \to X_0</math> for any word <math>w = w_1 \ldots w_n</math> by the formula
for <math>j=1,\ldots,k</math> and <math>t \in [0,1]</math>; one easily verifies the desired properties.  <math>\Box</math>


:<math>\tilde U_{w_1 \ldots w_n} = \tilde U_1^{w_1} \ldots \tilde U_n^{w_n}</math>(2)
Applying this lemma, we may now construct invertible measure-preserving slicing maps <math>\tilde U_{m+1}^i: \tilde X_{m+1} \to \tilde X_m</math> that commute with the original slicing maps, where <math>\tilde X_m := X_m \times [0,1]</math> with product measure.  We can also extend these maps to negative m, by declaring <math>X_m</math> to be a point in these cases (and the slicing map to be trivial).  We then have the invertible measure-preserving slicing map <math>\tilde U_{m+r}^w: \tilde X_{m+r} \to \tilde X_m</math> for all words <math>w \in [3]^r</math> of length r, defined by the semigroupoid law


then <math>\tilde U_w^{-1}(E_0) = \{ w \in B_n \}</math>. Since the <math>B_i</math> are a.s. line free, we thus see that
:<math>\tilde U_{m+r}^{w_1 \ldots w_r} = \tilde U_{m+1}^{w_1} \ldots \tilde U_{m+r}^{w_r}.</math>


:<math>\mu_n( \tilde U_{w^0}^{-1}(E_0) \cap \tilde U_{w^1}^{-1}(E_0) \cap \tilde U_{w^2}^{-1}(E_0) ) = 0</math>
In fact, we can now extend this groupoid action not just to words, but to any element w of the free group <math>F_3</math> generated by the alphabet 0,1,2 and the formal inverses <math>0^{-1}, 1^{-1}, 2^{-1}</math>, defining <math>\tilde U_{m+|w|}^w: \tilde X_{m+|w|} \to \tilde X_m</math> using the semigroupoid law (1) and the inversion law


for all n and all combinatorial lines <math>w^0, w^1, w^2 \in [3]^n</math>.
:<math>(\tilde U_{m+|w|}^w)^{-1} := \tilde U_m^{w^{-1}}</math>.


It is inconvenient that the <math>\tilde U_i^j</math> are not invertible.  But this can be remedied by lifting the measure spaces up to a sufficiently large space (or, in more probabilistic terms, by introducing a sufficient number of random number generators).
Here we adopt the convention that the inverses <math>0^{-1}, 1^{-1}, 2^{-1}</math> have length -1.  Thus for instance <math>\tilde U_5^{012^{-1}1}: \tilde X_5 \to \tilde X_3</math> is the map


...
:<math>\tilde U_5^{012^{-1}1} = \tilde U_4^0 \tilde U_5^1 (\tilde U_4^2)^{-1} \tilde U_5^1</math>.

Revision as of 11:41, 14 February 2009

This proof of DHJ(3) is being pursued in our reading seminar. Below is an ongoing "combinatorial" translation of the argument.


Step 1: Random sampling

Suppose for contradiction that DHJ(3); then there exists [math]\displaystyle{ \delta \gt 0 }[/math], a sequence of n going to infinity, and a sequence [math]\displaystyle{ A^{(n)} \subset [3]^n }[/math] of line-free sets, all with density at least [math]\displaystyle{ \delta }[/math].

For any fixed integer m, and with any n larger than m, we can then form a random line-free subset [math]\displaystyle{ A^{(n)}_m \subset [3]^m }[/math] by randomly embedding [math]\displaystyle{ [3]^m }[/math] inside [math]\displaystyle{ [3]^n }[/math]. More precisely,

  1. Pick m distinct positions [math]\displaystyle{ a_1,\ldots,a_m \in [n] }[/math] uniformly at random.
  2. Outside of these m positions, pick a random word x in [math]\displaystyle{ [3]^{[n] \backslash \{a_1,\ldots,a_m\}} }[/math] uniformly at random.
  3. Define the random sampling map [math]\displaystyle{ \phi_m^{(n)}: [3]^m \to [3]^n$ by mapping \lt math\gt w_1 \ldots w_m }[/math] to the string that is equal to x outside of the positions [math]\displaystyle{ a_1,\ldots,a_m }[/math], and then setting the position at [math]\displaystyle{ a_i }[/math] equal to [math]\displaystyle{ w_i }[/math].
  4. Define [math]\displaystyle{ A^{(n)}_m := (\phi_m^{(n)})^{-1}(A^{(n)}) }[/math].

One can think of [math]\displaystyle{ A^{(n)}_m }[/math] as a random variable taking values in the power set [math]\displaystyle{ X_m := 2^{[3]^m} }[/math], so its probability distribution is a probability measure [math]\displaystyle{ \mu^{(n)}_m }[/math] on [math]\displaystyle{ X_m }[/math]. Since [math]\displaystyle{ A^{(n)} }[/math] is line-free, the [math]\displaystyle{ A^{(n)}_m }[/math] are also.

These random variables have two notable stationarity properties. The first is permutation symmetry: the obvious action of the permutation group [math]\displaystyle{ S_m }[/math] on [math]\displaystyle{ X_m }[/math] preserves [math]\displaystyle{ \mu^{(n)}_m }[/math], thus [math]\displaystyle{ A^{(n)}_m }[/math] is stationary with respect to this action. The other symmetry comes from the slicing maps [math]\displaystyle{ U_{m+|w|}^w: X_{m+|w|} \to X_m }[/math] defined for any m, and any word [math]\displaystyle{ w \in [3]^{|w|} }[/math] of length |w| by the formula

[math]\displaystyle{ U_{m+|w|}^w(A_{m+|w|}) := \{ v \in [3]^m: vw \in A_{m+|w|} \} }[/math].

Observe the semigroupoid law

[math]\displaystyle{ U_{m+|w|}^w U_{m+|w|+|u|}^u = U_{m+|wu|}^{wu} }[/math] (1)

for all words w, u.

One has the almost stationary property: for fixed m and w, the random variables [math]\displaystyle{ A^{(n)}_m }[/math] and [math]\displaystyle{ U_{m+|w|}^w(A^{(n)}_{m+|w|}) }[/math] have asymptotically the same distribution in the limit [math]\displaystyle{ n \to \infty }[/math]. This is because both of these random subsets of [math]\displaystyle{ [3]^m }[/math] are formed by pulling back [math]\displaystyle{ A^{(n)} }[/math] by a random m-dimensional subspace; the probability distributions of these random subspaces differ slightly, but one can compute that the total variation of the difference goes to zero in the limit [math]\displaystyle{ n \to \infty }[/math] (holding m,w fixed).

Taking a subsequence limit as [math]\displaystyle{ n \to \infty }[/math] (and using the Helly selection principle, a.k.a. the vague sequential compactness of probability measures), we can then find random line-free sets [math]\displaystyle{ A_m \subset [3]^m }[/math] (or equivalently, a probability measure [math]\displaystyle{ \mu_m }[/math] on [math]\displaystyle{ X_m }[/math]) which is stationary with respect to the permutation groups [math]\displaystyle{ S_m }[/math] acting on [math]\displaystyle{ X_m }[/math], and also with respect to the slicing maps [math]\displaystyle{ U_{m+|w|}^w: X_{m+|w|} \to X_m }[/math]. Also, since [math]\displaystyle{ A^{(n)}_0 }[/math] was non-empty with probability at least [math]\displaystyle{ \delta }[/math], then so does [math]\displaystyle{ A_0 }[/math]; by stationarity, this implies that for any [math]\displaystyle{ v \in [3]^m }[/math], that [math]\displaystyle{ v\in A_m }[/math] with probability at least [math]\displaystyle{ \delta }[/math] also.

Step 2. Inverting the translations

This is all very well and good, but there is one minor technical problem which will cause difficulty in later parts of the argument: the slicing maps [math]\displaystyle{ U_{m+|w|}^w: X_{m+r} \to X_m }[/math] are not invertible. We would like to (artificially) upgrade them to be invertible (thus upgrading the semigroupoid of transformations [math]\displaystyle{ U_{m+r}^w }[/math] to a groupoid). This can be done, but requires extending the measure spaces $latex X_m$ a bit (or in probabilistic language, by adding random number generators).

Let's turn to the details. It suffices to understand what is going on with the hyperplane slicing maps [math]\displaystyle{ U_{m+1}^i: X_{m+1} \to X_m }[/math] for [math]\displaystyle{ i=0,1,2 }[/math], since the more general maps can be expressed as compositions of these maps by (1). The key lemma is

Lemma 1 Let [math]\displaystyle{ (X,\mu_X) }[/math] and [math]\displaystyle{ (Y,\mu_Y) }[/math] be finite probability spaces, and let [math]\displaystyle{ U: X \to Y }[/math] be a surjective map which is probability-preserving (thus [math]\displaystyle{ U_* \mu_X = \mu_Y }[/math], or equivalently that [math]\displaystyle{ \mu_X(U^{-1}(E)) = \mu_Y(E) }[/math] for every [math]\displaystyle{ E \subset Y). Then one can lift U to an (almost surely) invertible map \lt math\gt \tilde U: X \times [0,1] \to Y \times [0,1] }[/math] on the product probability spaces which forms a commuting square with the projections [math]\displaystyle{ \pi_X: X \times [0,1] \to X }[/math], [math]\displaystyle{ \pi_Y: Y \times [0,1] \to Y }[/math], such that [math]\displaystyle{ \tilde U, \tilde U^{-1} }[/math] are measure-preserving.

Proof: By working one fibre of U at a time, one can reduce to the case when Y is a point (so that U is trivial). Order X as [math]\displaystyle{ x_1,\ldots,x_k }[/math], with probabilities [math]\displaystyle{ p_1,\ldots,p_k }[/math]. One can then take

[math]\displaystyle{ \tilde U( x_j, t ) := p_1 + \ldots + p_{j-1} + p_j t }[/math]

for [math]\displaystyle{ j=1,\ldots,k }[/math] and [math]\displaystyle{ t \in [0,1] }[/math]; one easily verifies the desired properties. [math]\displaystyle{ \Box }[/math]

Applying this lemma, we may now construct invertible measure-preserving slicing maps [math]\displaystyle{ \tilde U_{m+1}^i: \tilde X_{m+1} \to \tilde X_m }[/math] that commute with the original slicing maps, where [math]\displaystyle{ \tilde X_m := X_m \times [0,1] }[/math] with product measure. We can also extend these maps to negative m, by declaring [math]\displaystyle{ X_m }[/math] to be a point in these cases (and the slicing map to be trivial). We then have the invertible measure-preserving slicing map [math]\displaystyle{ \tilde U_{m+r}^w: \tilde X_{m+r} \to \tilde X_m }[/math] for all words [math]\displaystyle{ w \in [3]^r }[/math] of length r, defined by the semigroupoid law

[math]\displaystyle{ \tilde U_{m+r}^{w_1 \ldots w_r} = \tilde U_{m+1}^{w_1} \ldots \tilde U_{m+r}^{w_r}. }[/math]

In fact, we can now extend this groupoid action not just to words, but to any element w of the free group [math]\displaystyle{ F_3 }[/math] generated by the alphabet 0,1,2 and the formal inverses [math]\displaystyle{ 0^{-1}, 1^{-1}, 2^{-1} }[/math], defining [math]\displaystyle{ \tilde U_{m+|w|}^w: \tilde X_{m+|w|} \to \tilde X_m }[/math] using the semigroupoid law (1) and the inversion law

[math]\displaystyle{ (\tilde U_{m+|w|}^w)^{-1} := \tilde U_m^{w^{-1}} }[/math].

Here we adopt the convention that the inverses [math]\displaystyle{ 0^{-1}, 1^{-1}, 2^{-1} }[/math] have length -1. Thus for instance [math]\displaystyle{ \tilde U_5^{012^{-1}1}: \tilde X_5 \to \tilde X_3 }[/math] is the map

[math]\displaystyle{ \tilde U_5^{012^{-1}1} = \tilde U_4^0 \tilde U_5^1 (\tilde U_4^2)^{-1} \tilde U_5^1 }[/math].