Furstenberg-Katznelson argument

From Polymath Wiki
Jump to navigationJump to search

This proof of DHJ(3) is being pursued in our reading seminar. Below is an ongoing "combinatorial" translation of the argument.


Step 1: Random sampling

Suppose for contradiction that DHJ(3); then there exists [math]\displaystyle{ \delta \gt 0 }[/math], a sequence of n going to infinity, and a sequence [math]\displaystyle{ A^{(n)} \subset [3]^n }[/math] of line-free sets, all with density at least [math]\displaystyle{ \delta }[/math].

For any fixed integer m, and with any n larger than m, we can then form a random line-free subset [math]\displaystyle{ A^{(n)}_m \subset [3]^m }[/math] by randomly embedding [math]\displaystyle{ [3]^m }[/math] inside [math]\displaystyle{ [3]^n }[/math]. More precisely,

  1. Pick m distinct positions [math]\displaystyle{ a_1,\ldots,a_m \in [n] }[/math] uniformly at random.
  2. Outside of these m positions, pick a random word x in [math]\displaystyle{ [3]^{[n] \backslash \{a_1,\ldots,a_m\}} }[/math] uniformly at random.
  3. Define the random sampling map [math]\displaystyle{ \phi_m^{(n)}: [3]^m \to [3]^n }[/math] by mapping [math]\displaystyle{ w_1 \ldots w_m }[/math] to the string that is equal to x outside of the positions [math]\displaystyle{ a_1,\ldots,a_m }[/math], and then setting the position at [math]\displaystyle{ a_i }[/math] equal to [math]\displaystyle{ w_i }[/math].
  4. Define [math]\displaystyle{ A^{(n)}_m := (\phi_m^{(n)})^{-1}(A^{(n)}) }[/math].

One can think of [math]\displaystyle{ A^{(n)}_m }[/math] as a random variable taking values in the power set [math]\displaystyle{ X_m := 2^{[3]^m} }[/math], so its probability distribution is a probability measure [math]\displaystyle{ \mu^{(n)}_m }[/math] on [math]\displaystyle{ X_m }[/math]. Since [math]\displaystyle{ A^{(n)} }[/math] is line-free, the [math]\displaystyle{ A^{(n)}_m }[/math] are also.

These random variables have two notable stationarity properties. The first is permutation symmetry: the obvious action of the permutation group [math]\displaystyle{ S_m }[/math] on [math]\displaystyle{ X_m }[/math] preserves [math]\displaystyle{ \mu^{(n)}_m }[/math], thus [math]\displaystyle{ A^{(n)}_m }[/math] is stationary with respect to this action. The other symmetry comes from the slicing maps [math]\displaystyle{ U_{m+|w|}^w: X_{m+|w|} \to X_m }[/math] defined for any m, and any word [math]\displaystyle{ w \in [3]^{|w|} }[/math] of length |w| by the formula

[math]\displaystyle{ U_{m+|w|}^w(A_{m+|w|}) := \{ v \in [3]^m: vw \in A_{m+|w|} \} }[/math].

Observe the semigroupoid law

[math]\displaystyle{ U_{m+|w|}^w U_{m+|w|+|u|}^u = U_{m+|wu|}^{wu} }[/math] (1)

for all words w, u.

One has the almost stationary property: for fixed m and w, the random variables [math]\displaystyle{ A^{(n)}_m }[/math] and [math]\displaystyle{ U_{m+|w|}^w(A^{(n)}_{m+|w|}) }[/math] have asymptotically the same distribution in the limit [math]\displaystyle{ n \to \infty }[/math]. This is because both of these random subsets of [math]\displaystyle{ [3]^m }[/math] are formed by pulling back [math]\displaystyle{ A^{(n)} }[/math] by a random m-dimensional subspace; the probability distributions of these random subspaces differ slightly, but one can compute that the total variation of the difference goes to zero in the limit [math]\displaystyle{ n \to \infty }[/math] (holding m,w fixed).

Taking a subsequence limit as [math]\displaystyle{ n \to \infty }[/math] (and using the Helly selection principle, a.k.a. the vague sequential compactness of probability measures), we can then find random line-free sets [math]\displaystyle{ A_m \subset [3]^m }[/math] (or equivalently, a probability measure [math]\displaystyle{ \mu_m }[/math] on [math]\displaystyle{ X_m }[/math]) which is stationary with respect to the permutation groups [math]\displaystyle{ S_m }[/math] acting on [math]\displaystyle{ X_m }[/math], and also with respect to the slicing maps [math]\displaystyle{ U_{m+|w|}^w: X_{m+|w|} \to X_m }[/math]. Also, since [math]\displaystyle{ A^{(n)}_0 }[/math] was non-empty with probability at least [math]\displaystyle{ \delta }[/math], then so does [math]\displaystyle{ A_0 }[/math]; by stationarity, this implies that for any [math]\displaystyle{ v \in [3]^m }[/math], that [math]\displaystyle{ v\in A_m }[/math] with probability at least [math]\displaystyle{ \delta }[/math] also.

Step 2. Inverting the translations

This is all very well and good, but there is one minor technical problem which will cause difficulty in later parts of the argument: the slicing maps [math]\displaystyle{ U_{m+|w|}^w: X_{m+r} \to X_m }[/math] are not invertible. We would like to (artificially) upgrade them to be invertible (thus upgrading the semigroupoid of transformations [math]\displaystyle{ U_{m+r}^w }[/math] to a groupoid). This can be done, but requires extending the measure spaces $latex X_m$ a bit (or in probabilistic language, by adding random number generators).

Let's turn to the details. It suffices to understand what is going on with the hyperplane slicing maps [math]\displaystyle{ U_{m+1}^i: X_{m+1} \to X_m }[/math] for [math]\displaystyle{ i=0,1,2 }[/math], since the more general maps can be expressed as compositions of these maps by (1). The key lemma is

Lemma 1 Let [math]\displaystyle{ (X,\mu_X) }[/math] and [math]\displaystyle{ (Y,\mu_Y) }[/math] be finite probability spaces, and let [math]\displaystyle{ U: X \to Y }[/math] be a surjective map which is probability-preserving (thus [math]\displaystyle{ U_* \mu_X = \mu_Y }[/math], or equivalently that [math]\displaystyle{ \mu_X(U^{-1}(E)) = \mu_Y(E) }[/math] for every [math]\displaystyle{ E \subset Y }[/math]). Then one can lift U to an (almost surely) invertible map [math]\displaystyle{ \tilde U: X \times [0,1] \to Y \times [0,1] }[/math] on the product probability spaces which forms a commuting square with the projections [math]\displaystyle{ \pi_X: X \times [0,1] \to X }[/math], [math]\displaystyle{ \pi_Y: Y \times [0,1] \to Y }[/math], such that [math]\displaystyle{ \tilde U, \tilde U^{-1} }[/math] are measure-preserving.

Proof: By working one fibre of U at a time, one can reduce to the case when Y is a point (so that U is trivial). Order X as [math]\displaystyle{ x_1,\ldots,x_k }[/math], with probabilities [math]\displaystyle{ p_1,\ldots,p_k }[/math]. One can then take

[math]\displaystyle{ \tilde U( x_j, t ) := p_1 + \ldots + p_{j-1} + p_j t }[/math]

for [math]\displaystyle{ j=1,\ldots,k }[/math] and [math]\displaystyle{ t \in [0,1] }[/math]; one easily verifies the desired properties. [math]\displaystyle{ \Box }[/math]

Applying this lemma, we may now construct invertible measure-preserving slicing maps [math]\displaystyle{ \tilde U_{m+1}^i: \tilde X_{m+1} \to \tilde X_m }[/math] that commute with the original slicing maps, where [math]\displaystyle{ \tilde X_m := X_m \times [0,1] }[/math] with product measure. We can also extend these maps to negative m, by declaring [math]\displaystyle{ X_m }[/math] to be a point in these cases (and the slicing map to be trivial). We then have the invertible measure-preserving slicing map [math]\displaystyle{ \tilde U_{m+r}^w: \tilde X_{m+r} \to \tilde X_m }[/math] for all words [math]\displaystyle{ w \in [3]^r }[/math] of length r, defined by the semigroupoid law

[math]\displaystyle{ \tilde U_{m+r}^{w_1 \ldots w_r} = \tilde U_{m+1}^{w_1} \ldots \tilde U_{m+r}^{w_r}. }[/math]

In fact, we can now extend this groupoid action not just to words, but to any element w of the free group [math]\displaystyle{ F_3 }[/math] generated by the alphabet 0,1,2 and the formal inverses [math]\displaystyle{ 0^{-1}, 1^{-1}, 2^{-1} }[/math], defining [math]\displaystyle{ \tilde U_{m+|w|}^w: \tilde X_{m+|w|} \to \tilde X_m }[/math] using the semigroupoid law (1) and the inversion law

[math]\displaystyle{ (\tilde U_{m+|w|}^w)^{-1} := \tilde U_m^{w^{-1}} }[/math].

Here we adopt the convention that the inverses [math]\displaystyle{ 0^{-1}, 1^{-1}, 2^{-1} }[/math] have length -1. Thus for instance [math]\displaystyle{ \tilde U_5^{012^{-1}1}: \tilde X_5 \to \tilde X_3 }[/math] is the map

[math]\displaystyle{ \tilde U_5^{012^{-1}1} = \tilde U_4^0 \tilde U_5^1 (\tilde U_4^2)^{-1} \tilde U_5^1 }[/math].

Step 3: To be continued...