Furstenberg-Katznelson argument: Difference between revisions
No edit summary |
→Step 1: Random sampling: Suppose for contradiction that DHJ(3) _is false_ |
||
Line 4: | Line 4: | ||
== Step 1: Random sampling == | == Step 1: Random sampling == | ||
Suppose for contradiction that [[DHJ(3)]]; then there exists <math>\delta > 0</math>, a sequence of n going to infinity, and a sequence <math>A^{(n)} \subset [3]^n</math> of [[line-free]] sets, all with density at least <math>\delta</math>. | Suppose for contradiction that [[DHJ(3)]] is false; then there exists <math>\delta > 0</math>, a sequence of n going to infinity, and a sequence <math>A^{(n)} \subset [3]^n</math> of [[line-free]] sets, all with density at least <math>\delta</math>. | ||
For any fixed integer m, and with any n larger than m, we can then form a random line-free subset <math>A^{(n)}_m \subset [3]^m</math> by randomly embedding <math>[3]^m</math> inside <math>[3]^n</math>. More precisely, | For any fixed integer m, and with any n larger than m, we can then form a random line-free subset <math>A^{(n)}_m \subset [3]^m</math> by randomly embedding <math>[3]^m</math> inside <math>[3]^n</math>. More precisely, |
Revision as of 15:05, 17 February 2009
This proof of DHJ(3) is being pursued in our reading seminar. Below is an ongoing "combinatorial" translation of the argument.
Step 1: Random sampling
Suppose for contradiction that DHJ(3) is false; then there exists [math]\displaystyle{ \delta \gt 0 }[/math], a sequence of n going to infinity, and a sequence [math]\displaystyle{ A^{(n)} \subset [3]^n }[/math] of line-free sets, all with density at least [math]\displaystyle{ \delta }[/math].
For any fixed integer m, and with any n larger than m, we can then form a random line-free subset [math]\displaystyle{ A^{(n)}_m \subset [3]^m }[/math] by randomly embedding [math]\displaystyle{ [3]^m }[/math] inside [math]\displaystyle{ [3]^n }[/math]. More precisely,
- Pick m distinct positions [math]\displaystyle{ a_1,\ldots,a_m \in [n] }[/math] uniformly at random.
- Outside of these m positions, pick a random word x in [math]\displaystyle{ [3]^{[n] \backslash \{a_1,\ldots,a_m\}} }[/math] uniformly at random.
- Define the random sampling map [math]\displaystyle{ \phi_m^{(n)}: [3]^m \to [3]^n }[/math] by mapping [math]\displaystyle{ w_1 \ldots w_m }[/math] to the string that is equal to x outside of the positions [math]\displaystyle{ a_1,\ldots,a_m }[/math], and then setting the position at [math]\displaystyle{ a_i }[/math] equal to [math]\displaystyle{ w_i }[/math].
- Define [math]\displaystyle{ A^{(n)}_m := (\phi_m^{(n)})^{-1}(A^{(n)}) }[/math].
One can think of [math]\displaystyle{ A^{(n)}_m }[/math] as a random variable taking values in the power set [math]\displaystyle{ X_m := 2^{[3]^m} }[/math], so its probability distribution is a probability measure [math]\displaystyle{ \mu^{(n)}_m }[/math] on [math]\displaystyle{ X_m }[/math]. Since [math]\displaystyle{ A^{(n)} }[/math] is line-free, the [math]\displaystyle{ A^{(n)}_m }[/math] are also.
These random variables have two notable stationarity properties. The first is permutation symmetry: the obvious action of the permutation group [math]\displaystyle{ S_m }[/math] on [math]\displaystyle{ X_m }[/math] preserves [math]\displaystyle{ \mu^{(n)}_m }[/math], thus [math]\displaystyle{ A^{(n)}_m }[/math] is stationary with respect to this action. The other symmetry comes from the slicing maps [math]\displaystyle{ U_{m+|w|}^w: X_{m+|w|} \to X_m }[/math] defined for any m, and any word [math]\displaystyle{ w \in [3]^{|w|} }[/math] of length |w| by the formula
- [math]\displaystyle{ U_{m+|w|}^w(A_{m+|w|}) := \{ v \in [3]^m: vw \in A_{m+|w|} \} }[/math].
Observe the semigroupoid law
- [math]\displaystyle{ U_{m+|w|}^w U_{m+|w|+|u|}^u = U_{m+|wu|}^{wu} }[/math] (1)
for all words w, u.
One has the almost stationary property: for fixed m and w, the random variables [math]\displaystyle{ A^{(n)}_m }[/math] and [math]\displaystyle{ U_{m+|w|}^w(A^{(n)}_{m+|w|}) }[/math] have asymptotically the same distribution in the limit [math]\displaystyle{ n \to \infty }[/math]. This is because both of these random subsets of [math]\displaystyle{ [3]^m }[/math] are formed by pulling back [math]\displaystyle{ A^{(n)} }[/math] by a random m-dimensional subspace; the probability distributions of these random subspaces differ slightly, but one can compute that the total variation of the difference goes to zero in the limit [math]\displaystyle{ n \to \infty }[/math] (holding m,w fixed).
Taking a subsequence limit as [math]\displaystyle{ n \to \infty }[/math] (and using the Helly selection principle, a.k.a. the vague sequential compactness of probability measures), we can then find random line-free sets [math]\displaystyle{ A_m \subset [3]^m }[/math] (or equivalently, a probability measure [math]\displaystyle{ \mu_m }[/math] on [math]\displaystyle{ X_m }[/math]) which is stationary with respect to the permutation groups [math]\displaystyle{ S_m }[/math] acting on [math]\displaystyle{ X_m }[/math], and also with respect to the slicing maps [math]\displaystyle{ U_{m+|w|}^w: X_{m+|w|} \to X_m }[/math]. Also, since [math]\displaystyle{ A^{(n)}_0 }[/math] was non-empty with probability at least [math]\displaystyle{ \delta }[/math], then so does [math]\displaystyle{ A_0 }[/math]; by stationarity, this implies that for any [math]\displaystyle{ v \in [3]^m }[/math], that [math]\displaystyle{ v\in A_m }[/math] with probability at least [math]\displaystyle{ \delta }[/math] also.
Step 2. Inverting the translations
This is all very well and good, but there is one minor technical problem which will cause difficulty in later parts of the argument: the slicing maps [math]\displaystyle{ U_{m+|w|}^w: X_{m+r} \to X_m }[/math] are not invertible. We would like to (artificially) upgrade them to be invertible (thus upgrading the semigroupoid of transformations [math]\displaystyle{ U_{m+r}^w }[/math] to a groupoid). This can be done, but requires extending the measure spaces [math]\displaystyle{ X_m }[/math] a bit (or in probabilistic language, by adding random number generators).
Let's turn to the details. It suffices to understand what is going on with the hyperplane slicing maps [math]\displaystyle{ U_{m+1}^i: X_{m+1} \to X_m }[/math] for [math]\displaystyle{ i=0,1,2 }[/math], since the more general maps can be expressed as compositions of these maps by (1). The key lemma is
Lemma 1 Let [math]\displaystyle{ (X,\mu_X) }[/math] and [math]\displaystyle{ (Y,\mu_Y) }[/math] be finite probability spaces, and let [math]\displaystyle{ U: X \to Y }[/math] be a surjective map which is probability-preserving (thus [math]\displaystyle{ U_* \mu_X = \mu_Y }[/math], or equivalently that [math]\displaystyle{ \mu_X(U^{-1}(E)) = \mu_Y(E) }[/math] for every [math]\displaystyle{ E \subset Y }[/math]). Then one can lift U to an (almost surely) invertible map [math]\displaystyle{ \tilde U: X \times [0,1] \to Y \times [0,1] }[/math] on the product probability spaces which forms a commuting square with the projections [math]\displaystyle{ \pi_X: X \times [0,1] \to X }[/math], [math]\displaystyle{ \pi_Y: Y \times [0,1] \to Y }[/math], such that [math]\displaystyle{ \tilde U, \tilde U^{-1} }[/math] are measure-preserving.
Proof: By working one fibre of U at a time, one can reduce to the case when Y is a point (so that U is trivial). Order X as [math]\displaystyle{ x_1,\ldots,x_k }[/math], with probabilities [math]\displaystyle{ p_1,\ldots,p_k }[/math]. One can then take
- [math]\displaystyle{ \tilde U( x_j, t ) := p_1 + \ldots + p_{j-1} + p_j t }[/math]
for [math]\displaystyle{ j=1,\ldots,k }[/math] and [math]\displaystyle{ t \in [0,1] }[/math]; one easily verifies the desired properties. [math]\displaystyle{ \Box }[/math]
Applying this lemma, we may now construct invertible measure-preserving slicing maps [math]\displaystyle{ \tilde U_{m+1}^i: \tilde X_{m+1} \to \tilde X_m }[/math] that commute with the original slicing maps, where [math]\displaystyle{ \tilde X_m := X_m \times [0,1] }[/math] with product measure. We can also extend these maps to negative m, by declaring [math]\displaystyle{ X_m }[/math] to be a point in these cases (and the slicing map to be trivial). We then have the invertible measure-preserving slicing map [math]\displaystyle{ \tilde U_{m+r}^w: \tilde X_{m+r} \to \tilde X_m }[/math] for all words [math]\displaystyle{ w \in [3]^r }[/math] of length r, defined by the semigroupoid law
- [math]\displaystyle{ \tilde U_{m+r}^{w_1 \ldots w_r} = \tilde U_{m+1}^{w_1} \ldots \tilde U_{m+r}^{w_r}. }[/math]
In fact, we can now extend this groupoid action not just to words, but to any element w of the free group [math]\displaystyle{ F_3 }[/math] generated by the alphabet 0,1,2 and the formal inverses [math]\displaystyle{ 0^{-1}, 1^{-1}, 2^{-1} }[/math], defining [math]\displaystyle{ \tilde U_{m+|w|}^w: \tilde X_{m+|w|} \to \tilde X_m }[/math] using the semigroupoid law (1) and the inversion law
- [math]\displaystyle{ (\tilde U_{m+|w|}^w)^{-1} := \tilde U_m^{w^{-1}} }[/math]. (2)
Here we adopt the convention that the inverses [math]\displaystyle{ 0^{-1}, 1^{-1}, 2^{-1} }[/math] have length -1. Thus for instance [math]\displaystyle{ \tilde U_5^{012^{-1}1}: \tilde X_5 \to \tilde X_3 }[/math] is the map
- [math]\displaystyle{ \tilde U_5^{012^{-1}1} = \tilde U_4^0 \tilde U_5^1 (\tilde U_4^2)^{-1} \tilde U_5^1 }[/math].
Observe that for any word [math]\displaystyle{ w \in [3]^n }[/math], the event "[math]\displaystyle{ w \in A_n }[/math]" in [math]\displaystyle{ X_n }[/math] can be expressed as [math]\displaystyle{ (\tilde U_{|w|}^w)^{-1}(E) }[/math], where E is the event in [math]\displaystyle{ X_0 }[/math] that [math]\displaystyle{ A_0 }[/math] is non-empty.
To prove DHJ(3), it now suffices to show
DHJ(3)(c*). Suppose one has a family of probability spaces [math]\displaystyle{ \tilde X_n }[/math], together with measure-preserving invertible maps [math]\displaystyle{ \tilde U_{m+|w|}^w: \tilde X_{m+|w|} \to \tilde X_m }[/math] obeying the groupoid laws (1), (2). Let E be a set of positive measure. Then there exists a combinatorial line [math]\displaystyle{ w^0, w^1, w^2 }[/math] in some [math]\displaystyle{ [3]^n }[/math] such that the triple intersection of [math]\displaystyle{ (\tilde U_n^{w^0})^{-1}(E) }[/math], [math]\displaystyle{ (\tilde U_n^{w^1})^{-1}(E) }[/math], [math]\displaystyle{ (\tilde U_n^{w^2})^{-1}(E) }[/math] has positive measure.
We shall first prove the following weaker statement:
DHJ(2.6). Suppose one has a family of probability spaces [math]\displaystyle{ \tilde X_n }[/math], together with measure-preserving invertible maps [math]\displaystyle{ \tilde U_{m+|w|}^w: \tilde X_{m+|w|} \to \tilde X_m }[/math] obeying the groupoid laws (1), (2). Let E be a set of positive measure. Then there exists a combinatorial line [math]\displaystyle{ w^0, w^1, w^2 }[/math] in some [math]\displaystyle{ [3]^n }[/math] such that the each pairwise intersection of [math]\displaystyle{ (\tilde U_n^{w^0})^{-1}(E) }[/math], [math]\displaystyle{ (\tilde U_n^{w^1})^{-1}(E) }[/math], [math]\displaystyle{ (\tilde U_n^{w^2})^{-1}(E) }[/math] has positive measure.
Step 3: To be continued...
...
Commentary
Austin.6: At no point does an action of an easy-to-work-with _group_ of measure-preserving transformations emerge during the DHJ proof. Furstenberg and Katznelson introduce an enormous family of measure-preserving transformations indexed by all finite-length words in an alphabet of size k (and having some algebraic structure in terms of that family of words) in order to encode their notion of stationarity, but they neither use nor prove anything about the group generated by these transformations. Rather, they identify within their collection a great many `IP systems’ of transformations, a rather weaker sort of algebraic structure, and exploit that. In order to reach this point, they first obtain a kind of `stationarity’ for a family of positive-measure sets indexed by these words before introducing the transformations, and then expressing that stationarity in terms of these transformations in effect locks that notion of stationarity in place, and also gives a way to transport various mult-correlations between functions around on the underlying probability space.
In this connexion, it’s in obtaining this notion of stationarity that Furstenberg and Katznelson first use the Carlson-Simpson Theorem, and they then use it repeatedly throughout the paper in order to recover some similar kind of symmetry among a collection of objects indexed by the finite-length words. At this point, I can’t see where the same need would arise in the approach to DHJ under discussion here.