Furstenberg-Katznelson argument: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
mNo edit summary
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This proof of [[DHJ(3)]] is being pursued in our [http://terrytao.wordpress.com/2009/02/11/a-reading-seminar-on-density-hales-jewett/ reading seminar].  Below is an ongoing "combinatorial" translation of the argument.
This proof of [[DHJ(3)]] is being pursued in our [http://terrytao.wordpress.com/2009/02/11/a-reading-seminar-on-density-hales-jewett/ reading seminar].  Below is an ongoing "combinatorial" translation of the argument. [Very informal!]


We write <math>[3] = \{0,1,2\}</math> for our alphabet.  We let <math>Lines(n)</math> be the space of all combinatorial lines <math>\ell: [3] \to [3]^n</math> with "few" wildcards, where we will be vague about what "few" means.  We also have <math>Planes(n)</math>, the space of combinatorial planes <math>\pi: [3]^2 \to [3]^n</math>, and <math>Spaces(n)</math>, the space of combinatorial 3-spaces <math>\sigma: [3]^3 \to [3]^n</math>.  Note that one can think of a combinatorial plane in <math>[3]^n</math> as a combinatorial line in Lines(n), etc.


== Step 1: Random sampling ==
[[DHJ(3)]] is then the claim that if <math>f: [3]^n \to [0,1]</math> has large mean, then


Suppose for contradiction that [[DHJ(3)]]; then there exists <math>\delta > 0</math>, a sequence of n going to infinity, and a sequence <math>A^{(n)} \subset [3]^n</math> of [[line-free]] sets, all with density at least <math>\delta</math>.
:<math>{\Bbb E} f(\ell(0)) f(\ell(1)) f(\ell(2))</math> is large,


For any fixed integer m, and with any n larger than m, we can then form a random line-free subset <math>A^{(n)}_m \subset [3]^m</math> by randomly embedding <math>[3]^m</math> inside <math>[3]^n</math>. More precisely,
where <math>\ell</math> ranges over Lines(n).


# Pick m distinct positions <math>a_1,\ldots,a_m \in [n]</math> uniformly at random.
== 12-low influence ==
# Outside of these m positions, pick a random word x in <math>[3]^{[n] \backslash \{a_1,\ldots,a_m\}</math> uniformly at random.
# Define the random sampling map <math>\phi_m^{(n)}: [3]^m \to [3]^n</math> by mapping <math>w_1 \ldots w_m</math> to the string that is equal to x outside of the positions <math>a_1,\ldots,a_m</math>, and then setting the position at <math>a_i</math> equal to <math>w_i</math>.
# Define <math>A^{(n)}_m := (\phi_m^{(n)})^{-1}(A^{(n)})</math>.


One can think of <math>A^{(n)}_m</math> as a random variable taking values in the power set <math>X_m := 2^{[3]^m}</math>, so its probability distribution is a probability measure <math>\mu^{(n)}_m</math> on <math>X_m</math>.  Since <math>A^{(n)}</math> is line-free, the <math>A^{(n)}_m</math> are also.
We say that a function <math>f: [3]^n \to [-1,1]</math> is ''12-low influence'' if


These random variables have two notable stationarity properties.  The first is permutation symmetry: the obvious action of the permutation group <math>S_m</math> on <math>X_m</math> preserves <math>\mu^{(n)}_m</math>, thus <math>A^{(n)}_m</math> is stationary with respect to this action.  The other symmetry comes from the ''slicing maps'' <math>U_{m+|w|}^w: X_{m+|w|} \to X_m</math> defined for any m, and any word <math>w \in [3]^{|w|}</math> of length |w| by the formula
:<math>{\Bbb E} |f(x)-f(y)|^2</math> is small,


:<math>U_{m+|w|}^w(A_{m+|w|}) := \{ v \in [3]^m: vw \in A_{m+|w|} \}</math>.
where x is drawn uniformly from <math>[3]^n</math> and y is formed from x by randomly flipping a 1 to a 2 or vice versa. Equivalently, f is 12-low influence if


Observe the semigroupoid law
<math>{\Bbb E} |f(\ell(1)) - f(\ell(2))|^2</math> is small,


:<math>U_{m+|w|}^w U_{m+|w|+|u|}^u = U_{m+|wu|}^{wu}</math> (1)
where <math>\ell</math> is drawn randomly from Lines(n).


for all words w, u.
The following claim is, technically, not true, but is a useful first approximation to the truth:


One has the <i>almost stationary</i> property: for fixed m and w, the random variables <math>A^{(n)}_m</math> and <math>U_{m+|w|}^w(A^{(n)}_{m+|w|})</math> have asymptotically the same distribution in the limit <math>n \to \infty</math>.  This is because both of these random subsets of <math>[3]^m</math> are formed by pulling back <math>A^{(n)}</math> by a random m-dimensional subspace; the probability distributions of these random subspaces differ slightly, but one can compute that the total variation of the difference goes to zero in the limit <math>n \to \infty</math> (holding m,w fixed).
:'''Lemma 1''' (A lie)  Let <math>f: [3]^n \to [-1,1]</math>.  Then the function <math>F(x) := {\Bbb E}_{\ell: \ell(1)=x} f(\ell(2))</math> has 12-low influence, as does the function <math>G(x) := {\Bbb E}_{\ell: \ell(2) = x} f(\ell(1))</math>.


Taking a subsequence limit as <math>n \to \infty</math> (and using the Helly selection principle, a.k.a. the vague sequential compactness of probability measures), we can then find random line-free sets <math>A_m \subset [3]^m</math> (or equivalently, a probability measure <math>\mu_m</math> on <math>X_m</math>) which is stationary with respect to the permutation groups <math>S_m</math> acting on <math>X_m</math>, and also with respect to the slicing maps <math>U_{m+|w|}^w: X_{m+|w|} \to X_m</math>.  Also, since <math>A^{(n)}_0</math> was non-empty with probability at least <math>\delta</math>, then so does <math>A_0</math>; by stationarity, this implies that for any <math>v \in [3]^m</math>, that <math>v\in A_m</math> with probability at least <math>\delta</math> also.
Let us say that a function <math>f: [3]^n \to [-1,1]</math> is ''12-uniform'' if <math>{\Bbb E} g(\ell(1)) f(\ell(2))</math> is small for all <math>g: [3]^n \to [-1,1]</math>, where <math>\ell</math> is again ranging over Lines(n).  Assuming Lemma 1, we see that 12-uniform functions are basically orthogonal to 12-low influence functions.  In fact any function can essentially be orthogonally decomposed into 12-uniform and 12-low influence components (see the [[Fourier-analytic proof of Sperner]] for some more discussion).


== Step 2Inverting the translations ==
We can also define 12-uniformity and 12-low influence on functions on Lines(n) rather than <math>[3]^n</math>, where now <math>\ell</math> is ranging over lines in Lines(n) (i.e. in elements of Planes(n))Similarly for functions on Planes(n), Spaces(n), etc.


This is all very well and good, but there is one minor technical problem which will cause difficulty in later parts of the argument: the slicing maps <math>U_{m+|w|}^w: X_{m+r} \to X_m</math> are not invertible.  We would like to (artificially) upgrade them to be invertible (thus upgrading the semigroupoid of transformations <math>U_{m+r}^w</math> to a groupoid). This can be done, but requires extending the measure spaces $latex X_m$ a bit (or in probabilistic language, by adding random number generators). 
== 01-uniformity relative to 12-low influence ==


Let's turn to the detailsIt suffices to understand what is going on with the hyperplane slicing maps <math>U_{m+1}^i: X_{m+1} \to X_m</math> for <math>i=0,1,2</math>, since the more general maps can be expressed as compositions of these maps by (1). The key lemma is
We can define 01-uniformity in exact analogy to 12-uniformity; a function f is 01-uniform if <math>{\Bbb E} g(\ell(0)) f(\ell(1))</math> is small for all <math>g: [3]^n \to [-1,1]</math>We need a more sophisticated notion (analogous to <math>U^2({\Bbb Z}/N{\Bbb Z})</math> Gowers uniformity): a function f is 01-uniform relative to 12-low influence if <math>{\Bbb E} g(\ell(0)) f(\ell(1)) h(\ell)</math> is small for all <math>g: [3]^n \to [-1,1]</math> and all 12-low influence <math>h: Lines(n) \to [-1,1]</math>.


'''Lemma 1''' Let <math>(X,\mu_X)</math> and <math>(Y,\mu_Y)</math> be finite probability spaces, and let <math>U: X \to Y</math> be a surjective map which is probability-preserving (thus <math>U_* \mu_X = \mu_Y</math>, or equivalently that <math>\mu_X(U^{-1}(E)) = \mu_Y(E)</math> for every <math>E \subset Y</math>).  Then one can lift U to an (almost surely) invertible map <math>\tilde U: X \times [0,1] \to Y \times [0,1]</math> on the product probability spaces which forms a commuting square with the projections <math>\pi_X: X \times [0,1] \to X</math>, <math>\pi_Y: Y \times [0,1] \to Y</math>, such that <math>\tilde U, \tilde U^{-1}</math> are measure-preserving.
'''Example''' Random functions are 01-uniform relative to 12-low influence.


'''Proof''': By working one fibre of U at a time, one can reduce to the case when Y is a point (so that U is trivial).  Order X as <math>x_1,\ldots,x_k</math>, with probabilities <math>p_1,\ldots,p_k</math>. One can then take
:'''Lemma 2''' (von Neumann theorem)  If <math>f: [3]^n \to [-1,1]</math> is 01-uniform relative to 12-low influence and <math>g, h: [3]^n \to [-1,1]</math>, then <math>{\Bbb E} h(\ell(0)) f(\ell(1)) g(\ell(2))</math> is small.


:<math>\tilde U( x_j, t ) := p_1 + \ldots + p_{j-1} + p_j t</math>
'''Proof''': An "IP van der Corput lemma" allows one to ensure that <math>{\Bbb E} h(\ell(0)) f(\ell(1)) g(\ell(2))</math> is small if we can show that


for <math>j=1,\ldots,k</math> and <math>t \in [0,1]</math>; one easily verifies the desired properties.  <math>\Box</math>
:<math>{\Bbb E} f(\pi(01)) g(\pi(02)) f(\pi(11)) g(\pi(22))</math> (1)


Applying this lemma, we may now construct invertible measure-preserving slicing maps <math>\tilde U_{m+1}^i: \tilde X_{m+1} \to \tilde X_m</math> that commute with the original slicing maps, where <math>\tilde X_m := X_m \times [0,1]</math> with product measure.  We can also extend these maps to negative m, by declaring <math>X_m</math> to be a point in these cases (and the slicing map to be trivial).  We then have the invertible measure-preserving slicing map <math>\tilde U_{m+r}^w: \tilde X_{m+r} \to \tilde X_m</math> for all words <math>w \in [3]^r</math> of length r, defined by the semigroupoid law
is small, where <math>\pi</math> ranges over Planes(n).  (Rough sketch of proof: we can rearrange <math>{\Bbb E} h(\ell(0)) f(\ell(1)) g(\ell(2))</math> as


:<math>\tilde U_{m+r}^{w_1 \ldots w_r} = \tilde U_{m+1}^{w_1} \ldots \tilde U_{m+r}^{w_r}.</math>
:<math>{\Bbb E}_{\phi,i} h(\phi(0^r)) f(\phi(1^i 0^{r-i})) g(2^i 0^{r-i})</math>


In fact, we can now extend this groupoid action not just to words, but to any element w of the free group <math>F_3</math> generated by the alphabet 0,1,2 and the formal inverses <math>0^{-1}, 1^{-1}, 2^{-1}</math>, defining <math>\tilde U_{m+|w|}^w: \tilde X_{m+|w|} \to \tilde X_m</math> using the semigroupoid law (1) and the inversion law
where r is a medium size number, <math>\phi: [3]^r \to [3]^n</math> ranges over r-dimensional subspaces with few wildcards, and i ranges from 1 to r.  Using Cauchy-Schwarz in i, we reduce to bounding


:<math>(\tilde U_{m+|w|}^w)^{-1} := \tilde U_m^{w^{-1}}</math>.
:<math>{\Bbb E}_{\phi,i,j} f(\phi(1^i 0^{r-i})) g(2^i 0^{r-i}) f(\phi(1^j 0^{r-j})) g(2^j 0^{r-j})</math>


Here we adopt the convention that the inverses <math>0^{-1}, 1^{-1}, 2^{-1}</math> have length -1.  Thus for instance <math>\tilde U_5^{012^{-1}1}: \tilde X_5 \to \tilde X_3</math> is the map
which can be rearranged to give (1)).   


:<math>\tilde U_5^{012^{-1}1} = \tilde U_4^0 \tilde U_5^1 (\tilde U_4^2)^{-1} \tilde U_5^1</math>.
We can rearrange (1) further as
--[[User:Ryanworldwide|Ryanworldwide]] 19:44, 14 February 2009 (UTC)
 
:<math>{\Bbb E} F(\pi(1)) G(\pi(2))</math>
 
where <math>\pi(1), \pi(2)</math> are thought of as lines and <math>F(\ell) := f(\ell(0)) f(\ell(1))</math> and <math>G(\ell) := g(\ell(0)) g(\ell(2))</math>.  But by Lemma 1, we can rewrite this as
 
:<math>{\Bbb E} F(\ell) \overline{G}(\ell)</math>
 
where <math>\overline{G}</math> is 12-low influence.  But as f is 01-uniform relative to 12-low influence, this expression is small.  <math>\Box</math>
 
Thus, when dealing with expressions of the form
 
:<math>{\Bbb E} h(\ell(0)) f(\ell(1)) g(\ell(2))</math>,
 
functions which are 01-uniform relative to 12-low influence can be discarded from the second factor.  For similar reasons, functions which are 20-uniform relative to 12-low influence (defined in the obvious manner) can be discarded from the third factor).
 
== Obstructions to uniformity ==
 
To continue, we need to understand what the obstructions are to 01-uniformity relative to 12-low influence.  The answer turns out to be 01-almost periodicity relative to 12-low influence.  Roughly speaking, a function f is 01-periodic relative to 12-low influence if any shift from 0s to 1s distorts f in a manner which can be described by 12-low influence functions.  More precisely, there exists a bounded number of functions <math>g_h: [3]^n \to [-1,1]</math> and 12-low influence functions <math>c_h: Lines(n) \to [-1,1]</math> for <math>h = 1,\ldots,H</math> such that the equation
 
:<math>f(\ell(1)) = {\Bbb E}_h c_h(\ell) g_h(\ell(0))</math>
 
is approximately true for most lines <math>\ell</math> in Lines(n).
 
'''Example 1''' If f is 01-low influence, then <math>f(\ell(1)) \approx f(\ell(0))</math>, so f is certainly 01-almost periodic relative to 12-low influence.
 
'''Example 2''' Let <math>f(x) \in \{-1,+1\}</math> be the parity of the number of 1s in x, then <math>f(\ell(1)) = c(\ell) f(\ell(0))</math> where <math>c(\ell)</math> depends only on the number of wildcards in <math>\ell</math>, and is in particular 12-low influence.  Thus this function is also 01-almost periodic relative to 12-low influence.
 
'''Example 3''' If <math>f: [3]^n \to [-1,1]</math> is 12-low influence, then trivially <math>f(\ell(1)) = c(\ell)</math> where <math>c(\ell) := f(\ell(1))</math>.  Since c is 12-low influence, we conclude that f is 01-almost periodic relative to 12-low influence.
 
'''Example 4''' Any bounded linear combination of 01-almost periodic functions relative to 12-low influence is also 01-almost periodic relative to 12-low influence; also, any product of 01-almost periodic functions relative to 12-low influence is also 01-almost periodic relative to 12-low influence.
 
These almost periodic functions obstruct uniformity:
 
'''Lemma 2''': If <math>f: [3]^n \to [-1,1]</math> is 01-uniform relative to 12-low influence, and <math>g: [3]^n \to [-1,1]</math> is 01-almost periodic relative to 12-low influence, then f and g have small inner product.
 
'''Proof''': The inner product of f and g can be expressed as
 
:<math> {\Bbb E} f(\ell(1)) g(\ell(1))</math>
 
where <math>\ell</math> ranges over Lines(n).  On the other hand, from the almost periodicity of g we can write
 
:<math>g(\ell(1)) = {\Bbb E}_h c_h(\ell) G_h(\ell(0))</math>
 
for some 12-low influence <math>c_h</math> and some bounded <math>G_h</math>.  So we can reexpress the inner product as
 
:<math> {\Bbb E}_h {\Bbb E} f(\ell(1)) G_h(\ell(0)) c_h(\ell).</math>
 
But as f is 01-uniform relative to 12-low influence, the inner expectation is small, and the claim follows.  <math>\Box</math>
 
Conversely, these are the only obstructions to uniformity:
 
'''Lemma 3''': If <math>f: [3]^n \to [-1,1]</math> is not 01-uniform relative to 12-low influence, then there exists a <math>g: [3]^n \to [-1,1]</math> is 01-almost periodic relative to 12-low influence, such that f and g have large inner product.
 
'''Proof''': By hypothesis, we can find a 12-low influence c and a bounded G such that
 
:<math>{\Bbb E} f(\ell(1)) G(\ell(0)) c(\ell)</math>
 
is large.  We then take
 
:<math> g(x) := {\Bbb E}_{\ell: \ell(1) = x} G(\ell(0)) c(\ell)</math>.
 
Clearly f correlates with g.
 
If the expression inside the expectation was constant, then we would have
:<math> g(\ell(1)) = G(\ell(0)) c(\ell)</math>
and g would be 01-almost periodic relative to 12-low influence.  Of course, the expression here is not necessarily constant, but if it oscillates too much, then g would be very small and the claim would be easy.  I think we can use some sort of statistical sampling argument to handle the general case (I did this in [http://front.math.ucdavis.edu/math.CO/0405251 my finitisation of Furstenberg's proof of Szemeredi's theorem]); I'll come back to this point later.  <math>\Box</math>
 
In view of this lemma, I believe we have
 
'''Corollary 4''': Any function <math>f: [3]^n \to [0,1]</math> can be decomposed into a function <math>f_{01}: [3]^n \to [0,1]</math> with the same mean as f which is 01-almost periodic relative to 12-low influence, plus an error <math>f-f_{01}</math> which is 01-uniform relative to 12-low influence.
 
We have a similar decomposition <math>f = f_{20} + (f-f_{20})</math> into a function <math>f_{20}: [3]^n \to [0,1]</math> with the same mean as f which is 20-almost periodic relative to 12-low influence, plus an error <math>f-f_{20}</math> which is 01-uniform relative to 12-low influence.  Using the von Neumann theorem, we can thus replace
 
:<math>{\Bbb E} f(\ell(0)) f(\ell(1)) f(\ell(2))</math>
 
by
 
:<math>{\Bbb E} f(\ell(0)) f_{01}(\ell(1)) f_{20}(\ell(2))</math>.  (2)
 
== Conclusion of proof ==
 
Let <math>f_{12}</math> be the 12-low influence component of f, then <math>f_{12}</math> is non-negative, has large density, and correlates with f.  If we then let E be the set where <math>f_{12}</math> is large, then E is also 12-low influence, and f is large on E.
 
By [[DHJ(2.5)]], the 12-low influence set E contains large combinatorial subspaces.  Passing to such a subspace, we may assume that E is everything, i.e. <math>f_{12}</math> is large everywhere. To put it another way, this implies that f has large density inside any 12-low influence set.
 
Having done this, we now perform the reductions in the previous section to get to the expression (2).  From the almost periodicity of <math>f_{01}, f_{20}</math>, we have
 
:<math>f_{01}(\ell(1)) = {\Bbb E}_h c_h(\ell) g_{01,h}(\ell(0))</math> (3)
 
where <math>c_h</math> is 12-low influence, and similarly
 
:<math>f_{20}(\ell(2)) = {\Bbb E}_h c'_h(\ell) g_{20,h}(\ell(0))</math> (4)
where <math>c'_h</math> is 12-low influence.
 
Direct substitution of (3) and (4) into (2) does not quite work; we have to do something a bit sneakier.  Let r be a medium size number and consider an r-dimensional subspace <math>\phi: [3]^r \to [3]^n</math>.  By the [[Graham-Rothschild theorem]], if r is large enough, one can find a three-dimensional subspace <math>\sigma: [3]^3 \to [3]^n</math> of <math>\phi</math> on which <math>c_h, c'_h</math> are essentially constant.  Call such a subspace ''monochromatic''; thus the collection of monochromatic 3-dimensional subspaces has positive density in Spaces(n).  Furthermore, since this collection is defined using the 12-low influence functions <math>c_h, c'_h</math>, the collection of 3-dimensional monochromatic spaces is itself 12-low influence.
 
We now rewrite (2) as
 
:<math>{\Bbb E} f(\sigma(012)) f_{01}(\sigma(112)) f_{20}(\sigma(212))</math>
 
where <math>\sigma</math> ranges over Spaces(n).
 
Suppose that <math>\sigma</math> is monochromatic.  Then from (3) we have
:<math>f_{01}(\sigma(112)) = {\Bbb E}_h c_h(\sigma(**2)) g_{01,h}(\sigma(002))</math>
and
:<math>f_{01}(\sigma(012)) = {\Bbb E}_h c_h(\sigma(0*2)) g_{01,h}(\sigma(002))</math>
and hence by monochromaticity
:<math>f_{01}(\sigma(112)) = f_{01}(\sigma(012))</math>
and similarly
:<math>f_{20}(\sigma(212)) = f_{20}(\sigma(012))</math>.
Thus we can bound (2) from below by
:<math>{\Bbb E} 1_{mono}(\sigma) [f f_{01} f_{20}](\sigma(012))</math>
where <math>1_{mono}</math> is the indicator function on the class of monochromatic spaces.
 
Now, we claim that <math>f_{01}</math> is large on almost all of the support of f.  Indeed, if there was a large set in the support of f on which <math>f_{01}</math> was small, then denoting f' by the restriction of f to that set, <math>{\Bbb E} f'</math> would be large but <math> {\Bbb E} f' f_{01}</math> would be small.  But if <math>f'_{01}</math> denotes the component of f' which is 01-almost periodic relative to 12-low influence, then <math>f'_{01}</math> is dominated by <math>f_{01}</math> (since f' is dominated by f), and so <math> {\Bbb E} f' f'_{01}</math> would be small.  But by orthogonality of f' and <math>f'-f'_{01}</math>, this is equal to <math> {\Bbb E} (f'_{01})^2 </math>.  Since <math>f'_{01}</math> has the same mean as f', which is large, this is a contradiction.  Thus <math>f_{01}</math> is large on almost all the support of f, and similarly for <math>f_{20}</math>.  Thus we can bound (2) from below by some multiple of
 
:<math>{\Bbb E} 1_{mono}(\sigma) 1_{supp(f)}(\sigma(012))</math>.
 
But because f has positive density inside any 12-low influence set, this expression is large, and we are done.

Latest revision as of 17:43, 24 February 2009

This proof of DHJ(3) is being pursued in our reading seminar. Below is an ongoing "combinatorial" translation of the argument. [Very informal!]

We write [math]\displaystyle{ [3] = \{0,1,2\} }[/math] for our alphabet. We let [math]\displaystyle{ Lines(n) }[/math] be the space of all combinatorial lines [math]\displaystyle{ \ell: [3] \to [3]^n }[/math] with "few" wildcards, where we will be vague about what "few" means. We also have [math]\displaystyle{ Planes(n) }[/math], the space of combinatorial planes [math]\displaystyle{ \pi: [3]^2 \to [3]^n }[/math], and [math]\displaystyle{ Spaces(n) }[/math], the space of combinatorial 3-spaces [math]\displaystyle{ \sigma: [3]^3 \to [3]^n }[/math]. Note that one can think of a combinatorial plane in [math]\displaystyle{ [3]^n }[/math] as a combinatorial line in Lines(n), etc.

DHJ(3) is then the claim that if [math]\displaystyle{ f: [3]^n \to [0,1] }[/math] has large mean, then

[math]\displaystyle{ {\Bbb E} f(\ell(0)) f(\ell(1)) f(\ell(2)) }[/math] is large,

where [math]\displaystyle{ \ell }[/math] ranges over Lines(n).

12-low influence

We say that a function [math]\displaystyle{ f: [3]^n \to [-1,1] }[/math] is 12-low influence if

[math]\displaystyle{ {\Bbb E} |f(x)-f(y)|^2 }[/math] is small,

where x is drawn uniformly from [math]\displaystyle{ [3]^n }[/math] and y is formed from x by randomly flipping a 1 to a 2 or vice versa. Equivalently, f is 12-low influence if

[math]\displaystyle{ {\Bbb E} |f(\ell(1)) - f(\ell(2))|^2 }[/math] is small,

where [math]\displaystyle{ \ell }[/math] is drawn randomly from Lines(n).

The following claim is, technically, not true, but is a useful first approximation to the truth:

Lemma 1 (A lie) Let [math]\displaystyle{ f: [3]^n \to [-1,1] }[/math]. Then the function [math]\displaystyle{ F(x) := {\Bbb E}_{\ell: \ell(1)=x} f(\ell(2)) }[/math] has 12-low influence, as does the function [math]\displaystyle{ G(x) := {\Bbb E}_{\ell: \ell(2) = x} f(\ell(1)) }[/math].

Let us say that a function [math]\displaystyle{ f: [3]^n \to [-1,1] }[/math] is 12-uniform if [math]\displaystyle{ {\Bbb E} g(\ell(1)) f(\ell(2)) }[/math] is small for all [math]\displaystyle{ g: [3]^n \to [-1,1] }[/math], where [math]\displaystyle{ \ell }[/math] is again ranging over Lines(n). Assuming Lemma 1, we see that 12-uniform functions are basically orthogonal to 12-low influence functions. In fact any function can essentially be orthogonally decomposed into 12-uniform and 12-low influence components (see the Fourier-analytic proof of Sperner for some more discussion).

We can also define 12-uniformity and 12-low influence on functions on Lines(n) rather than [math]\displaystyle{ [3]^n }[/math], where now [math]\displaystyle{ \ell }[/math] is ranging over lines in Lines(n) (i.e. in elements of Planes(n)). Similarly for functions on Planes(n), Spaces(n), etc.

01-uniformity relative to 12-low influence

We can define 01-uniformity in exact analogy to 12-uniformity; a function f is 01-uniform if [math]\displaystyle{ {\Bbb E} g(\ell(0)) f(\ell(1)) }[/math] is small for all [math]\displaystyle{ g: [3]^n \to [-1,1] }[/math]. We need a more sophisticated notion (analogous to [math]\displaystyle{ U^2({\Bbb Z}/N{\Bbb Z}) }[/math] Gowers uniformity): a function f is 01-uniform relative to 12-low influence if [math]\displaystyle{ {\Bbb E} g(\ell(0)) f(\ell(1)) h(\ell) }[/math] is small for all [math]\displaystyle{ g: [3]^n \to [-1,1] }[/math] and all 12-low influence [math]\displaystyle{ h: Lines(n) \to [-1,1] }[/math].

Example Random functions are 01-uniform relative to 12-low influence.

Lemma 2 (von Neumann theorem) If [math]\displaystyle{ f: [3]^n \to [-1,1] }[/math] is 01-uniform relative to 12-low influence and [math]\displaystyle{ g, h: [3]^n \to [-1,1] }[/math], then [math]\displaystyle{ {\Bbb E} h(\ell(0)) f(\ell(1)) g(\ell(2)) }[/math] is small.

Proof: An "IP van der Corput lemma" allows one to ensure that [math]\displaystyle{ {\Bbb E} h(\ell(0)) f(\ell(1)) g(\ell(2)) }[/math] is small if we can show that

[math]\displaystyle{ {\Bbb E} f(\pi(01)) g(\pi(02)) f(\pi(11)) g(\pi(22)) }[/math] (1)

is small, where [math]\displaystyle{ \pi }[/math] ranges over Planes(n). (Rough sketch of proof: we can rearrange [math]\displaystyle{ {\Bbb E} h(\ell(0)) f(\ell(1)) g(\ell(2)) }[/math] as

[math]\displaystyle{ {\Bbb E}_{\phi,i} h(\phi(0^r)) f(\phi(1^i 0^{r-i})) g(2^i 0^{r-i}) }[/math]

where r is a medium size number, [math]\displaystyle{ \phi: [3]^r \to [3]^n }[/math] ranges over r-dimensional subspaces with few wildcards, and i ranges from 1 to r. Using Cauchy-Schwarz in i, we reduce to bounding

[math]\displaystyle{ {\Bbb E}_{\phi,i,j} f(\phi(1^i 0^{r-i})) g(2^i 0^{r-i}) f(\phi(1^j 0^{r-j})) g(2^j 0^{r-j}) }[/math]

which can be rearranged to give (1)).

We can rearrange (1) further as

[math]\displaystyle{ {\Bbb E} F(\pi(1)) G(\pi(2)) }[/math]

where [math]\displaystyle{ \pi(1), \pi(2) }[/math] are thought of as lines and [math]\displaystyle{ F(\ell) := f(\ell(0)) f(\ell(1)) }[/math] and [math]\displaystyle{ G(\ell) := g(\ell(0)) g(\ell(2)) }[/math]. But by Lemma 1, we can rewrite this as

[math]\displaystyle{ {\Bbb E} F(\ell) \overline{G}(\ell) }[/math]

where [math]\displaystyle{ \overline{G} }[/math] is 12-low influence. But as f is 01-uniform relative to 12-low influence, this expression is small. [math]\displaystyle{ \Box }[/math]

Thus, when dealing with expressions of the form

[math]\displaystyle{ {\Bbb E} h(\ell(0)) f(\ell(1)) g(\ell(2)) }[/math],

functions which are 01-uniform relative to 12-low influence can be discarded from the second factor. For similar reasons, functions which are 20-uniform relative to 12-low influence (defined in the obvious manner) can be discarded from the third factor).

Obstructions to uniformity

To continue, we need to understand what the obstructions are to 01-uniformity relative to 12-low influence. The answer turns out to be 01-almost periodicity relative to 12-low influence. Roughly speaking, a function f is 01-periodic relative to 12-low influence if any shift from 0s to 1s distorts f in a manner which can be described by 12-low influence functions. More precisely, there exists a bounded number of functions [math]\displaystyle{ g_h: [3]^n \to [-1,1] }[/math] and 12-low influence functions [math]\displaystyle{ c_h: Lines(n) \to [-1,1] }[/math] for [math]\displaystyle{ h = 1,\ldots,H }[/math] such that the equation

[math]\displaystyle{ f(\ell(1)) = {\Bbb E}_h c_h(\ell) g_h(\ell(0)) }[/math]

is approximately true for most lines [math]\displaystyle{ \ell }[/math] in Lines(n).

Example 1 If f is 01-low influence, then [math]\displaystyle{ f(\ell(1)) \approx f(\ell(0)) }[/math], so f is certainly 01-almost periodic relative to 12-low influence.

Example 2 Let [math]\displaystyle{ f(x) \in \{-1,+1\} }[/math] be the parity of the number of 1s in x, then [math]\displaystyle{ f(\ell(1)) = c(\ell) f(\ell(0)) }[/math] where [math]\displaystyle{ c(\ell) }[/math] depends only on the number of wildcards in [math]\displaystyle{ \ell }[/math], and is in particular 12-low influence. Thus this function is also 01-almost periodic relative to 12-low influence.

Example 3 If [math]\displaystyle{ f: [3]^n \to [-1,1] }[/math] is 12-low influence, then trivially [math]\displaystyle{ f(\ell(1)) = c(\ell) }[/math] where [math]\displaystyle{ c(\ell) := f(\ell(1)) }[/math]. Since c is 12-low influence, we conclude that f is 01-almost periodic relative to 12-low influence.

Example 4 Any bounded linear combination of 01-almost periodic functions relative to 12-low influence is also 01-almost periodic relative to 12-low influence; also, any product of 01-almost periodic functions relative to 12-low influence is also 01-almost periodic relative to 12-low influence.

These almost periodic functions obstruct uniformity:

Lemma 2: If [math]\displaystyle{ f: [3]^n \to [-1,1] }[/math] is 01-uniform relative to 12-low influence, and [math]\displaystyle{ g: [3]^n \to [-1,1] }[/math] is 01-almost periodic relative to 12-low influence, then f and g have small inner product.

Proof: The inner product of f and g can be expressed as

[math]\displaystyle{ {\Bbb E} f(\ell(1)) g(\ell(1)) }[/math]

where [math]\displaystyle{ \ell }[/math] ranges over Lines(n). On the other hand, from the almost periodicity of g we can write

[math]\displaystyle{ g(\ell(1)) = {\Bbb E}_h c_h(\ell) G_h(\ell(0)) }[/math]

for some 12-low influence [math]\displaystyle{ c_h }[/math] and some bounded [math]\displaystyle{ G_h }[/math]. So we can reexpress the inner product as

[math]\displaystyle{ {\Bbb E}_h {\Bbb E} f(\ell(1)) G_h(\ell(0)) c_h(\ell). }[/math]

But as f is 01-uniform relative to 12-low influence, the inner expectation is small, and the claim follows. [math]\displaystyle{ \Box }[/math]

Conversely, these are the only obstructions to uniformity:

Lemma 3: If [math]\displaystyle{ f: [3]^n \to [-1,1] }[/math] is not 01-uniform relative to 12-low influence, then there exists a [math]\displaystyle{ g: [3]^n \to [-1,1] }[/math] is 01-almost periodic relative to 12-low influence, such that f and g have large inner product.

Proof: By hypothesis, we can find a 12-low influence c and a bounded G such that

[math]\displaystyle{ {\Bbb E} f(\ell(1)) G(\ell(0)) c(\ell) }[/math]

is large. We then take

[math]\displaystyle{ g(x) := {\Bbb E}_{\ell: \ell(1) = x} G(\ell(0)) c(\ell) }[/math].

Clearly f correlates with g.

If the expression inside the expectation was constant, then we would have

[math]\displaystyle{ g(\ell(1)) = G(\ell(0)) c(\ell) }[/math]

and g would be 01-almost periodic relative to 12-low influence. Of course, the expression here is not necessarily constant, but if it oscillates too much, then g would be very small and the claim would be easy. I think we can use some sort of statistical sampling argument to handle the general case (I did this in my finitisation of Furstenberg's proof of Szemeredi's theorem); I'll come back to this point later. [math]\displaystyle{ \Box }[/math]

In view of this lemma, I believe we have

Corollary 4: Any function [math]\displaystyle{ f: [3]^n \to [0,1] }[/math] can be decomposed into a function [math]\displaystyle{ f_{01}: [3]^n \to [0,1] }[/math] with the same mean as f which is 01-almost periodic relative to 12-low influence, plus an error [math]\displaystyle{ f-f_{01} }[/math] which is 01-uniform relative to 12-low influence.

We have a similar decomposition [math]\displaystyle{ f = f_{20} + (f-f_{20}) }[/math] into a function [math]\displaystyle{ f_{20}: [3]^n \to [0,1] }[/math] with the same mean as f which is 20-almost periodic relative to 12-low influence, plus an error [math]\displaystyle{ f-f_{20} }[/math] which is 01-uniform relative to 12-low influence. Using the von Neumann theorem, we can thus replace

[math]\displaystyle{ {\Bbb E} f(\ell(0)) f(\ell(1)) f(\ell(2)) }[/math]

by

[math]\displaystyle{ {\Bbb E} f(\ell(0)) f_{01}(\ell(1)) f_{20}(\ell(2)) }[/math]. (2)

Conclusion of proof

Let [math]\displaystyle{ f_{12} }[/math] be the 12-low influence component of f, then [math]\displaystyle{ f_{12} }[/math] is non-negative, has large density, and correlates with f. If we then let E be the set where [math]\displaystyle{ f_{12} }[/math] is large, then E is also 12-low influence, and f is large on E.

By DHJ(2.5), the 12-low influence set E contains large combinatorial subspaces. Passing to such a subspace, we may assume that E is everything, i.e. [math]\displaystyle{ f_{12} }[/math] is large everywhere. To put it another way, this implies that f has large density inside any 12-low influence set.

Having done this, we now perform the reductions in the previous section to get to the expression (2). From the almost periodicity of [math]\displaystyle{ f_{01}, f_{20} }[/math], we have

[math]\displaystyle{ f_{01}(\ell(1)) = {\Bbb E}_h c_h(\ell) g_{01,h}(\ell(0)) }[/math] (3)

where [math]\displaystyle{ c_h }[/math] is 12-low influence, and similarly

[math]\displaystyle{ f_{20}(\ell(2)) = {\Bbb E}_h c'_h(\ell) g_{20,h}(\ell(0)) }[/math] (4)

where [math]\displaystyle{ c'_h }[/math] is 12-low influence.

Direct substitution of (3) and (4) into (2) does not quite work; we have to do something a bit sneakier. Let r be a medium size number and consider an r-dimensional subspace [math]\displaystyle{ \phi: [3]^r \to [3]^n }[/math]. By the Graham-Rothschild theorem, if r is large enough, one can find a three-dimensional subspace [math]\displaystyle{ \sigma: [3]^3 \to [3]^n }[/math] of [math]\displaystyle{ \phi }[/math] on which [math]\displaystyle{ c_h, c'_h }[/math] are essentially constant. Call such a subspace monochromatic; thus the collection of monochromatic 3-dimensional subspaces has positive density in Spaces(n). Furthermore, since this collection is defined using the 12-low influence functions [math]\displaystyle{ c_h, c'_h }[/math], the collection of 3-dimensional monochromatic spaces is itself 12-low influence.

We now rewrite (2) as

[math]\displaystyle{ {\Bbb E} f(\sigma(012)) f_{01}(\sigma(112)) f_{20}(\sigma(212)) }[/math]

where [math]\displaystyle{ \sigma }[/math] ranges over Spaces(n).

Suppose that [math]\displaystyle{ \sigma }[/math] is monochromatic. Then from (3) we have

[math]\displaystyle{ f_{01}(\sigma(112)) = {\Bbb E}_h c_h(\sigma(**2)) g_{01,h}(\sigma(002)) }[/math]

and

[math]\displaystyle{ f_{01}(\sigma(012)) = {\Bbb E}_h c_h(\sigma(0*2)) g_{01,h}(\sigma(002)) }[/math]

and hence by monochromaticity

[math]\displaystyle{ f_{01}(\sigma(112)) = f_{01}(\sigma(012)) }[/math]

and similarly

[math]\displaystyle{ f_{20}(\sigma(212)) = f_{20}(\sigma(012)) }[/math].

Thus we can bound (2) from below by

[math]\displaystyle{ {\Bbb E} 1_{mono}(\sigma) [f f_{01} f_{20}](\sigma(012)) }[/math]

where [math]\displaystyle{ 1_{mono} }[/math] is the indicator function on the class of monochromatic spaces.

Now, we claim that [math]\displaystyle{ f_{01} }[/math] is large on almost all of the support of f. Indeed, if there was a large set in the support of f on which [math]\displaystyle{ f_{01} }[/math] was small, then denoting f' by the restriction of f to that set, [math]\displaystyle{ {\Bbb E} f' }[/math] would be large but [math]\displaystyle{ {\Bbb E} f' f_{01} }[/math] would be small. But if [math]\displaystyle{ f'_{01} }[/math] denotes the component of f' which is 01-almost periodic relative to 12-low influence, then [math]\displaystyle{ f'_{01} }[/math] is dominated by [math]\displaystyle{ f_{01} }[/math] (since f' is dominated by f), and so [math]\displaystyle{ {\Bbb E} f' f'_{01} }[/math] would be small. But by orthogonality of f' and [math]\displaystyle{ f'-f'_{01} }[/math], this is equal to [math]\displaystyle{ {\Bbb E} (f'_{01})^2 }[/math]. Since [math]\displaystyle{ f'_{01} }[/math] has the same mean as f', which is large, this is a contradiction. Thus [math]\displaystyle{ f_{01} }[/math] is large on almost all the support of f, and similarly for [math]\displaystyle{ f_{20} }[/math]. Thus we can bound (2) from below by some multiple of

[math]\displaystyle{ {\Bbb E} 1_{mono}(\sigma) 1_{supp(f)}(\sigma(012)) }[/math].

But because f has positive density inside any 12-low influence set, this expression is large, and we are done.