# Austin's proof

This is an informal combinatorial translation of Austin's proof of DHJ(3).

## Step 0: strong stationarity

Let $A \subset ^n$ be a dense set with no combinatorial lines. Using the density increment argument, we may assume that A has no significant density increment on medium-dimensional subspaces (e.g. spaces of dimension between $\log \log n$ and n). This implies in particular that on "most" medium-dimensional subspaces, A has about the same local density as its global density. Thus the density of A is "strongly stationary" in some sense, in that it does not change much under passage to medium-dimensional subspaces.

In the course of the argument, we are going to create a number of other sets B out of A (think of these as "cells" arising from a "Szemeredi partition" of A). We will need to also assume strong stationarity of these sets B, i.e. that the local density of B on most medium-dimensional subspaces is close to their global density. There will only be a bounded (i.e. independent of n) number of such B that we will use, although the later B's will depend on the earlier B's. Nevertheless, I feel confident that one can pass to a subspace in which strong stationarity for all of these B's is attained. This can be done either by iterating the density increment argument, or by appealing to the Graham-Rothschild theorem. But for now, I will ignore this issue and assume that every set we actually construct in the ensuing argument has the strong stationarity property (local density is approximately equal to global density).

## Step 1: Reformulate the problem

Now that everybody is strongly stationary, we reformulate DHJ as

Reformulated DHJ(3) Let $f_0, f_1, f_2: ^n \to [0,1]$ be "strongly stationary" functions such that the form

$\Lambda(f_0,f_1,f_2) := {\Bbb E}_{\ell} f_0(\ell(0)) f_1(\ell(1)) f_2(\ell(2))$

is of size o(1), where $\ell$ ranges over combinatorial lines $\ell:  \to ^n$ with a distribution that will intentionally be kept vague. Then we have

{\Bbb E}_{x \in ^n} f_0(x) f_1(x) f_2(x) = o(1).

Indeed, by applying this reformulation to $f_0=f_1=f_2=1_A$ we obtain the claim.

The idea is to establish this reformulation by starting with very simple functions $f_0, f_1, f_2$, and then build up to general functions.

We will also adopt the "cheat" that all reasonable choices of generating random lines give the same statistics. I think we can get away with this cheat by enough Poisson or Cesaro averaging, but I will completely gloss over this technical issue.

## Step 2: An independence lemma

A key point here is that strong stationarity implies certain crucial independence properties [note: in the ergodic world, this is only true assuming an additional property of "strong satedness", but this turns out to be automatic in the finitary world].

Call a function $f: ^n \to $ "01-insensitive" (or "01-low influence") if it does not change much after swapping 0 to 1 or vice versa. Thus, if I is a medium-sized index set in [n] and we let $\rho^I_{0 \to 1}: ^n \to ^n$ be the operation of swapping all the 0s in I to 1, then we have $f( \rho^I_{0 \to 1}(x) ) \approx f(x)$ for most x and I, and similarly $f( \rho^I_{1 \to 0}(x) ) \approx f(x)$. (This notion would correspond to the notion of a "2-set" used elsewhere on this wiki). Similarly define 12-insensitive and 02-insensitive functions. (One could also define 012-insensitive functions, but any function which is 012-insensitive and strongly stationary is basically constant.)

Lemma 1 (independence lemma) If $f_{01}: ^n \to [-1,1]$ is strongly stationary and 01-insensitive, and $f_{02}: ^n \to [-1,1]$ is 02-insensitive, then

${\Bbb E}_{x \in ^n} f_{01}(x) f_{02}(x) \approx ({\Bbb E}_{x \in ^n} f_{01}(x)) ({\Bbb E}_{x \in ^n} f_{02}(x))$.

More informally: strongly stationary 1-sets and 2-sets are independent of each other (and similarly for permutations, of course).

Proof By subtracting off the mean, we may assume that $f_{01}$ has a global mean of zero, which by strong stationarity implies that it has a local mean of essentially zero on most medium-dimensional subspaces.

Now let I be a random medium subset of [n]. Observe that if x is chosen randomly from $^n$ (and independently of I), then $\rho^I_{1 \to 0}(x)$ is still basically uniformly distributed in $^n$. Thus

${\Bbb E}_{x \in ^n} f_{01}(x) f_{02}(x) \approx {\Bbb E}_{I, x \in ^n} f_{01}(\rho^I_{1 \to 0}(x)) f_{02}(\rho^I_{1 \to 0}(x))$.

Since $f_{01}$ is 01-insensitive, we have $f_{01}(\rho^I_{1 \to 0}(x)) \approx f_{01}(x)$. Also, as $f_{02}$ is 02-sensitive, we have

$f_{02}(\rho^I_{1 \to 0}(x)) = f_{02}(\rho^I_{0,1 \to 0}(x)) \approx f_{02}(\rho^I_{0,1 \to 2}(x)) = f_{02}(\rho^I_{0,1,2 \to 2}(x))$

where $\rho^I_{0,1,2 \to 2}(x)$ is formed from x by changing all 0s, 1s, and 2s in I to 2, etc. Putting this all together, we have

${\Bbb E}_{x \in ^n} f_{01}(x) f_{02}(x) \approx {\Bbb E}_{I, x \in ^n} f_{01}(x) f_{02}(\rho^I_{0,1,2 \to 2}(x))$.

But for fixed I, and fixing the coordinates of x outside of I (thus restricting x to a random medium dimensional subspace), the quantity $f_{02}(\rho^I_{0,1,2 \to 2}(x))$ is constant. On the other hand, by hypothesis $f_{01}$ has mean close to zero on most medium-dimensional subspaces. So the above expectation is essentially zero, as required. $\Box$

## Step 3: The case of basic 12-sets, etc.

Now we prove the reformulated DHJ(3) in the special case

$f_0(x) = 1_{A_{01}}(x) 1_{A_{02}}(x); f_1(x) = 1_{A_{01}}(x) 1_{A_{12}}(x); f_2(x) = 1_{A_{02}}(x) 1_{A_{12}}(x)$

where $A_{01}, A_{02}, A_{12}$ are strongly stationary sets that are 01-insensitive, 02-insensitive, and 12-insensitive respectively (i.e. they are 2-sets, 1-sets, and 0-sets basically). This is like triangle removal in the case of complete bipartite graphs between three cells, and corresponds to the trivial "pair removal lemma" (if a pair of sets has small Cartesian product, then one of them is small).

In this case, the expression

$\Lambda(f_0,f_1,f_2) := {\Bbb E}_{\ell} f_0(\ell(0)) f_1(\ell(1)) f_2(\ell(2))$

simplifies to

${\Bbb E}_{\ell} 1_{A_{01}}(\ell(1)) 1_{A_{02}}(\ell(2)) 1_{A_{12}}(\ell(2))$

because $1_{A_{01}}(\ell(0)) = 1_{A_{01}}(\ell(1))$, etc.

One way to generate a random line $\ell$ is to pick a random point x and a random medium-sized index set I and set $\ell(i) := \rho^I_{1 \to i}(x)$ for i=0,1,2. If we use the cheat that this random line generation gives the same statistics as any other random line generation, we can then write the above expression as

${\Bbb E}_{I, x} 1_{A_{01}}(x) 1_{A_{02}}(\rho^I_{1 \to 2}(x)) 1_{A_{12}}(\rho^I_{1 \to 2}(x))$.

Now let's fix I, and the coordinates of x outside of I, thus restricting x to a random subspace. The latter two factors in the above expression then become 12-insensitive. Thus, one is now taking the local average of the strongly stationary and 01-insensitive function $1_{A_{01}}$ with a 12-insensitive function. Applying Lemma 1, we see that this expression is independent, thus $\Lambda(f_0,f_1,f_2)$ is approximately

$({\Bbb E} 1_{A_{01}}) ( {\Bbb E}_{I, x} 1_{A_{02}}(\rho^I_{1 \to 2}(x)) 1_{A_{12}}(\rho^I_{1 \to 2}(x)) ).$

Now $\rho^I_{1 \to 2}(x)$ is basically uniformly distributed in $^n$, so the second term here is just the density of $A_{02} \cap A_{12}$. Applying Lemma 1 again, we thus see that $\Lambda(f_0,f_1,f_2)$ is basically the product of the densities of $A_{01}, A_{02}, A_{12}$.

Thus if $\Lambda(f_0,f_1,f_2) = o(1)$, then one of $A_{01}, A_{02}, A_{12}$ has density o(1), and the reformulated DHJ(3) holds in this case.

## Step 4: The case of non-basic 12-sets, etc.

We can very easily extend Step 3 to the case where each $f_i(x)$ is a "non-basic jk-set" (with {i,j,k}={0,1,2}) in the sense that each $f_i$ is a finite linear combination of products $1_{A_{ij}}(x) 1_{A_{ik}}(x)$ of a strongly stationary ij-insensitive set and a strongly stationary ik-insensitive set, so long as the total complexity of these combinations is O(1). Indeed, by passing to a common refinement we may express $f_0, f_1, f_2$ simultaneously in terms of common partitions of $^n$ into 01-insensitive sets, 02-insensitive sets, and 12-insensitive sets. We then do the standard "cleaning" step of zeroing out each $f_i$ on a cell pair $A_{ij} \cap A_{ik}$ for which the density is small, leaving only the cell pairs where the density is large. We can also zero out cells that are too small.

By working on each triple of 01-insensitive cells, 02-insensitive cells, and 12-insensitive cells separately, and applying the Step 3 result to each triple for which the densities of $f_0, f_1, f_2$ are simultaneously large, we obtain the claim in this case.

## Step 5: The case of measurable 12-sets, etc.

Let $0 \lt \varepsilon \lt 0.001$. We now consider the case where each $f_i(x)$ is a "measurable jk-set" in the sense that there exists a bounded complexity approximation $\tilde f_i(x)$ to $f_i(x)$ (thus there is a partitioning of $^n$ into cells $A_{ij} \cap A_{ik}$ formed by intersecting strongly stationary ij-insensitive sets with strongly stationary ik-insensitive sets (i.e. basic jk-sets) such that $\tilde f_i$ is constant on these cells), such that $\tilde f_i$ is within $\varepsilon$ of $f_i$, say in L^2 norm. This is broadly analogous to triangle removal in the case when each graph has already been replaced by its Szemeredi partition average, with a small amount of bad cells and other noise remaining. We also assume that the restriction of $f_i$ to any of the cells $A_{ij} \cap A_{ik}$ is strongly stationary.

The first step is again to clean up the $f_i$, by zeroing out any cells that are too small, too low density, of in which the average fluctuation of $|\tilde f_i - f_i|$ is too large, thus leaving ourselves only with the dense and "$\varepsilon$-regular" (or maybe $\sqrt{\varepsilon}$-regular) cells. Since $\tilde f_i$ is large and constant on each surviving cell pair, we may assume that $\tilde f_i$ pointwise dominates $f_i$.

We are given that $\Lambda(f_0,f_1,f_2) = o(1)$. We would like to conclude that $\Lambda(\tilde f_0,\tilde f_1,\tilde f_2) = o(1)$, since the claim then follows from the Step 4 analysis. To do this, we shall show that

$\Lambda(\tilde f_0 - f_0,\tilde f_1,\tilde f_2) \leq 0.1 \Lambda(\tilde f_0,\tilde f_1,\tilde f_2)$ (1)

and similarly for permutations; the claim then follows from the triangle inequality.

By decomposing $\tilde f_0, \tilde f_1, \tilde f_2$ into component cells and using the linearity of (1), it suffices to verify (1) in the special case

$\tilde f_0 = 1_{A_{01}} 1_{A_{02}}; \tilde f_1 = 1_{A_{01}} 1_{A_{12}}; \tilde f_2 = 1_{A_{02}} 1_{A_{12}}$

where $A_{01}, A_{02}, A_{12}$ are strongly stationary 01-sets, 02-sets, 12-sets respectively. Thus, in this case, $f_0$ is something between 0 and $1_{A_{01}} 1_{A_{02}}$ which has density at least 99% (say) on $A_{01} \cap A_{02}$.

By arguing as in Step 3, we can write $\Lambda(\tilde f_0 - f_0,\tilde f_1,\tilde f_2)$ as

${\Bbb E}_{I, x} (\tilde f_0 - f_0)(x) \tilde f_1(\rho^I_{0 \to 1}(x)) \tilde f_2(\rho^I_{0 \to 2}(x))$

and then expanding out $\tilde f_1, \tilde f_2$ and using insensitivity we end up at

${\Bbb E}_{I, x} (\tilde f_0 - f_0)(x) 1_{A_{12}}( \rho^I_{012 \to 2}(x) ).$ (2)

Meanwhile, $\Lambda(\tilde f_0, \tilde f_1, \tilde f_2)$ is similarly (approximately) equal to

${\Bbb E}_{I, x} \tilde f_0(x) 1_{A_{12}}( \rho^I_{012 \to 2}(x) ).$ (3)

But the global density of $\tilde f_0 - f_0$ is at most 0.01 of that of $\tilde f_0$; by strong stationarity, the same is true for (most) local densities. But fixing I, and fixing x outside of I (thus localising x to a medium-sized subspace) we see that the second terms in (2), (3) are constant, and so the expressions (2) and (3) are linear combinations of local densities of $\tilde f_0 - f_0$ and $\tilde f_0$ respectively, and the claim follows.

## Step 6: The case of regularised 12-sets, etc.

Now, we generalise further, to the case where each $f_i$ admits a decomposition $f_i = f_{i,U^\perp} + f_{i,U}$, where $f_{i,U^\perp}$ is of the form in Step 5 (i.e. it is within epsilon of a function $\tilde f_{i,U^\perp}$ which is constant on a bounded number M = O(1) of cells coming from intersecting strongly stationary ij-insensitive sets with strongly stationary ik-insensitive sets), and $f_{i,U}$ is "jk-uniform" in the sense that it is essentially orthogonal to all jk-sets. (Actually we will need local orthogonality rather than global orthogonality, but I think this can be accomplished by requiring a certain "Gowers uniformity norm" type expression of $f_{i,U}$ to be strongly stationary. I have not thought about this issue too much.) The uniformity of $f_{i,U}$ has to be extremely high quality, in particular it has to be substantially better than the complexity M of the partition that regularises $\tilde f_{i,U}$. This type of partition is similar to the one used in my interpretation of the Szemeredi regularity lemma in this paper. We assume that $f_{i,U^\perp}$ is non-negative (and bounded between 0 and 1), but $f_{i,U}$ is signed. (This is analogous to the triangle removal lemma in which each bipartite graph has been regularised by the Szemeredi regularity lemma.)

This case follows from the previous case because the influence of the $f_{i,U}$ is negligible. The reason for this is closely related to the fact that line-free sets correlate locally with complexity-1 sets. For instance, consider the expression $\Lambda(f_{0,U}, g, h)$ where g, h are bounded but otherwise arbitrary. We can express this as

${\Bbb E}_{I, x} f_{0,U}(x) g( \rho^I_{0,1 \to 1}(x) ) h( \rho^I_{0,2 \to 2}(x) )$.

For fixed I (and fixing x outside of I, so that x is now uniform inside a random medium-sized subspace) this is the local average of $f_{0,U}$ with respect to the product of a locally 01-insensitive function and a locally 02-insensitive function. This should be controlled by some local Gowers-type uniformity norm of $f_{0,U}$, which by some suitable strong stationarity hypothesis should be close to the corresponding global Gowers-type uniformity norm, which is very small by hypothesis. So I believe we can basically ignore $f_{i,U}$ and repeat the Step 5 arguments verbatim.

## Step 7: the regularity lemma

To finish the job, we need an analogue of the Szemeredi regularity lemma; basically we need to decompose each (strongly stationary) $f_i$ in the form $f_i = f_{i,U^\perp} + f_{i,U}$ as above. But I think this can be done by mimicking standard double-induction proofs of the regularity lemma, such as the one in [my paper, using a coarse and fine approximation to $f_i$ by a partition into (strongly stationary) jk-sets, and refining either the coarse or fine approximation whenever there is a failure of uniformity in the error. Each such refinement increments an "energy" or "index" of the approximation and so the algorithm must terminate (after a tower-exponential number of steps).