Szemerédi's original proof of Szemerédi's theorem
Szemerédi's combinatorial proof of Szemerédi's theorem is highly intricate. Below are some of the ingredients used in the proof.
Maximum asymptotic density
In the standard proof of Roth’s theorem (as well as Szemeredi’s proof), we may assume without loss of generality that the set A does not have significantly increased density on any reasonably long subprogression, since otherwise we could pass to that subprogression and apply some sort of induction hypothesis. We can do the same here: to prove DHJ for some set with density , we may assume without loss of generality that every slice of A of any reasonable size (m-dimensional, where m grows slowly with n) has density at most (where I will be vague about exactly what o(1) means). By Markov’s inequality, this also implies that almost all (i.e. 1-o(1)) of all such slices will have density at least as well. It is not clear to me how one can exploit such a fact, but it is something worth keeping in mind. (Admittedly, this structure is not used in the graph-theoretic proofs of Roth or corners, but as I said in my previous post, we are not bound to slavishly follow the graph-theoretic model.)
In a somewhat similar spirit, we can use the coloring Hales-Jewett theorem to stabilise various low-complexity quantities. Indeed, if we have some quantity X that depends on a medium number m of the coordinates of , and takes a bounded number k of values, then (if m is sufficiently large depending on d and k), we can reduce the m coordinates to d, passing to a slice, and force X to now be constant on these d coordinates. (I already used this argument in 85.) Again, I don’t see an immediate way to utilise this freedom, but if we are encountering a problem that the “richness” of various slices varies too much with the slice, this is presumably the thing to do to fix that. And there is also the Carlson-Simpson theorem to hold in reserve to soup this up even further.
In Endre’s big paper (the proof of Szemeredi’s theorem), he uses a very nice double counting argument to say a much stronger statement: given any specified dense subset of , A also has density close to on “most” copies of . (He eventually applies this fact to sets that are the cells of a Szemeredi partition of a certain graph he constructs on .) This trick does not appear to be as well known as it should be (and is absolutely crucial to Endre’s proof), so I am reproducing it here.
Here is the formal claim: let (where means that “y is sufficiently large depending on x”, and let . Then one can find a copy of in A such that for every x in and all , the relative density of A in is .
Proof. We need some intermediate numbers . Pick a random copy of in A. By the previous discussion, every x in has a probability of being such that A has density in . By the union bound (assuming m is large depending on l), we may thus select a copy of such that A has density in every .
For each j, let f_j(x) be the density of A in , rounded to the nearest multiple of 1/r. Applying coloring Hales-Jewett (because l is large compared to k, d, r), we can pass from to a subspace such that the are constant on this subspace for all j: . But by double counting the density of A on we see that ; similarly by double counting the density of A on we have . The claim follows.