Ajtai-Szemerédi's proof of the corners theorem
Introduction
Ajtai and Szemerédi proved the corners theorem with a clever use of three basic principles: the density-increment strategy, which allowed them to assume that the set A was of density almost [math]\displaystyle{ \delta }[/math] in almost all subgrids; averaging arguments, which implied that any dense structured set in which A was sparse must be compensated for by a structured set in which it was of increased density, and Szemerédi's theorem, which allowed them to pass from dense subsets of [n] to long arithmetic progressions. Their original paper does not seem to be available online, but here you can find a quantitative version of their argument due to Van Vu. On this page, we give a detailed sketch of the argument: the aim is to present the key ideas in enough detail that the reader who wishes to can go away and make it fully precise.
Statement of the theorem and some basic definitions
For every [math]\displaystyle{ \delta\gt 0 }[/math] there exists n such that every subset [math]\displaystyle{ A\subset [n]^2 }[/math] of density at least [math]\displaystyle{ \delta }[/math] (i.e., of cardinality at least [math]\displaystyle{ \delta n^2 }[/math]) contains three points of the form (x,y), (x+d,y) and (x,y+d) with [math]\displaystyle{ d\gt 0. }[/math]
Definitions. A grid of side m is a Cartesian product [math]\displaystyle{ P\times Q, }[/math] where P and Q are arithmetic progressions of length m with the same common difference. A vertical line is a subset of [math]\displaystyle{ [n]^2 }[/math] of the form [math]\displaystyle{ VL_x=\{(x,y):y\in[n]\}, }[/math] and a horizontal line is a subset of the form [math]\displaystyle{ HL_y=\{(x,y):x\in[n]\}. }[/math]
First step: A is dense on almost all grids
This is a very standard move in density-increment arguments. We are trying to do an iteration in which we pass to a subgrid where A has increased density. Suppose we aim for a density increase from [math]\displaystyle{ \delta }[/math] to [math]\displaystyle{ \delta+\eta. }[/math] Then if we find a subgrid (of width tending to infinity with n) where the density is at least [math]\displaystyle{ \delta+\eta }[/math] then we have our density increase, obviously enough. But if we do not have such a subgrid, then we can argue as follows. Pick a random point in [math]\displaystyle{ [n]^2 }[/math] by first picking a random grid of width m, and then picking a random point in that grid. If m is small enough, then it is easy to argue that the distribution of the resulting point is approximately uniform, from which it follows that the average density in a random subgrid is approximately [math]\displaystyle{ \delta. }[/math] But if the average density in a subgrid is approximately [math]\displaystyle{ \delta }[/math] and the maximum density is at most [math]\displaystyle{ \delta+\eta, }[/math] then there can be only a small proportion of subgrids inside which A has density substantially smaller than [math]\displaystyle{ \delta. }[/math] (Roughly, the proportion where the density is less than [math]\displaystyle{ \delta-\mu }[/math] is at most [math]\displaystyle{ \eta/\mu, }[/math] by a simple averaging argument).
Second step: find a large set of "full" vertical lines
We shall now prove that almost all vertical lines have density close to [math]\displaystyle{ \delta. }[/math] (What does this mean? It means that the number of points of A inside the line is at least [math]\displaystyle{ \delta n. }[/math] Of course, this is really saying that A is dense in the line, and not that the line is dense, but it is such a convenient way of talking that we shall adopt it from now on.) First, we observe that if some positive proportion [math]\displaystyle{ \mu }[/math] of vertical lines have density at most [math]\displaystyle{ \delta-\eta, }[/math] then the average density in the remaining vertical lines is at least [math]\displaystyle{ \delta+\mu\eta, }[/math] so there must be at least [math]\displaystyle{ \mu\eta n/2 }[/math] vertical lines with density at least [math]\displaystyle{ \delta+\mu\eta/2. }[/math] So if a positive proportion of lines have density substantially different from [math]\displaystyle{ \delta, }[/math] then we know that a positive proportion of vertical lines have density that is substantially larger than [math]\displaystyle{ \delta. }[/math] (Here, the word "substantially" means that the difference is a constant that depends on [math]\displaystyle{ \delta }[/math] and not on n.)
If that is the case, then we can apply Szemerédi's theorem to obtain an arithmetic progression P of length m (which tends to infinity as n does) such that for every [math]\displaystyle{ x\in P }[/math] the vertical line [math]\displaystyle{ VL_x }[/math] has density substantially larger than [math]\displaystyle{ \delta. }[/math] Furthermore, we can choose P so that the common difference is small compared with n.
This last condition is useful because it enables us to partition almost all of [n] into translates [math]\displaystyle{ Q_i }[/math] of P. And then the average density of the grids [math]\displaystyle{ P\times Q_i }[/math] is substantially larger than [math]\displaystyle{ \delta. }[/math] Therefore, there must be a density increase on some subgrid and we are done, unless all but at most [math]\displaystyle{ \eta n }[/math] vertical lines have density at least [math]\displaystyle{ \delta-\eta, }[/math] for some positive [math]\displaystyle{ \eta }[/math] that we can make as small as we like.
Third step: find a dense diagonal
Since there are at most 2n possible values of x+y when [math]\displaystyle{ (x,y)\in[n]^2, }[/math] there must exist z such that for at least [math]\displaystyle{ \delta n/2 }[/math] pairs [math]\displaystyle{ (x,y)\in A }[/math] we have [math]\displaystyle{ x+y=z. }[/math] Assuming we have ensured that [math]\displaystyle{ \eta\lt \delta/4, }[/math] this gives us at least [math]\displaystyle{ \delta n/4 }[/math] points [math]\displaystyle{ (x,y)\in A }[/math] such that [math]\displaystyle{ x+y=z }[/math] and the vertical line [math]\displaystyle{ VL_x }[/math] has density at least [math]\displaystyle{ \delta-\eta. }[/math]
Fourth step: find a long arithmetic progression of good points in that diagonal
Pick a diagonal as given by Step 3. Amongst the first [math]\displaystyle{ \delta n/8 }[/math] points on that diagonal (counting from the top left) there must be an arithmetic progression of points of length m tending to infinity and small common difference (much smaller than [math]\displaystyle{ \sqrt{n}, }[/math] say). That is, we can find an arithmetic progression P of length m and common difference d such that for every [math]\displaystyle{ x\in P }[/math] the point (x,z-x) belongs to A, and the vertical line [math]\displaystyle{ VL_x }[/math] has density at least [math]\displaystyle{ \delta-\eta. }[/math]
Fifth step: find a subgrid with several empty horizontal lines
Now let us partition almost all of [n] into translates [math]\displaystyle{ Q_1,\dots,Q_s }[/math] of P (which is easy to do since the common difference is small). On almost all the grids [math]\displaystyle{ P\times Q_i }[/math] the density of A is almost [math]\displaystyle{ \delta, }[/math] since otherwise there would have to be some grids with increased density. Also, the average density in [math]\displaystyle{ Q_i }[/math] of numbers y such that (z-y,y) belongs to A and is not one of the first [math]\displaystyle{ \delta n/8 }[/math] such points is at least [math]\displaystyle{ \delta/8. }[/math] If [math]\displaystyle{ x\in P }[/math] and y is such a number, then (x,y) cannot belong to A, since otherwise the three points (x,y), (x,z-x) and (z-y,y) would form a corner in A, and we are assuming that A is corner free. (Remark: z-y is greater than x because the point (z-y,y) is further down the diagonal than the point (x,z-x).) Therefore, we can find some [math]\displaystyle{ Q_i }[/math] such that the density of A in [math]\displaystyle{ P\times Q_i }[/math] is almost as large as [math]\displaystyle{ \delta }[/math] and several horizontal lines inside [math]\displaystyle{ P\times Q_i }[/math] are empty.
Conclusion
And now we are done by the second step: we have a dense grid with some sparse (in fact, empty) lines, so it has several lines of increased density, and the argument of Step 2 leads to a density increase on a further subgrid (after another application of Szemerédi's theorem).
An old blog comment that used to be the main article
Ajtai and Szemerédi found a clever proof of the corners theorem that used Szemerédi’s theorem as a lemma. This gives us a direction to explore that we have hardly touched on. Very roughly, to prove the corners theorem you first use an averaging argument to find a dense diagonal (that is, subset of the form x+y=r that contains many points of your set A). For any two such points (x,r-x) and (y,r-y) you know that the point (x,r-y) does not lie in A (if A is corner free), which gives you some kind of non-quasirandomness. (The Cartesian semi-product arguments discussed in the 400s thread are very similar to this observation.) Indeed, it allows you to find a large Cartesian product [math]\displaystyle{ X\times Y }[/math] in which A is a bit too dense. In order to exploit this, Ajtai and Szemerédi used Szemerédi’s theorem to find long arithmetic progressions [math]\displaystyle{ P\subset X }[/math] and [math]\displaystyle{ Q\subset Y }[/math] of the same common difference such that A has a density increment in [math]\displaystyle{ P\times Q }[/math]. (I may have misremembered, but I think this must have been roughly what they did.) So we could think about what the analogue would be in the DHJ setting. Presumably it would be some kind of multidimensional Sperner statement of sufficient depth to imply Szemerédi’s theorem. A naive suggestion would be that in a dense subset of [math]\displaystyle{ [2]^n }[/math] you can find a large-dimensional combinatorial subspace in which all the variable sets have the same size. If you apply this to a union of layers, then you find an arithmetic progression of layer-cardinalities. But this feels rather artificial, so here’s a question we could think about.
Question 1. Can anyone think what the right common generalization of Szemerédi’s theorem and Sperner’s theorem ought to be? (Sperner’s theorem itself would correspond to the case of progressions of length 2.)