Ajtai-Szemerédi's proof of the corners theorem

From Polymath Wiki
Revision as of 08:43, 2 March 2009 by Gowers (talk | contribs) (→‎Using the dictionary)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

Ajtai and Szemerédi proved the corners theorem with a clever use of three basic principles: the density-increment strategy, which allowed them to assume that the set A was of density almost [math]\displaystyle{ \delta }[/math] in almost all subgrids; averaging arguments, which implied that any dense structured set in which A was sparse must be compensated for by a structured set in which it was of increased density; and Szemerédi's theorem, which allowed them to pass from dense subsets of [n] to long arithmetic progressions. Their original paper does not seem to be available online, but here you can find a quantitative version of their argument due to Van Vu. On this page, we give a detailed sketch of the argument: the aim is to present the key ideas in enough detail that the reader who wishes to can go away and make it fully precise.

Statement of the theorem and some basic definitions

For every [math]\displaystyle{ \delta\gt 0 }[/math] there exists n such that every subset [math]\displaystyle{ A\subset [n]^2 }[/math] of density at least [math]\displaystyle{ \delta }[/math] (i.e., of cardinality at least [math]\displaystyle{ \delta n^2 }[/math]) contains three points of the form (x,y), (x+d,y) and (x,y+d) with [math]\displaystyle{ d\gt 0. }[/math]

Definitions. A grid of side m is a Cartesian product [math]\displaystyle{ P\times Q, }[/math] where P and Q are arithmetic progressions of length m with the same common difference. A vertical line is a subset of [math]\displaystyle{ [n]^2 }[/math] of the form [math]\displaystyle{ VL_x=\{(x,y):y\in[n]\}, }[/math] and a horizontal line is a subset of the form [math]\displaystyle{ HL_y=\{(x,y):x\in[n]\}. }[/math]

First step: A is dense on almost all grids

This is a very standard move in density-increment arguments. We are trying to do an iteration in which we pass to a subgrid where A has increased density. Suppose we aim for a density increase from [math]\displaystyle{ \delta }[/math] to [math]\displaystyle{ \delta+\eta. }[/math] Then if we find a subgrid (of width tending to infinity with n) where the density is at least [math]\displaystyle{ \delta+\eta }[/math] then we have our density increase, obviously enough. But if we do not have such a subgrid, then we can argue as follows. Pick a random point in [math]\displaystyle{ [n]^2 }[/math] by first picking a random grid of width m, and then picking a random point in that grid. If m is small enough, then it is easy to argue that the distribution of the resulting point is approximately uniform, from which it follows that the average density in a random subgrid is approximately [math]\displaystyle{ \delta. }[/math] But if the average density in a subgrid is approximately [math]\displaystyle{ \delta }[/math] and the maximum density is at most [math]\displaystyle{ \delta+\eta, }[/math] then there can be only a small proportion of subgrids inside which A has density substantially smaller than [math]\displaystyle{ \delta. }[/math] (Roughly, the proportion where the density is less than [math]\displaystyle{ \delta-\mu }[/math] is at most [math]\displaystyle{ \eta/\mu, }[/math] by a simple averaging argument).

Second step: find a large set of "full" vertical lines

We shall now prove that almost all vertical lines have density close to [math]\displaystyle{ \delta. }[/math] (What does this mean? It means that the number of points of A inside the line is at least [math]\displaystyle{ \delta n. }[/math] Of course, this is really saying that A is dense in the line, and not that the line is dense, but it is such a convenient way of talking that we shall adopt it from now on.) First, we observe that if some positive proportion [math]\displaystyle{ \mu }[/math] of vertical lines have density at most [math]\displaystyle{ \delta-\eta, }[/math] then the average density in the remaining vertical lines is at least [math]\displaystyle{ \delta+\mu\eta, }[/math] so there must be at least [math]\displaystyle{ \mu\eta n/2 }[/math] vertical lines with density at least [math]\displaystyle{ \delta+\mu\eta/2. }[/math] So if a positive proportion of lines have density substantially different from [math]\displaystyle{ \delta, }[/math] then we know that a positive proportion of vertical lines have density that is substantially larger than [math]\displaystyle{ \delta. }[/math] (Here, the word "substantially" means that the difference is a constant that depends on [math]\displaystyle{ \delta }[/math] and not on n.)

If that is the case, then we can apply Szemerédi's theorem to obtain an arithmetic progression P of length m (which tends to infinity as n does) such that for every [math]\displaystyle{ x\in P }[/math] the vertical line [math]\displaystyle{ VL_x }[/math] has density substantially larger than [math]\displaystyle{ \delta. }[/math] Furthermore, we can choose P so that the common difference is small compared with n.

This last condition is useful because it enables us to partition almost all of [n] into translates [math]\displaystyle{ Q_i }[/math] of P. And then the average density of the grids [math]\displaystyle{ P\times Q_i }[/math] is substantially larger than [math]\displaystyle{ \delta. }[/math] Therefore, there must be a density increase on some subgrid and we are done, unless all but at most [math]\displaystyle{ \eta n }[/math] vertical lines have density at least [math]\displaystyle{ \delta-\eta, }[/math] for some positive [math]\displaystyle{ \eta }[/math] that we can make as small as we like.

Third step: find a dense diagonal

Since there are at most 2n possible values of x+y when [math]\displaystyle{ (x,y)\in[n]^2, }[/math] there must exist z such that for at least [math]\displaystyle{ \delta n/2 }[/math] pairs [math]\displaystyle{ (x,y)\in A }[/math] we have [math]\displaystyle{ x+y=z. }[/math] Assuming we have ensured that [math]\displaystyle{ \eta\lt \delta/4, }[/math] this gives us at least [math]\displaystyle{ \delta n/4 }[/math] points [math]\displaystyle{ (x,y)\in A }[/math] such that [math]\displaystyle{ x+y=z }[/math] and the vertical line [math]\displaystyle{ VL_x }[/math] has density at least [math]\displaystyle{ \delta-\eta. }[/math]

Fourth step: find a long arithmetic progression of good points in that diagonal

Pick a diagonal as given by Step 3. Amongst the first [math]\displaystyle{ \delta n/8 }[/math] points on that diagonal (counting from the top left) there must be an arithmetic progression of points of length m tending to infinity and small common difference (much smaller than [math]\displaystyle{ \sqrt{n}, }[/math] say). That is, we can find an arithmetic progression P of length m and common difference d such that for every [math]\displaystyle{ x\in P }[/math] the point (x,z-x) belongs to A, and the vertical line [math]\displaystyle{ VL_x }[/math] has density at least [math]\displaystyle{ \delta-\eta. }[/math]

Fifth step: find a subgrid with several empty horizontal lines

Now let us partition almost all of [n] into translates [math]\displaystyle{ Q_1,\dots,Q_s }[/math] of P (which is easy to do since the common difference is small). On almost all the grids [math]\displaystyle{ P\times Q_i }[/math] the density of A is almost [math]\displaystyle{ \delta, }[/math] since otherwise there would have to be some grids with increased density. Also, the average density in [math]\displaystyle{ Q_i }[/math] of numbers y such that (z-y,y) belongs to A and is not one of the first [math]\displaystyle{ \delta n/8 }[/math] such points is at least [math]\displaystyle{ \delta/8. }[/math] If [math]\displaystyle{ x\in P }[/math] and y is such a number, then (x,y) cannot belong to A, since otherwise the three points (x,y), (x,z-x) and (z-y,y) would form a corner in A, and we are assuming that A is corner free. (Remark: z-y is greater than x because the point (z-y,y) is further down the diagonal than the point (x,z-x).) Therefore, we can find some [math]\displaystyle{ Q_i }[/math] such that the density of A in [math]\displaystyle{ P\times Q_i }[/math] is almost as large as [math]\displaystyle{ \delta }[/math] and several horizontal lines inside [math]\displaystyle{ P\times Q_i }[/math] are empty.

Conclusion

And now we are done by the second step: we have a dense grid with some sparse (in fact, empty) lines, so it has several lines of increased density, and the argument of Step 2 leads to a density increase on a further subgrid (after another application of Szemerédi's theorem).

Where next?

The reason for being interested in the Ajtai-Szemerédi corners proof is, of course, that it provides another argument that one can try to use as a model for a proof of DHJ(3). In an ideal, but uninteresting, world, we would be able to prove DHJ(3) by modifying each of the steps of the above argument in a routine way. To do that, it helps to have some kind of dictionary in the back of one's mind. The following one seems to be appropriate, as far as it goes.

Dictionary

1. A vertical line corresponds to a set of sequences (U,V,W) where you fix the 1-set U.

2. A horizontal line corresponds to a set of sequences (U,V,W) where you fix the 2-set V.

3. A line of slope -1, or diagonal, corresponds to a set of sequences (U,V,W) where you fix the 3-set W.

4. An arithmetic progression in [n] with length tending to infinity corresponds to a combinatorial subspace in [math]\displaystyle{ [2]^n }[/math] with\de dimension tending to infinity.

5. A subgrid with width tending to infinity corresponds to a combinatorial subspace of [math]\displaystyle{ [3]^n }[/math] with dimension tending to infinity.

6. A corner corresponds to a combinatorial line.

Using the dictionary

That's about it. So now let us see where the first problem arises when one tries to translate the AS proof to the DHJ(3) context. Obviously step 1 is fine: with any sensible model of a random combinatorial subspace you can assume that the density of A in that subspace is almost always at least [math]\displaystyle{ \delta-\eta }[/math], where [math]\displaystyle{ \delta }[/math] is the density of A.

Step 2 says that if you have a positive density of vertical lines inside which A has density noticeably less than [math]\displaystyle{ \delta }[/math], then you also have a positive density of vertical lines inside which A has density noticeably greater than [math]\displaystyle{ \delta. }[/math] If we let B be the set of x-coordinates associated with these vertical lines, then we can use Szemerédi's theorem to get an arithmetic progression P of these "over-full" vertical lines, with small common difference. Then we can partition the horizontal lines into arithmetic progressions [math]\displaystyle{ Q_i, }[/math] each of which is a translate of P, and by averaging we find a subgrid [math]\displaystyle{ P\times Q_i }[/math] inside which A has density noticeably larger than [math]\displaystyle{ \delta, }[/math] which gives us a density increment and we're done. (Of course, then one has to go on and say what happens if we don't have a positive density of sparse vertical lines -- this is just the first step of the argument.)

Let us gloss over the question of which measure we should use: it is probably just a technicality and the main point is to get the conceptual argument working. If you accept item 1 in the dictionary, then the obvious first step is to define U to be underfull if the density of pairs (V,W) (out of all pairs such that (U,V,W) is a point in [math]\displaystyle{ [3]^n }[/math]) is noticeably less than [math]\displaystyle{ \delta. }[/math] Then if there is a positive density of underfull U, then we have a positive density of overfull U as well. At this point one would like (following the dictionary) to be able to say that there must be some combinatorial subspace on which [math]\displaystyle{ A }[/math] has a density increment. One can easily use the multidimensional Sperner theorem to find a combinatorial subspace consisting entirely of overfull Us (which is what item 4 suggests we should be looking for) but there is not an obvious analogue of the fact that a Cartesian product of two APs with the same common difference is a grid, so this is the first point at which the translation ceases to be routine. A solution to this problem can be found on a separate page.

An old blog comment that used to be the main article

Ajtai and Szemerédi found a clever proof of the corners theorem that used Szemerédi’s theorem as a lemma. This gives us a direction to explore that we have hardly touched on. Very roughly, to prove the corners theorem you first use an averaging argument to find a dense diagonal (that is, subset of the form x+y=r that contains many points of your set A). For any two such points (x,r-x) and (y,r-y) you know that the point (x,r-y) does not lie in A (if A is corner free), which gives you some kind of non-quasirandomness. (The Cartesian semi-product arguments discussed in the 400s thread are very similar to this observation.) Indeed, it allows you to find a large Cartesian product [math]\displaystyle{ X\times Y }[/math] in which A is a bit too dense. In order to exploit this, Ajtai and Szemerédi used Szemerédi’s theorem to find long arithmetic progressions [math]\displaystyle{ P\subset X }[/math] and [math]\displaystyle{ Q\subset Y }[/math] of the same common difference such that A has a density increment in [math]\displaystyle{ P\times Q }[/math]. (I may have misremembered, but I think this must have been roughly what they did.) So we could think about what the analogue would be in the DHJ setting. Presumably it would be some kind of multidimensional Sperner statement of sufficient depth to imply Szemerédi’s theorem. A naive suggestion would be that in a dense subset of [math]\displaystyle{ [2]^n }[/math] you can find a large-dimensional combinatorial subspace in which all the variable sets have the same size. If you apply this to a union of layers, then you find an arithmetic progression of layer-cardinalities. But this feels rather artificial, so here’s a question we could think about.

Question 1. Can anyone think what the right common generalization of Szemerédi’s theorem and Sperner’s theorem ought to be? (Sperner’s theorem itself would correspond to the case of progressions of length 2.)