Density increment method: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
Line 49: Line 49:
More generally, an obstruction to uniformity can be a "structured function" <math>f</math> from <math>S</math> to <math>\mathbb{R}</math> or <math>\mathbb{C}</math> that is bounded above by 1 in modulus, averages zero, and has the property that if <math>A</math> is any set such that <math>|\mathbb{E}_x1_A(x)f(x)|\geq c</math> then <math>A</math> fails to be quasirandom.   
More generally, an obstruction to uniformity can be a "structured function" <math>f</math> from <math>S</math> to <math>\mathbb{R}</math> or <math>\mathbb{C}</math> that is bounded above by 1 in modulus, averages zero, and has the property that if <math>A</math> is any set such that <math>|\mathbb{E}_x1_A(x)f(x)|\geq c</math> then <math>A</math> fails to be quasirandom.   


The dream, when one is trying to develop a density-increment strategy, is to find a simple complete set of obstructions to uniformity. This is a collection <math>\mathbb{F}</math> of functions with the following properties.
The dream, when one is trying to develop a density-increment strategy, is to find a simple complete set of obstructions to uniformity. This is a collection <math>\mathcal{F}</math> of functions with the following properties.


*Each function <math>f\in\mathcal{F}</math> has average 0.
*Each function <math>f\in\mathcal{F}</math> has average 0.

Revision as of 08:22, 14 February 2009

The basic idea of the density increment method is as follows. Suppose that you have some structure [math]\displaystyle{ S }[/math] that has many substructures of a similar general nature to [math]\displaystyle{ S }[/math]. (Examples of such structures are combinatorial subspaces, grids, arithmetic progressions, complete graphs, etc.) Suppose also that you want to prove that every dense subset [math]\displaystyle{ A }[/math] of [math]\displaystyle{ S }[/math] contains a configuration of a certain kind. You try to prove it using a three-stage argument of the following kind.

  • If [math]\displaystyle{ A }[/math] is quasirandom (in some sense that you have to devise), then [math]\displaystyle{ A }[/math] contains a configuration of the desired kind. Therefore, if [math]\displaystyle{ A }[/math] does not contain a configuration of the desired kind, then [math]\displaystyle{ A }[/math] is not quasirandom.
  • If [math]\displaystyle{ A }[/math] is not quasirandom, then there is a set [math]\displaystyle{ T }[/math] of a kind one can describe such that [math]\displaystyle{ |A\cap T|/|T| }[/math] is significantly larger than the density of [math]\displaystyle{ A }[/math]. (More generally, we can find a function [math]\displaystyle{ f }[/math] of average zero that we can describe such that the average of [math]\displaystyle{ f }[/math] over [math]\displaystyle{ A }[/math] is bounded away from zero.)
  • The characteristic function of [math]\displaystyle{ T }[/math] can be approximated by a positive linear combination of characteristic functions of substructures of [math]\displaystyle{ S }[/math]. Therefore, by averaging, there is a substructure [math]\displaystyle{ S' }[/math] of [math]\displaystyle{ S }[/math] such that [math]\displaystyle{ |A\cap S'|/|S'| }[/math] is significantly greater than the density of [math]\displaystyle{ A }[/math].

With these three steps in place, a simple iteration argument completes the proof.

A simple example of the method

A basic example of this method is Roth's original proof of Roth's theorem. Here the structure [math]\displaystyle{ S }[/math] is an arithmetic progression of length [math]\displaystyle{ n }[/math], the similar substructures are subprogressions of length [math]\displaystyle{ m }[/math] (for some [math]\displaystyle{ m }[/math] that tends to infinity with [math]\displaystyle{ n }[/math]), and the desired configuration is an arithmetic progression of length 3. A brief outline of the proof then runs as follows. (For convenience I am expressing it in a slightly different way from the standard one.)

  • Let [math]\displaystyle{ A }[/math] be a subset of [math]\displaystyle{ \mathbb{Z}_N }[/math] of density [math]\displaystyle{ \delta }[/math]. Then [math]\displaystyle{ A }[/math] is said to be quasirandom if the number of quadruples [math]\displaystyle{ (x,x+a,x+b,x+a+b) }[/math] such that all four terms lie in [math]\displaystyle{ A }[/math] is significantly larger than [math]\displaystyle{ \delta^4N^3 }[/math] (which can be shown with the help of the Cauchy-Schwarz inequality to be a lower bound -- the lower bound is approximately attained if [math]\displaystyle{ A }[/math] is random).
  • If [math]\displaystyle{ A }[/math] is not quasirandom, then there is a trigonometric function [math]\displaystyle{ \tau(x)=\exp(2\pi irx) }[/math] such that [math]\displaystyle{ \mathbb{E}_x1_A(x)\tau(x) }[/math] is not small. From this it is not hard to deduce that there is a linear-sized mod-[math]\displaystyle{ N }[/math] arithmetic progression [math]\displaystyle{ P }[/math] such that [math]\displaystyle{ |A\cap P|/|P| }[/math] is significantly larger than the density of [math]\displaystyle{ A }[/math]. This follows because we can partition [math]\displaystyle{ \mathbb{Z}_N }[/math] into linear-sized mod-[math]\displaystyle{ N }[/math] arithmetic progressions on each of which [math]\displaystyle{ \tau }[/math] is approximately constant, and then an averaging argument does the rest.
  • It can be shown that every mod-[math]\displaystyle{ N }[/math] arithmetic progression can be approximately partitioned into genuine arithmetic progressions of length proportional to [math]\displaystyle{ \sqrt{N} }[/math]. So by averaging we can find a genuine arithmetic progression [math]\displaystyle{ Q }[/math] such that [math]\displaystyle{ |A\cap Q|/|Q| }[/math] is significantly larger than the density of [math]\displaystyle{ A }[/math].
  • Now a simple iteration argument finishes off the proof.

Difficulties in applying the method to density Hales-Jewett

In applications of the density-increment method, it tends to be possible to prove a stronger form of the first statement. Instead of showing that every quasirandom subset of the structure [math]\displaystyle{ S }[/math] contains a configuration of the given kind, one shows that every quasirandom subset of [math]\displaystyle{ S }[/math] contains approximately the same number of configurations as a typical random subset of the same density. Indeed, this is why the word "quasirandom" is used.

If we try to imagine how this principle might apply to density Hales-Jewett then we get some odd consequences. To begin with, let us think what a typical combinatorial line in [math]\displaystyle{ [3]^n }[/math] looks like. If we choose one randomly, then for each coordinate we have four choices: it can be 1, 2, 3 or a wildcard. Because the binomial distribution is concentrated about its mean, we deduce that with very high probability a random combinatorial line has approximately [math]\displaystyle{ n/4 }[/math] each of fixed 1s, fixed 2s, fixed 3s and wildcards. But this means that if we remove all points from our subset A apart from points that take two values roughly [math]\displaystyle{ n/4 }[/math] times and the third value roughly [math]\displaystyle{ n/2 }[/math] times, then we will hardly affect the number of combinatorial lines. Thus, it might seem that an appropriate definition of quasirandomness would have to apply just to these rather special sorts of points.

Now consider the set [math]\displaystyle{ A }[/math] that consists of all sequences for which the numbers of 1s, 2s and 3s all lie between [math]\displaystyle{ 3n/10 }[/math] and [math]\displaystyle{ 3n/8 }[/math]. The proportion of sequences that do not have this property is exponentially small in n, so the density of A is almost exactly 1. However, this set contains no typical combinatorial lines, so the number of combinatorial lines it contains is exponentially smaller than it would be for the set [math]\displaystyle{ [3]^n }[/math] itself. If a density-increment strategy is to succeed, this should imply first that [math]\displaystyle{ A }[/math] is highly non-quasirandom, and then that we can find a substantial density increase on some substructure. But obviously we cannot do the latter, since the density is already within a whisker of the trivial maximum. (What really matters here is that the difference tends to zero with [math]\displaystyle{ n }[/math].)

Various strategies have been proposed for dealing with this problem, and there could be more. (In fact, I'll mention one that has not yet been proposed.)

1. Restriction to central layers

The idea here is to fix some m, a popular choice being C\sqrt{n], and just look at sequences where the numbers of 1s, 2s and 3s are all within m of n/3. If we do this, then the number of wildcards in a combinatorial line cannot be more than 3m, so the typical shape of a combinatorial line is radically different from what it is if we do not restrict.

2. Alternative measures

In this case the idea is to give different weights to different sequences in order to bring the distribution of a random point in a random combinatorial line closer to the distribution of a random point. A choice that is natural in many ways is the slices-equal measure. Here, instead of choosing a point at random, you first choose, uniformly at random, a triple [math]\displaystyle{ (a,b,c) }[/math] of non-negative integers that add up to [math]\displaystyle{ n }[/math]. Then you choose randomly a sequence with [math]\displaystyle{ a }[/math] 1s, [math]\displaystyle{ b }[/math] 2s and [math]\displaystyle{ c }[/math] 3s. This measure allows certain averaging arguments to run very smoothly, but it also has disadvantages: for example, if you restrict to a combinatorial subspace, then the measure on the restriction is not proportional to the slices-equal measure on that subspace.

3. Extreme localization

Here one chooses a very small [math]\displaystyle{ m }[/math], possibly even so small that it depends on the density [math]\displaystyle{ \delta }[/math] only, and restricts to a random m-dimensional combinatorial subspace (where this can be given a variety of sensible meanings -- one would be to fix [math]\displaystyle{ n-m }[/math] coordinates randomly and let the others be the variable coordinates). A strategy of this kind is implicit in the Furstenberg-Katznelson argument.

Obstructions to uniformity

An obstruction to uniformity can be loosely defined as a simple global explanation for a set not being quasirandom. In practice, it tends to be something more precise than this. It is a "structured subset" [math]\displaystyle{ T }[/math] of the basic structure [math]\displaystyle{ S }[/math] with the property that every set [math]\displaystyle{ A }[/math] of density [math]\displaystyle{ \delta }[/math] such that [math]\displaystyle{ |A\cap T|/|T| }[/math] is at least [math]\displaystyle{ \delta+c }[/math] fails to be quasirandom (or rather, fails to be [math]\displaystyle{ \eta }[/math]-quasirandom for some [math]\displaystyle{ \eta\gt 0 }[/math] that depends on [math]\displaystyle{ c }[/math]).

More generally, an obstruction to uniformity can be a "structured function" [math]\displaystyle{ f }[/math] from [math]\displaystyle{ S }[/math] to [math]\displaystyle{ \mathbb{R} }[/math] or [math]\displaystyle{ \mathbb{C} }[/math] that is bounded above by 1 in modulus, averages zero, and has the property that if [math]\displaystyle{ A }[/math] is any set such that [math]\displaystyle{ |\mathbb{E}_x1_A(x)f(x)|\geq c }[/math] then [math]\displaystyle{ A }[/math] fails to be quasirandom.

The dream, when one is trying to develop a density-increment strategy, is to find a simple complete set of obstructions to uniformity. This is a collection [math]\displaystyle{ \mathcal{F} }[/math] of functions with the following properties.

  • Each function [math]\displaystyle{ f\in\mathcal{F} }[/math] has average 0.
  • If [math]\displaystyle{ A }[/math] is a set such that [math]\displaystyle{ \mathbb{E}_xf(x)\geq c }[/math] for some function[math]\displaystyle{ f\in\mathbb{F} }[/math], then [math]\displaystyle{ A }[/math] is easily seen not to be [math]\displaystyle{ \eta(c) }[/math]-quasirandom.
  • If [math]\displaystyle{ A }[/math] is a not [math]\displaystyle{ c }[/math]-quasirandom set then there exists [math]\displaystyle{ f\in\mathcal{F} }[/math] such that [math]\displaystyle{ \mathbb{E}_x1_A(x)f(x)\geq\beta(c). }[/math]
  • Every function [math]\displaystyle{ f\in\mathcal{F} }[/math] can be closely approximated by a linear combination [math]\displaystyle{ \sum_i\lambda_i1_{S_i} }[/math], where each [math]\displaystyle{ S_i }[/math] is a substructure of [math]\displaystyle{ S }[/math] of density [math]\displaystyle{ \sigma_i }[/math], and [math]\displaystyle{ \sum_i|\lambda_i|1_{S_i}\equiv 1. }[/math]

To see the point of the last condition, let us see what happens if [math]\displaystyle{ |\mathbb{E}_xg(x)f(x)|\geq c }[/math]. Assuming that the approximation above is good, we deduce from the triangle inequality that [math]\displaystyle{ \sum_i|\lambda_i||\mathbb{E}_xg(x)1_{S_i}(x)|\geq c/2 }[/math]. But [math]\displaystyle{ \sum_i|\lambda_i|\mathbb{E}_x1_{S_i}(x)=1 }[/math] for every [math]\displaystyle{ x }[/math], from which it follows that [math]\displaystyle{ \sum_i|\lambda_i|\mathbb{E}_xg(x)1_{S_i}(x)=0 }[/math]. Therefore, [math]\displaystyle{ \sum_i|\lambda_i|(|\mathbb{E}_xg(x)1_{S_i}(x)|+\mathbb{E}_xg(x)1_{S_i}(x))\geq (c/2)\sum_i|\lambda_i|\mathbb{E}_x1_{S_i}(x) }[/math], and from this it follows that there exists [math]\displaystyle{ i }[/math] such that [math]\displaystyle{ \mathbb{E}_{x\in S_i}g(x)\geq c/4 }[/math], which in turn implies that [math]\displaystyle{ |A\cap S_i|/|S_i|\geq\delta+c/4. }[/math]