Proof of DHJ(3) via density-increment

From Polymath1Wiki

Jump to: navigation, search

Contents

Introduction

This article is intended to contain a proof outline of DHJ(3) via the density-increment method. The proofs of each step in the outline will appear in separate articles. The goal is to rigorously codify the "second outline" proof. A secondary goal is to write the proof in such a way that it becomes "easy" to see the generalization to DHJ(k). Each step of the outline has a "Proofread By" section to which people can add their names.


Notation and definitions

We'll call the main probability distribution we work with the Pólya Urn distribution. As a distribution on [k]n, we'll write it as \nu_k^n, or as νk or ν if it's clear from context. (Note: even if one only wants to prove DHJ(3), one needs to define \nu_k^n for general k.) One way to define it is as follows: Order the coordinates [n] in a list according to a random permutation. Now choose k − 1 distinct "partition points" uniformly at random from the \binom{n+1}{k-1} possibilities. This partitions the list of coordinates into k sublists; take the 1st sublist of coordinates to be 1's, the 2nd sublist to be 2's, etc. Note that any string drawn from the Pólya Urn distribution is nondegenerate, meaning it contains at least one of each character in [3].

A closely related distribution is called the Equal Slices distribution. We will denote this distribution by \overline{\nu}_k^n. It has the same definition as the Pólya Urn definition just given, except that the k − 1 partition points are "allowed to be equal"; more precisely, we choose the 1st partition point uniformly from the n + 1 possibilities, then we choose the 2nd partition point uniformly from the n + 2 possibilities, etc. It should be clear from these definitions that the total variation distance between \nu_k^n and \overline{\nu}_k^n is small. Here is a proof that it is O(k2 / n).

We identify combinatorial lines in [3]n with strings in [4]n, where the 4 character is treated as identifying the "wildcard" set. Similarly, we identify 2-dimensional combinatorial subspaces in [3]n with strings in [5]n, where the characters 4 and 5 identify the 2 wildcard sets. Etc. Note that a combinatorial line in [3]n drawn from \nu_4^n is automatically nondegenerate. Henceforth, whenever we say "combinatorial line" or "combinatorial subspace" without qualification, the object is assumed to be nondegenerate.

Although we will prove DHJ(3) for sets that are dense under ν3, this also proves it for sets that are dense under the usual uniform distribution; this should be explicated in the article on passing between measures.


Proof Outline

Here is the sequence of statements we prove:

1. Dense sets in [2]n contain lines (i.e., DHJ(2)).

2. Dense sets in [2]n contain large subspaces.

3. Dense ij-insensitive sets in [3]n contain large subspaces.

4. Dense ij-insensitive sets in [3]n are large-subspace rich.

5. Dense ij-insensitive sets in [3]n are large-subspace partitionable.

6. Dense intersections of ij-insensitive sets in [3]n are large-subspace partitionable.

7. Line-free sets in [3]n have density increments on dense intersections of ij-insensitive sets.

8. Line-free sets in [3]n have density increments on large subspaces.

9. Dense sets in [3]n contain lines (i.e., DHJ(3)).


1. Sets in [2]n have lines

Proofread By: no one yet

This is simply Sperner's Theorem. Our version, with the Pólya Urn distribution, is:

Theorem 1: Let A \subseteq [2]^n have ν2-density at least δ. Assume that n \geq n_1(\delta), where n1(δ) = O(1 / δ2). Then A contains a combinatorial (2-)line.
Proof of Sperner's Theorem. For now, the proof more or less appears in the article on Sperner's Theorem; we might want to use the writeup ideas in the current article on Equal Slices. The easiest proof involves first passing from Pólya to Equal Slices. I think you only need n1(δ) > 1 / δ2 + 1 or something.


2. Sets in [2]n have subspaces

Proofread By: no one yet

The next step is to extend DHJ(2) from lines to subspaces of arbitrary dimension.

Lemma 2: Let A \subseteq [2]^n have ν2-density at least δ, and let d \geq 1. Assume that n \geq n_2(\delta,d). Then A contains a d-dimensional combinatorial subspace. Here n2(δ,d) probably only needs to be O(1/\delta)^{2^{d+1}}

This sort of result has been shown for the uniform distribution case by [[1]], from which one can easily conclude the Pólya distribution case by passing between measures. Alternatively, one can prove the result with a (much) worse bound on n2 using an argument which generalizes to [k]n. A few technical bits need to be filled into this argument; namely, showing that if one does Pólya on some small random subset S \subseteq [n], and then combines this with an independent Pólya draw on [n] \setminus S, the overall draw is very close in total variation distance to a normal Pólya draw. This is proved rather explicitly in the passing between measures article. (Presumably there is also literature from the Pólya Urn studiers on this, too.)


3. ij-insensitive sets in [3]n have subspaces

Proofread By: no one yet

Let i,j \in [k] be distinct characters. We define A \subseteq [k]^n to be an "ij-insensitive set" if for all strings x, changing i's to j's in x or vice versa does not change whether or not x \in A.

An 23-insensitive in [3]n is elsewhere on the blog called a "1-set", since it depends only on the set of coordinates where the value 1 is taken. It is a special case of a special set of complexity 1. See also this article discussing the complexity of sets.

The statement we prove here is:

Lemma 3: Let A \subseteq [3]^n be an ij-insensitive set with ν3-density at least δ, and let d \geq 1. Assume that n \geq n_3(\delta,d) = \sqrt{2/\delta} \cdot n_2(\delta/2, d). Then A contains a d-dimensional combinatorial subspace.

We sketch the proof here, for now.

Proof: (Sketch.) Assume without loss of generality that i = 2, j = 3. We can think of a draw x \sim \nu_3^n by first conditioning on the set of coordinates S where x has its 3's, and then drawing the remainder of the string as y \sim \nu_2^{|\overline{S}|}, where \overline{S} = [n] \setminus S. (This requires a quick justification.) Inventing some notation, we have \mathbf{E}_S[\nu_2(A_S)] = \delta. Further, it should be easy to check that \mathbf{Pr}[|\overline{S}| <\gamma n] \leq \gamma^2 (or perhaps only a tiny bit higher). Hence \mathbf{E}_S[\nu_2(A_S) \mid |\overline{S}| \geq \gamma n] \geq \delta - \gamma^2. We set \gamma = \sqrt{\delta/2} and conclude that there exists a particular set of 3-coordinates S0 such that \nu_2(A_{S_0}) \geq \delta/2 and |\overline{S_0}| \geq \sqrt{\delta/2} n. By choice of n2, this lets us apply Lemma 2 to conclude that A_{S_0} \subseteq [2]^{S_0} contains a d-dimensional combinatorial subspace; call it σ.
Pulling σ back into [3]n by putting 3's into the S coordinates, we get a d-dimensional "2-subspace" in A \subseteq [3]^n. But in fact, all of the strings necessary to fill this out into a complete d-dimensional subspace in [3]n must also be present in A, because A is 23-insensitive. \Box (somewhat poorly explained)


4. ij-insensitive sets are subspace rich

Proofread By: no one yet

Lemma 4: Let A \subseteq [3]^n be an ij-insensitive set with ν-density at least δ, and let d \geq 1. Assume that n \geq n_4(\delta,d) = n_3(\delta/2,d). Then a random d-dimensional subspace drawn from \nu^n_{3+d} is in A with probability at least \eta_4(\delta, d) = \delta/(O(d))^{n_4(\delta,d)}.

This is sketched originally here. Here is a more complete argument:

Proof: (Sketch.) For any 3 \leq m \leq n one can draw a string h \sim \nu_3^n as follows: First, draw f \sim \nu_m^n; then independently draw g \sim \nu_3^m; finally, set h = g \circ f. This notation means that h_i = g_{f_i}. The fact that this indeed gives the distribution \nu_3^n is justified in [document], but should be simplified and ported to the wiki.
By assumption, \mathbf{Pr}[h \in A] \geq \delta; hence with probability at least δ / 2 over the choice of f we get that A_f \subseteq [3]^m has ν3-density at least δ / 2. Here Af of course denotes \{g \in [3]^m : g \circ f \in A. Crucially, Af is an ij-insensitive set because A. (This takes a slight bit of reflection but is, I think, easily confirmed.) Call an f "good" if indeed \nu_3(A_f) \geq \delta/2.
We therefore select m = n3(δ / 2,d) (which is allowable as n \geq n_3(\delta/2,d)) and use Lemma 3 to conclude that for each good f, the set Af contains some d-dimensional subspace. In these cases, a randomly chosen subspace σ˜[3 + d]m is in Af with some positive probability. At this point, we should calculate the least probability of any outcome under \nu_k^n; I believe it is \frac{\Theta(1/k)^{k/2}}{n^{(k-1)/2} \cdot k^n}. Lower-bounding this crudely, we have that σ˜[3 + d]m is in Af with probability at least (O(d))m.
So overall we have that for f \sim \nu_3^m and σ˜[3 + d]m, the probability of \sigma \circ f being entirely in A is at least \delta/(O(d))^m = \delta/(O(d))^{n_3(\delta/2,d)}. But \sigma \circ f is distributed as \nu_{3+d}^n, completing the proof. \Box


5. ij-insensitive sets are subspace partitionable

Proofread By: no one yet

Definition: A set A \subseteq [3]^n is "(γ,d)-subspace partitionable" if it can be written as a disjoint union of d-dimensional combinatorial subspaces, along with a single "error" set E satisfying \nu(E)/\nu(A) \leq \gamma.
Lemma 5: Let A \subseteq [3]^n be an ij-insensitive set with ν-density at least δ, let d \geq 1, and let \gamma \in (0,1) be a parameter. Assume that n \geq n_5(\delta,\gamma,d) = XXX. Then A is (γ,d)-subspace partitionable.

This is currently written fairly thoroughly here, although the parameters need to be worked out to fill in the "XXX" in the statement.

6. Intersections of ij-insensitive sets are subspace partitionable

Proofread By: no one yet

Currently written here.


7. Line-free sets increment on intersections of ij-insensitive sets

Proofread By: no one yet

Written mostly completely here, although some of the "passing between measures" tricks that were done for the uniform distribution should be done for the Pólya distribution instead.


8. Line-free sets increment on subspaces

Proofread By: no one yet

Sketch here.


9. Sets in [3]n have lines

Proofread By: no one yet

The basic idea is described in the article on the density increment method.

Personal tools