Sperner's theorem
Statement of the theorem
Sperner's theorem as originally stated is a result about set systems. Suppose that you want to find the largest collection of sets [math]\displaystyle{ \mathcal{A} }[/math] such that no set in [math]\displaystyle{ \mathcal{A} }[/math] is a proper subset of any other. Then the best you can do is choose all the sets of some fixed size---and of course the best size to pick is [math]\displaystyle{ \lfloor n/2\rfloor }[/math], since the binomial coefficient [math]\displaystyle{ \binom nm }[/math] is maximized when [math]\displaystyle{ m=\lfloor n/2\rfloor. }[/math]
Sperner's theorem is closely related to the density Hales-Jewett theorem: in fact, it is nothing other than DHJ(2) with the best possible bound. To see this, we associate each set [math]\displaystyle{ A\subset[n] }[/math] with its characteristic function (that is, the sequence that is 0 outside A and 1 in A). If we have a pair of sets [math]\displaystyle{ A\subset B, }[/math] then the two sequences form a combinatorial line in [math]\displaystyle{ [2]^n. }[/math] For example, if n=6 and A and B are the sets [math]\displaystyle{ \{2,3\} }[/math] and [math]\displaystyle{ \{2,3,4,6\} }[/math], then we get the combinatorial line that consists of the two points 011000 and 011101, which we can denote by 011*0* (so the wildcard set is [math]\displaystyle{ \{4,6\} }[/math]).
Proof of the theorem
There are several proofs, but perhaps the most enlightening is a very simple averaging argument that proves a stronger result. Let [math]\displaystyle{ \mathcal{A} }[/math] be a collection of subsets of [n]. For each k, let [math]\displaystyle{ \delta_k }[/math] denote the density of [math]\displaystyle{ \mathcal{A} }[/math] in the kth layer of the cube: that is, it is the number of sets in [math]\displaystyle{ \mathcal{A} }[/math] of size k, divided by [math]\displaystyle{ \binom nk. }[/math] The equal-slices measure of [math]\displaystyle{ \mathcal{A} }[/math] is defined to be [math]\displaystyle{ \delta_0+\dots+\delta_n. }[/math]
Now the equal-slices measure of [math]\displaystyle{ \mathcal{A} }[/math] is easily seen to be equal to the following quantity. Let [math]\displaystyle{ \pi }[/math] be a random permutation of [n], let [math]\displaystyle{ U_0,U_1,U_2\dots,U_n }[/math] be the sets [math]\displaystyle{ \emptyset, \{\pi(1)\},\{\pi(1),\pi(2)\},\dots,[n], }[/math] and let [math]\displaystyle{ \mu(\mathcal{A}) }[/math] be the expected number of the sets [math]\displaystyle{ U_i }[/math] that belong to [math]\displaystyle{ \mathcal{A}. }[/math] This is the same by linearity of expectation and the fact that the probability that [math]\displaystyle{ U_k }[/math] belongs to [math]\displaystyle{ \mathcal{A} }[/math] is [math]\displaystyle{ \delta_k. }[/math]
Therefore, if the equal-slices measure of [math]\displaystyle{ \mathcal{A} }[/math] is greater than 1, then the expected number of sets [math]\displaystyle{ U_k }[/math] in [math]\displaystyle{ \mathcal{A} }[/math] is greater than 1, so there must exist a permutation for which it is at least 2, and that gives us a pair of sets with one contained in the other.
To see that this implies Sperner's theorem, one just has to make the simple observation that a set with equal-slices measure at most 1 must have cardinality at most [math]\displaystyle{ \binom n{\lfloor n/2\rfloor}. }[/math] (If n is odd, so that there are two middle layers, then it is not quite so obvious that to have an extremal set you must pick one or other of the layers, but this is the case.) This stronger version of the statement is called the LYM inequality
Multidimensional versions
Basic version
Sperner's theorem allows us to find a combinatorial line in any dense subset of [math]\displaystyle{ [2]^n. }[/math] What if we want to find a higher-dimensional subspace? Here we briefly sketch a proof that this can be done. We make no attempt to optimize any bounds.
The proof uses the argument for the one-dimensional version together with one extra ingredient, which is the following lemma.
Lemma. Let X be a set of size N and let [math]\displaystyle{ A_1,\dots,A_m }[/math] be subsets of X of average size [math]\displaystyle{ \delta N. }[/math] Then the average size of the intersections [math]\displaystyle{ A_i\cap A_j }[/math] with [math]\displaystyle{ i\ne j }[/math] is at least [math]\displaystyle{ (\delta^2-\delta/m)N. }[/math]
Proof. For each [math]\displaystyle{ x\in X }[/math] let f(x) be defined to be the number of i such that [math]\displaystyle{ x\in A_i. }[/math] Then the average of f(x) is [math]\displaystyle{ \delta m, }[/math] so the average of [math]\displaystyle{ f(x)^2 }[/math] is at least [math]\displaystyle{ \delta^2m^2. }[/math] But the sum over all [math]\displaystyle{ f(x)^2 }[/math] is the sum over all i and j of [math]\displaystyle{ |A_i\cap A_j|. }[/math] If we define [math]\displaystyle{ g(A_i,A_j) }[/math] to be [math]\displaystyle{ |A_i\cap A_j| }[/math] when [math]\displaystyle{ i\ne j }[/math] and 0 when [math]\displaystyle{ i=j, }[/math] then we find that the sum of all [math]\displaystyle{ g(A_i,A_j) }[/math] is at least [math]\displaystyle{ \delta^2m^2N-\delta mN. }[/math] Therefore the average of [math]\displaystyle{ g(A_i,A_j), }[/math] and hence the average over all [math]\displaystyle{ i\ne j, }[/math] is at least [math]\displaystyle{ (\delta^2-\delta/m)N, }[/math] as stated.
Now let us suppose that [math]\displaystyle{ \mathcal{A} }[/math] is a subset of [math]\displaystyle{ [2]^n }[/math] of density [math]\displaystyle{ \delta. }[/math] Let n=p+q (where p and q are to be chosen at the stage where one actually tries to optimize what this argument gives) and think of [math]\displaystyle{ [2]^n }[/math] as [math]\displaystyle{ [2]^p\times[2]^q. }[/math] For each [math]\displaystyle{ y\in[2]^p }[/math] let [math]\displaystyle{ \mathcal{A}_y }[/math] be the set of [math]\displaystyle{ z\in[2]^q }[/math] such that [math]\displaystyle{ (y,z)\in\mathcal{A}. }[/math] Then the density of y such that [math]\displaystyle{ \mathcal{A}_y }[/math] has density at least [math]\displaystyle{ \delta/2 }[/math] is at least [math]\displaystyle{ \delta/2. }[/math] From this it is easy to prove that there is a random permutation of [p] such that for at least [math]\displaystyle{ 8/\delta }[/math] sequences x corresponding to initial segments the density of [math]\displaystyle{ \mathcal{A}_x }[/math] is at least [math]\displaystyle{ \delta/2. }[/math] But that, by the lemma, implies that there are two such sequences, x and x', say, such that [math]\displaystyle{ \mathcal{A}_x\cap\mathcal{A}_{x'} }[/math] has density at least [math]\displaystyle{ \delta^2/8. }[/math] Now we fix such a pair x and x' and use induction to say that [math]\displaystyle{ \mathcal{A}_x\cap\mathcal{A}_{x'} }[/math] contains a (k-1)-dimensional combinatorial subspace, which implies that [math]\displaystyle{ \mathcal{A} }[/math] contains a k-dimensional combinatorial subspace.
Strong version
An alternative argument deduces the multidimensional Sperner theorem from the density Hales-Jewett theorem. We can think of [math]\displaystyle{ [2]^n }[/math] as [math]\displaystyle{ [2^k]^{n/k}. }[/math] If we do so and apply DHJ(2^k) and translate back to [math]\displaystyle{ [2]^n, }[/math] then we find that we have produced a k-dimensional combinatorial subspace. This is obviously a much more sophisticated proof, since DHJ(2^k) is a very hard result, but it gives more information, since the wildcard sets turn out to have the same size. A sign that this strong version is genuinely strong is that it implies Szemerédi's theorem. For instance, suppose you take as your set [math]\displaystyle{ \mathcal{A} }[/math] the set of all sequences such that the number of 0s plus the number of 1s in even places plus twice the number of 1s in odd places belongs to some dense set in [math]\displaystyle{ [3n]. }[/math] Then if you have a 2D subspace with both wildcard sets of size d, one wildcard set consisting of odd numbers and the other of even numbers (which this proof gives), then this implies that in your dense set of integers you can find four integers of the form a, a+d, a+2d, a+d+2d, which is an arithmetic progression of length 4.
Further remarks
The k=3 generalisation of the LYM inequality is the hyper-optimistic conjecture.
Sperner's theorem is also related to the Kruskal-Katona theorem.