Fourier-analytic proof of Sperner
Fourier analysis
We identify [math]\displaystyle{ [2]^n }[/math] with the power set [math]\displaystyle{ 2^{[n]} }[/math] of [n]. Given any function [math]\displaystyle{ f: 2^{[n]} \to {\Bbb R} }[/math], we define the Fourier transform [math]\displaystyle{ \hat f: 2^{[n]} \to {\Bbb R} }[/math] by the formula
- [math]\displaystyle{ \hat f(A) := {\Bbb E}_{X \in 2^{[n]}} (-1)^{|A \cap X|} f(X) }[/math];
we have the Plancherel formula
- [math]\displaystyle{ \sum_{A \in 2^{[n]}} |\hat f(A)|^2 = {\Bbb E}_{X \in 2^{[n]}} |f(X)|^2 }[/math] (1)
and the inversion formula
- [math]\displaystyle{ f(X) = \sum_{A \in 2^{[n]}} (-1)^{|A \cap X|} \hat f(A) }[/math]. (2)
For any scale [math]\displaystyle{ 1 \lt r \lt n }[/math], we consider the bilinear form
- [math]\displaystyle{ B_r( f, g ) := {\Bbb E} f(X) g(Y) }[/math] (3)
where X is drawn randomly from [math]\displaystyle{ 2^{[n]} }[/math], and Y is formed from X by setting [math]\displaystyle{ j \in Y }[/math] whenever [math]\displaystyle{ j \in X }[/math], and when [math]\displaystyle{ j \not \in X }[/math], setting [math]\displaystyle{ j \in Y }[/math] with probability [math]\displaystyle{ r/n }[/math]. To avoid significant distortion in the measure, we will be working in the regime [math]\displaystyle{ 1 \leq r \ll \sqrt{n} }[/math].
We also define the influence Inf(f) as
- [math]\displaystyle{ Inf(f) := {\Bbb E} |f(X) - f(X')|^2 }[/math] (4)
where X' is formed from X by randomly flipping one bit (from 0 to 1, or vice versa). In Fourier space, we have
- [math]\displaystyle{ Inf(f) = \sum_{A \in 2^{[n]}} \frac{|A|}{n} |\hat f(A)|^2 }[/math]. (5)
Dichotomy between structure and randomness
For any [math]\displaystyle{ f: 2^{[n]} \to [-1,1] }[/math], define the uniformity norm [math]\displaystyle{ \|f\|_{U(r)} }[/math] at scale r by the formula
- [math]\displaystyle{ \|f\|_{U(r)} := \sup_g \max( |B_r(f,g)|, |B_r(g,f)| ) }[/math] (6)
where g ranges over all functions [math]\displaystyle{ g: 2^{[n]} \to [-1,1] }[/math].
- Lemma 1 Let [math]\displaystyle{ f: 2^{[n]} \to [-1,1] }[/math] be such that [math]\displaystyle{ \|f\|_{U(r)} \geq \eta }[/math], where [math]\displaystyle{ 1 \leq r \ll \sqrt{n} }[/math] and [math]\displaystyle{ \eta \gt 0 }[/math]. Then there exists a function [math]\displaystyle{ F: 2^{[n]} \to [-1,1] }[/math] of influence [math]\displaystyle{ Inf(F) \ll_\eta \frac{1}{r} }[/math] such that
- [math]\displaystyle{ {\Bbb E} f F \gg_\eta 1 }[/math]
It seems that this is more or less proven in Ryan's notes. (May be worth redoing it here on the wiki. Note that the scale r here basically corresponds to [math]\displaystyle{ \epsilon n }[/math] in Ryan's notes.)
Structure theorem
Now we use the general "energy increment" argument (see e.g. Terry's paper on this) to obtain a structural theorem.
- Corollary 2 (Koopman-von Neumann type theorem) Let [math]\displaystyle{ f: 2^{[n]} \to [0,1] }[/math] and [math]\displaystyle{ 0 \lt \eta \lt 1 }[/math]. Then there exists [math]\displaystyle{ k = k(\eta) }[/math] such that whenever [math]\displaystyle{ \sqrt{n} \gg r_k \gt \ldots \gt r_1 \gt 1 }[/math] be a sequence of scales. Then there exists [math]\displaystyle{ 1 \leq i \leq k }[/math] and a decomposition
- [math]\displaystyle{ f = f_{U^\perp} + f_U }[/math] (7)
- where [math]\displaystyle{ f_{U^\perp}, f_U }[/math] take values in [-1,1],
- [math]\displaystyle{ Inf( f_{U^\perp} ) \ll_{\eta} 1/r_i }[/math] (8)
- and
- [math]\displaystyle{ \| f_U \|_{U(r_{i+1})} \leq \eta. }[/math] (9)
- Furthermore, [math]\displaystyle{ f_{U^\perp} }[/math] is non-negative and
- [math]\displaystyle{ {\Bbb E} f = {\Bbb E} f_{U^\perp}. }[/math] (10)
Proof (sketch) It's likely that there is now a slick proof of this type of thing using the duality arguments of Gowers or of Reingold-Trevisan-Tulsiani-Vadhan. But here is the "old school" approach from my long AP paper with Ben, running the energy increment algorithm:
- Initialise i=1, and let [math]\displaystyle{ {\mathcal B} }[/math] be the trivial partition of [math]\displaystyle{ [2]^n }[/math].
- Let [math]\displaystyle{ f_{U^\perp} := {\Bbb E}(f|{\mathcal B}) }[/math] be the conditional expectation of f with respect to the partition; initially, this is just the constant function [math]\displaystyle{ {\Bbb E} f }[/math], but becomes more complicated as the partition gets finer. Set [math]\displaystyle{ f_U := f - f_{U^\perp} }[/math]
- If [math]\displaystyle{ \| f_U \|_{U(r_{i+1})} \leq \eta }[/math] then STOP.
- Otherwise, we apply Lemma 1 to find a function F of influence [math]\displaystyle{ O_\eta(1/r_{i+1}) }[/math] that [math]\displaystyle{ f_U }[/math] correlates with. Partition F into level sets (discretising in multiples of [math]\displaystyle{ \eta }[/math] or so, randomly shifting the cut-points to avoid edge effects) and use this to augment the partition [math]\displaystyle{ {\mathcal B} }[/math]. In doing so, the energy [math]\displaystyle{ {\Bbb E} |f_{U^\perp}|^2 }[/math] increases by some non-negligible amount [math]\displaystyle{ c(\eta) \gt 0 }[/math], thanks to Pythagoras' theorem: this is why this process will stop before k steps.
- Now return to Step 1.
A certain tedious amount of verification is needed to show that all the hypotheses are satisfied. One technical step is that one needs to use the Weierstrass approximation theorem to approximate [math]\displaystyle{ f_{U^\perp} }[/math] by a polynomial combination of low-influence functions, and one needs to show that polynomial combinations of low-influence functions are low-influence. The complexity of the polynomial is controlled by some function of [math]\displaystyle{ \eta }[/math] and this explains the losses in that parameter in (8). [math]\displaystyle{ \Box }[/math]
DHJ(2)
Now we prove DHJ(2). Let [math]\displaystyle{ f = 1_A }[/math] have density [math]\displaystyle{ \delta }[/math]. We let [math]\displaystyle{ \eta }[/math] be a small number depending on [math]\displaystyle{ \delta }[/math]. Now we pick some scales [math]\displaystyle{ 1 \leq r_1 \lt \ldots \lt r_k }[/math] between 1 and [math]\displaystyle{ \sqrt{n} }[/math] (the exact choices are not so important, so long as the scales are widely separated), and apply the above corollary, to decompose [math]\displaystyle{ f = f_{U^\perp} + f_U }[/math], where [math]\displaystyle{ f_U }[/math] is uniform at some scale [math]\displaystyle{ r_i }[/math] and [math]\displaystyle{ f_{U^\perp} }[/math] is low-influence at a coarser scale [math]\displaystyle{ r_{i+1} }[/math].
We then look at [math]\displaystyle{ B_{r_i}(f, f) }[/math]. Decomposing into four pieces and using the uniformity of [math]\displaystyle{ f_U }[/math], this expression is equal to [math]\displaystyle{ B_{r_i}(f_{U^\perp}, f_{U^\perp}) + O(\eta) }[/math]. But the low influence of [math]\displaystyle{ f_{U^\perp} }[/math] means that when one randomly flips a bit from a 0 to a 1, [math]\displaystyle{ f_{U^\perp} }[/math] changes by [math]\displaystyle{ O_\eta(1/r_{i+1}) }[/math] on the average. Iterating this [math]\displaystyle{ r_i }[/math] times, we see that [math]\displaystyle{ B_{r_i}(f_{U^\perp}, f_{U^\perp}) = {\Bbb E}( f_{U^\perp}^2 ) + O_\eta( r_i / r_{i+1} ) }[/math]. But [math]\displaystyle{ f_{U^\perp} }[/math] has density [math]\displaystyle{ \delta,/math\gt and so the main term is at least \lt math\gt \delta^2 }[/math]. Choosing [math]\displaystyle{ \eta }[/math] small enough and the [math]\displaystyle{ r_i }[/math] widely separated enough we obtain a nontrivial lower bound on [math]\displaystyle{ B_{r_i}(f_{U^\perp}, f_{U^\perp}) }[/math], which gives DHJ(2).