A Hilbert space lemma
Motivation
The Khintchine recurrence theorem has a Hilbert space formulation:
- Theorem If [math]\displaystyle{ U: H \to H }[/math] is a unitary operator, and 1, f are elements of H of norm at most 1 with [math]\displaystyle{ U^* 1 = 1 }[/math] and [math]\displaystyle{ \langle f, 1 \rangle = \delta }[/math], then there exist arbitrarily large n with [math]\displaystyle{ \langle f, U^n f \rangle \geq \delta^2 - o(1) }[/math].
Proof The von Neumann ergodic theorem, that asserts that [math]\displaystyle{ \frac{1}{N} \sum_{n=1}^N U^n f }[/math] converges strongly to Pf for some orthogonal projection P, and any f. In particular, [math]\displaystyle{ \langle f, U^n f \rangle }[/math] converges in the Cesaro sense to [math]\displaystyle{ \langle f, Pf \rangle = \|Pf\|^2 }[/math]. On the other hand, as [math]\displaystyle{ U^* 1 = 1 }[/math], we have [math]\displaystyle{ P^* 1 = 1 }[/math], and hence [math]\displaystyle{ \langle 1, Pf \rangle = \langle 1, f \rangle = \delta }[/math]. By Cauchy-Schwarz we thus have [math]\displaystyle{ \|Pf\| \geq \delta }[/math], and the claim follows. [math]\displaystyle{ \Box }[/math]
The theorem implies in particular that if A is a subset of [N] of density at least [math]\displaystyle{ \delta }[/math], then there are many n for which [math]\displaystyle{ A \cap (A-n) }[/math] has density at least [math]\displaystyle{ \delta^2 - o(1) }[/math].
The argument relies on the ability to iterate a single operator U many times. In the situation relevant to DHJ, U corresponds to an operation such as flipping a 0 to a 1, and we cannot iterate any single operator more than once, let alone n times. Instead, what one has is an "IP system" of operators [math]\displaystyle{ U_1, U_2, \ldots }[/math], each of which can be used exactly once (think of [math]\displaystyle{ U_i }[/math] as the operation of flipping the "i^th digit" from 0 to 1). We are then interested in lower bounds on expressions such as [math]\displaystyle{ \langle f, U_{i_n} \ldots U_{i_1} f \rangle }[/math] for some [math]\displaystyle{ 0 \lt i_1 \lt \ldots \lt i_n }[/math]. In view of the above proof, it is natural to seek some sort of ergodic theorem that makes [math]\displaystyle{ U_{i_n} \ldots U_{i_1} }[/math] converge in some weak sense to an orthogonal projection.
Preliminary lemma
A preliminary lemma in this direction is as follows.
- Lemma Let H be a Hilbert space with nested subspaces
- [math]\displaystyle{ V_0 \subset V_1 \subset \ldots \subset V_m \subset H }[/math].
- Suppose [math]\displaystyle{ V_0 }[/math] contains two elements 1, f.
- For each [math]\displaystyle{ 1 \leq i \leq m }[/math], let [math]\displaystyle{ U_i: H \to H }[/math] be a bounded linear operator. We assume the following axioms:
- (Boundedness) [math]\displaystyle{ 1, f }[/math] have norm at most 1.
- (Positive density) [math]\displaystyle{ \langle 1, f \rangle \geq \delta \gt 0 }[/math].
- (Measure preservation) [math]\displaystyle{ U_i^* 1 = 1 }[/math] for all i.
- (Substationarity) Each [math]\displaystyle{ U_i }[/math] has operator norm at most [math]\displaystyle{ 1+O(\varepsilon) }[/math].
- (Measurability) For each i, [math]\displaystyle{ U_i }[/math] maps [math]\displaystyle{ V_{i-1} }[/math] to [math]\displaystyle{ V_i }[/math].
- (Stability) For each [math]\displaystyle{ 0 \leq i \lt m-1 }[/math], the operators [math]\displaystyle{ \Pi_{V_i} U_j \Pi_{V_i} }[/math] for [math]\displaystyle{ i \lt j \leq m }[/math] and [math]\displaystyle{ \Pi_{V_i} U_k U_j \Pi_{V_i} }[/math] for [math]\displaystyle{ i \lt j \lt k \leq m }[/math] differ from each other by at most [math]\displaystyle{ O(\varepsilon) }[/math] in the operator norm, where [math]\displaystyle{ \Pi_{V_i} }[/math] is the orthogonal projection onto [math]\displaystyle{ V_i }[/math].
- (Parameter sizes) [math]\displaystyle{ \varepsilon }[/math] is sufficiently small depending on [math]\displaystyle{ \delta }[/math], and m is sufficiently large depending on [math]\displaystyle{ \delta, \varepsilon }[/math].
- Then there exists [math]\displaystyle{ 1 \leq i \leq m }[/math] such that [math]\displaystyle{ \langle f, U_i f \rangle \gt \delta^2 - o(1) }[/math], where o(1) goes to zero as [math]\displaystyle{ \varepsilon \to 0 }[/math].
Sketch of infinitary proof
By a compactness argument it suffices to treat the case [math]\displaystyle{ \varepsilon=0, m = \infty }[/math]. We may then restrict H to the closure of [math]\displaystyle{ \bigcup_i V_i }[/math]. The stability hypothesis is then asserting that the [math]\displaystyle{ U_i }[/math] converge in the weak operator topology to some limit P, and that [math]\displaystyle{ U_k U_j }[/math] also converge in this topology. By computing the limit [math]\displaystyle{ \lim_{k \to \infty} \lim_{j \to \infty} \langle U_k U_j v, w \rangle }[/math] in two different ways, we conclude that [math]\displaystyle{ P^2 = P }[/math], i.e. P is idempotent. On the other hand, from substationarity we know that P has operator norm at most 1. The two facts are only compatible when P is an orthonormal projection. It now suffices to show that [math]\displaystyle{ \langle f, Pf \rangle \geq \delta^2, }[/math]. But the left-hand side is [math]\displaystyle{ \|Pf\|^2 }[/math]; meanwhile, positive density and measure preservation give us [math]\displaystyle{ P^* 1 = 1 }[/math] and thus [math]\displaystyle{ \langle Pf, 1 \rangle \geq \delta }[/math], and the claim follows from Cauchy-Schwarz.
Finitary proof
By the stability hypothesis, we have an operator [math]\displaystyle{ P_i = \Pi_{V_i} P_i \Pi_{V_i} }[/math] such that
- [math]\displaystyle{ \Pi_{V_i} U_j \Pi_{V_i} = P_i + o(1) }[/math] (1)
for all [math]\displaystyle{ i \lt j \leq m }[/math] and
- [math]\displaystyle{ \Pi_{V_i} U_k U_j \Pi_{V_i} = P_i + o(1) }[/math] (2)
for all [math]\displaystyle{ i \lt j \lt k \leq m }[/math], where o(1) denotes a quantity that goes to zero as [math]\displaystyle{ \varepsilon \to 0 }[/math] (in the operator norm for operators, or the Hilbert space norm for vectors). By the substationarity hypothesis, [math]\displaystyle{ P_i }[/math] has operator norm [math]\displaystyle{ 1+o(1) }[/math].
From (1) we see that [math]\displaystyle{ \| P_i f \|^2 }[/math] is monotone increasing in i up to errors of [math]\displaystyle{ o(1) }[/math]. By the pigeonhole principle, one can thus find [math]\displaystyle{ i_0 }[/math] such that
- [math]\displaystyle{ \| P_{i_0 + \lfloor 1/\varepsilon \rfloor} f\|^2 = \| P_{i_0} f \|^2 + o(1) }[/math]
which then implies that
- [math]\displaystyle{ P_i f = P_j f + o(1) }[/math] (3)
for [math]\displaystyle{ i_0 \leq i, j \leq i_0 + 1/\varepsilon }[/math]. A second application of the pigeonhole principle then locates [math]\displaystyle{ i_0 \leq i_1 \leq i_0 + 1/\varepsilon - 3 }[/math] such that
- [math]\displaystyle{ \| P_{i_1+3}^* P_{i_0} f\|^2 = \| P_{i_1}^* P_{i_0} f \|^2 + o(1) }[/math]
and thus
- [math]\displaystyle{ P_i^* P_{i_0} f = P_j^* P_{i_0} f + o(1) }[/math] (4)
for [math]\displaystyle{ i_1 \leq i,j \leq i_1+3 }[/math].
Now we compute the expression
- [math]\displaystyle{ \| P_{i_0} f \|^2 = \langle P_{i_0} f, P_{i_0} f \rangle. }[/math]
From (2), this is equal to
- [math]\displaystyle{ \langle U_{i_1+2} U_{i_1+1} f, P_{i_0} f \rangle + o(1). }[/math]
Applying (1), this is equal to
- [math]\displaystyle{ \langle P_{i_1+1} U_{i_1+1} f, P_{i_0} f \rangle + o(1). }[/math]
Moving the [math]\displaystyle{ P_{i_1+1} }[/math] temporarily to the other side of the inner product, applying (4), and then moving it back, this is equal to
- [math]\displaystyle{ \langle P_{i_1} U_{i_1+1} f, P_{i_0} f \rangle + o(1). }[/math]
which by (1) is equal to
- [math]\displaystyle{ \langle P_{i_1} P_{i_1} f, P_{i_0} f \rangle + o(1). }[/math]
Aplying (3), this is equal to
- [math]\displaystyle{ \langle P_{i_1} P_{i_0} f, P_{i_0} f \rangle + o(1). }[/math]
which is in turn equal to
- [math]\displaystyle{ \langle P_{i_0} f, P_{i_1}^* P_{i_0} f \rangle + o(1). }[/math]
On the other hand, by Cauchy-Schwarz and the boundedness of [math]\displaystyle{ P_{i_1} }[/math], this is bounded by [math]\displaystyle{ \|P_{i_0} f \|^2 + o(1) }[/math]. In order for Cauchy-Schwarz to be sharp, we are thus forced to conclude that
- [math]\displaystyle{ P_{i_1}^* P_{i_0} f = P_{i_0} f + o(1) }[/math]. (5)
Now, we look at the expression
- [math]\displaystyle{ \langle f, P_{i_0} f \rangle. }[/math]
By (5), this is
- [math]\displaystyle{ \langle P_{i_1} f, P_{i_0} f \rangle + o(1) }[/math]
which by (3) is equal to
- [math]\displaystyle{ \| P_{i_0} f \|^2 + o(1). }[/math]
On the other hand, from positive density and measure preservation we have
- [math]\displaystyle{ \langle U_{i_0+1} f, 1 \rangle \geq \delta }[/math]
and thus by (1)
- [math]\displaystyle{ \langle P_{i_0} f, 1 \rangle \geq \delta. }[/math]
By Cauchy-Schwarz we thus have
- [math]\displaystyle{ \| P_{i_0} f \|^2 \geq \delta^2 }[/math]
and the claim follows. [math]\displaystyle{ \Box }[/math]