Obtaining the correct bound in Roth's discrepancy theorem
\documentclass[12pt]{amsart} \title{Discrepancy theorems and representing diagonal matrices} \author{D. H. J. Polymath}
\usepackage{amsmath, wrapfig} \usepackage{dsfont, a4wide, amsthm, amssymb, amsfonts, graphicx} \usepackage{fancyhdr, xspace, psfrag, setspace, supertabular, color} %\usepackage{showkeys}
\newtheorem{theorem}{Theorem}[section] \newtheorem{proposition}[theorem]{Proposition} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{claim}[theorem]{Claim} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{conjecture}[theorem]{Conjecture} \newtheorem*{definition}{Definition} \newtheorem{problem}[theorem]{Problem} \newtheorem{example}[theorem]{Example} \newtheorem{question}[theorem]{Question} \newtheorem{remark}[theorem]{Remark}
%\setlength{\parindent}{0cm}
\onehalfspacing
\def\E{\mathbb{E}} \def\Z{\mathbb{Z}} \def\R{\mathbb{R}} \def\T{\mathbb{T}} \def\C{\mathbb{C}} \def\N{\mathbb{N}} \def\P{\mathbb{P}} \def\F{\mathbb{F}} \def\Q{\mathbb{Q}}
\def\A{\mathcal{A}}
\def\a{\alpha} \def\b{\beta} \def\g{\gamma} \def\d{\delta} \def\e{\epsilon}
\def\ra{\rightarrow} \def\seq#1#2{#1_1,\dots,#1_#2} \def\sm#1#2{\sum_{#1=1}^#2} \def\sp#1{\langle #1\rangle} \def\ol{\overline} \def\hf{\hat{f}}
\def\disc{\mathrm{disc}}
\begin{document}
\maketitle
\section{Introduction}
Let $X$ be a finite set and let $\A$ be a collection of subsets of $X$. For every red/blue colouring $\kappa$ of $X$, the $\A$-\textit{discrepancy} of $\kappa$ is defined to be the maximum over all $A\in\A$ of the difference between the number of red points in $A$ and the number of blue points in $A$.There are many important problems of the following kind: given a set-system $\A$, obtain bounds for the smallest $\A$-discrepancy of any red/blue colouring of the ground set $X$.
One of the best-known results of this kind is due to Roth. It deals with the case where $X$ is the set $\{1,2,\dots,n\}$ and $\A$ is the set of all subsets of $X$ that are arithmetic progressions. The smallest possible discrepancy was shown by Roth to be at least $cn^{1/4}$, and this bound was proved to be tight by Matousek and Spencer.
Later, Lov\'asz gave a different proof of Roth's lower bound using semidefinite programming. The purpose of this paper is to give a third proof, related to Lov\'asz's proof but not quite the same.
To explain the relationship, we need to say a little bit more about the semidefinite programming approach. So let us begin by stating a rather abstract criterion that gives a lower bound for discrepancy. The first step is to reformulate the problem analytically. A simple reformulation is to replace the red/blue colouring by a function $f$ from $X$ to $\{-1,1\}$. Then the quantity we are trying to minimize is \begin{equation*} \disc_\A(f)=\max_{A\in\A}|\sum_{x\in A}f(x)|. \end{equation*} Once we have reformulated the problem in this way, it is natural to seek a lower bound that is valid for all functions $f$, and not just $\pm 1$-valued functions. Of course, the lower bound will have to depend on $f$, so a natural idea is to try to prove a bound of the form $\disc_\A(f)\geq\|f\|$ for some suitably chosen norm. Since eventually we would like a uniform bound for all $\pm 1$-valued functions, it is also natural if all such functions have the same norm.
It is also natural to try to prove the result by means of some kind of averaging argument, and to use a Euclidean norm for the lower bound. Moreover, one would like to replace the modulus by a modulus squared. With all these thoughts in mind, it is natural to try to find non-negative coefficients $\lambda_A$ and weights $b(x)$ and to seek a bound of the form \begin{equation*} \sum_{A\in\A}\lambda_A|\sum_{x\in A}f(x)|^2\geq\sum_{x\in X}b(x)|f(x)|^2. \end{equation*} If we have such a bound and $f$ is $\pm 1$-valued, then \begin{equation*} \sum_{A\in\A}\lambda_A|\sum_{x\in A}f(x)|^2\geq \sum_{x\in X}b(x), \end{equation*} which implies by averaging that $\disc_\A(f)^2$ is at least $\sum_{x\in X}b(x)/\sum_{A\in\A}\lambda_A$.
The basic idea behind the semidefinite programming method is to observe that \begin{equation*} Q_\lambda:f\mapsto\sum_{A\in\A}\lambda_A|\sum_{x\in A}f(x)|^2 \end{equation*} is a positive semidefinite quadratic form on $\R^X$, and that the desired inequality will be true if the quadratic form \begin{equation*} Q_{\lambda,b}:f\mapsto\sum_{A\in\A}\lambda_A|\sum_{x\in A}f(x)|^2-\sum_{x\in X}b(x)|f(x)|^2 \end{equation*} is also positive semidefinite. So for a given choice of coefficients $\lambda_A$ our problem is reduced to a \textit{semidefinite programming problem}, or SDP for short: that is, we would like to maximize the sum $\sum_xb(x)$ subject to the positive semidefiniteness of the quadratic form $Q_{\lambda,b}$. A great deal is known about SDPs. In particular, there are efficient algorithms for solving them, a fact that has many ramifications in extremal combinatorics and theoretical computer science.
It is straightforward to show that if $Q_{\lambda,b}$ is positive semidefinite, then we can extend the discrepancy result to vector-valued functions. More precisely, if each $f(x)$ is a vector in some Euclidean space $H$, then we have the lower bound \begin{equation*} \sum_{A\in\A}\lambda_A\|\sum_{x\in A}f(x)\|^2\geq\sum_{x\in X}b(x)\|f(x)\|^2. \end{equation*} In particular, it follows that if the $f(x)$ are unit vectors, then $\disc_\A(f)$ (now defined to be the maximum of $\|\sum_{x\in A}f(x)\|$ over all sets $A\in\A$) is again at least the square root of $\sum_{x\in X}b(x)/\sum_{A\in\A}\lambda_A$. Interestingly, the converse is true as well: if $\disc_\A(f)$ is at least $t$ whenever each $f(x)$ is a unit vector in some Hilbert space $H$, then we can find coefficients $\lambda_A$ and $b(x)$ such that $\sum_{x\in X}b(x)/\sum_{A\in\A}\lambda_A=t^2$ and the quadratic form $Q_{\lambda,b}$ is positive semidefinite.
The approach taken in this paper is different, but it is also based on proving an inequality of the form \begin{equation*} \sum_{A\in\A}\lambda_A|\sum_{x\in A}f(x)|^2\geq\sum_{x\in X}b(x)|f(x)|^2. \end{equation*} This we do as follows. For each set $A\in\A$, let us also write $A$ for its characteristic function. That is, $A(x)=1$ if $x\in A$ and 0 otherwise. If $u,v:X\rightarrow\C$, then we write $u\otimes v$ for the matrix with $xy$ entry equal to $u(x)v(y)$. Suppose we can find coefficients $\lambda_{AB}$ such that $\sum_{A,B\in\A}\lambda_{AB}A\otimes B=D$, where $D$ is a diagonal matrix with $D(x,x)=b(x)$ for every $x\in X$. Then by averaging \begin{equation*} \sum_xb(x)|f(x)|^2=\sp{Df,f}=\sum_{A,B\in\A}\lambda_{AB}\sum_{x\in A}f(x)\sum_{y\in B}f(y). \end{equation*} Therefore, there exist sets $A,B\in\A$ such that \begin{equation*} |\sum_{x\in A}f(x)\sum_{y\in B}f(y)|\geq\frac{\sum_xb(x)}{\sum_{A,B\in\A}|\lambda_{AB}|}, \end{equation*} from which it follows that $\disc_\A(f)^2\geq\sum_xb(x)/\sum_{A,B\in\A}|\lambda_{AB}|.$
To convert this observation into a proof, one must find an efficient way of representing a diagonal matrix as a linear combination of rank-1 matrices $A\otimes B$, where $A$ and $B$ belong to the set $\A$. In this paper, we shall show how to do so when $X=\{1,2,\dots,n\}$ and $\A$ is the set of all subsets of $X$ that are arithmetic progressions.
We remark that it is possible to show that this approach works and gives a lower bound of $t$ if and only if the following stronger discrepancy-type statement holds. Let $f$ and $g$ be two functions from $X$ to a Euclidean space $H$, and suppose that $\sp{f(x),g(x)}=1$ for every $x\in X$. Then there exist $A,B\in\A$ such that $\sp{\sum_{x\in A}f(x),\sum_{y\in B}g(y)}\geq t^2$. Note also that if $g=f$ then we recover the statement that was equivalent to the success of the SDP approach, so this approach is less likely to work. However, if it works, then it gives a stronger result, and searching for an efficient representation of a diagonal matrix may be easier than proving that a certain matrix is positive semidefinite.
\section{Roth's discrepancy theorem}
Now let us specialize to the case that will interest us in this paper. That is, let $X=\{1,2,\dots,n\}$ and let $\A$ be the set of arithmetic subprogressions of $X$. We shall let $D$ be the identity matrix: that is, we shall take $b(x)=1$ for every $x$. Also, we shall identify $X$ with $\Z_N$.
For each $r\in\Z_N$ let $\omega_r$ be the function $\omega_r(x)=e^{2\pi irx/N}=\omega^{rx}$, where $\omega=e^{2\pi i/N}$. Our starting point is the following very simple lemma. We write $\E_r$ to stand for $N^{-1}\sum_r$.
\begin{lemma} $I_N=\E_r\omega_r\otimes\overline{\omega_r}$. \end{lemma}
\begin{proof} Let us evaluate the right-hand side at $(x,y)$. We obtain \begin{equation*} \E_r\omega^{rx}\omega^{-ry}=\E_r\omega^r(x-y)=\d_{xy}, \end{equation*} which proves the lemma. \end{proof}
That was not a completely sensible proof: it is straightforward to show that $I_n=\E_ru_r\otimes\overline{u_r}$ for any orthonormal basis $(u_r)$ (with respect to the inner product $\sp{f,g}=\E_xf(x)\overline{g(x)}$).
Next, we show that each function $\omega_r$ can be decomposed as a linear combination of arithmetic progressions with small common difference. For this we need a standard lemma. Here $\|\a\|$ denotes the distance from $\a$ to its nearest integer.
\begin{lemma} \label{dirichlet} Let $0<m\leq N$ and let $\a\in\R$. Then there exists $0<d\leq m$ such that $\|\a d\|\leq m^{-1}$. \end{lemma}
Let us now take some arbitrary $0<m\leq N$, which we will fix later.
For each $r$ let us fix a $d$ that satisfies the conclusion of the lemma with $\alpha=r/N$. From the lemma it follows that for every $r$ we can find $0<d\leq m$ such that $|1-\omega^{rd}|\leq 2\pi/m$. Let us now define $P$ to be the interval $[-m/4\pi, m/4\pi]$ (meaning the set of all residues in $\Z_N$ that lie in this interval when they are considered as real numbers). For each $0<d\leq m$ let $P_d$ be the mod-$N$ progression $dP$. Note that the diameter of $P_d$ is at most $m^2/2\pi$: this will be important to us later. Let $\pi_d$ be the characteristic measure of $P_d$. That is, $\pi_d(x)=N/|P_d|$ when $x\in P_d$ and 0 otherwise.
In the next lemma, we define the convolution of two functions $f$ and $g$ by the formula \begin{equation*} f*g(x)=\E_{y+z=x}f(y)g(z). \end{equation*}
\begin{lemma} \label{convapprox} Suppose that $|1-\omega^{rd}|\leq 2\pi/m$. Then $\omega_r*\pi_d=\lambda_{r,d}\omega_r$ for some real number $\lambda_{r,d}$ that lies between 1/2 and 1. \end{lemma}
\begin{proof} Let us evaluate the left-hand side at $x$. We have \begin{eqnarray*} \omega_r*\pi_d(x) &=& \E_y\omega^{r(x-y)}\pi_d(y)\\ &=&\omega^{rx}\E_{y\in P_d}\omega^{-ry}.\\ \end{eqnarray*} Since $P_d$ is symmetric, $\lambda_{r,d}=\E_{y\in P_d}\omega^{-ry}$ is real. Note also that for every $y\in P_d$ we have $y=sd$ for some $s$ of modulus at most $m/4\pi$, from which it follows that $|1-\omega^{-ry}|\leq(m/4\pi)|1-\omega^{-rd}|\leq 1/2$. Therefore, $\lambda_{r,d}\geq 1/2$ as claimed. It is trivial that $\lambda_{r,d}\leq 1$ since $\|\pi_d\|_1=1$. \end{proof}
Writing $\mu_{r,d}$ for $\lambda_{r,d}^{-1}$, we may rewrite $\omega_r$ as $\mu_{r,d}\omega_r*\pi_d$. Now $\omega_r*\pi_d(x)$ can also be written as $\E_y\omega^{ry}\pi_d(x-y)$. Writing $\pi_{d,y}(x)$ for $\pi_d(x-y)$, we can write this as $\E_y\omega^{ry}\pi_{d,y}(x)$. That is, $\omega_r*\pi_d=\E_y\omega^{ry}\pi_{d,y}$, so $\omega_r=\mu_{r,d}\E_y\omega^{ry}\pi_{d,y}$. Note that $\pi_{d,y}$ is the characteristic measure of the arithmetic progression $P_{d,y}=P_d+y$, so we can also write this as $\mu_{r,d}(N/|P|)\E_y\omega^{ry}P_{d,y}$.
Since, $\omega_r=\mu_{r,d}(N/|P|)\E_y\omega^{ry}P_{d,y}$ for each $r$, and $I_n=\E_r\omega_r\otimes\ol{\omega_r}$, this gives us the decomposition \begin{equation*} I_n=\frac{N^2}{|P|^2}\E_r\mu_{r,d}^2\E_{y,z}\omega^{r(y-z)}P_{d,y}\otimes P_{d,z}. \end{equation*} Our next task is to show that this decomposition is efficient.
To do this, let us fix $d$ and work out the sum over $y$ and $z$ of the absolute values of the coefficients of $P_{d,y}\otimes P_{d,z}$ in the decomposition. Recall that $d=d(r)$: for each $d$ with $0<d\leq m$ let us write $R(d)$ for the set of $r$ such that $d(r)=d$. Since always choose $d$ such that $\|dr/N\|\leq m^{-1}$, it follows that $|R(d)|\leq 3N/m$ for every $d$.
First we rewrite the decomposition itself as \begin{equation*} I_n=\frac{1}{N|P|^2}\sum_r\mu_{r,d(r)}^2\sum_{y,z}\omega^{r(y-z)}P_{d(r),y}\otimes P_{d(r),z}. \end{equation*} Now it becomes easier to see that the coefficient of $P_{d,y}\otimes P_{d,z}$ is \begin{equation*} \frac{1}{N|P|^2}\sum_{r\in R(d)}\mu_{r,d}^2\omega^{r(y-z)}, \end{equation*} so the sum in question is \begin{equation*} \frac{N}{|P|^2}\E_{y,z}|\sum_{r\in R(d)}\mu_{r,d}^2\omega^{r(y-z)}|, \end{equation*}
Let $\nu(r)=\mu_{r,d}^2$ if $r\in R(d)$ and $0$ otherwise. Then this expression is equal to \begin{equation*} \frac{N^2}{|P|^2}\E_{y,z}|\E_r\nu(r)\omega^{r(y-z)}|=\frac{N^2}{|P|^2}\E_y|\E_r\nu(r)\omega^{ry}|. \end{equation*} Now \begin{eqnarray*} (\E_y|\E_r\nu(r)\omega^{ry}|)^2 &=& N^{-2}(\sum_y|\E_r\nu(r)\omega^{ry}|)^2\\ &=& N^{-2}\|\hat{\nu}\|_1^2\\ &\leq& N^{-1}\|\hat{\nu}\|_2^2\\ &=& N^{-1}\|\nu\|_2^2\\ &\leq& 4N^{-2}|R(d)|.\\ \end{eqnarray*}
Therefore, since $|P|\geq m/16$ and $|R(d)|\leq 3N/m$, the coefficients have absolute values summing to at most $256(N/m)^2.2N^{-1/2}m^{-1/2}=512N^{3/2}m^{-5/2}$. Since the set of possible $d$ has size $m$, the sum of the absolute values of \textit{all} coefficients is at most $512(N/m)^{3/2}$.
This gives us a discrepancy bound of $2^{-9/2}m^{3/4}N^{-1/4}$ if we take $\A$ to be the set of all arithmetic progressions of the form $P_{d,y}$. However, although these are arithmetic progressions in the $\Z_N$ sense, they do not necessarily become arithmetic progressions when we identify $\Z_N$ with $\{1,2,\dots,N\}$.
This is where the diameter estimate comes in. Since the diameter of $P_d$ is at most $m^2/2\pi$, we know that each $P_{d,y}$ can wrap around at most once, provided that $m\leq\sqrt{N}$. If it wraps around once, then we can split it into two genuine arithmetic progressions, and at least half the discrepancy must occur on one of them. Thus, by taking $m=\sqrt{N}$ we obtain a bound of size $cN^{3/8}N^{-1/4}=cN^{1/8}$.
\section{Improving the bound}
Our next aim is to get from a bound of $cN^{1/8}$ to a bound of $cN^{1/4}$. We begin by giving the basic thought behind the improvement.
In the argument above, we bounded $\|\hat{\nu}\|_1$ above by $N^{1/2}\|\hat{\nu}\|_2$. Now this bound is sharp only if $\hat{\nu}$ is evenly spread throughout $\Z_N$, but that is not the case. Speaking rather roughly, $\nu$ is a bit like the characteristic function of $R(d)$, which is a bit like an arithmetic progression of length $N/m$, which means that $\hat{\nu}$ should be a bit like an arithmetic progression of length $m$, which means that $\|\hat{\nu}\|_1/\|\hat{\nu}\|_2$ should be more like $m^{1/2}$ than $N^{1/2}$. So if we can make these thoughts precise, then it is reasonable to aspire to an improvement by a factor $(m/N)^{1/2}$ to the sum of the coefficients in the decomposition, which would improve the discrepancy bound by a factor of $(N/m)^{1/4}$, taking it from $cm^{3/4}N^{-1/4}$ to $cm^{1/2}$. Encouragingly, this is exactly the Roth bound when $m$ is at its maximum of $c\sqrt{n}$, and it gives us at least some information for all smaller (but non-constant) values of $m$.
A standard trick that will enable us to avoid unwanted logarithmic factors is to work not with characteristic functions of arithmetic progressions but with similar functions that have absolutely summable Fourier coefficients. The following lemma gives us the main technical fact that we need.
\begin{lemma} \label{ell1} Let $m\leq N/2$ and let $f_{d,m}:\Z_N\rightarrow\C$ be defined by the formula $f_{d,m}(dx)=\max\{1-|x|/m,0\}$ for every $x$ with $|x|\leq N/2$. Then $\|\hat{f}_{d,m}\|_1=1$. \end{lemma}
\begin{proof} Let $P$ be the mod-$N$ arithmetic progression $\{d,2d,\dots,md\}$, and also the characteristic function of that arithmetic progression, and similarly for $-P$. Then \begin{equation*} P*(-P)(xd)=\E_yP(y)P(y-xd)=N^{-1}|P\cap(P+xd)|=N^{-1}\max\{m-|x|,0\}=N^{-1}mf_{d,m}(dx). \end{equation*} We also know that \begin{equation*} \|P*(-P)\|_1=\sum_r\hat{P}(r)\ol{\hat{P}(r)}=\|\hat{P}\|_2^2=\|P\|_2^2=N^{-1}m. \end{equation*} It follows that $\|\hat{f}_{d,m}\|_1=1,$ as claimed. \end{proof}
Note that $\hat{f}$ is a non-negative real-valued function, since $\hat{f}(r)=|\hat{P}(r)|^2$ for every $r$. This fact will also be useful to us.
The next lemma is a variant of Lemma \ref{convapprox}.
\begin{lemma} \label{convapprox2} Let $d,r\in\Z_N$. Then $\omega_r*\pi_d*\pi_d=\mu_{r,d}\omega_r$ for some non-negative real number $\mu_{r,d}$. Moreover, if $|1-\omega^{rd}|\leq 2\pi/m,$ then $1/4\leq\mu_{r,d}\leq 1$. \end{lemma}
\begin{proof} The second part of the lemma follows directly from Lemma \ref{convapprox}, applied twice. In general, we know in Lemma \ref{convapprox} that $\lambda_{r,d}$ is real, so the positivity of $\mu_{r,d}$ follows from the fact that $\mu_{r,d}=\lambda_{r,d}^2$. \end{proof}
The next lemma is a simple consequence of Lemmas \ref{dirichlet} and \ref{convapprox2}. For the next few lemmas, all sums over $d$ will be from $d=1$ to $d=m$.
\begin{lemma}\label{decomposetrig} Let $s=\lceil N/m\rceil$, and for each $d$ with $0<d\leq m$ and for each $r\in\Z_N$ let $\a_{r,d}=f_{1,2s}(rd)$. Let $\omega_r$ be the function $x\mapsto\omega^{rx}$. Then there is a real number $\lambda_r\geq 1/32$ such that \begin{equation*} \lambda_r\omega_r\otimes\ol{\omega_r}=\sum_d\alpha_{r,d}(\omega_r*\pi_d*\pi_d)\otimes(\ol{\omega_r}*\pi_d*\pi_d). \end{equation*} \end{lemma}
\begin{proof} By Lemma \ref{dirichlet} there exists a positive $d\leq m$ such that $\|rd/N\|\leq m^{-1}$. It follows that $f_{1,2s}(rd)=\alpha_{r,d}\geq 1/2$.
For this $d$ we know that $|1-\omega^{rd}|\leq 2\pi/m$, so by Lemma \ref{convapprox2} we can conclude that \begin{equation*} \alpha_{r,d}(\omega_r*\pi_d*\pi_d)\otimes(\ol{\omega_r}*\pi_d*\pi_d) =\alpha_{r,d}\mu_{r,d}^2\omega_r\otimes\ol{\omega_r} \end{equation*} for some real constant $\mu_{r,d}\geq 1/4$. For all other $d$, we have the same conclusion, but this time all we can guarantee is that $\alpha_{r,d}\mu_{r,d}^2\geq 0$. Summing over all $d$ and setting $\lambda_r=\sum_d\alpha_{r,d}\mu_{r,d}^2$, we obtain the result stated. \end{proof}
We are now ready for the main estimate, after which the proof will be more or less finished.
\begin{lemma} \label{mainlemma} There exist coefficients $\lambda_r$, all at least 1/16, such that the matrix $\E_r\lambda_r\omega_r\otimes\ol{\omega_r}$ can be expressed as a linear combination \begin{equation*} \sum_{0<d\leq m}\sum_{x,y}\gamma_{d,x,y}P_{d,x}\otimes P_{d,y} \end{equation*} where each $P_{d,x}$ and $P_{d,y}$ is an arithmetic progression mod $N$ of diameter at most $m^2$, and $\sum_{d,x,y}|\gamma_{d,x,y}|\leq N/4m$. \end{lemma}
\begin{proof} Lemma \ref{decomposetrig} gives us the decomposition \begin{eqnarray*} \E_r\lambda_r\omega_r\otimes\ol{\omega_r}&=&\E_r\sum_d\alpha_{r,d}(\omega_r*\pi_d*\pi_d)\otimes(\ol{\omega_r}*\pi_d*\pi_d)\\ &=&\frac{N}{|P|^2}\sum_r\sum_d\alpha_{r,d}(\omega_r*\pi_d*P_d)\otimes(\ol{\omega_r}*\pi_d*P_d).\\ \end{eqnarray*} Let us write $Q_d$ for the function $\pi_d*P_d$. Now $\omega_r*Q_d=\E_x\omega^{rx}Q_{d,x}$, where $Q_{d,x}(y)=Q_d(y-x)=Q_d(x-y)$. So we can rewrite our decomposition as \begin{equation*} \frac{N}{|P|^2}\sum_r\sum_d\alpha_{r,d}\E_{x,y}\omega^{r(x-y)}Q_{d,x}\otimes Q_{d,y} =\frac{1}{N|P|^2}\sum_r\sum_d\alpha_{r,d}\sum_{x,y}\omega^{r(x-y)}Q_{d,x}\otimes Q_{d,y}. \end{equation*} The coefficient of $Q_{d,x}\otimes Q_{d,y}$ in this decomposition is \begin{equation*} \frac{1}{N|P|^2}\sum_r\alpha_{r,d}\omega^{r(x-y)}, \end{equation*} so the sum of the absolute values of these coefficients is \begin{eqnarray*} \frac{1}{N|P|^2}\sum_d\sum_{x,y}|\sum_r\alpha_{r,d}\omega^{r(x-y)}|&=& \frac{N}{|P|^2}\sum_d\E_{x,y}|\sum_r\alpha_{r,d}\omega^{r(x-y)}|\\ &=&\frac{N}{|P|^2}\sum_d\E_x|\sum_r\alpha_{r,d}\omega^{rx}|\\ &=&\frac{N}{|P|^2}\sum_d\sum_x|\E_r\alpha_{r,d}\omega^{rx}|\\ &=&\frac{N}{|P|^2}\sum_d\|\hat{\alpha}_d\|_1,\\ \end{eqnarray*} where we define $\alpha_d$ by the formula $\alpha_d(r)=\alpha_{r,d}$.
Now $\alpha_d(r)=f_{1,2s}(dr)=f_{d^{-1},2s}(r)$ for every $r$. Therefore, by Lemma \ref{ell1} we have that $\|\hat{\alpha}_d\|_1=1$ for every $d\leq m$. It follows that the sum of the absolute values of the coefficients is $Nm/|P|^2\leq N/4m$. \end{proof}
To complete the proof, we first observe that the matrix $\E_r\lambda_r\omega_r\otimes\ol{\omega_r}-I_N/16$ is positive semidefinite. That is for the simple reason that it is equal to $\E_r(\lambda_r-1/16)\omega_r\otimes\ol{\omega_r}$, and we have ensured that each $\lambda_r$ is at least $1/16$. Therefore, by Lemma \ref{mainlemma}, given any function $f:\Z_N\rightarrow\C$, we have some pair $(Q_{d,x},Q_{d,y})$ such that $|\sum_zQ_{d,x}(z)f(z)\sum_wQ_{d,y}(w)\ol{f(w)}|\geq m\|f\|_2^2/4$. (This number $m/4$ is the ratio of the trace $N/16$ of $I_N/16$ to the upper bound $N/4m$ for the sum of the coefficients in the decomposition.) It follows that we can find $x$ such that $|\sum_zQ_{d,x}(z)f(z)|\geq m^{1/2}\|f\|_2/2$. Since $Q_{d,x}$ is an average of characteristic functions of translates of the arithmetic progression $P_d$, it follows that we can find some $x$ such that $|\sum_{z\in P_d+x}f(z)|\geq m^{1/2}\|f\|_2/2$. And finally, since $P_d$ has diameter at most $N$ (assuming that $m\leq c\sqrt{N}$), it follows that we can find an arithmetic progression of length $m$ and common difference at most $m$ on which $f$ has discrepancy at least $m^{1/2}\|f\|_2/4$. In particular, if $f$ is $\pm 1$-valued, then we obtain a lower bound for the AP-discrepancy of $m^{1/2}/4$. Once again, this argument is valid when $m\leq c\sqrt{N}$, so we obtain Roth's bound of $cN^{1/4}$.
\end{document}