# Prime counting function

One way to find primes is to find a polynomial time algorithm to compute [math]\pi(x)[/math], the number of primes less than x, to reasonable accuracy. For example, if we can find [math]\pi(x)[/math] to better than [math]10^{k/2}[/math] accuracy for k-digit x, we can break the square root barrier. We don't necessarily have to do this for all x; just having a targeted x for which we can show [math]\pi(x+y)-\pi(x) \gt 0[/math] for some [math]x \sim 10^k[/math] and [math]y=o(10^{k/2})[/math] would suffice.

Now, perhaps instead of trying to prove that intervals like [math][x, x + (\log x)^A][/math] contain primes unconditionally, we should first try to be much less ambitious and aim to show that *some* interval [math][y, y + \sqrt{x}][/math] with [math]y \in [x,2x][/math], contains a prime number that we can discover computationally.

How? Well, let’s start by assuming that we can computationally locate all the [math]O( \sqrt{x} \log x)[/math] zeros in the critical strip up to height [math]\sqrt{x}[/math]. Then what we can do is some kind of “binary search” to locate an interval [math][y, y + \sqrt{x}][/math] containing loads of primes: say at the [math]i[/math]th step in the iteration we have that some interval [math][u,v][/math] has loads of primes. Then, using the explicit formula for [math]\psi(z) = \sum_{n \leq z} \Lambda_0(n) = z - \sum_{\rho : \zeta(\rho)=0} z^\rho/\rho[/math] (actually, the usual quantitative form of the identity), we can decide which of the two intervals [math][u, (u+v)/2][/math] or [math][(u+v)/2, v][/math] contains loads of primes (maybe both do — if so, pick either one for the next step in the iteration). We keep iterating like this until we have an interval of width around [math]x^{1/2 + \epsilon}[/math] or so that we know contains primes.

Ok, but how do you locate the zeros of the zeta function up to height [math]\sqrt{x}[/math]? Well, I’m not sure, but maybe one can try something like the following: if we can understand the value of [math]\zeta(s)[/math] for enough well-spaced, but close together, points up the half-line, then by taking local interpolations with these points, we can locate the zeros to good precision. And then to evaluate [math]\zeta(s)[/math] at these well-spaced points, maybe we can use a Dirichlet polynomial approximations, and then perhaps apply some variant of Fast Fourier Transforms (if this is even possible with Dirichlet polynomials, which are not really polynomials) to evaluate them at lots of values [math]s = 1/2 + \delta[/math] quickly — perhaps FFTs can speed things up enough so that the whole process doesn’t take more than, say, [math]10^{k/2}[/math] polylog(k) bit operations. Keep in mind also that our Dirichlet polynomial approximation only needs to hold “on average” once we are sufficiently high up the half-line, so it seems quite plausible that this could work. Note that for [math]s[/math] near to 1/2 we would need to be more careful, and get the sharpest approximation we can, because those terms contribute more in the explicit formula.

## Computing the parity of [math]\pi(x)[/math]

Interestingly, there is an elementary way to compute the parity of [math]\pi(x)[/math] in [math]x^{1/2+o(1)}[/math] time. The observation is that for square-free n, the divisor function [math]\tau(n)[/math] (the number of divisors of n) is equal to 2 mod 4 if n is prime, and is divisible by 4 otherwise. This gives the identity

- [math] 2 \pi(x) = \sum_{n\ltx} \tau(n) \mu(n)^2 \hbox{ mod } 4.[/math]

Thus, to compute the parity of [math]\pi(x)[/math], it suffices to compute [math]\sum_{n\ltx} \tau(n) \mu(n)^2[/math].

But by Mobius inversion, one can express [math] \tau(n) \mu(n)^2 = \sum_{d^2|n} \mu(d) \tau(n)[/math] and so

- [math]\sum_{n\ltx} \tau(n) \mu(n)^2 = \sum_{d \lt x^{1/2}} \mu(d) \sum_{n\ltx: d^2 | n} \tau(n).[/math]

Since one can compute all the [math]\mu(d)[/math] for [math]d \lt x^{1/2}[/math] in [math]x^{1/2+o(1)}[/math] time, it would suffice to compute [math]\sum_{n\ltx: d^2 | n} \tau(n)[/math] in [math](x/d^2)^{1/2} x^{o(1)}[/math] time for each d. One can use the multiplicativity properties of [math]\tau[/math] to decompose this sum as a combination of [math]x^{o(1)}[/math] sums of the form [math] \sum_{n\lty} \tau(n)[/math] for various [math]y \leq x/d^2[/math], so it suffices to show that [math] \sum_{n\lty} \tau(n)= \sum_{a,b: ab \lt y} 1[/math] can be computed in [math]y^{1/2+o(1)}[/math] time. But this can be done by the Gauss hyperbola method, indeed

- [math] \sum_{a,b: ab \lt y} 1 = 2 \sum_{a \lt \sqrt{y}} \lfloor \frac{y}{a} \rfloor - \lfloor \sqrt{y} \rfloor^2.[/math]

The same method lets us compute [math]\pi(x)[/math] mod 3 efficiently provided one can compute [math] \sum_{n\ltx} \tau_2(n) = \sum_{a,b,c: abc \lt x} 1[/math] efficiently. Unfortunately, so far the best algorithm for this takes time [math]x^{2/3+o(1)}[/math]. If one can compute [math]\pi(x)[/math] mod q for every modulus q up to O(\log x), one can compute [math]\pi(x)[/math] by the Chinese remainder theorem.

Related to this approach, there is a nice identity of Linnik. Let [math]\Lambda(n)[/math] be the van Mangoldt function and [math]t_{j}(n)[/math] the number of representations of n as ordered products of integers greater than 1, then [math]\Lambda(n) = \ln(n) \sum_{j=1}^{\infty} \frac{(-1)^{j-1}}{j} t_{j}(n)[/math]. The sum is rather short since [math]t_{j}(n)=0[/math] for j larger than about \ln(n). Note that the function [math]t_{j}(n)[/math] is related to [math]\tau_{k}(n)[/math] by the relation [math]t_j(n) = \sum_{k=0}^{j}(-1)^{j-k} {j \choose k} \tau_{k}(n)[/math]. Again, [math]t_{2}(n)[/math] is computable in n^{1/2} steps, however [math]t_{j}(n)[/math], for larger j, appears more complicated. Curiously this is a fundamental ingredient in the Friedlander and Iwaniec work.

## Breaking the square root barrier

It is known that breaking the square root barrier for [math]\sum_{n \leq x} \tau(n)[/math] breaks the square root barrier for the parity of [math]\pi(x)[/math] also: specifically, if the former can be computed in time [math]x^{1/2-\epsilon+o(1)}[/math] for some [math]\epsilon \lt 1/4[/math], then the latter can be computed in time [math]x^{1/4+1/(4+16\epsilon/3) + o(1)}[/math]. Details are here.

Using Farey sequences, one can compute the former sum in time [math]x^{1/3+o(1)}[/math] (and hence the latter sum in time [math]x^{5/11+o(1)}[/math]:

The argument is similar to elementary proofs (such as the one waved at in the exercises to Chapter 3 of Vinogradov's Elements of Number Theory) of the fact that the number of points under the hyperbola equals [math](main term) + O(N^{1/3} (\log N)^2)[/math].

What we must do is compute [math]\sum_{n \lt= N^{ 1/2 } } \lfloor N/n \rfloor[/math] in time [math]O(N^{1/3} (\log N)^3)[/math].

**Lemma 1**Let [math] x[/math] be about [math]N^{\theta}[/math]. Assume that [math]N/x^2 = a/q + \eta/q^2, \hbox{gcd}(a,q)=1, \eta\lt=1/\sqrt{5}[/math]. Assume furthermore that

[math]q\lt=Q[/math], where [math]Q=N^{\theta-1/3}/10[/math]. Then the sum [math]\sum_{x\lt=n\ltx+q} \{N/n\}[/math] can be computed in time [math]O(\log x)[/math] with an error term [math]\lt 1/2[/math].

**Proof**

We can write

- [math]N/n = N/x - N/x^2 t + \eta_2 t^2 / N^{3\theta-1} = N/x - (a/q + \eta/q^2) t + \eta_2 t^2 / N^{3\theta-1}[/math],

where [math]n = x + t, 0\lt=t\ltq[/math] an integer, [math]|\eta|\lt=1/\sqrt{5}[/math] and [math]|\eta_2|\lt=1fs[/math] independent of n. Since [math]q\lt=Q[/math] and [math]Q=N^{\theta-1/3}/10[/math], we have [math]\eta t^2 / N^{3\theta-1} \lt 1/1000q[/math]. We also have [math]\eta/q^2 t \lt= 1/\sqrt{5} q[/math]. Thus, [math]\eta t^2 / N^{3\theta-1} + \eta/q^2 t \lt 1/2 q[/math]. It follows that

- [math]|\{N/n\} - \{N/x - at/q\}| \lt 1/(2 q)[/math]

except when [math]\{N/x-at/q\} 1 - 1/(2 q)[/math]. That exception can happen for only one value of [math]t=0 \ldots q-1[/math] (namely, when [math]at[/math] is congruent mod q to the integer closest to [math]\{N/x\} q[/math]) and we can easily find that [math]t[/math] (and isolate it and compute its term exactly) in time [math]O(\log n)[/math] by taking the inverse of [math]a \mod q[/math].

Thus, we get the sum [math]\sum_{x\lt=n\ltx+q} \{N/n\}[/math] in time [math]O(\log n)[/math] with an error term less than [math](1/2 q) * q = 1/2[/math] once we know the sum

- [math]\sum_{0\lt=t\ltq} \{N/x - at/q\}[/math]

exactly. But this sum is equal to [math]\sum_{0\lt=r\ltq} \{r/q + \epsilon/q\}[/math], where [math]\epsilon := \{qN/x\}[/math], and that sum is simply [math](q-1)/2 + \epsilon[/math]. Thus, we have computed the sum [math]\sum_{x\lt=n\ltx+q} \{N/n\}[/math] in time [math]O(\log n)[/math]. QED

Now we show why the lemma is enough for attaining our goal (namely, computing [math]\sum_{n\leq \sqrt{N}} \lbrack N/n\rbrack[/math] with **no** error term). We know that

- [math]\sum_{x\lt=n\lt=x+q} \lbrack N/n \rbrack = \sum_{x\lt=n\ltx+q} N/n - \sum_{x\lt=n\ltx+q} \{N/n\} = N*(\log(x+q)-\log(x)) - \sum_{x\lt=n\ltx+q} \{N/n\}.[/math]

We also know that [math]\sum_{x\lt=n\lt=x+q} \lbrack N/n \rbrack[/math] is an integer. Thus, it is enough to compute [math]\sum_{x\lt=n\ltx+q} \{N/n\}[/math] with an error term <1/2 in order to compute [math]\sum_{x\lt=n\ltx+q} \lbrack N/n\rbrack[/math] exactly.

We now partition the range [math]n=\{N^\theta,\ldots,2 N^\theta\}, 1/3\lt=\theta\lt=1/2[/math], into intervals of the form [math]x\lt=n\ltx+q[/math], where q is the denominator of a good approximation to [math]N/x^2[/math], that is to say, an approximation of the form [math]a/q, \hbox{gcd}(a,q)=1, q\lt=Q[/math] with an error term [math]\lt= 1/\sqrt{5} q^2[/math]. Such good approximations are provided to us by Hurwitz's approximation theorem. Moreover, it shouldn't be hard to show that, as x varies, the q's will be fairly evenly distributed in [1,Q]. (Since Hurwitz's approximation is either one of the ends of the interval containing [math]N/x^2[/math] in the Farey series with upper bound Q/2 or the new Farey fraction produced within that interval, it is enough to show that Dirichlet's more familiar approximations have fairly evenly distributed denominators.) This means that 1/q should be about [math](\log Q)/Q[/math] in the average.

Thus, the number of intervals of the form [math]x\lt=n\ltx+q[/math] into which [math]\{N^\theta,\ldots ,2 N^\theta\}[/math] has been partitioned should be about [math](\log Q) N / Q[/math]. Since the contribution of each interval to the sum [math]\sum_{N^\theta\lt=n\lt=2N^\theta} \lfloor N/n\rfloor[/math] can (by Lemma 1 and the paragraph after its proof) be computed exactly in time [math]O(\log x)[/math], we can compute the entire sum in [math]\sum_{N^\theta n\lt= 2 N^\theta} \lfloor N/n\rfloor[/math] in time [math]O((\log x) (\log Q) N^\theta/Q) = O((\log N)^2 N^{1/3})[/math].

(There are bits of the sum (at the end and the beginning) that belong to two truncated intervals, but those can be computed in time [math]O(Q) \ll O(N^{1/6})[/math].)

We partition [math]\{1,2,\ldots ,\sqrt{N}\}[/math] into [math]O(\log N)[/math] intervals of the form [math]\{N^\theta,\ldots ,2 N^\theta\}[/math], and obtain a total running time of [math]O((\log N)^3 N^{1/3})[/math], as claimed.