Iterated sumsets of log-primes

Let K be the set of logarithms of all primes in some interval [u,u+v] (it seems reasonable to take v comparable to u). If one can show that some iterated sumset of K intersects the logarithm of an interval [n,n+a], then (with a factoring algorithm) one would be able to find a prime in [u,u+v] in an amount a of work. This becomes non-trivial as soon as a is significantly larger than u.

We want u to be somewhat smaller than n, say n^{1/3} or n^{1/log log n}

Note that the Fourier coefficients of K are related to the Riemann Zeta function, and so one expects good bounds on the Riemann hypothesis. One can hope that other tools from additive combinatorics are useful, possibly exploiting the concavity of the logarithm function.

A weaker version of this strategy replaces K by L, the set of logarithms of all _integers_ in the interval [u,u+v]. This gives a weaker result (finding an integer with a factor in [u,u+v]) but the Fourier-analytic behaviour of L is presumably better (one only needs the Lindelhof hypothesis rather than Riemann).

K. Ford has a big paper in Annals (available at http://front.math.ucdavis.edu/0401.5223 ) which gives information about the asymptotic behavior of the number of integers up to x with at least one divisor in an interval [y,z], more or less for arbitrary x, y, z. He does state a result (Th. 2, page 6 of the arXiv PDF) with n restricted to an interval [x-\Delta,x], but \Delta must be quite large (something like n/\sqrt{\log n} in the setting of the previous comment); he does say that the range of \Delta can be improved, but to get a power of log seems hard. However, the techniques he develops could be useful.

An objection

There are only a integers in [n,n+a]. Each integer of size n has at most log n prime factors (log n/log log n, in fact), so there are only a log n primes that contribute to [n,n+a]. If one deletes these primes from K then the sumset of K will now completely miss the log of [n,n+a]. Thus we will not be able to proceed just by using average case information about K (cf. the generic prime discussion, or the oracle counterexample to finding pseudoprimes.

Analysis of K+K+K

Here we try a model in which u ~ S and n ~ S^3 for some S.

Let K be the logarithms of the primes between S and 2S. Thus this set consists of about S/\log S numbers in the interval [\log S, \log S+\log 2]. It's quite uniformly distributed in a Fourier sense (especially if one assumes the Riemann hypothesis).

Experience has shown that double sumsets K+K tend to be well behaved almost everywhere, but triple sumsets K+K+K and higher are well behaved everywhere. (Thus, for instance, the odd Goldbach conjecture is solved for all large odd numbers, but the even Goldbach conjecture is only known for almost all large even numbers, even assuming GRH.) So it seems reasonable to look at the triple sumset K+K+K, which is lying in the interval [3 \log S, 3 \log S + 3 \log 2].

Suppose we are looking to find a non-S-smooth number in time O( S^{0.99} ) (say). It would suffice to show that the interval [T, T + S^{-2.01}] contains an element of K+K+K for some fixed T in [3 \log S, 3 \log S + 3 \log 2], e.g. T = 3 \log S + \frac{1}{2} 3 \log 2.

On the one hand, this is quite a narrow interval to hit. On the other hand, K+K+K has about S^3/\log^3 S triples in it, so probabilistically one has quite a good chance of catching something. But, as always, the difficulty is to get a deterministic result which works even in the worst case scenario.

Hmm, the tininess of the interval [T,T+S^{-2.01}] is quite discouraging. Even if one considers the larger set K + L, where L is the log-integers (and which are very highly uniformly distributed), one can still miss this interval entirely. Undoing the logarithm, the point here is that an interval of the form [N, N+S^{0.99}] could manage, by a perverse conspiracy, to miss all multiples of p for every prime p between S and 2S.

Here is a back-of-the-envelope Fourier calculation which looks a bit discouraging. Suppose one wants to show that the interval [2N^3, 2N^3+N^{0.99}] contains an element of the triple product set [N,2N] \cdot [N,2N] \cdot [N,2N]. If we let \mu be counting measure on the log-integers \{ \log n: N \leq n \leq 2N \}, we are asking that \mu*\mu*\mu gives a non-zero weight to the the interval [\log 2N^3, \log 2N^3 + O( N^{-2.01} ) ].

We express this Fourier-analytically, basically as a Fourier integral of \hat \mu(\xi)^3 over an interval \xi = O(N^{2.01}), multiplied by the normalising factor of N^{-2.01}.

The main term will be coming from the low frequencies \xi=O(1), where \hat \mu(\xi) is about N; this gives the main term of about N^{0.99}, which is what one expects.

What about the error terms? Well, the Dirac spikes of \mu are distance about 1/N apart. For $N \ll |\xi| \ll N^{2.01}$, there’s no particular reason for any coherence in the Fourier sum in \hat \mu(\xi), and so I would expect the sum to behave randomly, i.e. \hat \mu(\xi) = O(\sqrt{N}) in this region. (In fact, RH basically would give this to us). This leads one to an error term of O( \sqrt{N}^3 \times N^{2.01} \times N^{-2.01} ) = O(N^{1.5} ), which is too large compared to the main term.

The situation does not seem to improve with various tweaking of parameters, though maybe I’m missing something.

Analysis of L+L

I can get a non-trivial result on L+L using the Weyl bound for the Gauss circle problem, or more precisely the variant of this circle problem for the hyperbola (essentially the Dirichlet divisor problem).

More precisely, let’s look at the product set [S,2S] \cdot [S,2S] \subset [S^2, 4S^2] in the middle of the interval [S^2,4S^2], say near 2S^2 (this is like considering L+L where L are the log-integers restricted to [\log S, \log S + \log 2]). It’s trivial that any interval of length S near 2S^2 will meet [S,2S] \cdot [S,2S]. I claim though that the same is true for intervals of size about S^{2/3}. The reason is that the number of products of the form [S,2S] \cdot [S,2S] less than a given number x is basically the number of lattice points in the square [S,2S] \times [S,2S] intersect the hyperbolic region \{ (a,b): ab < x \}. The Weyl technology for the Gauss circle problem (Poisson summation, etc.) gives an asymptotic for this with an error of O(S^{2/3}), which implies in particular that this count must increase whenever x increases by more than O(S^{2/3}). So every interval of this length must contain at least one number which factors as a product of two numbers in [S,2S].

Presumably some of the various small improvements to the Weyl bound for the circle problem over the years can be transferred to the hyperbola, allowing one to reduce the 2/3 exponent slightly.

Unfortunately the asymptotics become much much worse if we restrict the numbers in [S,2S] to be prime, so I doubt this gives anything particularly non-trivial for the original primality problem. Also the error term in these lattice point problems is never going to be better than S^{1/2}, so we once again butt our heads against this S^{1/2} barrier.