# Finding primes

This is the main wiki page for the Polymath4 project "Finding primes".

The main aim of the project is to resolve the following conjecture:

(Strong) conjecture. There exists deterministic algorithm which, when given an integer k, is guaranteed to find a prime of at least k digits in length of time polynomial in k. You may assume as many standard conjectures in number theory (e.g. the generalised Riemann hypothesis) as necessary, but avoid powerful conjectures in complexity theory (e.g. P=BPP) if possible.

Since primality is known to be in P by the AKS algorithm, we may assume a primality oracle that decides whether any given number is prime in unit time. Weaker versions of the strong conjecture are proposed below.

Contributions and corrections to this wiki page (and its subpages) are very welcome.

1. Proposal for the project and initial research thread Opened July 27. Inactive.
2. Research thread II Opened Aug 9. Inactive.
3. Research thread III Opened Aug 13. Inactive.
4. Research thread IV Opened Aug 28. Inactive.
5. Research thread V Opened Oct 27. Active.
6. Discussion thread I Opened July 28. Inactive.
7. Discussion thread II Opened Jun 29. Active.

## Easier versions of the problem

Semi-weak conjecture: There exists a probabilistic algorithm which, when given an integer k, has probability at least 1/2 (say) of finding a prime of at least k digits in length in time polynomial in k, using o(k) random bits. (Note that it is known that one can do this with O(k) random bits; and if one can use just $O(\log k)$ random bits, one is done by exhausting over the random bits.)

One way to solve the semi-weak conjecture would be to find an "explorable" set of size $\exp(o(k))$ of integers, a significant fraction (at least $k^{-O(1)}$) of which are primes at least k digits in length. By explorable, we mean that one has a quick way to produce a reasonably uniformly distributed element of that set using o(k) random bits (e.g. the set might be an interval, arithmetic progression, or other explicitly parameterisable set, or one might have an explicit expander graph on this set).

Weak conjecture: There exists a deterministic algorithm, which, when given an integer k, is guaranteed to find a prime of at least $\omega(\log k)$ digits in length in time polynomial in k, where $\omega(\log k)$ grows faster than $\log k$ as $k \to \infty$. (Equivalently, find a deterministic algorithm which, when given k, is guaranteed to find a prime of at least k digits in time $\exp(o(k))$. Note that the semi-weak conjecture implies the weak conjecture).
Strong conjecture with factoring: Same as the original conjecture, but assume that one has a factoring oracle will return the factors of any number given to it (as a stream of bits) in unit time. (This oracle can be given "for free" if factoring is in P, but we can also assume the existence of this oracle even if factoring is not in P.)
Semi-weak conjecture with factoring: Same as the semi-weak conjecture, but with a factoring oracle.
Weak conjecture with factoring: Same as the weak conjecture, but with a factoring oracle. (Note that there are deterministic sieve algorithms, such as the quadratic sieve, which are believed to run in subexponential time $\exp(o(k))$, so this oracle is in some sense available "for free" for this problem assuming this belief.)

The weak conjecture with factoring is the easiest of the six, and so perhaps one should focus on that one first.

It is also of interest to see whether the conjecture becomes easier to solve assuming that P=BPP or P=BQP.

## Partial results

1. P=NP implies a deterministic algorithm to find primes
2. An efficient algorithm exists if Cramer's conjecture holds
3. k-digit primes can be found with high probability using O(k) random bits
1. If pseudorandom number generators exist, the conjecture holds, as one can derandomise the above algorithm by substituting the pseudorandom number generator for the random string of bits.
4. The function field analogue (i.e. finding degree k irreducible polynomials over a finite field) is known [AL1986]
1. The analogue over the rationals is even easier: Eisenstein's criterion provides plenty of irreducible polynomials of degree k, e.g. $x^k-2$.
5. Assuming a suitably quantitative version of Schinzel's hypothesis H (namely, the Bateman-Horn conjecture), the weak conjecture is true, as one can simply search over all k-digit numbers of the form $n^d-2$ (say) for any fixed d to give an algorithm that works in time $O(10^{k/d})$. By randomly sampling such integers, one also establishes the semi-weak conjecture (assuming that one has the expected density for primes in hypothesis H).
6. It's easy to deterministically find large square-free numbers; just multiply lots of small primes together. However, it is not known how to test a number for square-freeness deterministically and quickly (unless one has a factoring oracle, in which case it is trivial).
1. However, it's not so obvious how to find large pairs $n,n+1$ of consecutive square-free numbers, though such pairs must exist by counting arguments (the density of square-free numbers is $6/\pi^2 \approx 60\%$). This could be a good variant problem.
1. It is conjectured that every element of Sylvester's sequence is square-free. If so, this solves the above problem.
7. If a fast algorithm to find primes exists, then an explicit fast algorithm to find primes exists: simply run all algorithms of size at most A simultaneously for time $k^A$, then increment A from 1 onward, until one gets a hit.

Here are the current "world records" for the fastest way to deterministically (and provably) obtain a k-digit prime (ignoring errors of $\exp(o(k))$ in the run time):

1. $O(10^k)$: The trivial algorithm of testing each of the k-digit numbers in turn until one gets a hit.
2. $O(10^k)^{3/4}$: Test the k-digit numbers of the form $a^2+b^4$ for primality (the Friedlander-Iwaniec theorem guarantees a hit for k large enough).
3. $O(10^k)^{2/3}$: Test the k-digit numbers of the form $a^3+2b^3$ for primality (a hit is guaranteed for large k by a result of Heath-Brown).
4. $O(10^k)^{2/3}$: Use the Meissel-Lehmer method for computing $\pi(x)$
5. $O(10^{k})^{6/11+\epsilon} (\approx O(10^{k})^{.5454})$assuming factoring oracle: Factor each number in the interval $[x^{12/11},x^{12/11}+x^{6/11+\epsilon}]$, one will find a prime factor of order x by [HB1995].
6. $O(10^k)^{0.525}$: Test the k-digit numbers from, say, $10^k$ onwards until one gets a hit (here we use the [BHP2001] bound on prime gaps).
7. $O(10^k)^{1/2}$ assuming RH: Test the k-digit numbers from, say, $10^k$ onwards until one gets a hit (here we use the RH bound on prime gaps).
8. $O(10^k)^{1/2}$ using Odlyzko's method.

Note that if one can get $O(10^k)^{\varepsilon}$ for each fixed $\varepsilon \gt 0$, then one has established the weak conjecture.

A slight variant of the problem: assuming a factoring oracle, given an algorithm that runs in k steps, how large a prime is the algorithm guaranteed to produce in the worst-case scenario? (Note that this is very different from what the algorithm is heuristically predicted to produce in the average-case scenario.)

Here is a simple algorithm that produces a prime larger than about $k \log k$ in k steps, based on Euclid's proof of the infinitude of primes:

• Initialise $p_1=2$.
• Once $p_1,\ldots,p_{k-1}$ are computed, define $p_k$ to be the largest prime factor of $p_1 \ldots p_{k-1}+1$.

This generates k distinct primes; the prime number theorem tells us that the k^th prime has size about $k \log k$, so at least one of the primes produced here has size at least $k \log k$.

If one instead works with Sylvester's sequence $a_k = a_1 \ldots a_{k-1}+1$, then it is known [O1985] that the number of primes less than n that can divide any one of the $a_k$ is $O( n / \log n \log \log \log n )$ rather than $O(n/\log n)$ (the prime number theorem bound). If we then factor the first k elements of this sequence, we must get a prime of size at least $k \log k \log \log \log k$ or so.

One can do better still by working with the Fermat numbers, $F_n = 2^{2^n}+1$. It is known [KLS2002] that the number of primes dividing any of the $F_n$ is $O(\sqrt{n}/\log n)$, so in particular if we factor the first k Fermat numbers, we get a prime almost of size $k^2$. In fact, it's a classical result of Euler that the prime divisors of $F_n$ are all at least $2^{n+2}+1$, so we can obtain a large prime simply by factoring a Fermat number.

Assuming RH, one can find a prime larger than $k^2$ in about k steps by the method indicated earlier (start at $k^2$ and increment until one hits a prime). Unconditionally, one gets a prime larger than $k^{1/0.525}$ by this method.

## Variants of the problem

1. Can one derandomise factoring of polynomials over finite fields? Note that if one could do this even in the quadratic case, this would allow for efficient location of quadratic non-residues, which is not known.

### Finding consecutive (or nearby) square-free integers

Toy problem Is there an efficient algorithm to find consecutive square-free integers n, n+1 of at least k digits in length assuming an oracle to determine whether a number is square free (a special case of the factoring oracle)? Or just square-free integers that are close together (e.g. polylog(n)).

Note that finding one large square-free number is easy: just multiply lots of distinct small primes together. Also, as the density of square-free numbers is $6/\pi^2 \approx 0.6$, a counting argument shows that pairs of consecutive square-free numbers exist in abundance, and are thus easy to find probabilistically. The density can be increased to arbitrarily close to 1 using the W-trick, so one could also ask for, say, long arithmetic progressions of square-free numbers also.

If Sylvester's sequence is square-free, then this solves the problem.

Testing a number for square-freeness is trivial with a factoring oracle. No unconditional polynomial-time deterministic algorithm for square-freeness seems to be known. (It is listed as an open problem in this paper from 1994.)

Unconditionally, there are asymptotics for the number of square-free numbers in [1,N] with an error term of $O(N^{1/2})$, which leads to a $O(10^k)^{1/2}$ algorithm (just search an interval of this length for square-freeness). This is the current unconditional world record. Assuming RH, this can be improved to $O(10^k)^{17/54+\varepsilon}$.

There is an asymptotic [HB1984] for the number of consecutive square free numbers with an error term of $O(N^{7/11})$, leading to a $O(10^k)^{7/11+\varepsilon}$, though this is inferior to the above bounds.

It is known [FT1992] that the interval $[N, N+N^{1/5+\epsilon}]$ contains at least one square-free number; Granville [G1998] improved this to $[N,N+N^{\varepsilon}]$ assuming the ABC conjecture. If the density in such intervals can be made to exceed 50% (perhaps after first applying the W-trick) then we would have solved the weak version of this toy problem assuming ABC.

### Finding pseudoprimes

Problem: How quickly can one find a solution to the congruence $2^{n-1}=1 \hbox{ mod } n$ with n at least k digits in length? (a pseudoprime)?

It is conjectured that the largest gap between k-digit pseudoprimes is poly(k) (this is weaker than Cramer's conjecture, and would imply a polynomial time solution to the above problem).

It is conjectured that the number $\mathcal{P}_a(x)$ of pseudoprimes to any base $a \neq \pm1$ satisfies

$\displaystyle \log\mathcal{P}_a(x) = \log(x) - \log(x)\frac{\log\log\log(x)}{\log\log(x)}(1+o(1)).$

However, the best known bounds seem to be

$\displaystyle \log\mathcal{P}_a(x) \leq \log(x) - \frac{1}{2}\log(x)\frac{\log\log\log(x)}{\log\log(x)}(1+o(1)),$

and

$\displaystyle \log\mathcal{P}_a(x) \geq (\frac{\log(x)}{\log a})^{E/(E+1)-o(1)},$

where E is a constant which is conjectured to equal 1, but is only known to have value at least 1-1/(2\sqrt{e}). These results can be found in [P1981], [P1982], [P1989]. Better bounds are known to hold on average [EP1986].

If n is a pseudoprime, then $2^n-1$ is also a pseudoprime (because $2^n = 1 \hbox{ mod } 2^n - 1$, and $2^n-2$ is a multiple of n). This, technically, solves the problem, but can we find denser sets of pseudoprimes? What if we want $2^{n-1}=1 \hbox{ mod } n$ and $3^{n-1} = 1 \hbox{ mod } n$ simultaneously?

### Finding Almost Primes

An almost primes is a number that is the product of at most two primes (counted with multiplicity).

problem How fast can we deterministically write down a $k$ digit almost prime?

Clearly this is a relaxation of the prime problem, so we expect to get better results. In [L1996] it is shown that the interval $[x,x+x^{.436}]$ contains an almost prime. This gives us a $O(10^{k})^{.436}$ algorithm. Assuming a factoring oracle, the problems of deterministically finding a $k$ digit almost prime and a $k$ digit prime in polynomial time are equivalent. If we can find an almost prime in polynomial time we can factor it and get (a slightly smaller) prime. Conversely, if we can find a prime in polynomial time we can square it to find (a slightly larger) almost prime. Using our prime technology this should give a $O(10^{k})^{.262}$ algorithm.

## Observations and strategies

1. For the conjecture with factoring, it would suffice to show that every polylog(N)-length interval in [N,2N] contains a number which is not $N^\varepsilon$-smooth (or $\log^{O(1)} N$ smooth, if one only wants to solve the weak conjecture with factoring).
1. One can combine this with the "W-trick". For instance, if one lets W be the product of all the primes less than k, and if one can show that not every element of the progression $\{ Wn+1: 1 \leq n \leq k^{100} \}$ (say) is $k^A$-smooth, then one has found a prime with at least $A \log k$ digits in polynomial time. If one can make A arbitrarily large, one has solved the baby problem with factoring.
2. One way to proceed is to analyse iterated sumsets of log-primes.
3. Another is to analyse signed sums of prime reciprocals.
4. Probably the best known bound along these lines is given in [HB1995] and states that the interval $[x,x+x^{.5+\epsilon}]$ contains an integer with a prime factor at least $x^{11/12 -\epsilon}$. This only leads to a $O(10^{k})^{.5455}$ algorithm.
2. Another approach is to accurately compute the prime counting function.
1. A variant of this strategy is the polynomial strategy.
3. If one can find explicitly enumerable sets of k-digit numbers of size $O(10^{\varepsilon k})$ for any fixed $\varepsilon \gt 0$ that are guaranteed to contain a prime, then one has solved the weak conjecture. If instead one can find randomly samplable sets of k-digit numbers of size $O(10^{\varepsilon k})$ that contain a large fraction (e.g. $\exp(-o(k))$ of primes), one has solved the semi-weak conjecture. If one can find explicit sets of poly(k) size that are guaranteed to contain a prime, one has solved the strong conjecture.
4. If a primality test can be expressed as a constant-degree polynomial over a finite field from the digits of the number-under-test to {0,1}, then known pseudorandom generators for polynomials can be used to derandomize finding primes the same way as under the full P=BPP assumption. So, instead of requiring a general PRG, the proposal is trying to put the primality test into a fully derandomizable subclass of P.
5. If one can find a sequence of easily constructible, mutually coprime integers whose prime factors are contained in a sparse subset of the primes, this generates (with the aid of a factoring oracle) a sparse set of primes. So far the sparsest set of this type has come from factoring the Fermat numbers.

## Negative results and other obstructions

1. Using sieve theory to find a non-smooth number in a short interval runs into the difficulty that most sieve theory methods are insensitive to the starting point of an interval. Given that every integer in [1,n] is n-smooth, this implies that sieve theory is unlikely to establish the existence of a non-n-smooth number in any interval of length n, and in particular to find a non-$O(k^O(1))$ smooth integer in any interval that can be searched in polynomial time. Indeed, there are heuristic arguments that suggest that even augmented with basic Fourier analysis and assuming RH, this method is unlikely to work, although they haven't yet been fully formalized.
2. Average-case results for the primes cannot be used directly, because one could delete all constructible integers from the set of primes (a negligible set) without significantly impacting the average case results, while causing the conjecture to fail for this modified set of primes.
1. In a similar spirit, one cannot hope to find non-smooth numbers in a short interval by analysing the iterated sumset of the log-primes and using average-case information about that set), because if one deletes all the primes arising from factors of that interval, one still captures most of the primes (and thus most of the average-case behaviour of the log-primes) while losing the information about that interval.
3. It is not possible to prove the conjecture purely by "soft" (i.e. relativisable) complexity theory methods which do not use any additional information on the primes rather than their density. Details are at Oracle counterexample to finding pseudoprimes.

## How could the conjecture be false?

A world in which the conjecture is negative would be very strange, especially with factoring or if the weak conjecture fails also. In particular, the following would have to be true:

1. All numbers with at least k digits that are "constructible" in the sense that they are the output of a Turing machine program of length $O(\log k)$ that runs in time $k^{O(1)}$, are composite. [If the conjecture fails with factoring, then these numbers are furthermore $\exp(\varepsilon k)$ smooth for any fixed $\varepsilon \gt 0$; if even the weak conjecture with factoring fails, they are $k^{O(1)}$-smooth.]
2. In fact, all constructible numbers are not only composite/smooth, but they sit inside intervals of length $k^{100}$, all of whose elements are composite/smooth. (One can replace 100 by any other fixed number.)
1. In particular, Cramer's conjecture fails; the gaps between adjacent k-digit primes exceeds $k^{100}$ around every constructible number.
3. No pseudorandom number generators exist. In particular, this implies that one-way functions do not exist.
4. $P \neq NP$. More specifically, the decision problem "Does there exist a prime in a given interval?" is in NP, but cannot lie in P if the conjecture fails.

## Relevant concepts

Complexity theory concepts:

1. Complexity classes
2. Pseudo-random generators (PRG)
3. Full derandomization
4. one-way function
5. Impagliazzo's five worlds: Algorithmica, Heuristica, Pessiland, Minicrypt, Cryptomania
6. Kolmogorov complexity
7. Oracles

Number theory concepts:

Other:

Note: articles with external links should eventually be replaced by in-house pages that can provide information that is more dedicated to this project.

## Paper

Here is the [arXiv copy] of the research paper, now submitted to Mathematics of Computation.

Here is the [Subversion repository] of the research paper.

Here are the Polymath4 grant acknowledgments.