Find a good configuration of HAPs: Difference between revisions
No edit summary |
No edit summary |
||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[The Erdős discrepancy problem|Click here to return to the main Polymath5 page]]. | |||
==Introduction== | |||
One of the difficulties of thinking about the Erdős discrepancy problem is that one can get hung up on HAPs with small common differences, when ultimately what should matter is the more typical HAPs, which have large common differences. This is less of a problem when one is looking at multiplicative sequences, since then all one cares about is partial sums. However, the strategy to be described on this page concerns a direct attack on the general problem. | One of the difficulties of thinking about the Erdős discrepancy problem is that one can get hung up on HAPs with small common differences, when ultimately what should matter is the more typical HAPs, which have large common differences. This is less of a problem when one is looking at multiplicative sequences, since then all one cares about is partial sums. However, the strategy to be described on this page concerns a direct attack on the general problem. | ||
Line 8: | Line 10: | ||
This point will be much clearer if we discuss an actual example of a subhypergraph that might do the trick. (I shall follow the usual practice, when talking informally about asymptotic arguments, of talking about "a subhypergraph" when strictly speaking what I mean is a sequence of larger and larger subhypergraphs. I shall picture a single subhypergraph that is very large. | This point will be much clearer if we discuss an actual example of a subhypergraph that might do the trick. (I shall follow the usual practice, when talking informally about asymptotic arguments, of talking about "a subhypergraph" when strictly speaking what I mean is a sequence of larger and larger subhypergraphs. I shall picture a single subhypergraph that is very large. | ||
==A collection of HAPs that might be useful== | |||
Let us pick three very large integers L, M and N, with N much larger than M and M much larger than L. Let us then take the set of all HAPs of length L that live inside the interval [N,N+M] and have common difference d such that dL is at most o(M). (This o(M) is shorthand for some convenient function we choose later.) I should make clear that what I am calling a HAP here is an arithmetic progression of the form (x,x+d,...,x+(L-1)d) such that d is a factor of x (so that if you continued the progression backwards it would include zero). | |||
One might hope that if L, M and N are sufficiently large, then a general principle along the following lines should hold: almost all numbers behave like average numbers. An example of where this principle <em>does</em> hold is in the number of prime factors. If n is a random number close to N, then not only does n have, on average, roughly log log N prime factors, but the Erdős-Kac theorem tells us that the number of prime factors is approximately normally distributed with mean log log N and standard deviation <math>\sqrt{\log \log N}</math>. Unfortunately, this suggests that the number of factors (as opposed to prime factors) is not concentrated. This could be a problem. | |||
==A toy model of what is going on== | |||
Let f be a function from [N,N+M] to {-1,1}. We would like to obtain a lower bound for the sum | |||
<math>\sum_{x=N}^{N+M}\sum_{d|x}|f(x)+f(x+d)+\dots+f(x+(L-1)d)|^2.</math> | |||
Here, the second sum should be taken to be over all d such that x+Ld also lives in the interval [N,N+M]. | |||
Now let us take the attitude that for each d the probability that d is a factor of x is <math>d^{-1}</math>. (That is because we are imagining that x mod d is uniformly distributed.) According to this very simple model (which ignores all sorts of dependences), we ought to be able to get a good iea of the above sum if we replace it by the sum | |||
<math>\sum_{x=N}^{N+M}\sum_dd^{-1}|f(x)+f(x+d)+\dots+f(x+(L-1)d)|^2.</math> | |||
Expanding out the square and interchanging the order of summation, we get | |||
<math>\sum_{1\leq i,j\leq L}\sum_{x=N}^{N+M}\sum_d d^{-1}f(x+id)f(x+jd).</math> | |||
The i=j contribution to this quantity is <math>L\sum_{x=N}^{N+M}\sum_dd^{-1}.</math> For a typical x, we sum over d in an interval [-A,B] with both A and B at least <math>M^{1-\epsilon},</math> so this sum works out at about <math>LM\log M.</math> | |||
If <math>i\ne j,</math> then the pair <math>(x+id,x+jd)</math> can be made to equal some pair <math>(u,v)</math> (by an appropriate choice of x and d) only if j-i is a factor of v-u, and that's sort of an if as well, give or take a few edge effects. The probability of this is <math>|j-i|^{-1}.</math> Also, <math>d=(j-i)^{-1}(v-u),</math> so we can rewrite the inner sum in this case as <math>\sum_{u,v}|v-u|^{-1}f(u)f(v).</math> (This is not supposed to be a rigorous argument.) | |||
What I now hope is that if the partial sums of f remain bounded, then a sum like <math>\sum_{u,v}|v-u|^{-1}f(u)f(v)</math> has to be small, so that in total we find that | |||
<math>\sum_{x=N}^{N+M}\sum_dd^{-1}|f(x)+f(x+d)+\dots+f(x+(L-1)d)|^2\geq LM\log M/2,</math> | |||
from which we can conclude that there exist x and d such that <math>|f(x)+f(x+d)+\dots+f(x+(L-1)d)|^2\geq L,</math> giving us the unboundedness we were looking for. | |||
For this to work we would need to show that it really is the case that the off-diagonal contribution to the sum is small, and, probably more challengingly (unless the first step is plain wrong), that the probabilistic assumptions we made can be justified. A justification would consist in showing that the actual weights (which are 1 if d divides x and 0 otherwise) are sufficiently quasirandom that they approximate, in an appropriate sense, the idealized weights (which are <math>d^{-1}</math>). And we would need to justify the second probabilistic approximation as well. | |||
==Why the above proof idea does not work== | |||
As with many approaches to the Erdős discrepancy problem, this one runs into problems when one looks at what it would say about character-like functions. One of the remarkable features of functions such as <math>\lambda_3</math> (the completely multiplicative function that takes 3 to 1 and m to 1 if m is congruent to 1 mod 3 and -1 if m is congruent to 2 mod 3) is that they have very small drift: for any m<n one has the inequality <math>|\sum_{x=m+1}^n\lambda_3(x)|\leq\log_3(n-m)+1.</math> This implies that they have very small drift along any segment of a homogeneous arithmetic progression (since <math>\lambda_3</math> is completely multiplicative). But the argument above, if it worked, would be giving segments of HAPs of length L along which the drift was <math>\sqrt{L}.</math> | |||
So what, one might ask, is wrong with the probabilistic model? To give a complete answer to this it is necessary to say a bit more about quasirandomness. If G is a graph with vertex set X, then one of the properties we most like it to have, if we want it to imitate a random graph of about the same density, is that if A and B are large subsets of X, then the number of edges of G that go from a point in A to a point in B is roughly the same as it would be for the random graph. More generally, we would say that two graphs G and H are "close" if the number of edges from any large set to any other is roughly the same for G as it is for H. And we can use a very similar definition for weighted graphs, where we replace "number of edges" by "sum of the weights of the edges". We had a graph derived from HAPs and were hoping that it would imitate the weighted graph where the edge from x to y has weight <math>|x-y|^{-1}.</math> (That was not made explicit, but it is pretty much what we were doing.) | |||
Now these two graphs do <em>not</em> in fact imitate each other. For example, a notable and highly relevant difference between the two is that if A is any residue class modulo some prime p and is not the set of multiples of p, then A is an independent set in the HAP graph. (Just to be clear, I join x to y in the HAP graph if y-x divides x and y, which is the same as saying that x and y are consecutive elements of some HAP.) By contrast, in the weighted graph, all residue classes mod p look the same. | |||
==What one might be able to salvage from the wreckage== | |||
What we have here is a certain class of sets -- residue classes modulo small primes -- that do <em>not</em> have approximately the number of edges inside them that would be predicted by the naive probabilistic model above. However, these examples have a very particular structure, so they should spoil the proof only for a very special class of functions. The functions that cause the difficulty are likely to be ones that have significant correlation with residue classes, something that is tailor made to be detected by Fourier analysis. | |||
This strongly suggests that we might be able to prove a result that applies to all functions f that satisfy some Fourier condition. The condition should tell us that f does not correlate with residue classes modulo small primes, so it should tell us something like that <math>\hat{f}(\alpha)</math> is small whenever α is close to a rational with small denominator. Intuitively, these are the functions that "can tell the difference between HAPs and ordinary APs". | |||
So it looks as though there is something to do. I don't know exactly what it is, but I think it should be something like looking at the HAP matrix minus the weight matrix and showing that all its eigenvectors with large eigenvalues are functions with Fourier transforms that are large near some rational with small denominator. With a lemma like that, one might be able to do something. |
Latest revision as of 07:45, 7 February 2010
Click here to return to the main Polymath5 page.
Introduction
One of the difficulties of thinking about the Erdős discrepancy problem is that one can get hung up on HAPs with small common differences, when ultimately what should matter is the more typical HAPs, which have large common differences. This is less of a problem when one is looking at multiplicative sequences, since then all one cares about is partial sums. However, the strategy to be described on this page concerns a direct attack on the general problem.
The rough idea is this. Recall that a hypergraph is a collection of subsets of some set X. We borrow graph terminology and call the elements of X verticesand the sets in H edges (or sometimes hyperedges if we want to make it clear that our graph is hyper). If H is a hypergraph with vertex set X and f is a function from X to {-1,1}, then the discrepancy of f is the maximum of [math]\displaystyle{ |\sum_{x\in A}f(x)| }[/math] over all edges A of H. The discrepancy of H is the minimum discrepancy of any function [math]\displaystyle{ f:X\to\{-1,1\} }[/math].
The Erdős discrepancy problem is asking us to prove that the discrepancy of the hypergraph whose vertex set is [math]\displaystyle{ \mathbb{N} }[/math] and whose edges are all finite HAPs is infinite. One obvious strategy for doing this is to identify within this hypergraph a collection of hypergraphs [math]\displaystyle{ H_n }[/math] that have properties that allow us to prove arbitrarily good lower bounds for the discrepancy. What makes this a strategy, rather than a trivial reformulation of the problem (after all, one might suggest that there is no point in taking all HAPs) is that one would be aiming to find subhypergraphs of the HAP hypergraph that lent themselves particularly well to certain kinds of combinatorial arguments.
This point will be much clearer if we discuss an actual example of a subhypergraph that might do the trick. (I shall follow the usual practice, when talking informally about asymptotic arguments, of talking about "a subhypergraph" when strictly speaking what I mean is a sequence of larger and larger subhypergraphs. I shall picture a single subhypergraph that is very large.
A collection of HAPs that might be useful
Let us pick three very large integers L, M and N, with N much larger than M and M much larger than L. Let us then take the set of all HAPs of length L that live inside the interval [N,N+M] and have common difference d such that dL is at most o(M). (This o(M) is shorthand for some convenient function we choose later.) I should make clear that what I am calling a HAP here is an arithmetic progression of the form (x,x+d,...,x+(L-1)d) such that d is a factor of x (so that if you continued the progression backwards it would include zero).
One might hope that if L, M and N are sufficiently large, then a general principle along the following lines should hold: almost all numbers behave like average numbers. An example of where this principle does hold is in the number of prime factors. If n is a random number close to N, then not only does n have, on average, roughly log log N prime factors, but the Erdős-Kac theorem tells us that the number of prime factors is approximately normally distributed with mean log log N and standard deviation [math]\displaystyle{ \sqrt{\log \log N} }[/math]. Unfortunately, this suggests that the number of factors (as opposed to prime factors) is not concentrated. This could be a problem.
A toy model of what is going on
Let f be a function from [N,N+M] to {-1,1}. We would like to obtain a lower bound for the sum
[math]\displaystyle{ \sum_{x=N}^{N+M}\sum_{d|x}|f(x)+f(x+d)+\dots+f(x+(L-1)d)|^2. }[/math]
Here, the second sum should be taken to be over all d such that x+Ld also lives in the interval [N,N+M].
Now let us take the attitude that for each d the probability that d is a factor of x is [math]\displaystyle{ d^{-1} }[/math]. (That is because we are imagining that x mod d is uniformly distributed.) According to this very simple model (which ignores all sorts of dependences), we ought to be able to get a good iea of the above sum if we replace it by the sum
[math]\displaystyle{ \sum_{x=N}^{N+M}\sum_dd^{-1}|f(x)+f(x+d)+\dots+f(x+(L-1)d)|^2. }[/math]
Expanding out the square and interchanging the order of summation, we get
[math]\displaystyle{ \sum_{1\leq i,j\leq L}\sum_{x=N}^{N+M}\sum_d d^{-1}f(x+id)f(x+jd). }[/math]
The i=j contribution to this quantity is [math]\displaystyle{ L\sum_{x=N}^{N+M}\sum_dd^{-1}. }[/math] For a typical x, we sum over d in an interval [-A,B] with both A and B at least [math]\displaystyle{ M^{1-\epsilon}, }[/math] so this sum works out at about [math]\displaystyle{ LM\log M. }[/math]
If [math]\displaystyle{ i\ne j, }[/math] then the pair [math]\displaystyle{ (x+id,x+jd) }[/math] can be made to equal some pair [math]\displaystyle{ (u,v) }[/math] (by an appropriate choice of x and d) only if j-i is a factor of v-u, and that's sort of an if as well, give or take a few edge effects. The probability of this is [math]\displaystyle{ |j-i|^{-1}. }[/math] Also, [math]\displaystyle{ d=(j-i)^{-1}(v-u), }[/math] so we can rewrite the inner sum in this case as [math]\displaystyle{ \sum_{u,v}|v-u|^{-1}f(u)f(v). }[/math] (This is not supposed to be a rigorous argument.)
What I now hope is that if the partial sums of f remain bounded, then a sum like [math]\displaystyle{ \sum_{u,v}|v-u|^{-1}f(u)f(v) }[/math] has to be small, so that in total we find that
[math]\displaystyle{ \sum_{x=N}^{N+M}\sum_dd^{-1}|f(x)+f(x+d)+\dots+f(x+(L-1)d)|^2\geq LM\log M/2, }[/math]
from which we can conclude that there exist x and d such that [math]\displaystyle{ |f(x)+f(x+d)+\dots+f(x+(L-1)d)|^2\geq L, }[/math] giving us the unboundedness we were looking for.
For this to work we would need to show that it really is the case that the off-diagonal contribution to the sum is small, and, probably more challengingly (unless the first step is plain wrong), that the probabilistic assumptions we made can be justified. A justification would consist in showing that the actual weights (which are 1 if d divides x and 0 otherwise) are sufficiently quasirandom that they approximate, in an appropriate sense, the idealized weights (which are [math]\displaystyle{ d^{-1} }[/math]). And we would need to justify the second probabilistic approximation as well.
Why the above proof idea does not work
As with many approaches to the Erdős discrepancy problem, this one runs into problems when one looks at what it would say about character-like functions. One of the remarkable features of functions such as [math]\displaystyle{ \lambda_3 }[/math] (the completely multiplicative function that takes 3 to 1 and m to 1 if m is congruent to 1 mod 3 and -1 if m is congruent to 2 mod 3) is that they have very small drift: for any m<n one has the inequality [math]\displaystyle{ |\sum_{x=m+1}^n\lambda_3(x)|\leq\log_3(n-m)+1. }[/math] This implies that they have very small drift along any segment of a homogeneous arithmetic progression (since [math]\displaystyle{ \lambda_3 }[/math] is completely multiplicative). But the argument above, if it worked, would be giving segments of HAPs of length L along which the drift was [math]\displaystyle{ \sqrt{L}. }[/math]
So what, one might ask, is wrong with the probabilistic model? To give a complete answer to this it is necessary to say a bit more about quasirandomness. If G is a graph with vertex set X, then one of the properties we most like it to have, if we want it to imitate a random graph of about the same density, is that if A and B are large subsets of X, then the number of edges of G that go from a point in A to a point in B is roughly the same as it would be for the random graph. More generally, we would say that two graphs G and H are "close" if the number of edges from any large set to any other is roughly the same for G as it is for H. And we can use a very similar definition for weighted graphs, where we replace "number of edges" by "sum of the weights of the edges". We had a graph derived from HAPs and were hoping that it would imitate the weighted graph where the edge from x to y has weight [math]\displaystyle{ |x-y|^{-1}. }[/math] (That was not made explicit, but it is pretty much what we were doing.)
Now these two graphs do not in fact imitate each other. For example, a notable and highly relevant difference between the two is that if A is any residue class modulo some prime p and is not the set of multiples of p, then A is an independent set in the HAP graph. (Just to be clear, I join x to y in the HAP graph if y-x divides x and y, which is the same as saying that x and y are consecutive elements of some HAP.) By contrast, in the weighted graph, all residue classes mod p look the same.
What one might be able to salvage from the wreckage
What we have here is a certain class of sets -- residue classes modulo small primes -- that do not have approximately the number of edges inside them that would be predicted by the naive probabilistic model above. However, these examples have a very particular structure, so they should spoil the proof only for a very special class of functions. The functions that cause the difficulty are likely to be ones that have significant correlation with residue classes, something that is tailor made to be detected by Fourier analysis.
This strongly suggests that we might be able to prove a result that applies to all functions f that satisfy some Fourier condition. The condition should tell us that f does not correlate with residue classes modulo small primes, so it should tell us something like that [math]\displaystyle{ \hat{f}(\alpha) }[/math] is small whenever α is close to a rational with small denominator. Intuitively, these are the functions that "can tell the difference between HAPs and ordinary APs".
So it looks as though there is something to do. I don't know exactly what it is, but I think it should be something like looking at the HAP matrix minus the weight matrix and showing that all its eigenvectors with large eigenvalues are functions with Fourier transforms that are large near some rational with small denominator. With a lemma like that, one might be able to do something.