Polylog parameterizability

From Polymath Wiki
Revision as of 07:44, 17 August 2010 by Teorth (talk | contribs) (→‎pp vs ppp: one way function example)
Jump to navigationJump to search

In this post I will try to describe what I believe Deolilakar means by "polylog parameterizability", a key concept in his work. (A more accurate term here would be "polylog recursive factorizability".

Definition

Let [math]\displaystyle{ \mu }[/math] be a probability distribution on the cube [math]\displaystyle{ \{0,1\}^n }[/math]. Then to describe this distribution fully, one needs [math]\displaystyle{ 2^n-1 }[/math] real parameters; one real number [math]\displaystyle{ \mu(\{x\}) }[/math] for each of the [math]\displaystyle{ 2^n }[/math] elements of the cube, minus one for the constraint [math]\displaystyle{ \sum_{x \in \{0,1\}^n} \mu(\{x\}) = 1 }[/math].

Roughly speaking, a probability distribution is "polylog parameterizable" if it requires much fewer than [math]\displaystyle{ 2^n }[/math] real numbers to describe, and more precisely it needs only [math]\displaystyle{ \exp( O( polylog(n) ) ) }[/math] many real numbers. More precisely still, we ask that the distribution function [math]\displaystyle{ \mu(x_1,\ldots,x_n): \{0,1\}^n \to [0,1] }[/math] admit a factorisation into potentials

[math]\displaystyle{ \mu(x_1,\ldots,x_n) = \prod_{j=1}^n p_j( x_j; x_{pa(j)} ) }[/math]

where for each j, pa(j) is a set of O(polylog(n)) "parents" of j (as given by a directed acyclic graph), and [math]\displaystyle{ p_j(\cdot,x_{pa(j)}) }[/math] is a probability distribution for each assignment of the parent literals [math]\displaystyle{ x_{pa(j)} }[/math]. The point is that each factor [math]\displaystyle{ p_j }[/math] only takes [math]\displaystyle{ \exp(O(polylog(n)) }[/math] real numbers to describe, and so the entire distribution also only requires [math]\displaystyle{ \exp(O(polylog(n)) }[/math] real numbers to describe.

The purpose of the FO(LFP) machinery is (I think) ultimately to obtain a result of the following form:

Claim. If a solution space in [math]\displaystyle{ \{0,1\}^n }[/math] can be "explored by a P algorithm" in a suitable sense, then the uniform distribution [math]\displaystyle{ \mu }[/math] of that space is the _projection_ of a polylog-parameterisable measure [math]\displaystyle{ \mu' }[/math] on a subset of a polynomially larger space [math]\displaystyle{ \{0,1\}^{n^{O(1)}} }[/math].


The following can be easily shown (see proof):

A probability distribution on [math]\displaystyle{ \{0,1\}^n }[/math] that can be computed by a polynomial time run on a probabilistic Turing machine is a projection of a polylog parametrizable distribution.

However, a key point that does not seem to be sufficiently addressed in the paper is whether the projection of a polylog-parameterisable measure remains polylog-parameterisable, and which I believe could in fact be a serious problem with the proof. In fact, it is not the case, shown by counterexamples below.

Examples

Let me illustrate the above Claim with a simple example, that of a straight line program. Let [math]\displaystyle{ N := n^{100} }[/math], and consider a boolean function [math]\displaystyle{ f: \{0,1\}^n \to \{0,1\} }[/math] obtained by the formula

[math]\displaystyle{ f(x_1,\ldots,x_n) := x_N }[/math]

where [math]\displaystyle{ x_{n+1},\ldots,x_N }[/math] are obtained from previous literals by formulae of the form

[math]\displaystyle{ x_i = f_i( x_{a_i}, x_{b_i} ) }[/math] (1)

for [math]\displaystyle{ 1 \leq a_i,b_i \lt i }[/math] and some logic gates [math]\displaystyle{ f_i : \{0,1\}^2 \to \{0,1\} }[/math]. Now let

[math]\displaystyle{ S := \{ (x_1,\ldots,x_n, f(x_1,\ldots,x_n)): (x_1,\ldots,x_n) \in \{0,1\}^n \} \subset \{0,1\}^{n+1} }[/math]

be the graph of f, and let [math]\displaystyle{ \mu }[/math] be the uniform distribution on S, thus [math]\displaystyle{ \mu }[/math] assigns a probability [math]\displaystyle{ 2^{-n} }[/math] to each element of S:

[math]\displaystyle{ \mu(x_1,\ldots,x_{n+1}) = 2^{-n} 1_{x_{n+1} = f(x_1,\ldots,x_n)}. }[/math] (2)

As f is fairly complicated (indeed, it is not obviously distinguishable from a random function), it does not seem that [math]\displaystyle{ \mu }[/math] itself is polylog parameterisable. However, [math]\displaystyle{ \mu }[/math] can be lifted to a probability measure [math]\displaystyle{ \mu' }[/math] on the larger space [math]\displaystyle{ \{0,1\}^N }[/math] which _is_ polylog parameterisable. Namely, let [math]\displaystyle{ S' \subset \{0,1\}^N }[/math] be the set of all tuples [math]\displaystyle{ (x_1,\ldots,x_N) \in \{0,1\}^N }[/math] for which (1) holds for all [math]\displaystyle{ i=n+1,\ldots,N }[/math]. Clearly, the map [math]\displaystyle{ (x_1,\ldots,x_N) \mapsto (x_1,\ldots,x_n,x_N) }[/math] from [math]\displaystyle{ \{0,1\}^N }[/math] to [math]\displaystyle{ \{0,1\}^{n+1} }[/math] maps S' to S in a bijective manner, and so the uniform measure [math]\displaystyle{ \mu' }[/math] on S' projects to the uniform measure [math]\displaystyle{ \mu }[/math] on S.

On the other hand, [math]\displaystyle{ \mu' }[/math] is polylog-parameterisable. Indeed, we have the explicit factorisation

[math]\displaystyle{ \mu'(x_1,\ldots,x_N) := 2^{-n} \prod_{i=n+1}^N 1_{x_i = f_i(x_{a_i},x_{b_i})} }[/math].


Next, consider the problem of parameterising a certain distribution [math]\displaystyle{ \nu }[/math] to be defined shortly on the solution space

[math]\displaystyle{ T := \{ (x_1,\ldots,x_n): F(x_1,\ldots,x_n) = 1 \} \subset \{0,1\}^n }[/math]

of some function [math]\displaystyle{ F: \{0,1\}^n \to \{0,1\} }[/math], which we assume to be non-empty. We assume that the partial satisfiability problem is solvable here, thus for any initial partial assignment [math]\displaystyle{ x_1,\ldots,x_i }[/math] of literals, one can quickly solve the problem of whether this partial assignment can be completed to a full solution in T. Indeed, let us suppose that there is a polynomial length straight line program [math]\displaystyle{ f_i: \{0,1\}^i \to \{0,1\} }[/math] such that [math]\displaystyle{ f_i(x_1,\ldots,x_i)=1 }[/math] if and only if there exists [math]\displaystyle{ x_{i+1},\ldots,x_n }[/math] for which [math]\displaystyle{ F(x_1,\ldots,x_n)=1 }[/math].

We now define [math]\displaystyle{ \nu }[/math] to be the probability distribution on T defined by selecting random variables [math]\displaystyle{ x_1,\ldots,x_n }[/math] consecutively in the following manner. Assume that [math]\displaystyle{ x_1,\ldots,x_{i-1} }[/math] have already been selected and are extendible (i.e. [math]\displaystyle{ f_{i-1}(x_1,\ldots,x_{i-1})=1 }[/math]). We then compute [math]\displaystyle{ f_i(x_1,\ldots,x_{i-1},0) }[/math] and [math]\displaystyle{ f_i(x_1,\ldots,x_{i-1},1) }[/math]; at least one of these must equal 1. If only the former equals 1, we set [math]\displaystyle{ x_i:=0 }[/math]; if the latter, we set [math]\displaystyle{ x_i:=1 }[/math]; and otherwise, we set [math]\displaystyle{ x_i }[/math] equal to either 0 or 1 with a 1/2 probability of each.

One can then factor [math]\displaystyle{ \nu }[/math] as

[math]\displaystyle{ \nu(x_1,\ldots,x_n) = \frac{1}{|T|} \prod_{i=1}^n p_i(x_i; f_i( x_1,\ldots,x_{i-1}, 0), f_i( x_1,\ldots,x_{i-1}, 1)) }[/math]

where [math]\displaystyle{ p_i(\cdot; a_0, a_1) }[/math] is the delta distribution [math]\displaystyle{ \delta_0 }[/math] if [math]\displaystyle{ a_0=1, a_1 = 0 }[/math], the delta distribution [math]\displaystyle{ \delta_1 }[/math] if [math]\displaystyle{ a_0=0,a_1=1 }[/math], the uniform distribution [math]\displaystyle{ \frac{1}{2}\delta_0+\frac{1}{2}\delta_1 }[/math] if [math]\displaystyle{ a_0=a_1=1 }[/math], and some arbitrary distribution (e.g. [math]\displaystyle{ \delta_0 }[/math] for concreteness) in the (unattainable) case [math]\displaystyle{ a_0=a_1=1 }[/math].

This is not yet of a polylog parameterizable form (each i has i-1 ancestors rather than polylog ancestors).

However, if one expands all the straight line programs [math]\displaystyle{ f_i }[/math] and adds all the literals required for computing such programs, then we can lift [math]\displaystyle{ \nu }[/math] to a polylog parameterizable measure [math]\displaystyle{ \nu' }[/math] similarly to before. However [math]\displaystyle{ \nu }[/math] itself is not obviously a polylog parameterizable measure, merely the projection of one that involves additional "hidden variables".

Another remark: since NP is in EXP, I think that the solution space to a k-SAT problem, if it is non-empty admits, the projection of a degree O(1) recursively factorizable measure on exponentially many variables. This already illustrates that the projection operation is powerful enough to convert even the simplest measure into an extremely general one, at least if one is allowed to discard an exponential number of variables.

Counterexamples

The following measures are not polylog parameterisable, though they are projections of polylog parameterisable measures in a higher dimensional space:

  1. uniform (or any positive) measure on [math]\displaystyle{ EVEN:=\{x \in \{0,1\}^n: x_1+\ldots+x_n =0 (\mod 2)\} }[/math]
  2. uniform (or any positive) measure on [math]\displaystyle{ \{0,1\}^n \backslash (0,\ldots,0) }[/math].

To see the non polylog parameterisability, we use

Claim: Let [math]\displaystyle{ S \subset \{0,1\}^n }[/math], let [math]\displaystyle{ \Chi_S }[/math] be the characteristic function of S, and let [math]\displaystyle{ \mu }[/math] be a probability measure whose support is exactly S (i.e. [math]\displaystyle{ \mu(x) \gt 0 }[/math] if and only if x is in S. Assume that [math]\displaystyle{ \mu }[/math] is polylog parametriable. Then, there is some permutation (reordering) of [n] such that Boolean function [math]\displaystyle{ \Chi_S(x_1,\ldots,x_n) }[/math] can be expressed/computed as follows:
[math]\displaystyle{ \Chi_S(x_1\ldots,x_n) = \prod_{i=1}^n g(x_i; x_j’s) }[/math], where each g is a Boolean function of [math]\displaystyle{ x_i }[/math] and polylog-many [math]\displaystyle{ x_j }[/math]’s with j < i.

We can see the claim by focusing on measure being zero or nonzero in the definition of polylog paramateizability. Now consider #1. With respect to any ordering (permutation) of [n], the membership in EVEN cannot be expressed by such an AND as in the claim because the last factor [math]\displaystyle{ g(x_n; x_j's) }[/math] only depend on *fixed* polylog many [math]\displaystyle{ x_j }[/math]'s. By the same reason, #2 is not polylog parametrizable. (To express #1 and #2 in such a form, the last g(xn; xj's) have to depend on all n variables.)

On the other hand EVEN can be polylog parametrized if we add a number of auxilary variables, placeholders for partial sums, so that we never add more than polylog elements (we can always add a fixed number), and similarly for #2.

pp vs ppp

For the purposes of discussion, the following abbreviations have been adopted:

  • A probability distribution on [math]\displaystyle{ \{0,1\}^n }[/math] is pp (polylog parameterizable) if it is polylog parameterizable in the above sense, WITHOUT the addition of any additional variables beyond the existing n literals [math]\displaystyle{ x_1,\ldots,x_n }[/math].
  • A probability distribution on [math]\displaystyle{ \{0,1\}^n }[/math] is ppp (projected polylog parameterizable) if it is the projection of a pp distribution on [math]\displaystyle{ \{0,1\}^N }[/math] for some polynomial size [math]\displaystyle{ N=n^{O(1)} }[/math] under a map that keeps n of the N literals and discards the rest.

Thus, for instance, the above discussion shows that the graph of a straight line program of polynomial length is in ppp, and if P=NP then the solution space of every NP problem has a ppp distribution supported on it. The sets EVEN and [math]\displaystyle{ \{0,1\}^n \backslash (0,\ldots,0) }[/math] can also be easily seen to support ppp distributions (e.g. the uniform distribution on either is ppp), but neither of these sets supports pp distributions.

As another example, consider the space of collections of k-clauses on n-variables for which the corresponding SAT problem is solvable, where k is fixed (e.g. k=10) and n is large. (This is a subset of {0,1}^C, where C is the space of k-clauses on n variables.) [Note that this space is NOT the same as the solution space of a single k-SAT instance; the former is the "base" of the k-SAT relation, while the latter is a "fiber".] On the one hand, if P!=NP, we know that this space is not in P. On the other hand, it supports a ppp distribution, because one can randomly select a solution [math]\displaystyle{ (x_1,\ldots,x_n) }[/math] first, and then select a random family of the clauses that are obeyed by this solution. Thus we see that ppp supports can actually be quite "hard" in nature.

As a further example of a set supporting a ppp distribution, consider the image of {0,1}^n via a hash function or some other computable "one-way function".

The pp vs ppp distinction appears to be a crucial issue in the Deolalikar argument. As far as can be discerned from the manuscript, the paper establishes the following facts:

  • If P=NP, then the solution space of any k-SAT problem with at least one solution, supports a ppp distribution.
  • The hard phase of k-SAT does not support any pp distribution.

However this does not allow us to conclude that P!=NP, because of the substantial gap between pp and ppp.