Polylog parameterizability: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 5: Line 5:
Roughly speaking, a probability distribution is "polylog parameterizable" if it requires much fewer than <math>2^n</math> real numbers to describe, and more precisely it needs only <math>\exp( O( polylog(n) ) )</math> many real numbers.  More precisely still, we ask that the distribution function <math>\mu(x_1,\ldots,x_n): \{0,1\}^n \to [0,1]</math> admit a factorisation into potentials
Roughly speaking, a probability distribution is "polylog parameterizable" if it requires much fewer than <math>2^n</math> real numbers to describe, and more precisely it needs only <math>\exp( O( polylog(n) ) )</math> many real numbers.  More precisely still, we ask that the distribution function <math>\mu(x_1,\ldots,x_n): \{0,1\}^n \to [0,1]</math> admit a factorisation into potentials


: <math> \mu(x_1,\ldots,x_n) = \prod_{j=1}^J p_j( x_{C_j} )</math>
: <math> \mu(x_1,\ldots,x_n) = \prod_{j=1}^n p_j( x_j; x_{pa(j)} )</math>


where <math>J = \exp(O(polylog(n))</math>, the <math>C_j</math> are subsets of <math>\{1,\ldots,n\}</math> of size O(polylog(n)), the <math>x_{C_j} := (x_i)_{i \in C_j} \in \{0,1\}^{C_j}</math> are the literals in <math>C_j</math>, and <math>p_j: \{0,1\}^{C_j} \to {\mathbf R}</math> is a function.  The point is that each factor <math>p_j</math> only takes <math>\exp(O(polylog(n))</math> real numbers to describe, and so the entire distribution also only requires <math>\exp(O(polylog(n))</math> real numbers to describe.  We also ask that the <math>C_j</math> are organised along a directed acyclic graph in some technical sense which will not be detailed here.
where for each j, pa(j) is a set of O(polylog(n)) "parents" of j (as given by a directed acyclic graph), and <math>p_j(\cdot,x_{pa(j)})</math> is a probability distribution for each assignment of the parent literals <math>x_{pa(j)}</math>.  The point is that each factor <math>p_j</math> only takes <math>\exp(O(polylog(n))</math> real numbers to describe, and so the entire distribution also only requires <math>\exp(O(polylog(n))</math> real numbers to describe.   


The purpose of the FO(LFP) machinery is (I think) ultimately to obtain a result of the following form:
The purpose of the FO(LFP) machinery is (I think) ultimately to obtain a result of the following form:
Line 38: Line 38:




Next, consider the problem of parameterising the uniform distribution <math>\nu</math> on the solution space
Next, consider the problem of parameterising a certain distribution <math>\nu</math> to be defined shortly on the solution space


: <math>T := \{ (x_1,\ldots,x_n): F(x_1,\ldots,x_n) = 1 \} \subset \{0,1\}^n</math>
: <math>T := \{ (x_1,\ldots,x_n): F(x_1,\ldots,x_n) = 1 \} \subset \{0,1\}^n</math>


of some function <math>F: \{0,1\}^n \to \{0,1\}</math>, which we assume to be non-empty.  We assume that the partial satisfiability problem is solvable here, thus for any initial partial assignment <math>x_1,\ldots,x_i</math> of literals, one can quickly solve the problem of whether this partial assignment can be completed to a full solution in T.  Indeed, let us suppose that there is a polynomial length straight line program <math>f_i: \{0,1\}^i \to \{0,1\}</math> such that <math>f_i(x_1,\ldots,x_i)=1</math> if and only if there exists <math>x_{i+1},\ldots,x_n</math> for which <math>F(x_1,\ldots,x_n)=1</math>.
of some function <math>F: \{0,1\}^n \to \{0,1\}</math>, which we assume to be non-empty.  We assume that the partial satisfiability problem is solvable here, thus for any initial partial assignment <math>x_1,\ldots,x_i</math> of literals, one can quickly solve the problem of whether this partial assignment can be completed to a full solution in T.  Indeed, let us suppose that there is a polynomial length straight line program <math>f_i: \{0,1\}^i \to \{0,1\}</math> such that <math>f_i(x_1,\ldots,x_i)=1</math> if and only if there exists <math>x_{i+1},\ldots,x_n</math> for which <math>F(x_1,\ldots,x_n)=1</math>.
We now define <math>\nu</math> to be the probability distribution on T defined by selecting random variables <math>x_1,\ldots,x_n</math> consecutively in the following manner.  Assume that <math>x_1,\ldots,x_{i-1}</math> have already been selected and are extendible (i.e. <math>f_{i-1}(x_1,\ldots,x_{i-1})=1</math>).  We then compute <math>f_i(x_1,\ldots,x_{i-1},0)</math> and <math>f_i(x_1,\ldots,x_{i-1},1)</math>; at least one of these must equal 1.  If only the former equals 1, we set <math>x_i:=0</math>; if the latter, we set <math>x_i:=1</math>; and otherwise, we set <math>x_i</math> equal to either 0 or 1 with a 1/2 probability of each.


One can then factor <math>\nu</math> as
One can then factor <math>\nu</math> as


: <math>\nu(x_1,\ldots,x_n) = \frac{1}{|T|} \prod_{i=1}^n (1_{x_i=0} f_i( x_1,\ldots,x_{i-1}, 1) + 1_{x_i=0} f_i( x_1,\ldots,x_{i-1}, 1) )</math>
: <math>\nu(x_1,\ldots,x_n) = \frac{1}{|T|} \prod_{i=1}^n p_i(x_i; f_i( x_1,\ldots,x_{i-1}, 0), f_i( x_1,\ldots,x_{i-1}, 1))</math>
 
where <math>p_i(\cdot; a_0, a_1)</math> is the delta distribution <math>\delta_0</math> if <math>a_0=1, a_1 = 0</math>, the delta distribution <math>\delta_1</math> if <math>a_0=0,a_1=1</math>, the uniform distribution <math>\frac{1}{2}\delta_0+\frac{1}{2}\delta_1</math> if <math>a_0=a_1=1</math>, and some arbitrary distribution (e.g. <math>\delta_0</math> for concreteness) in the (unattainable) case <math>a_0=a_1=1</math>.
 
This is not yet of a polylog parameterisable form (each i has i-1 ancestors rather than polylog ancestors). 


which has a directed acyclic structure (the nature of the i^th factors is such that the distribution of the i^th literal <math>x_i</math> is determined purely by its ancestors <math>x_1,\ldots,x_{i-1}</math>) but is not yet of a polylog parameterisable form (each i has i-1 ancestors rather than polylog ancestors).  However, if one expands all the straight line programs <math>f_i</math> and adds all the literals required for computing such programs, then we can lift <math>\nu</math> to a polylog parameterisable measure <math>\nu'</math> similarly to before.  However <math>\nu</math> itself is not obviously a polylog parameterisable measure, merely the projection of one that involves additional "hidden variables".
However, if one expands all the straight line programs <math>f_i</math> and adds all the literals required for computing such programs, then we can lift <math>\nu</math> to a polylog parameterisable measure <math>\nu'</math> similarly to before.  However <math>\nu</math> itself is not obviously a polylog parameterisable measure, merely the projection of one that involves additional "hidden variables".

Revision as of 10:18, 14 August 2010

In this post I will try to describe what I believe Deolilakar means by "polylog parameterizability", a key concept in his work.

Let [math]\displaystyle{ \mu }[/math] be a probability distribution on the cube [math]\displaystyle{ \{0,1\}^n }[/math]. Then to describe this distribution fully, one needs [math]\displaystyle{ 2^n-1 }[/math] real parameters; one real number [math]\displaystyle{ \mu(\{x\}) }[/math] for each of the [math]\displaystyle{ 2^n }[/math] elements of the cube, minus one for the constraint [math]\displaystyle{ \sum_{x \in \{0,1\}^n} \mu(\{x\}) = 1 }[/math].

Roughly speaking, a probability distribution is "polylog parameterizable" if it requires much fewer than [math]\displaystyle{ 2^n }[/math] real numbers to describe, and more precisely it needs only [math]\displaystyle{ \exp( O( polylog(n) ) ) }[/math] many real numbers. More precisely still, we ask that the distribution function [math]\displaystyle{ \mu(x_1,\ldots,x_n): \{0,1\}^n \to [0,1] }[/math] admit a factorisation into potentials

[math]\displaystyle{ \mu(x_1,\ldots,x_n) = \prod_{j=1}^n p_j( x_j; x_{pa(j)} ) }[/math]

where for each j, pa(j) is a set of O(polylog(n)) "parents" of j (as given by a directed acyclic graph), and [math]\displaystyle{ p_j(\cdot,x_{pa(j)}) }[/math] is a probability distribution for each assignment of the parent literals [math]\displaystyle{ x_{pa(j)} }[/math]. The point is that each factor [math]\displaystyle{ p_j }[/math] only takes [math]\displaystyle{ \exp(O(polylog(n)) }[/math] real numbers to describe, and so the entire distribution also only requires [math]\displaystyle{ \exp(O(polylog(n)) }[/math] real numbers to describe.

The purpose of the FO(LFP) machinery is (I think) ultimately to obtain a result of the following form:

Claim. If a solution space in [math]\displaystyle{ \{0,1\}^n }[/math] can be "explored by a P algorithm" in a suitable sense, then the uniform distribution [math]\displaystyle{ \mu }[/math] of that space is the _projection_ of a polylog-parameterisable measure [math]\displaystyle{ \mu' }[/math] on a subset of a polynomially larger space [math]\displaystyle{ \{0,1\}^{n^{O(1)}} }[/math].

However, a key point that does not seem to be sufficiently addressed in the paper is whether the projection of a polylog-parameterisable measure remains polylog-parameterisable, and which I believe could in fact be a serious problem with the proof.

Let me illustrate the above Claim with a simple example, that of a straight line program. Let [math]\displaystyle{ N := n^{100} }[/math], and consider a boolean function [math]\displaystyle{ f: \{0,1\}^n \to \{0,1\} }[/math] obtained by the formula

[math]\displaystyle{ f(x_1,\ldots,x_n) := x_N }[/math]

where [math]\displaystyle{ x_{n+1},\ldots,x_N }[/math] are obtained from previous literals by formulae of the form

[math]\displaystyle{ x_i = f_i( x_{a_i}, x_{b_i} ) }[/math] (1)

for [math]\displaystyle{ 1 \leq a_i,b_i \lt i }[/math] and some logic gates [math]\displaystyle{ f_i : \{0,1\}^2 \to \{0,1\} }[/math]. Now let

[math]\displaystyle{ S := \{ (x_1,\ldots,x_n, f(x_1,\ldots,x_n)): (x_1,\ldots,x_n) \in \{0,1\}^n \} \subset \{0,1\}^{n+1} }[/math]

be the graph of f, and let [math]\displaystyle{ \mu }[/math] be the uniform distribution on S, thus [math]\displaystyle{ \mu }[/math] assigns a probability [math]\displaystyle{ 2^{-n} }[/math] to each element of S:

[math]\displaystyle{ \mu(x_1,\ldots,x_{n+1}) = 2^{-n} 1_{x_{n+1} = f(x_1,\ldots,x_n)}. }[/math] (2)

As f is fairly complicated (indeed, it is not obviously distinguishable from a random function), it does not seem that [math]\displaystyle{ \mu }[/math] itself is polylog parameterisable. However, [math]\displaystyle{ \mu }[/math] can be lifted to a probability measure [math]\displaystyle{ \mu' }[/math] on the larger space [math]\displaystyle{ \{0,1\}^N }[/math] which _is_ polylog parameterisable. Namely, let [math]\displaystyle{ S' \subset \{0,1\}^N }[/math] be the set of all tuples [math]\displaystyle{ (x_1,\ldots,x_N) \in \{0,1\}^N }[/math] for which (1) holds for all [math]\displaystyle{ i=n+1,\ldots,N }[/math]. Clearly, the map [math]\displaystyle{ (x_1,\ldots,x_N) \mapsto (x_1,\ldots,x_n,x_N) }[/math] from [math]\displaystyle{ \{0,1\}^N }[/math] to [math]\displaystyle{ \{0,1\}^{n+1} }[/math] maps S' to S in a bijective manner, and so the uniform measure [math]\displaystyle{ \mu' }[/math] on S' projects to the uniform measure [math]\displaystyle{ \mu }[/math] on S.

On the other hand, [math]\displaystyle{ \mu' }[/math] is polylog-parameterisable. Indeed, we have the explicit factorisation

[math]\displaystyle{ \mu'(x_1,\ldots,x_N) := 2^{-n} \prod_{i=n+1}^N 1_{x_i = f_i(x_{a_i},x_{b_i})} }[/math].


Next, consider the problem of parameterising a certain distribution [math]\displaystyle{ \nu }[/math] to be defined shortly on the solution space

[math]\displaystyle{ T := \{ (x_1,\ldots,x_n): F(x_1,\ldots,x_n) = 1 \} \subset \{0,1\}^n }[/math]

of some function [math]\displaystyle{ F: \{0,1\}^n \to \{0,1\} }[/math], which we assume to be non-empty. We assume that the partial satisfiability problem is solvable here, thus for any initial partial assignment [math]\displaystyle{ x_1,\ldots,x_i }[/math] of literals, one can quickly solve the problem of whether this partial assignment can be completed to a full solution in T. Indeed, let us suppose that there is a polynomial length straight line program [math]\displaystyle{ f_i: \{0,1\}^i \to \{0,1\} }[/math] such that [math]\displaystyle{ f_i(x_1,\ldots,x_i)=1 }[/math] if and only if there exists [math]\displaystyle{ x_{i+1},\ldots,x_n }[/math] for which [math]\displaystyle{ F(x_1,\ldots,x_n)=1 }[/math].

We now define [math]\displaystyle{ \nu }[/math] to be the probability distribution on T defined by selecting random variables [math]\displaystyle{ x_1,\ldots,x_n }[/math] consecutively in the following manner. Assume that [math]\displaystyle{ x_1,\ldots,x_{i-1} }[/math] have already been selected and are extendible (i.e. [math]\displaystyle{ f_{i-1}(x_1,\ldots,x_{i-1})=1 }[/math]). We then compute [math]\displaystyle{ f_i(x_1,\ldots,x_{i-1},0) }[/math] and [math]\displaystyle{ f_i(x_1,\ldots,x_{i-1},1) }[/math]; at least one of these must equal 1. If only the former equals 1, we set [math]\displaystyle{ x_i:=0 }[/math]; if the latter, we set [math]\displaystyle{ x_i:=1 }[/math]; and otherwise, we set [math]\displaystyle{ x_i }[/math] equal to either 0 or 1 with a 1/2 probability of each.

One can then factor [math]\displaystyle{ \nu }[/math] as

[math]\displaystyle{ \nu(x_1,\ldots,x_n) = \frac{1}{|T|} \prod_{i=1}^n p_i(x_i; f_i( x_1,\ldots,x_{i-1}, 0), f_i( x_1,\ldots,x_{i-1}, 1)) }[/math]

where [math]\displaystyle{ p_i(\cdot; a_0, a_1) }[/math] is the delta distribution [math]\displaystyle{ \delta_0 }[/math] if [math]\displaystyle{ a_0=1, a_1 = 0 }[/math], the delta distribution [math]\displaystyle{ \delta_1 }[/math] if [math]\displaystyle{ a_0=0,a_1=1 }[/math], the uniform distribution [math]\displaystyle{ \frac{1}{2}\delta_0+\frac{1}{2}\delta_1 }[/math] if [math]\displaystyle{ a_0=a_1=1 }[/math], and some arbitrary distribution (e.g. [math]\displaystyle{ \delta_0 }[/math] for concreteness) in the (unattainable) case [math]\displaystyle{ a_0=a_1=1 }[/math].

This is not yet of a polylog parameterisable form (each i has i-1 ancestors rather than polylog ancestors).

However, if one expands all the straight line programs [math]\displaystyle{ f_i }[/math] and adds all the literals required for computing such programs, then we can lift [math]\displaystyle{ \nu }[/math] to a polylog parameterisable measure [math]\displaystyle{ \nu' }[/math] similarly to before. However [math]\displaystyle{ \nu }[/math] itself is not obviously a polylog parameterisable measure, merely the projection of one that involves additional "hidden variables".