Random k-SAT: Difference between revisions
Added a note of Achlioptas |
No edit summary |
||
Line 19: | Line 19: | ||
* Clearly, at t=0, every value assignment is a solution, i.e., <math>S_n(0)</math> is everything, and there is only one cluster. | * Clearly, at t=0, every value assignment is a solution, i.e., <math>S_n(0)</math> is everything, and there is only one cluster. | ||
* Now, fix <math>\delta >0</math> and let <math>X=X(\delta)</math> be the random variable equal to: the smallest number of clusters such that their union contains all but a delta-fraction of <math>S_n</math>. | * Now, fix <math>\delta >0</math> and let <math>X=X(\delta)</math> be the random variable equal to: the smallest number of clusters such that their union contains all but a delta-fraction of <math>S_n</math>. | ||
=== d1RSB === | |||
* As it turns out, for any finite <math>\delta</math>, X undergoes a "phase transition" and goes from 1 to "exponentially many" during the addition of <math>o(n)</math> constraints around a critical, problem-specific constraints-to-variables ratio. | * As it turns out, for any finite <math>\delta</math>, X undergoes a "phase transition" and goes from 1 to "exponentially many" during the addition of <math>o(n)</math> constraints around a critical, problem-specific constraints-to-variables ratio. | ||
Line 29: | Line 31: | ||
* Every pair of valleys has Hamming distance <math>\Omega(n)</math>, and every path connecting solutions in distinct valleys, must go through a point in which the energy function has value <math>\Omega(n)</math>. [By the Lipschitzness of the energy function, in fact, the path must go through <math>\Omega(n)</math> such points]. | * Every pair of valleys has Hamming distance <math>\Omega(n)</math>, and every path connecting solutions in distinct valleys, must go through a point in which the energy function has value <math>\Omega(n)</math>. [By the Lipschitzness of the energy function, in fact, the path must go through <math>\Omega(n)</math> such points]. | ||
=== Freezing=== | |||
* After the d1RSB occurs, the next relevant phenomenon is the emergence of "frozen variables". Concretely, a variable is frozen in a cluster if it takes the same value in all assignments of that cluster. Note that the same variable can be frozen to different values in different clusters and can be frozen in some clusters but not in others. | * After the d1RSB occurs, the next relevant phenomenon is the emergence of "frozen variables". Concretely, a variable is frozen in a cluster if it takes the same value in all assignments of that cluster. Note that the same variable can be frozen to different values in different clusters and can be frozen in some clusters but not in others. | ||
Line 48: | Line 52: | ||
As a result, at each density there is a single value <math>\delta_0</math> which "dominates" the uniform measure on solutions, by maximizing <math>\delta * C(d)</math>. Now, we are ready to define condensation. | As a result, at each density there is a single value <math>\delta_0</math> which "dominates" the uniform measure on solutions, by maximizing <math>\delta * C(d)</math>. Now, we are ready to define condensation. | ||
=== Condensation === | |||
Condensation occurs when the value of <math>\delta_0</math> dominating the measure corresponds to selecting the clusters of maximum size. In other words, it is the state in which a handful (finitely-many-in-expectation) clusters, capture all but a vanishing fraction of <math>S_n</math>, while exponentially many, smaller clusters still exist. | Condensation occurs when the value of <math>\delta_0</math> dominating the measure corresponds to selecting the clusters of maximum size. In other words, it is the state in which a handful (finitely-many-in-expectation) clusters, capture all but a vanishing fraction of <math>S_n</math>, while exponentially many, smaller clusters still exist. | ||
Revision as of 00:18, 11 August 2010
Below are some notes of Dimitris Achlioptas.
I will try below to clarify some issues regarding "dynamic 1-step RSB", "frozen variables", "condensation", and the contrast with "random k-XOR-SAT". I made an effort to keep the discussion rather generic and, less successfully, short.
The setting:
- A set of n variables all with the same finite domain of size d, e.g., d=2 in k-SAT.
- A collection of constraints on the variables, each constraint binding k of the n variables, e.g., k=2 in graph coloring.
Typically, the collection consists of "all possible constraints of a certain kind on the n variables", e.g., all [math]\displaystyle{ 2^k \binom{n}{k} }[/math] clauses of length k in k-SAT, or all {n choose 2} edges, i.e., "not equal" constraints, in graph coloring.
- It will be easier to think of the formation of random instances as a "process", in which constraints are added one by one, by selecting one uniformly random constraint from the collection, among those that have not yet been selected (for technical reasons, it often helps to allow selection with replacement, but this turns out to be inconsequential.)
- Finally, it also helps to think of what the physicists call the "energy landscape", i.e., the function counting the number of violated constraints for each value assignment. So, in particular, solutions correspond to the value assignments (points) where the energy is zero.
- For any given problem, let [math]\displaystyle{ S_n(t) }[/math] denote the set of solutions of a random instance after t constraints have been added. (Clearly, [math]\displaystyle{ S_n(t) }[/math] is a random variable, so everything below refers to the "typical" behavior of [math]\displaystyle{ S_n }[/math].)
- Say that two value assignments are adjacent if they have Hamming distance 1. Under this notion of adjacency, we will look at the "connected components" of [math]\displaystyle{ S_n }[/math], also called "clusters". (One can probably replace "Hamming distance 1" with "bounded Hamming distance" with no essential difference. Again, this does not seem to be a consequential issue.)
With the above in mind, below is a "voiceover" for the video [math]\displaystyle{ S_n(0), S_n(1), S_n(2),\ldots }[/math].
- Clearly, at t=0, every value assignment is a solution, i.e., [math]\displaystyle{ S_n(0) }[/math] is everything, and there is only one cluster.
- Now, fix [math]\displaystyle{ \delta \gt 0 }[/math] and let [math]\displaystyle{ X=X(\delta) }[/math] be the random variable equal to: the smallest number of clusters such that their union contains all but a delta-fraction of [math]\displaystyle{ S_n }[/math].
d1RSB
- As it turns out, for any finite [math]\displaystyle{ \delta }[/math], X undergoes a "phase transition" and goes from 1 to "exponentially many" during the addition of [math]\displaystyle{ o(n) }[/math] constraints around a critical, problem-specific constraints-to-variables ratio.
This is the "dynamic 1-step Replica Symmetry Breaking" transition.
- The d1RSB transition seems to be a very generic phenomenon for random CSPs and has been rigorously established for random k-SAT, random graph k-coloring, random NAE k-SAT, and a number of other problems, in each case for all values of [math]\displaystyle{ k \gt k_0 }[/math] (where [math]\displaystyle{ k_0 }[/math] is finite and problem dependent). (For the sake of simplicity, I am slightly overstating here, but not in any manner relevant to the matter at hand.)
- If one thinks in terms of the energy landscape, the d1RSB means that we go from a world with a single big energy-zero valley, to a world with exponentially many small energy-zero valleys (and no big energy zero valley).
Of course, to have valleys you need mountains. In particular, another thing that has been rigorously established is that the valleys are "landlocked". Namely:
- Every pair of valleys has Hamming distance [math]\displaystyle{ \Omega(n) }[/math], and every path connecting solutions in distinct valleys, must go through a point in which the energy function has value [math]\displaystyle{ \Omega(n) }[/math]. [By the Lipschitzness of the energy function, in fact, the path must go through [math]\displaystyle{ \Omega(n) }[/math] such points].
Freezing
- After the d1RSB occurs, the next relevant phenomenon is the emergence of "frozen variables". Concretely, a variable is frozen in a cluster if it takes the same value in all assignments of that cluster. Note that the same variable can be frozen to different values in different clusters and can be frozen in some clusters but not in others.
Regarding frozen variables we rigorously know the following:
- Shortly after the d1RSB transition occurs, a uniformly random solution, with high probability, will have [math]\displaystyle{ \Omega(n) }[/math] frozen variables.
- As k is increased (which in k-SAT means the length of the clauses, while in coloring the number of colors), the emergence of frozen variables occurrs sooner and sooner after the d1RSB.
- (Perhaps most importantly) Significantly after the d1RSB transition, but while [math]\displaystyle{ S_n }[/math] still consists of "exponentially many clusters of roughly equal size", with high probability, EVERY SINGLE SOLUTION contains [math]\displaystyle{ \Omega(n) }[/math] frozen variables. (And indeed, the fraction of variables that are frozen tends to 1 with k.)
To put this a bit more starkly: For a number of random CSP problems, for a sizable fraction of the satisfiable regime of each problem, the space of solutions looks like an "error-correcting code with fuzz".
The "fuzz" is the little cloud that forms by jiggling the non-frozen variables. (While these variables are not completely independent, they only exhibit "local correlations", so they do have non-vanishing entropy.) Again, it's important to remember that the set of frozen variables differs from cluster to cluster and that, moreover, a variable can be frozen to different values in different clusters.
- After [math]\displaystyle{ S_n }[/math] has reached the above "error-correcting code with fuzz" state, it is clear that every single constraint added from now on will "kill", in expectation, a constant fraction of all clusters (since each cluster has a constant fraction of frozen variables and, therefore, a constant probability of being killed).
- Moreover, one point that has not been mentioned so far is that beyond the d1RSB, one can define what the physicists call a "complexity" function (this is actually a reasonable name, but becomes problematic by the computational aspects of the context). What is this complexity function?
The complexity function C tells us for every [math]\displaystyle{ \delta\gt 0 }[/math], that there are [math]\displaystyle{ O(2^{C(\delta) n}) }[/math] clusters of size [math]\displaystyle{ O(2^{\delta n}) }[/math].
As a result, at each density there is a single value [math]\displaystyle{ \delta_0 }[/math] which "dominates" the uniform measure on solutions, by maximizing [math]\displaystyle{ \delta * C(d) }[/math]. Now, we are ready to define condensation.
Condensation
Condensation occurs when the value of [math]\displaystyle{ \delta_0 }[/math] dominating the measure corresponds to selecting the clusters of maximum size. In other words, it is the state in which a handful (finitely-many-in-expectation) clusters, capture all but a vanishing fraction of [math]\displaystyle{ S_n }[/math], while exponentially many, smaller clusters still exist.
- Of course, this handful of clusters defining condensation can not "survive" for long. Indeed, within o(1) steps in expectation they will be killed and the "next biggest" clusters will dominate for another O(1) steps. This process, called the "condensed regime" does last, in total, [math]\displaystyle{ \Omega(n) }[/math] steps and, naturally, ends with the instance becoming unsatisfiable.
For reasons that are outside the scope of this discussion, the condensed regime is important for certain algorithms (in particular, the Survey Propagation algorithm), but in my opinion is not central to the matter at hand.
- In regards, to Cris Moore's point about graph coloring, I think that the short answer is that we shouldn't be bothering at all with k=3 in coloring, where indeed there is an algorithm that works slightly above the d1RSB threshold. The reason, in my mind, is that there are probably no frozen variables at the density at which the algorithm works and, also, the actual number for the point at which the d1RSB transition occurs for k=3 is, in fact, non-rigorous. Whereas, for large k, everything is rigorous (and, indeed, no algorithm has been known to work above the d1RSB -- an open problem for more than 30 years).
Again, in my opinion, the salient (and rigorously established) point is: "for a number of random CSP problems, for a sizable fraction of the satisfiable regime, the energy landscape looks like that of an error-correcting code with fuzz".
- Finally, what about random k-XOR-SAT?
- First, a technical point. If one does indeed generate instances by selecting the variables in each constraint uniformly at random (without insisting that each variable has degree at least 2 as one does in LDPC codes), then there will be variables which appears in 0 or 1 constraints.
Clearly, such variables can be "repeatedly removed" leaving a "core" structure. It is clear that if one finds a solution to the core, then this solution can be trivially extended to the rest. OK, what about solutions to this core?
- Well, this core does indeed behave like -- you guessed it -- an error-correcting code, indeed one with absolutely no fuzz: each connected component has size 1. Intuitively, the reason is that if one is at a "codeword" and changes the value of a single variable V, since we are dealing with parity constraints, this will require flipping some other variable in each of the constraints that contain V, and since the underlying (hyper-/factor)graph is an expander, such "forced changes" will proliferate.
Note that such a proliferation does not necessarily occur in random k-SAT or graph coloring, as it's possible to change a single variable and still have every constraint be satisfied. This "redundancy" in the solutions is precisely what gives rise to the aforementioned "fuzz" for those problems.
- Finally, the heart of the matter, and what I believe is meant by Deolalikar when he speaks of "independent parameters".
The cores/codewords in random k-XOR-SAT, are linearly related.
In contrast, in random k-SAT we have no reason to think that they should be and one can, indeed, think of them as (more or less) randomly placed points inside the set of all value assignments, thus requiring "many" parameters to be specified. Alternatively, one can think of the random CSP instances as taking the randomness used to define the factor graph of the instance and "lifting it" to the high-dimensional space of solutions, by creating "randomly placed" pockets of solutions (clusters).