Relaxed Random Search for Solving K-Satisfiability and its Information Theoretic Interpretation

The problem of finding satisfying assignments for conjunctive normal formula with K literals in each clause, known as K-SAT, has attracted many attentions in the previous three decades. Since it is known as NP-Complete Problem, its effective solution (finding solution within polynomial time) would be of great interest due to its relation with the most well-known open problem in computer science (P=NP Conjecture). Different strategies have been developed to solve this problem but in all of them the complexity is preserved in NP class. In this paper, by considering the recent approach of applying statistical physic methods for analyzing the phase transition in the complexity of algorithms used for solving K-SAT, we try to compute the complexity of using randomized algorithm for finding the solution of K-SAT in more relaxed regions. It is shown how the probability of literal flipping process can change the complexity of algorithm substantially. An information theoretic interpretation of this reduction in time complexity will be argued. Keywords—Constraint satisfaction problem; K-SAT; threshold phenomena; randomized algorithm; entropy; NP-completeness


I. INTRODUCTION
In computer science, there is an important family of problems known as Constraint Satisfaction problem.In this family we are looking for the values of variables which satisfy the set of constraints simultaneously.Although many of these constraints deals with non-Boolean variables, but they can be reduced to the well-known form of satisfying a canonical form of logical formula called Conjunctive Normal Form (CNF).When each clause in CNF has K literals, this problem is called K-Satisfiability problem or K-SAT.This problem covers a wide range of different theoretical and applied problems.Scheduling time table problem [1], Planning in Artificial Intelligence [2], validating software models [3], routing field programmable gate arrays [4] and synthesizing consistent network configurations [5] are recognized among these problems.
Furthermore, an important problem of designing digital circuits and their verifications can be reformulated easily into the satisfying of K-SAT formula [6]- [8].
The theoretical reason behind this wide range of application for K-SAT problem was discovered by Stephen Cook [9].He proved the NP-Complete nature of the K-SAT problem.It means all of NP problems can be reduced to the version of K-SAT problem by using an efficient procedure with polynomial time complexity [10].Therefore it is not hard to imagine that how much the effective strategy for solving K-SAT would be advantageous, both from theoretical and practical perspectives.This paper has been organized into four sections.After this primary introduction about K-SAT, different strategies which have been designed to solve k-SAT are reviewed in Section 2. Section 3 focuses on randomized algorithm, in which it is tried to improve the time complexity of algorithm by relaxing the conditions imposed on random walking in the solution space inspired by the recent studies about the typical time complexity of K-SAT problem [11].Finally in Section 4, concluding remarks and future works will be discussed.

II. RELATED WORKS
Classically constraint satisfaction problems are solved by systematic search algorithms.For K-SAT problem, this approach is followed in the DPLL algorithm [12], [13].
In this algorithm after choosing a value for any unassigned variable, the formula is simplified by considering the propagation of chosen value in the formula.Since the constraints are represented in conjunctive normal form, two main equivalence rules in propositional logic about disjunctive phrases are used to deduce the consequences of any variable assignment (1) & (2).

   
For any K-SAT formula on Boolean variables, assignments are possible.This exponentially large state space can be pruned by considering the structural properties of the CNF formula.Sometimes it is better to start the process of assigning value to variables from highly constrained variables and sometimes it is better to start with more relaxed variables.
In the case of applying DPLL algorithm, the process of assignment is started with unit-clause (single literal clause).These unite clauses provide a suitable way to reduce the size of state space by imposing a strict type of restriction on the value of literals in the unite-clauses.
The best scenario happens when deducing the consequences of any unite-clause assignment provide an opportunity for forming another unite-clause.Consider the following formula: Certainly for satisfying , must be assigned 0, considering this value for , the second clause is transformed into a unit clause .Therefore must be assigned 1 or true.Again considering this value for the third clause is transformed into unit clause and the process is continued by choosing proper values for the variables of formula.This condition provides a clear guide for choosing the values for variables without any doubt and reduces the time complexity of the problem.
Unfortunately this consecutive formation of unite clauses happens rarely and cannot be used as a general technique.As a matter of fact the consecutive emergence of unit clauses in the process of deducing the consequences of the variable assignment reduces the branching factor of the search tree.
Studies have shown the considerable amount of reduction in the time complexity of solving K-SAT, whenever one can find a way to represent the formula in the way that imposes maximum restriction for variable assignment.Sometimes this type of forced assignment leading to the pruned search tree is called implication.The best example of this type of forced assignment can be observed in Horn theory [14].
It must be mentioned that in the case of unsatisfiable formula, reaching to conflict as soon as possible (in polynomial time) is realized as a sign of an effective search strategy.A conflict can be detected in the formula if at some point there is a clause in the formula with all of its literals evaluating to zero.The clause with this condition is called conflicting clause.A conflict in the formula happens as the result of earlier improper assignments.
Different strategies have been developed to escape from conflicts.Backtracking to the earlier assignments and change them in the controlled way, is the common them of all these strategies [11].DPLL has experienced many significant improvements over the years based on these backtracking techniques.Conflict Driven Learning and Non-Chronological Backtracking are among the best improvements which have enhanced the power of DPLL algorithm in a considerable way [15], [16].These improvements are based on the simple strategy of learning as much as possible from any conflict and its source in order to avoid it in the subsequent assignments.
Modern algorithms for solving SAT problem, gets the benefit of an improved type of unit clause rule, called twoliteral watching and also improved technique in branching and variable assignment by considering the variables presented in recently conflicting clauses [17].Sometimes it is justifiable to apply random restart technique due to the complications associated with Conflict Driven Learning techniques and correlations among different clauses [18].
There is an interpretation of K-SAT which puts it in the category of discrete optimization problems.In this perspective, we are looking to maximize the number of satisfied clauses.This maximum can reach to the total number of clauses in which satisfaction happens.Therefore it is possible to use discrete optimization techniques like Simulated Annealing [19], Tabu Search [20], Neural Network [21] and Genetic Algorithms [22] to solve it.
Realization of K-SAT as an optimization problem refreshes our mind about the general difficulties of finding the maximum of the objective function.The intractable nature of the problem exhibits itself as the difficulties of bypassing the exponential number of local maximums or local minimums in its objective function [23].Considering the K-SAT as an optimization problem, one can use stochastic local search algorithms to bypass the pathological difficulties of finding the global maximum of its objective function.
Stochastic local search algorithms were used for the first time by Minton et al. [24] for solving constraint satisfaction problem and for MAX-SAT problem by Hansen and Janmard [25].Particularly for K-SAT, stochastic local search was used by Gu [26] and Selman et al. [27].Selman et al. introduced GSAT algorithm which was more effective than DPLL variants used in those days and their approach sparked considerable interest in Artificial Intelligence Community.
In spite of all efforts, up to now, we don't have a polynomial time algorithm for solving K-SAT.On the other side there is a belief supported by many practical experiments which asserts that exponential time complexity occurs for a limited sub-space of K-SAT instances [28].After the seminal work of Cheesman et al. [29], today we know that hard instances for K-SAT reside at the threshold of satisfiable to unsatisfiable phase, which occurs at specific value of known as (here is the number of clauses or constraints and is the number of variables in the K-SAT formula).Theoretical investigations about the source and nature of this phase transition in the K-SAT problem have revolutionized our understanding from this problem and its state space geometry.Recently threshold conjecture has been proved analytically for some specific conditions [30].
The effectiveness of applying statistical physic methods for analyzing the source of phase transition in the K-SAT problem, has provided us a very detailed picture of solution space upon which many other thresholds of transition have been recognized.In addition to (Satisfiable to Unsatisfiable threshold), an algorithmic threshold is defined in such a way that all known algorithms running in polynomial time fail to find solution for .Generally it is known that [31].
Therefore the previous satisfiable phase is partitioned into different regions in the light of new detailed picture of the solution space.Generally as is increased the clusters of solutions in the solution space shrink and the connectivity among them is lost [32].Let and be two distinct solutions in the set of all satisfying assignments for a specific K-SAT instance.A step [33] is defined as the number of variables which must be inverted in to produce some that is also in a solution space.A path from to is defined as a sequence of steps starting with and ending with .
Intuitively we expect that increasing decreases the number of solutions up to reaching to the unsatisfiability region but this phase transition is accompanied with other micro-transitions especially in the solution space of the problem.www.ijacsa.thesai.orgFor example bellow for , we observe connectivity in solution space.This connectivity exhibits itself as the existence of path between any two arbitrarily chosen solutions and .This connectivity is transformed at some specific value, called in which the solutions are partitioned into different clusters.Therefore at this value , the condition can be described technically by the following equation [33].
Further increment of changes the size and the number of clusters into exponentially large number (in the problem size ) where two solutions belonging to different clusters, have a large hamming distance that scales with problem size [34].
There are several other phase transitions which can be defined for the topological transformation of the solutions in the state space [33], [35].For example one can also identify in which a condensation takes place such that for any , the majority of solutions belong to sub-exponential number of clusters.Generally we have: [36], [37].
In this paper by considering this picture of solution space, it is tried to improve randomized algorithm presented by Schoning [38] in 1999 and reduce its time complexity.

III. RANDOMIZED ALGORITHM AND ITS ANALYSIS
When the structure of K-CNF formula cannot provide an insight for pruning the solution space, nothing can do better than random search [23].Theoretically the maximum amount of information can be extracted by this strategy from state space of the problem.The first successful randomized local search was introduced by Schoning [38].His method has been based on random walking in the space of possible assignments starting from random truth assignment and flipping the suitable literals until the formula gets satisfied.
Taking into account different thresholds mentioned in the previous section, we expect to find a solution in polynomial time when . The situation would be harder for near and after it due to the clusterization.Because of the interaction between different clauses, up to now, no one has given an analytic model which makes us able to count the number of unsatisfied clauses during the process of random walking in the solution space.Therefore the performance of Schoning's algorithm is analyzed by focusing on the hamming distance between current assignments namely and one particular satisfying assignment called .Let's look at the Schonin's algorithm [38].
This algorithm starts at a uniformly random truth assignment .If satisfies the formula, it would be returned.Otherwise the algorithm repeatedly chooses a clause from unsatisfied clauses, then chooses a variable uniformly from 's literals and flipping it until the formula is satisfied or the algorithm runs out of time.
As a matter of fact, starting from random initial assignment , the algorithm tries to reduce the hamming distance between current assignment and satisfying assignment by random flipping of variables in unsatisfied clauses.Obviously ( ) would be less than (the number of variables in the formula) and greater or equal to zero.

( )  
Reaching to the zero hamming distance means the satisfying assignment has been found.Let's define ( ) as the probability of reaching to the zero.We know that: As a matter of fact we are looking for the evolution of ( ) during the execution of randomized algorithm.Considering the algorithm, it is not hard to realize that flipping the variable chosen uniformly from the unsatisfied clause is responsible for the evolution of ( ) .Let be the chosen clause from unsatisfied clauses with assumption that the formula is satisfied finally at threshold of , one can expect that agrees with on at least one of its variables.Therefore flipping one variable would lead to reducing the hamming distance and moving toward (satisfying assignment) with probability .Consequently the hamming distance is increased due to this flipping with probability( ).Now we have enough information to write the governing equation of ( ) at the vicinity of .7) reflects the evolution of ( ) when we are dealing with highly constrained problem near .For more relaxed type of problem in which is around , the agreement of with equals to , where , due to the connectivity in the solution space among the satisfying assignments.Therefore, (7) can be transformed for covering more relaxed problems into (8).
The boundary conditions ( ( ) ( ) ) are still valid for relaxed type of problem.Solving (8) will give us the answer: Equation (9) shows that the success of finding the satisfying assignment is completely controlled by the hamming distance of randomly chosen initial assignment with the satisfying assignment and also by the probability of having suitable flipping of variables.In order to calculate the (the probability of finding the satisfying assignment by algorithm), it is enough to divide the possible assignments into partitions with the same hamming distance to the desired satisfying assignment and compute the average of ( ) over .
Therefore for computing by applying the generalized type of Schoning's algorithm in which the random www.ijacsa.thesai.orgwalker reduces the hamming distance to the desired satisfying assignment with probability and increases it with probability , we have the following equation: By applying amplification technique in order to get rid of emergent error associated with randomized algorithm [39], the time complexity of applying this algorithm for solving K-SAT problem is ( ) ( )( ) ) .For computing this time complexity, it has been assumed that Schoning's algorithm needs polynomial time complexity.
Therefore boosting technique applied for reducing the error of randomized algorithm is the source of emerging exponential time complexity of the resulted algorithm.In the next section we argue how the parameter , taking part in the probability of reducing hamming distance, can change our usual expectation from the complexity of this method.

IV. CONCLUSION AND FUTURE WORK
In this paper we analyzed the consequences of applying Schoning's randomized search method [38] for values of .Usually the performance of algorithms is analyzed in worst case, in which the maximum time complexity can be observed.
For K-SAT, as it has been shown by Cheeseman et al. [29], the worst instances happen at the onset of in which one can expect that each clause is satisfied by a proper value of just one of its variables.In this highly constrained region, we have and the time complexity of algorithm is ( ( ) ) ( ) .By applying an information theoretic method [40] to calculate the entropy of the random walk in this case (where ( ) ( ) ) we reach to (12).

, - 
Fig. 1.The Entropy of random walking in the state space at the vicinity of .
Fig. 1 shows the entropy function of K-SAT problem at the threshold of in which satisfaction comes in highly constrained manner.Remember that for the K-SAT would be unsatisfiable.
A deeper look at Fig. 1 shows that maximum value of entropy function for k-SAT occurs at k=2.We know that k-SAT is solvable in polynomial time for k=2.It means that, the exponential size of solution space is pruned maximally when the amount of information gained from random walk in solution space becomes maximum at k=2.In Fig. 2, the entropy function of random walking in the solution space of more relaxed situation in which ( ) ( ) where , has been depicted.As it can be observed, for larger value of , which is seen in more relaxed problem ( ), the maximum value of function is shifted toward larger values of .
It has been known for several years that random walking in the solution space is the best strategy in the lack of any guide for pruning the large state spaces [23].The result of this paper approves this hypothesis.When the entropy of random walk is maximized the problem can be solved effectively in polynomial time due to the vanishing of exponential part of time complexity.Obviously ( ) ( )( ) ) is transformed to ( ) at , in which maximum entropy of random walk is happened and deviation from this maximum entropy of random walk would be accompanied by the emergence of exponential time complexity.
Although is known to be larger than 1 for , It is an open question to find a strict mathematical bound for it.This trend of study will improve our understanding in the future.with red line, with green line, with magenta line and with yellow line.

Fig. 2 .
Fig. 2. The entropy of random walk for different values of .withred line, with green line, with magenta line and with yellow line.