A Heuristic Approach for Minimum Set Cover Problem

The Minimum Set Cover Problem has many practical applications in various research areas. This problem belongs to the class of NP-hard theoretical problems. Several approximation algorithms have been proposed to find approximate solutions to this problem and research is still going on to optimize the solution. This paper studies the existing algorithms of minimum set cover problem and proposes a heuristic approach to solve the problem using modified hill climbing algorithm. The effectiveness of the approach is tested on set cover problem instances from ORLibrary. The experimental results show the effectiveness of our proposed approach. Keywords—Set Cover; Greedy Algorithm; LP Rounding Algorithm; Hill Climbing Method


I. INTRODUCTION
For a given set system on a universe of items and a collection of a set of items, Minimum Set Cover Problem (MSCP) [1] finds the minimum number of sets that covers the whole universe.This is a NP hard problem proven by Karp in [2].The optimization has numerous applications in different areas of studies and industrial applications [3].The applications include multiple sequence alignments for computational biochemistry, manufacturing, network security, service planning and location problems [4]- [7].
Several heuristics and approximation algorithms have been proposed in solving the MSCP [8].Guanghui Lan et al. proposed a Meta-RaPS (Meta-heuristic for Randomized Priority Search) [9].Fabrizio Grandoni et al. proposed an algorithm based on the interleaving of standard greedy algorithm that selects the min-cost set which covers at least one uncovered element [10].Amol Deshpande et al. [11] proposed an Adaptive Dual Greedy which is a generalization of Hochbaums [12] primal-dual algorithm for the classical Set Cover Problem.This paper studies some popular existing algorithms of MSCP and proposes a heuristic approach to solve MSCP using modified hill climbing method.Within our knowledge, the same approach for MSCP of this paper has not been yet reported.Although this work implements two popular algorithms, Greedy Minimum Set Cover [14] and Linear Polynomial Rounding (LP) algorithm [15] to find solutions to MSCP, this work does not focus on the strength and weakness of the algorithms.The proposed approach starts with an initial solution from Greedy approach and LP rounding and then the result is optimized using modified hill climbing technique.The computational results shows the effectiveness of the proposed approach.
The rest of the paper is organized as follows: Section II describes the preliminary studies for proposed approach.Section III describes the proposed algorithm for MSCP.Section IV presents the experimental results.Section V provides the conclusion and future work.

II. BACKGROUND THEORY AND STUDY
This section briefly describes MSCP and presents some preliminary studies.This includes Greedy Algorithm, LP Rounding Algorithm, Hill Climbing Algorithm and OR Library of SCP instances.

A. Minimum Set Cover Problem
Given a set of n elements U = [e 1 , e 2 , ..., e n ] and a collection S = {S 1 , S 2 , ..., S m } of m nonempty subsets of U where m i=1 S i = U .Every S i is associated with a positive cost c(S i ) ≥ 0. The objective is to find a subset X ⊆ S such that Si∈X c(S i ) is minimized with respect to S∈X S = U .

B. Minimum k-Set Cover Problem
An MSCP (U, S, c) is a k-set cover problem [13] if, for some constant k, it holds that |S i | ≤ k, ∀S i ∈ S represented as (U, S, c, k).For an optimization problem, x OP T presents an optimal solution of the problem where OP T = f (x OP T ).For a feasible solution x, the ratio f (x) OP T is regarded as its approximation ratio.If the approximation ratio of a feasible solution is upper-bounded by some value k, that is 1 ≤ f (x) OP T ≤ k, the solution is called an k-approximate solution.return X Output X, minimum number of subsets 9: end procedure

D. LP Rounding Algorithm
The LP formulation [15] of MSCP can be represented as Algorithm 2 LP Rounding MSCP 1: procedure LPROUND(U, S, c) Get an optimal solution x * by solving the linear program for MSCP defined in Equation 1. 3: for each S j do Continue for all members of S 5: end if end for 9: return X The minimum number of sets 10: end procedure

E. Hill Climbing Algorithm
Hill climbing [16] is a mathematical optimization technique which belongs to the family of local search.It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by incrementally changing a single element of the solution.If the change produces a better solution, an incremental change is made to the new solution, repeating until no further improvements can be found.[17] is a collection of test data sets for a variety of Operations Research (OR) problems.OR-Library was originally described in [17].There are currently 87 data files for SCP.The format is Algorithm 3 Hill Climbing Algorithm 1: Pick a random point in the search space.2: Consider all the neighbors of the current state.3: Choose the neighbor with the best quality and move to that state.4: Repeat 2 through 4 until all the neighboring states are of lower quality.5: Return the current state as the solution state.a) number of rows (m), number of columns (n) b) the cost of each column c(j), j = 1, 2, ..., n For each row i(i = 1, ..., m) : the number of columns which cover row i followed by a list of the columns which cover row i.

III. PROPOSED ALGORITHM
This work modified the conventional hill climbing algorithm for set cover problem.To avoid the local maxima problem, this work introduced random re-initialization.For comparisons, greedy algorithm and LP rounding algorithm are used to find the initial state for the modified hill climbing algorithm.The evaluation function for the modified hill climbing algorithm is described below.
• Constraint: Universality of X must hold, that is 1) Minimize the number of sets, X.

B. OR Library MSCP Formulation
The formulation of MSCP for OR Library is given below.
1) Let M m×n be a 0/1 matrix, with cost c i ≥ 0 is part of the solution and x i = 0 otherwise.

Minimize:
Subject To: (IJARAI) International Journal of Advanced Research in Artificial Intelligence, www.ijarai.thesai.org

C. Proposed Algorithm
This section describes our proposed algorithm for MSCP.The algorithm finds an initial solution and then optimizes the result using modified hill climbing algorithm.

IV. EXPERIMENTAL RESULTS AND DISCUSSIONS
This section presents the computational results of the proposed approach.The effectiveness of the proposed approach is tested on 20 SCP test instances obtained from Beasley's OR Library.These instances are divided into 11 sets as in Table I, in which Density is the percentage of nonzero entries in the SCP matrix.All of these test instances are publicly available via electronic mail from OR Library.
The approach presented in this paper is coded using C on an Intel laptop with speed of 2.13 GHz and 2GB of RAM under Windows 7 using the codeblock,version-13.12 compiler.Note here that this study presented here did not apply any kind of preprocessing on the instance sets received from OR-Library.This paper did not report the CPU times or running time of the algorithm as they vary machine to machine and compiler to compiler.

A. Experimental Results of Weighted SCP
Table II presents the experimental results for the proposed approach for weighted SCP instances.The first column represents the name of each instance.The optimal or best-known solution of each instance is given in the 2nd column.The 3rd and 4th column represent the solution found using greedy and LP rounding approach.The 5th and 6th column represent the solutions found in [5] and [7].The last two columns contain the result found using proposed approach, started from greedy approach and LP rounding approach respectively.

B. Experimental Results of Unweighted SCP
Table III presents the experimental results of the proposed approach for unweighted SCP instances.This paper used the same 20 instances of weighted SCP and made them unweighted by replacing the weights to 1 on these instances.The first column represents the name of each instance.The optimal or best-known solution of each instance is given in the 2nd column.The 3rd and 4th column represent the solution found using greedy and LP rounding approach.The 5th and 6th column represent the solutions found in [18] and [19].The last two columns contain the result found using proposed approach, started from greedy approach and LP rounding algorithm respectively.

C. Result Summary
Summary: The optimal solution presented in Table II and Table III are taken from [7].The quality of a solution derived by an algorithm is measured by Quality Ratio which is defined as a ratio of the derived solution to the optimal solution.The quality ratio for each instance for conventional greedy algorithm, LP rounding and Proposed algorithms, presented in this work are shown in Fig. 1, 2, 3 and 4. The figures show the ratio values, plotted as histogram for every instance, presented in this work.
Another popular quality measurement reported in literature is called GAP which is defined as the percentage of the deviation of a solution from the optimal solution or best known solution.The summarized results, in terms of average quality and average GAP, for weighted set covering instances are presented in Table IV.For unweighted set covering instances it is represneted in Table V. www.ijarai.thesai.orgAlgorithm 4 Proposed Algorithm 1: Preparation: In this step, elements of Universal set U , subsets of sets S and cost c of each set are taken as inputs.2: Initial Solution: This step finds a solution X using Greedy method and LP Rounding algorithm of MSCP.X is considered the initial state for hill climbing optimization step.This study uses both the solutions and further optimizes for comparisons.3: Hill Climbing Optimization: This Phase uses modified hill climbing algorithm and optimizes the cost of set cover problem.4: Find the cost c(X) from X.
Initial best found cost, c(X) 5: Keep this (X) as the best found sets.
Initial best found set, X Here M is the Set Minimization Repetition Factor 8: Randomly select a set X * from the selected sets.
Random selection of a candidate redundant set 9: Mark this set X * as Unselected Set.Stay with this state and find the cost, C new . 12: Replace the best found cost C, with current cost, C new . 13: Remove set X * from the selected sets, X. for K times do Here K is the Hill Climbing Repetition Factor 17: Randomly select a set Y from the unselected sets, S − X 18: Mark this set as Selected. 19: if (X − X * ) ∪ Y = U then Check whether the universality constraint holds 20: Go back to step 17 Replace the best found cost C, with the current cost, C new .

24:
Enlist Y in the Selected Sets.end for 29: end for 30: Return best found list of sets X and minimum number of sets n(X).The proposed algorithm presented in this paper used conventional greedy algorithm and LP-Rounding Algorithm as initial solution.Then with the modified hill climbing method, these results are optimized.Table IV and Table V compare the proposed heuristic approach to the original greedy approach and LP Rounding algorithm.
In Table IV, the average quality ratio and average GAP of original greedy are 1.14 and 14.10 respectively for weighted SCP while for proposed approach they are 1.00 and 0.09.The average quality ratio and average GAP of LP rounding are 2.22 and 122.57respectively for weighted SCP while for proposed approach they are 1.01 and 1.48.It is clearly visible that original greedy and LP Rounding are deviated from the optimal solution by a high degree where proposed approach hardly deviates from the optimal solution.
In Table V, the average quality ratio and average GAP of original greedy are 1.11 and 10.66 respectively for unweighted SCP while for proposed approach they are 1.02 and 1.58.The average quality ratio and average GAP of LP rounding are 5.41 and 441.06 respectively for unweighted SCP while for proposed approach they are 1.18 and 17.6.It is clearly visible that original greedy and LP Rounding are highly deviated from the optimal solution where proposed approach hardly deviates from the optimal solution.

V. CONCLUSION AND FUTURE WORK
This paper studies the existing approaches of MSCP and proposes a new heuristic approach for solving it.Appropriate theorems and algorithms are presented to clarify the proposed approach.The experimental results are compared with the existing results available in literature which shows the effectiveness of the proposed approach.This approach is tested only on OR-Libray in this work.In future this approach will be

Fig. 1 :
Fig. 1: Quality ratio of weighted problem instances for Greedy and Proposed Algorithm.

Fig. 2 :
Fig. 2: Quality ratio of weighted problem instances for LP Rounding and Proposed Algorithm.

10 :
if X − X * = U then Check whether the universality constraint holds 11:

Fig. 3 :
Fig. 3: Quality ratio of unweighted problem instances for Greedy and Proposed Algorithm.

Fig. 4 :
Fig. 4: Quality ratio of unweighted problem instances for LP Rounding and Proposed Algorithm.

TABLE I :
Test instance details

TABLE III :
Experimental Results for Unweighted SCP

TABLE IV :
Average quality ratio and GAP for the Weighted Set Covering Problem