A Graph Theoretic Approach for Minimizing Storage Space using Bin Packing Heuristics

In the age of Big Data the problem of storing huge volume of data in a minimum storage space by utilizing available resources properly is an open problem and an important research aspect in recent days. This problem has a close relationship with the famous classical NP-Hard combinatorial optimization problem namely the “Bin Packing Problem” where bins represent available storage space and the problem is to store the items or data in minimum number of bins. This research work mainly focuses on to find a near optimal solution of the offline one dimensional Bin Packing Problem based on two heuristics by taking the advantages of graph. Additionally, extreme computational results on some benchmark instances are reported and compared with the best known solution and solution produced by the four other well-known bin oriented heuristics. Also some future directions of the proposed work have been depicted. Keywords—Bin Packing; Combinatorial Optimization; Graph Theory; Heuristics; Operational Research


INTRODUCTION
The storage space minimization problem is an open problem of now-a-days as the sizes as well as the dimension of data are increasing day by day.So, there is a need to produce a near optimal solution in less amount of time.To tackle with the problem the author have considered the storage minimization problem as the famous one dimensional Bin Packing Problem where storage space can be represented as bins and the problem is to store the items or data in minimum number of bins.This problem arises in a wide variety of contexts and this popular combinatorial optimization problem has been extensively studied during past few years.The authors [1] called the problem as "The Problem That Wouldn"t Go Away".The study of classical one dimensional Bin Packing Problem first begins in the early 1970"s [2].The problem states that, an unlimited number of bins with integer capacity C>0 each, a set of items with their weights, wi, 0< wi ≤ C are given.The goal is to assign each item to one bin, such that total weight of the items in each bin does not exceed the capacity C and the number of bins used for packing all items is minimized.The problem is known to be NP-Hard is strong sense [3].Thus, in this case the satisfying solution is to design an approximation algorithm which will construct near-optimal packing.
One dimensional Bin Packing Problem has several applications in real world, among them resource and storage space minimization is one facet.Some formulations of real world storage minimization problem using Bin Packing Problem are as follows: i) Placing computer files with specified size into the identical disk with same capacity with constrained that each file must be entirely on one disk [4].The objective is to minimize the number of disks needed for the set of files.This can be formulated using Bin Packing Problem where items are files, disks are bins and disk capacity is the bin capacity which is fixed.The problem is to minimize the number of bins.ii) Server Consolidation [5] is an approach to the efficient usage of computer server resources in order to reduce the total number of servers or server locations that an organization requires.In this case, existing servers can be treated as items, resource utilizations are item sizes, bins are destination servers and the bin capacity is the utilization threshold of the destination servers.The goal is to minimize the destination servers and maximizing resource utilization.With one resource the problem is same as one dimension Bin Packing Problem.Additionally, with more than one resource (e.g.CPU, disk, memory and computer network) the problem dimension increases.iii) Also the Bin Packing Problem can be used to minimize the cost of storing data (items) in the cloud storage [6].As buying hard-drive in bulk is much cheaper than buying them individually, the goal of solving the problem becomes minimizing the hard-drives (bins) to store the data (items).Besides this, there are other storage minimization problems where Bin Packing has a major role, but are not discussed in this paper.
Not only Bin Packing Problem but also graph theory has vast real world applications.Graph algorithm provides unified solution approach to many classical and modern application areas by taking graph as an omnipotent mathematical tool.In view of storage minimization problem, there exists various graph compression mechanism which can be used to store data compactly [7].
This paper mainly focused on the solution of one dimensional Bin packing problem in polynomial time, and for this an algorithm depending on two offline bin oriented heuristics has been proposed taking the advantages of graph theory.Firstly, a vertex weighted graph is constructed from the set of item weights where for each item weights one vertex is created.Then, the first heuristic chooses the subset of vertices according to the maximum total weight criteria and the second one is based on maximum average weight criteria, which ultimately produces the minimal clique partition of the graph with each clique having weight not exceeding the capacity of each bin.The total number of partition gives the total number of bins.The algorithm runs in polynomial time.www.ijacsa.thesai.orgMost of the existing algorithms not completely based on graph algorithm rather hybridization of graph algorithms but this work is completely based on a graph algorithm to find minimum clique Partition with weight constraint and can compete with existing algorithms.Also it can open a new direction for solving multi-dimensional Bin Packing Problem.The detailed description of the algorithm can be found in subsequent sections.
The article is organized as follows: section II contains some preliminary concepts related to the work.Some existing work to tackle Bin Packing problem with graph is described in section III.Section IV gives the detailed description of proposed algorithm.Section V contains computational results.Finally, section VI concludes the article giving some future scopes in section VII.

II. PRELIMINARIES
This section contains some preliminary concepts related to the topic, taken from [8,9,10,11].

Definition 2.1:
A Graph G is a triple consisting of a vertex set V (G), an edge set E(G), and a relation that associates with each edge two vertices (not necessarily distinct) called its endpoints.[12] is to find a partition of these |v| vertices into smallest number of cliques such that each clique has its weight not beyond C´.Proof.The proof is done by transforming an instance of 3-Partition problem to an instance of MCPCW.
Consider an instance P of 3-Partition problem: Given the set S = {a 1 , a 2 , …, a 3k }of 3k integers satisfying C´/4 < a j < C´/2 for each 1 ≤ j ≤ 3k and   3k 1 j j a = kC´.The problem asks whether S can be partitioned into k subsets s 1 , s 2 ,…, s k , such that for each i= 1, 2,…, k, s i contains exactly three elements of S and   i s a a = C´.Now we will construct a polynomial time reduction Q for P of the 3-partition problem to an instance Q(P) of the MCPCW problem i.e. a vertex weighted graph with weight of each vertices w 1 , w 2 , …, w 3k respectively where w i = a i for each i= 1,… 3k and the bound C´= (   3k 1 j j w )/ k.We now prove the claim that there exists a feasible solution to an instance P of the 3-partition problem iff instance Q(P) of the MCPCW problem has its optimal solution.So, the feasible partition of the instance Q(P) can be constructed in the following way: for each s i = {

Definition 2.10: A bin oriented heuristic for Bin Packing
Problem constructs solution bin by bin i.e. while unpacked items remain it is packed with the maximal subset of unpacked items, e.g.First Fit Decreasing (FFD), Best-Two-Fit (B2F), Minimum Bin Slack (MBS), MBS´ etc. [13,14].Definition 2.11: Offline algorithms have all the items available before the packing starts, e.g.First Fit Decreasing (FFD) [4].

III. RELATED WORKS
This section consists of some related works to solve one dimensional Bin Packing Problem based on graphs.Firstly, in [15] the authors consider time constrained scheduling problem.For a set of jobs J with execution time t(j)  (0, 1] and an undirected graph (the conflict graph) G =(J, E), they www.ijacsa.thesai.orgconsider to find schedule of the jobs that are adjacent and they are assigned different machines (bins) with total execution time of each machine at most 1.The objective is to assign all jobs into minimum number of machines maintaining the time constraint.To tackle the problem, they have proposed six different algorithms based on different principles.The first three algorithms are the modification of classical NF, FF, FFD algorithms.Next algorithm depends on optimal coloring algorithm which finds a minimum partition of the item set into independent sets which is equal to the chromatic number of G and applies one of the NF, FF and FFD packing to each independent set.Fifth and sixth algorithm is same like above but the main difference is fifth one is based on pre-coloring method and sixth one is based on general coloring method that works for co-graph and k-trees.Next, the authors of [16] consider the problem namely Bin Packing with Conflict (BPC) using conflict graph and it"s online, offline versions.They mainly improve the upper bounds of BPC on perfect graphs, interval graph and bipartite graphs.Most of the recent results follow from the adaptation of weighting systems to enable analysis of algorithms for BPC and new algorithms which carefully remove small sub-graph of items causing problematic instances.In next work [17] authors considered a restricted problem called Bin Packing with Clique Graph Conflicts.They have designed a polynomial time approximation algorithm for constant item size analyzing its performance in the more general case of bounded item sizes.In [18] authors investigated the following problem: the items to be packed are structured as the leaves of a tree and it is called as Structured Bin Packing Problem.The objective is to pack the items in the same bin whose lowest common ancestor has low height.Next, authors of [19] have proposed a problem to pack a graph G with lower and upper bound on its edges and weights on its vertices into a host graph I and called the problem as Graph Bin Packing Problem.The vertices of G are items to be packed and vertices of I are bins.The host vertex can accommodate at most L weight in total and if two items are adjacent in G, then the distance of their host vertices in I must be between the lower and upper bounds on the edge joining the two items.

IV. THE PROPOSED ALGORITHM
Let, W= {w 1 , w 2 , …,w n } be the given sequence of weights of the items.The items are numbered 1 through n, from the left to right of the list, labeling their positions in W, i.e. w 1 is the weight of first item in W, w 2 is the weight of second item in W and so on.
In this section an algorithms based on two bin oriented heuristics has been formulated based on graph to cope with the one dimensional Bin Packing Problem.
In this algorithm firstly items are sorted in non decreasing order with respect to their weight.Next, a vertex weighted graph is constructed from the sequence of items.Here, for each item a weighted vertex are introduced.Hence, firstly the graph consists of "n" isolated vertices {v 1 ,…,v n } with their weight {w 1 , w 2 , …,w n } respectively.Now, for introducing edges to the graph, the following procedure is being followed.For any pair of items with weight w i and w j that are in the position i and j, respectively in W, an edge is introduced between the corresponding vertices of w i and w j , only if (i-j)( w i -(C-w j )) ≥ 0, where C is the capacity of each bin.In other words, an edge is introduced between the corresponding vertices in the graph if they satisfy the condition w i + w j ≤ C.This is explained with an example below.Suppose, W i ≥ W j ≥ W k and vertex i and j are connected.The following equations are satisfied.
If the ordering is the perfect elimination ordering then, vertex j and k will also be connected and W j + W k  C.
Adding (1) and ( 2) This condition is applicable for the whole ordering.So, the ordering is the perfect elimination ordering.Step 6: Print B.
Step 2: If ( i ≤ n) then goto step 3 else goto step 9.
Step 4: If ( i  j ) goto step 5 else goto step 7.
Step 5: If (w i + w j ≤ C) then goto step 6 else goto step 7.
Step 6: Connect item i and j.

Output: Clique Count (CC). Begin
Step 1: CC  0; Step 2: If (n!= 0) then goto step 3 else goto step 7; Step 3: i  1; Step 4: If vertex i has zero or one neighbor, then delete the vertex along with its neighbor (if any) from the Graph (G), CC  CC + 1 and goto step 2 else goto step 5; Step 5: Select subset of vertices consisting of vertex i and its neighbor vertices based on Selection criteria 1 or Selection Criteria 2.
Step 6: Delete the subset produced from step 4, CC  CC + 1, goto step 2; Step 7: End As the subsets are the cliques, so algorithm 4.1.2returns the number of clique partition with each partition weight not exceeding the capacity.The critical part of the algorithm 4.1 is step 4 of the algorithm 4.1.2where subset of the vertices consisting of the current vertex and its neighbors has to be selected.Here, we have adopted two heuristics for selection of the subset.The selection criteria are depicted below:

A. Selection Criteria 1 (A1):
This criterion selects the subset of the current vertex along with its neighbor vertices which gives maximum total weight not exceeding the capacity (C).

B. Selection Criteria 2 (A2):
This criterion selects the subset of the current vertex along with its neighbor vertices which gives maximum average weight not exceeding the capacity (C).Here, firstly the total average weight (T a ) of the vertex set is calculated.Suppose, average weight of current subset is C a and average weight of its previous subset is P a , then if C a ≥ P a or C a ≥T a and also total sum of current subset is greater than the previous one, current subset is selected as the final subset, otherwise previous subset is selected as the final subset and this process continues for all possible subsets.Proof.Let, there is a chordless cycle v 1 , v 2 , …, v l , with l ≥ 4 in G.According to lemma 4.2, the graph G has perfect vertex elimination ordering.Suppose, v i is the vertex in the cycle that occurs first in the perfect elimination ordering and v i+1 , v i+2 are neighbors of v i occur later in the ordering.So, there must be an edge between v i+1 and v i+2 .But this contradicts the assumption that the cycle is chordless.So, the graph G is a Chordal Graph.

1)
In this case all partitions except the last one have used more than 2C/3 of the total capacities.Otherwise an item from set Z can put into them.
2) It has to be the last partition.Suppose, required number of bins = K.
Case 2: There is no Clique Partition with all vertices from Z.
In this case all vertices from set Z can be thrown out without changing total number of partitions and below cases arise.

1)
No partition has more than 2 items.
2) Any partition with one vertex from X cannot accommodate any other vertices.
3) Any partition with one vertex from Y can accommodate only another vertex from T.
4) Any Partition with one vertex from T can accommodate either one vertex from Y or one vertex from T but not both.
From the conclusion above we know that know that our algorithm will put at most 2 vertices in a bin.So, it put each vertex in a partition with maximum total weight (criteria 1) and maximum average weight (criteria 2).So, in this case the solution of proposed algorithm is optimal.
For the second part, it can be seen from the algorithm that for V number of vertices algorithm 4.

C. Illustration of Algorithm 4.1 (Counting Bins) with examples
Suppose, set W is the set of vertex weights organized in non-increasing order of their sizes and capacity=C.The optimal number of bins is calculated as The graph can be viewed as a null graph and in that case the number of cliques is the cardinality of the vertex weight set which is |W|.Here also total number of bins= 4.

Case 3:
In this case the graph can be viewed as a clique, which is also a Chordal graph.
Here also the number of bin is the number of minimal clique partition with total weight of each clique ≤ C.

V. COMPUTATIONAL RESULTS
The proposed algorithm was coded in C, compiled using Borland C++ 5.0 compiler in Win32 mode and in Intel® Atom TM 1.60 Hz Processor with 1.0 GB DDR2 RAM.
The algorithms were tested on six classes of benchmark problem instances, all of which can be downloaded from the web page of EURO Special Interest Group on Cutting and Packing (ESICUP) (http://paginas.fe.up.pt/~esicup/).The propose algorithm with heuristic criteria 1 is named as A1 and with heuristic criteria 2 is named as A2.
The first two, the u class and t class, were developed by [21] and named instance "a" in table I.The u class has item weights drawn from an integer uniform distribution on (20,100) and bin capacity c= 150.There are four sets in this class, namely u_120, u_250, u_500 and u_1000; each consisting of 20 instances with n= 120, 250, 500 and 1000 items, respectively.The t class has item weights drawn from a uniform distribution on (25, 50) and c= 100.Item weights in this class are real numbers.There are also four sets in this class, namely t_60, t_120, t_249 and t_501; each consisting of 20 instances with n= 60, 120, 249 and 501 items, respectively.The t class is considered difficult, because in an optimal solution of each instance, each bin contains 3 items with zero slack (hence the name "triplets class").All problem instances in both the u and t classes have been solved to optimality with the exact algorithm of [22].It can be seen from table I that, proposed A1 and A2 finds the solution better than FFD heuristic and A1 is giving better solution than A2.
A third class of benchmark problem instances, developed by [23], contains two sets, was_1 and was_2 and named instance "b" in table I.Each set has 100 instances with c= 1000 and item weights from (150,200).Was_1 has n= 100 items in each instance, while was_2 has n= 120 items.For all instances in this class, optimal solutions are known.Solution produced by the proposed A1 and A2 are better than FFD heuristic but A2 has better solution than all other heuristics in table I.
A fourth class of benchmark problem instances, developed by [24], is called gau_1 and contains 17 problem instances with c= 10,000 and various values of n and item weights.It is named instance "c" in table I.For all instances the optimality gap is one bin.Solutions produced by proposed A1 and A2 are same and are better than FFD and B2F heuristics.
Next, the test on the data set of difficult problem instances has been performed, called hard28, used for example by [25] and named instance "d" in table I.This set has 28 instances with n {160,180, 200}, c= 1000, and items weights drawn from (1,800).The simplest heuristic, FFD, finds optimal solutions for five instances and solutions worse than optimal by one bin for all the remaining instances.None of the other heuristics including A1, is able to improve these solutions.In fact, B2F, MBS, MBS´, A1 find worse solutions for some instances.But proposed A2 finds the same solutions as FFD.
A sixth class of benchmark problem instances, developed by [26], consists of set_1, set_2, and set_3 and named instance "e 1 ", "e 2 ", "e 3 " in table I respectively.Set_1 has 720 instances with c= 100, 120, 150, n= 50, 100, 200, 500, and item weights drawn from an integer uniform distribution on (1,100), (20,100), and (30, 100).Set_2 has 480 instances with c= 1000, n= 50, 100, 200, 500 and item weights such that each bin has on average 3 to 9 items.Set_3 has 10 instances with c= 100,000, n= 200, and item weights drawn from a uniform distribution on (20,000, 35,000).Optimal solutions for 1184 instances in this class have been found in [26].For the remaining 26 instances, optimal solutions were found by [27].From Table I it can be seen that solution produced by proposed A2 is better than other heuristics.
Additionally, in Figure .19time comparison between two proposed heuristics A1 and A2 for above instances are shown.www.ijacsa.thesai.org of the bin.The algorithm takes polynomial time and finds near-optimal solution which is shown in computational results.It can also be seen that, no algorithm is better for all the instances, the algorithm with second criteria (A2) outperforms other heuristics for one benchmark instance and in other cases, solution with two heuristics (A1 and A2) deviates small from the best known solutions and solutions produced by the four heuristics.

VII. FUTURE WORK
Several further research scopes can be outlined as follows.Firstly, along with the volume of the data its dimension is also increasing.To tackle this problem good and efficient algorithms are needed for storing high dimensional data.In this case, multidimensional network graph concept can be used or vertex weighted graph for each dimension can be created and the proposed algorithm can be applied to the graph produced from the intersection of graphs of each dimension.But it needs further investigations.Secondly, online partition problem or semi-online partition problem [28] can be applied to generate cliques with constrained weight of the vertex weighted graph.This concept can be used to tackle online or semi-online Bin Packing Problem but needs further detailed study.Third, running time of the algorithm can be improved from O (|V| 2 ) to O (|V|log|V|) by using the appropriate data structure namely red black tree [29] which is under the investigation of the author and last but not the least, though this paper contains a study of the bound of maximum number of bins needed by the proposed algorithm, the proof of the upper bound involves an extremely detailed case analysis which can be investigated in future.www.ijacsa.thesai.org
then obtains a partition of these 3k weights of |V| vertices into k cliques having weight exactly C´.Conversely, if the instance Q(P) of the problem MCPCW has an optimal clique partition { c 1 , c 2 , …, c k } with the smallest integer k having  and C´/4 < a j < C´/2 for each 1 ≤ j ≤ 3k, we obtain   i c w w = C´ (as w j = a j for each j= 1,… 3k) for each j= 1, 2,…, k and the clique c j contains exactly three elements from S, i.e. s j = k.So, the instance p of the 3-partition problem has the partition s 1 , s 2 ,…, s k .
Most of the above algorithms not completely based on graph algorithm rather hybridization of graph algorithms and exiting heuristics for solving Bin Packing Problem.Our work is simple and purely based on a graph algorithm namely finding minimum clique Partition with weight constraint and can compete with existing algorithms.Also it can open a new direction for solving multi-dimensional Bin Packing Problem.

Lemma 4 . 2 :
The Graph produced from the sequence W´ after sorting the sequence W in non increasing order (i.e.w 1 ≥ w 2 ≥ …≥w n ), has a Perfect Elimination Ordering.

Claim 4 . 5 : 1 : 2 : 5 :
There exists a feasible solution to an instance I of one dimensional Bin Packing Problem if and ijacsa.thesai.orgonly if the instance  (I) of the MCPCW problem for Chordal Graph has its optimal solution with value k.For any feasible solution of an instance I of Bin Packing Problem, the set of items B is partitioned into k bins {B 1 , B 2 , …, B k },  i=1, …, k, such that each B i contains items of B and   i B a a ≤ C (C=capacity of each bin).Then a feasible partition of the instance  (I) of the MCPCW problem can be constructed in the following way: for each B i = total weight of the vertices not exceeding C. Likewise obtain the partition of total items into k cliques each having total weight not exceeding C. Conversely, if the instance  (I) of the MCPBW problem has optimal clique partition {C 1 , C 2 , …, C k },  i=1, …, k, with smallest integer k and having   i c a a ≤ C. Then each clique contains items from the set B, i.e.B j = ,  i=1, …, k.So, the instance I of Bin Packing Problem has the partition {B 1 , B 2 ,…, B k }.Algorithm 4.1: Counting Bins Input: List of vertices (n) with their weights {w 1 , w 2 , …,w n }, Capacity (C).Output: Number of Bins (B).BeginStep If (n! = 0) then go to step 2 else goto step 7.Step Sort the vertices according to non-increasing order of their weight.Step 3: Call Algorithm 4.1.1.Step 4: Call Algorithm 4.1.2.Step Assign clique partition number (obtained from step 4) of vertex weighted graph (G) (produced by step 3) with total weight of each clique ≤ C to B (i.e.B  CC (Clique Count)).

Claim 4 . 4 : 1 . 4 . 6 :
Any induced subgraph of the graph G produced by the Algorithm 4.1, is Chordal.As the graph G produced by the Algorithm 4.1 is Chordal and any induced subgraph of a Chordal Graph is Chordal[20], so the above claim is also true for the graph produced by the Algorithm 4.Lemma Minimum Clique Partition Problem with Constrained Weight (MCPCW) for the Chordal Graph can be solved in O(|V| +|E|) time where V is the vertex set and E is the edge set.According to lemma 4.2, the ordering w 1 ≥ w 2 ≥ …≥w n is a perfect vertex elimination ordering.Suppose, processing starts with vertex v 1 with weight w 1 .It is added to the first partition.Next the adjacent vertices of v 1 are checked and the vertices are added along with v 1 to the partition with total weight ≤ C, based on one of the two above selection criteria.If the first partition is {v 1 , v 2, …, v k }, then after deletion of the vertices in the partition Algorithm 4.1 continues the execution with the remaining graph G´, which is also a Chordal Graph according to the lemma 4.4.The execution continues until vertex set is empty.In each iteration, Algorithm 4.1 checks the vertex and its neighbors.So, the overall complexity of the implementation is O(|v|) + O  roughly equivalent to O (|V| + |E|).

Theorem 4 . 7 :Case 1 :
Number of Bins Produced by the Algorithm 4.1 is K ≤ 3/2 OPT +1 and time complexity is O (|V| 2 ).Proof.Assume, partition of the ordered list of vertex weights has to be done where the weights {w 1 , w 2 , …,w n } are distributed in the following sets: There is one Clique Partition with all vertices from set Z.
1.1 construct the graph in O(|V| 2 ) time and from Lemma 3.5 it can be concluded that algorithm 4.1.2requires O(|V| + |E|) time.As time complexity of algorithm 4.1.1dominates time complexity of algorithm 4.1.2;total time required by the algorithm 4.1 is O(|V| 2 ).

Fig. 2 .
Fig. 2. Vertex weighted graph for case 1 So, with selection criteria 1 and selection criteria 2, the number of Bins=7.Case2: Input sequence: w 1  w 2  …  w s > C/2 >w s+1  w s+2  …  w n This graph is a Chordal graph.Number of bins can be found by finding minimum clique partition with total weight of each clique not exceeding C.

Definition 2.2: vertex weighted graph is
a graph where each vertex has been assigned a positive weight.

Definition 2.3: A Null Graph is a graph whose edge set is null. Definition 2.4: A Clique in
a graph G is a set of pairwise adjacent vertices.

TABLE I .
COMPARISON BETWEEN PROPOSED HEURISTICS A1, A2 AND FOUR EXISTING HEURISTICS WITH 1615 INSTANCES