Resource Utilization in Cloud Computing as an Optimization Problem

In this paper, an algorithm for resource utilization problem in cloud computing based on greedy method is presented. A privately-owned cloud that provides services to a huge number of users is assumed. For a given resource, hundreds or thousands of requests accumulate over time to use that resource by different users worldwide via the Internet. A prior knowledge of the requests to use that resource is also assumed. The main concern is to find the best utilization schedule for a given resource in terms of profit obtained by utilizing that resource, and the number of time slices during which the resource will be utilized. The problem is proved to be an NP-Complete problem. A greedy algorithm is proposed and analyzed in terms of its runtime complexity. The proposed solution is based on a combination of the 0/1 Knapsack problem and the activityselection problem. The algorithm is implemented using Java. Results show good performance with a runtime complexity O((FS)nLogn). Keywords—Activity Selection; NP-Complete; Optimization Problem; Resource Utilization; 0/1 Knapsack


INTRODUCTION
The term cloud computing has become a buzzword in the recent years due to the publicity and widespread of the term in all aspects of life.Cloud computing in its basic form is a model of on-demand provisioning of computing resources to users [1].Resources such as computers, network servers, storage, applications, services, etc. are shared and reusable among users, this is referred to as Multi-tenancy [2].Clouding has a great influence on the cost of operation of information technology (IT) infrastructure.Companies no longer need to spend on building on-premises IT departments to support their operations.Adopting the pay-as-you-go strategy, i.e. pay only for resource usage, will cut the costs of IT operations which include maintenance, employment, training, etc.In its simplest form, provisioning of resources via clouds is similar to the way of obtaining electricity from power stations without the need for everyone to establish his privately-owned station [3].
Resources lie at the heart of cloud computing.Resource utilization (pooling) is an important topic in the field of computer science, yet it is a hot research area.The need for resource utilization never stops as long resources are limited compared to the increasing demand on computers and computing.Resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand [1].
Internet plays an important role in signifying the importance of resources.The demand on the Internet and the resources are ever increasing.The advent of cloud computing encouraged companies and computer professionals to use more and more resources even if they are not available on their premises.However, this might incur fees to be paid by those users, on the other hand, service providers have to find how to best utilize their resources so as they can serve more users during specific operation time.The main idea of cloud computing is about providing (leasing) services to users.The service providers can think in leasing their services in ways that maximize their overall profit.
In this paper, a privately-owned cloud that provides services to a huge number of users is assumed.For a given resource, hundreds or thousands of requests accumulate over time to use that resource by different users worldwide via the Internet.The main concern is to find the best utilization schedule for a given resource in terms of profit obtained by utilizing that resource, and the number of time slices during which the resource will be utilized.A prior knowledge of the requests to use that resource is assumed.
The proposed algorithm, based on a greedy method, is a combination between the solutions of two different problems, the Knapsack Problem and the Activity-Selection Problem.Based on these two techniques, this utilization problem is an NP-Complete problem.
After formalizing the problem and defining it, a greedy algorithm to solve that problem is proposed.The proposed algorithm is then analyzed in terms of runtime complexity.Finally, experimental results are recorded and discussed.
The paper is organized as follows: in Section II, a sample of related work is presented.In Section III, a mathematical formulation to the problem, the proposed algorithm and a detailed discussion of algorithm design, complexity, and NP-Completeness of the problem are introduced.In Section IV, the experimental results are discussed.Finally, conclusion and future work are presented in sections V and VI.

II. RELATED WORK
Maya Hristakeva et al [4], presented a number of methods to solve the 0/1 Knapsack problem.One of the methods presented is the greedy method.At the beginning, the 0/1 Knapsack problem is identified and formalized, then a greedy www.ijacsa.thesai.orgalgorithm is discussed, analyzed, and compared to other algorithms for different methods used in the research.
In [5], authors described an algorithm which generates an optimal solution for the 0/1 integer Knapsack problem on the NCUBE hypercube computer.Experimental data which supports the theoretical claims were provided for large instances of the one-and two-dimensional Knapsack problems.
In Knapsack problem, a number of items have to be chosen to fill the knapsack without exceeding its capacity so as the knapsack profit is maximized [6].The 0-1 Knapsack Problem is formulated as follows:  The knapsack (K) has a capacity C.
 The item (T) is a tuple  < ,  >, such that  is the weight of the item and  is the profit.
 The objective is as in (1): In [7], authors considered a setting in which they organized one or several group activities for a group of agents.Their goal was to assign agents to activities in a desirable way.They gave a general model, then studied some existence and optimization problems related to their solutions.Their results were positive as they found desirable assignments that proved to be tractable for several restrictions of the problem.
The Weighted Activity-Selection problem is an optimization problem [8], and it is a variant of the Activity-Selection Problem.Components of the problem are as follows:  An activity (A) is a tuple  < , ,  >, such that  is the activity's start time,  is its finish time, and  is the profit of that activity.
 Two activities   and   are said to be compatible if and only if   ≥   or   ≥   .
 A feasible schedule (S) is a set  ⊆ {1, 2,• • • , }, such that every two distinct numbers in S are compatible.
 The profit (P) of a schedule (S) is () = ∑   ∈ . The objective is to find a schedule that maximizes the profit.

III. ALGORITHM
Assume a resource R, with a start time S, finish time F, maximum capacity C, and Profit per Unit of Weight PU.The resource R is expressed as a tuple  < , , ,  >.The resource is connected to a network, mainly a public network like the Internet, and receives a huge number of requests.Each request Q is identified by its Id, and has a start time S, finish time T, and weight W. The request  is expressed as a tuple  < , ,  >.Two requests   and   are said to be compatible if and only if they do not overlap, i.e. the start time of the latter must be greater than or equal to the finish time of the former.
The goal is to allocate the resource in a way that achieves best utilization within the following constraints:  Maximize the profit of utilization.
 The weight of each request must not exceed the maximum capacity of the resource.
 Start and finish time of selected requests must not go beyond the boundaries of start and finish time of the resource.
 Requests must be compatible (must not overlap).

A. Explanation
Figure 1 shows the proposed algorithm.It comprises four phases, they are: (1) filtering, (2) maximum-request selection, (3) fill-right-to-max, and (4) swipe phase.Lines 7 -11 represent the filtering phase.In this phase, all requests that do not meet the constraints of the resource are filtered (removed from the request array).In other words, any request with a weight exceeds the capacity of the resources, or any request that exceeds any of the boundaries of start and finish time of the resource, is filtered.www.ijacsa.thesai.org

Function:
MaxProfitSchedule() Input: ReqArr This step is necessary, as it minimized the size of the request array through different iterations of the selection process.Back to line 5, a while statement is used to keep on iterating until the request array is empty.
Lines 13 -18 represent the maximum-request selection phase.It starts by sorting the request array in a non-increasing order by profit.This makes the maximum compatible request at the first location of the request array.As a result of the filtering phase, the first request is guaranteed to be compatible as long its weight is less than the capacity of the resource, or the remaining capacity in later iterations, and it does not exceed the boundaries of the start and finish time of the resource.In line 15, the request is added to the schedule, removed from the request array in line 16, its weight is deducted from the resource capacity in line 17, and set the new start time to the end time of that maximum request.
The third phase is the fill-right-to-max phase.Here, all the time slots to the right of the maximum request selected in the previous phase are filled.This phase starts in line 20 by sorting the request array in a non-decreasing order by start time of requests.Lines 21-23 iterate through all requests, pick up any request with weight less than the remaining capacity of the resource, and with a start time greater than or equal to the new left boundary.Until now, it is the finish time of the maximum request already selected in line 14.Finally, add this request in line 23 to the schedule.Similar to the previous phase, any request selected to be in the schedule: (1) is removed from the request array (line 24), (2) its weight is deducted again from the available resource capacity (line 25), and (3) its finish time is set temporarily to be the new resource start time (line 26).
Lines 30-35 signal the start of the swipe phase.The phase comprises iteration through the request array and removing all www.ijacsa.thesai.orgrequests that have start time greater than or equal to the finish time of the maximum request.These requests are still existent in the request array because they are incompatible with either the maximum request or the request to the right of it.They are removed to resize the array and start a new iteration with a fewer number of requests.In line 35, the start time of the maximum request is set to be the new resource finish time, and in line 36, the left boundary of the resource is set back again to the original start time S.
Iterations continue until there are no remaining requests in the request array, i.e. size of the request array equals zero.The iteration will stop at that point.The schedule array will be sorted in a non-decreasing order by start time in line 39, and the schedule is returned to the calling routine.

B. Analysis
All the terms that precede line 5 are constants.Line 5 introduces the term  which is the number of iterations of the outer while loop.The loop is expected to run until the request array is empty.In the worst case, the number of iterations is equal to the number of intervals of the resource ( =  − ).Assuming that all requests have weights less than or equal to the capacity of the resource, each with start and finish time within the boundaries of the start and finish time of the resource, and assuming a worst-case scenario in which the maximum request, i.e. the one with the highest profit, is at the end of the request array.
According to the algorithm and the assumptions aforementioned, the filtering phase will not be applicable to the initial setting, so no items will be removed from the request array.In the maximum-request selection phase, the maximum request will be added to the schedule and removed from the request array.The third phase, fill-right-to-max, is not applicable too, as long there are no requests to the right of the maximum request that has been just selected.Similarly, the swipe phase will not be applicable, because there remains no further requests right to the maximum request that are not added to the schedule.Repeating the same steps for  times, an empty array is obtained.
The sorting of an array takes time, in case of using one of the sorting algorithms of logarithmic runtimes such as the merge sort.When implementing the algorithm using Java, the Collections.sort()method is used which has () runtime complexity according to Java documentation [9].Complexity of the algorithm is evaluated as follows: The largest term of equation ( 3) is , so the effort of the algorithm is (), As mentioned earlier  =  − , thus, the effort of the algorithm is expressed as (( − )).
The value of  −  in the complexity of the algorithm is arguable in the sense whether to remove it from the equation or not.In the case of cloud computing and resource utilization, time slots can be measured in seconds or in factions of seconds.If a time slot of 1 second is assumed, for a 24-hour duty for a resource is equal to 86,400 seconds (time slots), which approximates to 84K of slots.This implies that the value  −  might be influential in the calculation of the complexity of the algorithm, so the complexity is expressed as (( − )).

C. Example
Consider a privately-owned cloud with a number of resources available to users each with a capacity  and is due to service hours starting from  = 08: 00 and ending in  = 18: 00.The service provider charges an amount of  as a profit per unit of weight.Assume 15 requests with capacities less than the resource capacity () and random profits as shown in Fig. 2 (a).
The initial schedule for the resource utilization is shown in Fig. 2 (d).When running the algorithm that is shown in Fig. 1, the following steps will be executed: Step 1:Sort the requests according to their profits in a non-increasing order.The result is shown in Fig. 2 (b).
2) Step 2:Comprises the following steps:  Add request  11 which has the maximum profit to the schedule and remove it from the requests array.The schedule will look like as in Fig. 2 (d) row MRS.
 Sort the remaining requests in a non-decreasing order according to their starting times as shown in Fig. 2 (c).

3)
Step 3: Select a request that can be fit after  11 into the schedule, i.e. its start time is equal to or greater than the finish time of  11 . 2 is the selected request.The result of adding  2 into the schedule is shown in Fig. 2 (d) row FRM.Now,  2 must be removed from the requests array.Then, any further requests' selections must be after the finish time of  2 .

4)
Step 4: Repeat step-3 for each request that follows  2 , each time changing the new start time to the selected request's finish time until no further requests can be added.Each time, the selected request is removed out of the request array.After this step  3 and  7 will be selected to the schedule as in Fig. 2 (c), second row labelled MRS.

5)
Step 4: Repeat step-1 to step-4 until no requests can be scheduled.The final schedule will be as shown in Fig. 2 (d

D. NP-Completeness
Proving the NP-Completeness of a certain problem represented in a language  is a two-step process.It involves [10]: (1) Prove that  ∈ , and (2) Prove that   NP-Hard: if there exists a language ′, such that  ′ ∈ , and  is polynomially reducible to ′ ( ′ ≤  ).
To check that  ∈ , for the language  that doesn't have a polynomial-time solution, there must be an algorithm () that checks (verifies) a proposed solution in polynomial time.This algorithm is referred to as the certificate [11].
Figure 3 shows a list of known NP-Complete problems organized in a hierarchical way so as a problem in a lower level of the tree can be polynomially reduced to a problem in a higher level of the hierarchy.Fig. 3.A family tree of reductions [11] www.ijacsa.thesai.orgTheorem 1.The resource utilization problem is an NP-Complete problem.

Proof. According to the two steps discussed earlier:
 The result of running the algorithm shown in Fig. 1 can be taken to verify that it is a solution to the resource utilization problem.An iteration through the requests in the final schedule checking that all of them are within the start and finish time of the resource working hours takes only () for the verification process.This means that a solution is verifiable in a polynomial time, which means that Resource-Utilization-Problem ∈ .
 To prove that the resource-allocation problem is NP-Hard, there must be a language (′) to which  can be polynomially reduced, that is the knapsack problem.
To show that Knapsack-Problem ≤  Resource-Utilization-Problem, resource utilization must be casted to an instance of a knapsack problem to prove its NP-Hardness.Let the resource R be the knapsack and the capacity of the resource be the knapsack capacity.The objective is to fill the knapsack, or utilize the resources, with requests so as they do not exceed the capacity of the knapsack and the profit is maximized.It is clear that the resource-allocation problem is polynomially reducible to the knapsack problem, Knapsack-Problem ≤  Resource-Utilization-Problem, which means that the resource utilization problem is NP-Hard.
From the previous two steps, it is proven that the resources utilization problem is an NP-Complete problem, Resource-Utilization-Problem ∈ .∎ ∎

IV. RESULTS
Tests are conducted on different datasets of sizes: 32K, 64K, 128K, 256K, 512K, 1M, 2M and 3MB.Datasets with further sizes were unable to be tested on the test PC due to memory limitations.Tests are performed on an Intel Core(TM) i5-3230M CPU with 2.60 GHz and 3 MB cache with 4 cores and 4 GB of RAM (3.86 GB is only usable).The PC runs windows 7 Enterprise edition 32-bit.The application program was written in Java.Datasets are generated by the application and saved to disk files.
Each dataset is experimented 10 times, runtime in milliseconds is recorded, and an average runtime is calculated.The parameters are set as follows: start time: 1, finish time: 86400 (number of seconds in a 24-hour period), resource capacity: 1048576, PU: 0.001.Results are shown in TABLE I.
Figure 4 shows the experimental runtimes depicted directly from TABLE I.
It is clear from both Fig. 4 and Fig. 5 that experimental and theoretical results converge.Many terms are removed from the asymptotic notation of the runtime complexity when calculated theoretically, and that explains the slight difference in shape between the two graphs.Figure 6 shows the asymptotic () complexity.To depict this graph, the same dataset sizes in the experiments need to be used, then the shapes of the graphs are compared together.This step is very important in the way it is used to prove the asymptotic notation.The controversial part in the asymptotic notation was the use of  −  in the expression.Some can argue that this term is not influential in the notation.Mathematically, based on the values used above for both S and F, the difference is very high which may lead the results of comparing both notations to differ significantly.www.ijacsa.thesai.orgFrom Fig. 5 and Fig. 6, it is clear that the value  −  is highly influential on the overall performance of the algorithm, which means it is not possible to be removed from the runtime complexity.Thus the complexity is asymptotically expressed as (( − )).This proves our asymptotic runtime complexity of the proposed algorithm.

V. CONCLUSION
In this paper, an optimization to the resources utilization problem in cloud computing is suggested.The solution is based on a combination between the 0/1 Knapsack problem and the activity-selection problem.The problem was introduced.The proposed greedy algorithm was analyzed, and then implemented using a Java program.It is proved that the problem is an NP-Complete problem.Asymptotically, the algorithm's runtime is (( − )).Results proved the asymptotic runtime is (( − )).An important part in that proof was whether to omit the term  −  from the asymptotic notation or not by depicting two charts for the notations, one for () and the other for (( − )).The second notation was proved when compared to the experimental runtime results.

VI. FUTURE WORK
As a future work, the algorithm could be implemented on a supercomputer.The scheduling can be made online by using preemption to obtain better utilization and higher profits.As an addition to the currently suggested model, different pricing schemes for different periods of the working hours might be added, for example the peak time.

Fig. 2 .
Fig. 2.An example: (a) Array of 15 requests to use the resources.(b) Requests array after sorting it in a non-increasing order by requests weights.(c) The requests array after removing the request with the maximum profit and sorting it again in a non-decreasing order by the starting times of requests.(d) Resources Utilization Schedule; each row represents a phase: Initial: initial setting; MRS: Maximum-Request Selection Phase; FRM: Fill-Right-to-Max phase; Final: Final Resource Utilization Schedule

Fig. 5 .Fig. 6 .
Fig. 5. Theoretical runtime graph when complexity is expressed as (( − )) : Requests Array, C, S, F: resource capacity, start time, and finish time Fig. 1.The proposed algorithm with time complexity for each step ) row Final, with a total profit of 1630.www.ijacsa.thesai.org

TABLE I .
RUNTIMES (IN MILLISECONDS) OF 10 EXPERIMENTS CONDUCTED ON DIFFERENT SIZES OF DATASETS