Clustering based Max-Min Scheduling in Cloud Environment

Cloud Computing ensures Service Level Agreement (SLA) by provisioning of resources to cloudlets. This provisioning can be achieved through scheduling algorithms that properly maps given tasks considering different heuristics such as execution time and completion time. This paper is built on the concept of max-min algorithm with and unique proposed modification. A novel idea of clustering based max-min scheduling algorithm is introduced to decrease overall makespan and better VM utilization for variable length of the tasks. Experimental analysis shows that due to clustering, it provides better result than the different variations of max-min as well as other heuristics algorithm in terms of effective utilization of faster VMs and proper scheduling of tasks considering all possible scheduling scenarios and picking up the best solution. Keywords—Cloud computation; cluster; heuristics; batch-mode heuristics; cluster based max-min scheduling


I. INTRODUCTION
Task scheduling is a mapping mechanism from user's tasks to the appropriate selection of resources and its execution.Compared with grid computing, cloud computing has many unique features including virtualization and flexibility.By using the technology of virtualization, all physical resources are virtualized and transparent for users.All users have their own virtual device, these devices do not interact with each other and they are created based on users' requirements.In addition, one or more virtual machines can run on a single host computer so that the utilization rate of resources has been effectively improved.The independence of users' application ensures the system's security of information and enhances the availability of service [1].Supplying resources under the cloud computing environment is flexible, we increase or reduce the supplying of resources depends on users' demand.Because of these new features, grid computing, the original task scheduling mechanism, can't work effectively in cloud computing environments [2].
The task scheduling goals of Cloud computing is providing optimal tasks scheduling for users, and provide the entire cloud system throughput and QoS at the same time.Specific goals are load balance, quality of service (QoS), economic principle, optimal operation time and system throughput [3], [4].
Task scheduling algorithm is responsible for mapping jobs submitted to cloud environment onto available resources in such a way that the total response time, the make-span, is minimized [5].Many task scheduling algorithms are applied by resources manager in distributed computing to optimally allocate resources to tasks [6].While some of these algorithms try to minimize the total completion time.Where the minimization is not necessarily related to the execution time of each single task, but the aim is to minimize overall the completion time of all tasks [7].Now, for flexible resource allocation, there must be a provisioning that all resources are made available to the tasks and this is done according to SLA (Service Level Agreement) with help of parallel processing.Due to different combinations of theses SLA objectives, optimal mapping of workload to resources is found to be NP-hard [8].
The paper focuses on provisioning of a full batch of cloudlets.While other researches focus on only achieving minimal make-span, this novel idea also introduces better VM utilization through clustering the cloudlets before allocating.The novel idea of dividing and existing batch of tasks into smaller clusters is introduced in this paper.This idea along with more effective scheduling algorithm provisioned for each of the clusters helps enormously in proper scheduling of tasks to VMs which are proved spontaneously in Section 3 and Section 4 titled Proposed Methodology and Experimental Result section of this paper.The effectiveness of the newly proposed algorithm is established in the Section 5 of result comparison with the existing algorithms as described in Section 2 titled Related Works.

II. RELATED WORKS
Many heuristics have been proposed to obtain semi-optimal match.Existing scheduling heuristics can be divided into two categories: on-line mode and batch-mode.

A. On-line mode heuristics
A task is mapped to a machine as soon as it arrives at the scheduler.Some heuristic instances of this category follow:

1) Minimum Execution Time
Each task is assigned to the resource that performs it in the least amount of execution time, no matter whether this resource is available or not at that time [9].

2) Opportunistic Load Balancing
Each task is assigned to the resource that becomes ready after the current task being executed, without any consideration of the execution time of the task on the particular resource.If www.ijacsa.thesai.orgmore than one resource becomes ready at the particular time, one resource is chosen randomly [7].

B. Batch-mode heuristics
The tasks are collected into a set called meta-task (MT).These sets are mapped at prescheduled times called mapping events.Some instances of this category are as follows: 1) Suffrage Suffrage [7] is based on the idea that a task should be assigned to a certain resource and if it does not go to that resource, the most it will suffer.

2) Max-Min
Max-Min assigns task with maximum expected completion time to the corresponding resource [9].
The Max-Min algorithm is given below.

Algorithm 1: Max-Min Algorithm
The algorithm takes m Resources R j (R 1 , R 2 , ..., R m ) and maps n tasks T i (T 1 , T 2 , ..., T n ) on these resources.Expected execution time E ij of task T i on resource R j is defined as required time of resource R j to finish task T i provided that R j has no load when assignment occurs.
On the other side, expected completion time C ij of task T i on resource R j is defined as the overall time consumption till finishing any assigned task previously assigned.Assume r j denote the beginning of execution task T i .From previous mentions, it can be concluded that C ij = E ij + r j .
The make-span of complete schedule is defined as Max (C i ) where C i is the completion time for a task T i [5].
Here task T m has maximum expected completion time and it is chosen to be assigned for corresponding resource R j that provides minimum execution time.
Make-span is defined as a measure of the throughput of the heterogeneous computing system; like the Cloud Computing environment [9], [10].

3) Min-Min
Min-Min assigns task with minimum expected completion time to the corresponding resource [9].

4) QoS Guided Min-Min
QoS Guided Min-Min [11] adds a QoS constraint (QoS for a network by its bandwidth) to basic Min-Min heuristic.The basic idea of this procedure is that some tasks may require high network bandwidth but others can be satisfied with low network bandwidth.Thus, it assigns tasks with high QoS request first according to Min-Min heuristic.

5) QoS priority grouping scheduling
QoS priority grouping scheduling is similar to QoS guided Min-min.It is proposed by F. Dong et al. [12].The algorithm considers two major factors: a) deadline and acceptation rate of the tasks; and b) makespan of the whole system for task scheduling.Compared to Min-min and QoS guided Min-min, it achieves better acceptance rate and completion time.

6) Segmented Min-Min
In Segmented Min-Min heuristic described in [13] tasks are first ordered by their expected completion times.Then the ordered sequence is segmented and finally it applies Min-Min to these segments.This heuristic works better than Min-Min when length of tasks are dramatically different by giving a chance to longer tasks to be executed earlier than where the original Min-Min is adopted.

7) Improved Max-Min
In Improved Max-min algorithm largest job is selected and assigned to the resource which gives minimum completion time [14].

8) Enhanced Max-Min
Here, a task just greater than average execution time is selected and assigned to the resource which gives minimum completion time [15].

9) Resource Aware Scheduling Algorithm
The algorithm presented in [16] is a combination of maxmin and min-min.The algorithm covers the disadvantages of both algorithms and uses the advantages.

10) Reliable Scheduling Distributed in Cloud
RSDC [17] is another batch-mode scheduling process that uses processing time as scheduling factor.It subtracts the request and acknowledges time from the ultimate time in each processor.
The organization of this paper is as follows.In Section 3 (Batch-mode Algorithm), detailed explanation of any modifications of max-min will be provided.In Section 4 (Implementation and Experiments), we will present the implementation of our algorithm through CloudSim and analysis of our findings.Discussed in Section 4 (Conclusion) is a summary of our full work as well as concerns to address for the future.

III. PROPOSED METHODOLOGY
Reviewing max-min and other batch-mode heuristics algorithm, it can be seen, the tasks are always allocated according to their respective lengths or task sizes.Now maxmin works best, but there are few long tasks and many short tasks.Because, the long task can be executed in one resource while the short tasks can concurrently run on other resources.But the max-min algorithm doesn't work well in case of variable length cloudlets.To overcome this problem, we use the idea of clustering in our proposed method.If we can create some groups of cloudlets based on their characteristics, then we can try to allocate those groups according to different Step 1: For all submitted tasks in meta-task Ti Step 2: For all resource Rj Step 3: Compute C ij = E ij + r j Step 4: While meta-task is not empty Step 5: Find the task T m consumes maximum completion time.
Step 6: Assign task T m to the resource R j with minimum execution time.
Step 7: Remove the task T m from meta-tasks set Step 8: Update r j for selected R j Step 9: Update C ij for all T i www.ijacsa.thesai.orgSLAs.In this paper, cloudlet length has been used to create clusters.The number of clusters can be the number of resources.Clusters can be created in different approaches such as K-means clustering algorithm [18], CURE [19], FCM [20].
Here we use standard deviation of the cloudlet lengths to create the clusters.
Next each cluster is processed separately to simulate which cluster takes the highest time of operation.This process gives a cluster enough priority to be completed first given that there are different lengths of cloudlets in the whole batch.
After simulation of each cluster the cluster consuming highest time is scheduled to the VMs using the improved maxmin algorithm.Subsequently the cluster with the next highest time consuming is scheduled on the VMs.This process goes on until there are no clusters left to be scheduled.
The proposed algorithm is as follows: finding minimum distance of cluster standard deviation and task length 6. Simulate each Task Cluster to find out highest make-span cluster.7. Choose the cluster with highest make-span among the batch of the clusters a.For all submitted tasks in meta-task T i b.For all resource R j c.Compute E ij based on cloudlet lengths and VMs d.Compute C ij = E ij + r j e.While meta-task is not empty f.Find the task T m consumes maximum execution time.g.Assign task T m to the resource R j with minimum completion time.h.Remove the task T m from meta-tasks set i. Update r j for selected R j j.Update C ij for all T i 8.If there are unprocessed clusters in the batch go to step 7. 9. End Algorithm.

C. Flowchart of the Proposed Algorithm
The above flowchart in Fig. 1 shows the stepwise process of the algorithm.A simulation of the given algorithm is shown below with a given scenario.

D. Scenario for Simulation
Suppose we have 12 cloudlets to be scheduled to the VMs.The respective lengths of the cloudlets are as follows: {1100,100,110,120,130,140,150,160,170,180,200,800}

A. Calculation of Average and Standard Deviation of Tasks
The average length of the tasks is calculated using the simple formula:

  
Thus the average in our scenario is: 280 Standard deviation can be calculated using the following formula: Where s = number of cloudlets and Length i is the specific length of the cloudlet, i.e. the number of instructions for that specific cloudlet.
Thus the standard deviation of the given scenario would be: 307.083051.

B. Creating Clusters on the Basis of Standard Deviation
Now we need to divide our sample tasks to create clusters that would be scheduled to the VMs.According to our given scenario we are creating three clusters because we have three www.ijacsa.thesai.orgVMs.If the number of VMs increases, so does our number of clusters.Thus we divide our SD in three equal parts.We now have three clusters those have similar sized tasks within themselves.We are ready to simulate how much time the three clusters need to finish by calculating their estimated execution time, completion time and waiting times.

C. Calculation of Estimated Makespan for Each Cluster
Now we simulate each cluster to see which one gives us the maximum time make-span.We will schedule the clusters that have the highest make-span and remove all tasks of that cluster from our set of cloudlets.
For our given scenario the time make-span for each cluster along with the definitive start time, time of execution, finish time along with the VM id at which the task was executed which was determined with the help of CloudSim are followed in Table 1.Now we would choose the cluster for scheduling which has the highest make-span among all three clusters.We will go on selecting the highest cluster until all clusters are scheduled.

D. Scheduling of tasks of a cluster Algorithm 3: Cluster Based Max-Min Scheduling Algorithm for each cluster
Next the tasks within a cluster are scheduled according to the Algorithm 3.
This algorithm ensures that a task T i will be assigned to a new VM such that the overall make-span of all of the VMs remains to a minimum.That means the new task will be assigned to a new VM only if the make-span of the newly assigned task to the new VM is lesser than the make-span if the task was assigned rather to the previous VM.
As per the given scenario we see that cluster 3 having the highest make-span should be executed first to ensure that the fastest VM gets free faster than the other VMs.The specific reason behind this operation is because while processing each cluster the task that has the highest execution time is set to be completed as fast as it could be.Thus we are utilizing the fastest resources on the highest length cloudlets which will help immensely on properly executing larger tasks at hand rather than clogging the fastest resource with faster smaller tasks.
According to our given scenario the start time, finish time and total operation time are followed in Table 2. Step 1: For all submitted tasks in meta-task Ti Step 2: For all resource Rj Step 3: Compute Eij based on cloudlet lengths and VMs Step 4: Compute Cij = Eij + rj Step 5: While meta-task is not empty Step 6: Find the task Tm consumes maximum execution time.
Step 7: Assign task Tm to the resource Rj with minimum completion time.
Step 8: Remove the task Tm from meta-tasks set Step 9: Update rj for selected Rj Step 10: Update Cij for all Ti www.ijacsa.thesai.org As we see above the cluster 3 is executed first which ends in VM 0(fastest resource) with the finish time of 6.43 seconds.This means the next task scheduled on VM 0 can start on 6.43 seconds.The other two VMs can now easily compute all of cluster 2 tasks within 5.31 seconds.As seen from the results we see that we have used the comparatively slower resources to execute faster smaller tasks which result in proper utilization of the VMs.Finally, the task scheduling ends with VM 0 having finish time 7.79 seconds, VM 1 with 7.01 seconds and VM 2 with 7.89 seconds.

V. RESULT COMPARISON
In our evaluation of the result with existing systems we would compare our results with several algorithms like Max-Min, Min-Min, Improved Max-Min and Enhanced Max-Min.

A. Result of Improved Max-Min on Given Scenario
We applied the improved max-min algorithm on the given scenario.The results from the simulation are followed in Table 3. Comparing with this algorithm alone shows that the makespan of the new algorithm is better than the improved max-min algorithm.
A mere (8.17-7.79)= 0.38 seconds at VM 0 might not seem that good a result.But given the fact that this VM is the fastest VM in the given scenario proves that a fraction of a seconds in the most powerful VM can outperform several slower VMs in the scenario.Thus getting the most powerful VM free faster means the next batch of tasks can be scheduled to the VMs faster than any other traditional algorithms.

B. Comparison with Improved and Enhanced Max-Min
Given the same scenario the make-span for each of the algorithms are followed in Table 4.The comparison chart between the traditional algorithms (Enhanced Max-Min, Improved Max-Min) and proposed Cluster based Max-Min scheduling in shown in Fig. 2.

VI. CONCLUSION AND FUTURE WORK
This paper concentrates on the problem of effectively scheduling tasks to VMs on a dynamic manner.The main problem of scheduling tasks in a VM is the diversity of the size of tasks that arrive for scheduling.The proposed algorithm proves to be effectively clustering the same sized cloudlets together and eventually scheduling them together.As a result, the tasks that will have the highest make-span is gotten rid of as quickly as possible ensuring that the highest VMs are freed up as soon as possible.This action results in execution of higher number of tasks in rather shorter span of time.Even if the tasks are way too much in diversity, even then this algorithm will never perform lesser than improved max-min algorithm in any situation.
On comparative analysis this algorithm can outperform any traditional algorithm on average case scenarios and no algorithm can perform better than this proposed algorithm in any worst case scenarios.
In the future other techniques (K-means clustering, Fuzzy C-means clustering) will be used for clustering and the proposed algorithm will be compared against Metaheuristic and Evolutionary algorithms to show its effectiveness.Larger dataset of cloudlets and VMs will also be used to elaborate the findings of the ongoing research.

Algorithm 2 :
Proposed Algorithm for Cluster based Max-Min Scheduling algorithm 1. Populate list of tasks T 2. Find average length of Tasks 3. Find Standard Deviation of Tasks 4. Find number of clusters in standard deviation by dividing the standard deviation in VM number of parts 5. Place each Task in the list T to specific cluster by

Fig. 1 .
Fig. 1.Flowchart of proposed algorithm.And the three VMs in our scenario have highest allocable MIPS as follows: {300,100,50} All of the VMs in the scenario have 1 core processor, 1000 Mb bandwidth, 512 Mb of RAM.Now the total process of allocation of the tasks to the VMs is simulated in the experimental results section.

TABLE I .
SIMULATION OF EACH OF THE CLUSTERS

TABLE III .
OPERATION TIME OF IMPROVED MAX-MIN ALGORITHM

TABLE IV .
COMPARISON CHART OF IMPROVED, ENHANCED AND PROPOSED ALGORITHM