Genetic-Based Task Scheduling Algorithm in Cloud Computing Environment

Nowadays, Cloud computing is widely used in companies and enterprises. However, there are some challenges in using Cloud computing. The main challenge is resource management, where Cloud computing provides IT resources (e.g., CPU, Memory, Network, Storage, etc.) based on virtualization concept and pay-as-you-go principle. The management of these resources has been a topic of much research. In this paper, a task scheduling algorithm based on Genetic Algorithm (GA) has been introduced for allocating and executing an application’s tasks. The aim of this proposed algorithm is to minimize the completion time and cost of tasks, and maximize resource utilization. The performance of this proposed algorithm has been evaluated using CloudSim toolkit. Keywords—Cloud computing; Task Scheduling; Genetic Algorithm; Optimization Algorithm


I. INTRODUCTION
Due to the development of virtualization and Internet technologies, Cloud computing has emerged as a new computing platform [1].Cloud computing can be defined as a type of distributed system consisting of a collection of interconnected and virtualized computers that are dynamically provisioned.It provides one or more consolidated computing resources based on service-level agreements (SLA) between the service providers and service consumers [2].
Cloud computing has some challenges (e.g., security, performance, resource management, reliability, etc.) [3].One of the resource management issues is related to task scheduling.Task scheduling on Cloud computing refers to allocating the users' tasks on the available resources to improve execution of tasks, and increase resource utilization [4].
As the allocation of Cloud resource is based on SLA, the task execution cost is considered one of the main performance parameters of the task scheduling algorithm [5].On the other hand, the task scheduling algorithm is considered a complex process because it must schedule a large number of tasks into the available resources.In the other side, there are many parameters that should be taken into consideration to develop a task scheduling algorithm.Some of these parameters are important from the Cloud user perspective (i.e., tasks compilation time, cost, and response time).Other parameters are important from the Cloud provider perspective (i.e., resource utilization, fault tolerant, and power consumption) [6].
The task scheduling problem is considered NP-Complete problem.Therefore, optimization approaches could be used to solve it by considering performance parameters (i.e., completion time, cost, resource utilization, etc.) [6].The aim of this paper is to develop a task scheduling algorithm in the Cloud computing environment based on Genetic Algorithm for allocating and executing independent tasks to improve task completion time, decrease the execution cost, as well as, maximize resource utilization.
The rest of the paper is as follows: Section 2 discusses the related work.In Section 3, the principles of the modified GAbased task scheduling are described.The configuration of the CloudSim simulator, implementation of the proposed Genetic Algorithm, as well as, performance evaluation are discussed Section 4. Finally, conclusion and future work are given in Section 5.

II. RELATED WORK
In recent years, the problem of task scheduling on a distributed environment has caught the attention of researchers.The main issue is the execution time which should be minimized.On the other hand, scheduling of tasks is considered a critical issue in the Cloud computing environment by considering different factors like completion time, the total cost for executing all users' tasks, utilization of the resource, power consumption, and fault tolerance.
GE Junwei [6] has presented a static genetic algorithm by considering total task completion time, average task completion time, and cost constraint.
One of the scheduling issues is to allocate the correct resource to the arriving tasks.The dynamic scheduling process is considered complex if several tasks arrive at the same time, Therefore, S. Ravichandran and D. E. Naganathan [7] have introduced a system to avoid this problem by allowing the arrived tasks to wait in a queue and the scheduling will recompute and sort these tasks.Therefore, the scheduling is done by taking the first task from the queue and allocated to the resource that will be the best fit using GA.The objective of this system is to maximize utilization of resources as also reduce execution time.R. Kaur and S. Kinger [5] have proposed task scheduling algorithm-based enhancement GA.They use a new fitness function based on mean and grand mean values.They claim that this algorithm could be implemented on both task and resource scheduling.www.ijacsa.thesai.orgA comparative study of three task scheduling algorithms on the Cloud computing environment -round-robin, pre-emptive priority and shortest remaining time first algorithms -has been done in [8].
V. V. Kumar and S. Palaniswami [9] have introduced a study focusing on increasing the efficiency of the task scheduling algorithm for real-time Cloud computing services.Additionally, they have introduced an algorithm to utilize the turnaround time by assigning high priority for the task of early completion time and less priority for abortion issues of realtime task.Z. Zheng, et al. [10] have proposed an algorithm based on GA to deal with scheduling problem in the Cloud computing environment called Parallel Genetic Algorithm (PGA) to achieve the optimization or sub-optimization for Cloud scheduling problems mathematically.Furthermore, one of the main goals of task scheduling from the perspective of a Cloud provider is to maximize the profit by utilizing resource efficiently.Therefore, K. Thyagaarajan, et al. [11] have introduced a model for task scheduling in the Cloud computing environment for an effective gain of profits on the Cloud computing service provider.
In [12], S. Singh has provided an elaborate idea about GA by introducing several variants for task scheduling in the Cloud computing environment.He has introduced an algorithm to solve task scheduling problem by modifying GA in which initial population is generated by Max-Min approach to get more optimum results in term of "makespan".

III. THE PROPOSAL GENETIC BASED TASK SCHEDULING ALGORITHM
The Cloud provider should guarantee optimal scheduling of user's tasks in the Cloud computing environment according to SLA.At the same time, he should guarantee the best throughput and good utilization of the Cloud resources.
Generally, by increasing the users' tasks, the complexity of scheduling these tasks in the Cloud computing environment will be increased proportionally.Therefore, the Cloud provider needs a good algorithm to schedule the users' tasks on the Cloud to satisfy QoS, minimize makespan, and guarantee good utilization of the Cloud resources [5].Therefore, task scheduling is classified as an optimization problem.Fig. 1 illustrates the task scheduling process where each user introduces his application's tasks, and the Cloud provider uses the appropriate approaches to schedule these tasks by considering some optimization parameters, such as minimum makespan, resources utilization, and minimum cost.
Therefore, the optimization problem can be solved using heuristic algorithm such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Ant Colony Optimization (ACO).
In this work, the proposed task scheduling algorithm in the Cloud environment is based on the default GA with some modifications.According to these modifications, the parents will be considered in each population beside the produced child after the crossover process.Also, the Tournament Selection is used to select the best chromosomes to overcome the limitation of the population size.Therefore, the proposed algorithm is called Tournament Selection Genetic Algorithm (TS-GA).

A. Genetic Algorithm
Genetic Algorithm (GA) is based on the biological concept of generating the population.GA is considered a rapidly growing area of Artificial Iintelligence [1] [2].By Darwin's theory of evolution was inspired the Genetic Algorithms (GAs).According to Darwin's theory, term "Survival of the fittest" is used as the method of scheduling in which the tasks are assigned to resources according to the value of fitness function for each parameter of the task scheduling process [13].The main principles of the GA are described as follows [1] [2]:

1) Initial Population
The initial population is the set of all individuals that are used in the GA to find out the optimal solution.Every solution in the population is called as an individual.Every individual is represented as a chromosome for making it suitable for the genetic operations.From the initial population, the individuals are selected, and some operations are applied on them to form the next generation.The mating chromosomes are selected based on some specific criteria.

2) Fitness Function
The productivity of any individual depends on the fitness value.It is the measure of the superiority of an individual in the population.The fitness value shows the performance of an individual in the population.Therefore, the individuals survive or die out according to the fitness or function value.Hence, the fitness function is the motivating factor in the GA.

3) Selection
The selection mechanism is used to select an intermediate solution for the next generation based on the Darwin's law of survival.This operation is the guiding channel for the GA based on the performance.There are various selection strategies to select the best chromosomes such as roulette wheel, Boltzmann strategy, tournament selection, and selection based on rank.

4) Crossover
Crossover operation can be achieved by selecting two parent individuals and then creating a new individual tree by alternating and reforming the parts of those parents.Hybridization operation is a guiding process in the GA and it boosts the searching mechanism.

5) Mutation
After crossover, mutation takes place.It is the operator that introduces genetic diversity in the population.The mutation takes place whenever the population tends to become homogeneous due to repeated use of reproduction and crossover operators.It occurs during evolution according to a user-defined mutation probability, usually set to fairly low.Mutation alters one or more gene values in the chromosome from its initial state.This can produce the entirely new gene values being added to the gene pool.With this new gene values, the genetic algorithm may be able to produce a better solution than was previously (see Fig. 2) [1].

B. The Proposed Tournament Selection Genetic Algorithm (TS-GA)
In this work, a modified GA is proposed to solve task scheduling problem in Cloud computing environment to enhance the completion time for executing all tasks on the VMs, in the same time, minimize the total cost of usage the resource and maximize utilization of the resource.The main idea of this proposed algorithm (i.e., TS-GA) is that after each selection in the population, there is a solution that might satisfy good fitness function, but it is not selected to crossover process.By the proposed algorithm, this solution is not removed from the population, but it is chosen and added to the population when next iteration is started.This step is considered as a good step as some of the iterations can generate the best solution.

1) Initialize Population
According to the proposed TS-GA algorithm, the population is randomly generated using encoded binary (0, 1).Therefore, the representation of solutions in task scheduling for each gene or (chromosome) consists of VM ID and ID for each task to be executed on these VM (see Fig. 3).

2) The Fitness Function Representation
The main objective of task scheduling in the Cloud computing is to reduce completion time for execution all tasks on the available resources.Therefore, the completion time of task on as is defined using equation "( 1)" [15]: Where denotes maximum time for complete Task i on . n and m denote the number of tasks and virtual machines respectively.Therefore, to reduce the completion time which can be denoted as , the execution time of each task for each virtual machine must be calculated for the scheduling purpose.If the processing speed of virtual machine is , then the processing time for task Pi can be calculated by equation "( 2)" [15]: Where, the processing time for task Pi on VMj and Ci computational complexity of task Pi VM 3:-TS4, TS8, TS9 www.ijacsa.thesai.orgparameter, for example, 0.75), the fitter of the two individuals is selected to be a parent; otherwise the less fit individual is selected.The non-chosen individuals are then returned to the original population and could be selected again.

4) Crossover
In the proposed TS-GA algorithm, the new crossover has been used differently from the used crossover in the original GA.Therefore, two chromosomes which are selected to crossover process to generate two offspring will be considered as offspring also.So, the proposed crossover produces four children (see Figure 4).After that, the two best children are chosen from these.

5) Initialize Subpopulation
After each iteration, subpopulations (i.e., new populations after crossover) are added into old populations (i.e., parents).This step can enhance the diversity of population.This idea is introduced by the modified TS-GA algorithm.

6) Keep Best Solution
There is a solution that might satisfy good fitness function, but it is not selected during the crossover process.By the proposed TS-GA algorithm, this solution is not removed from the population, but it is chosen and added to the population when next iteration is started.This step is considered as good step as some of the iterations can generate the best solution.
Generally, according to the modified TS-GA algorithm, a set of modifications have been introduced.These modifications are as follows.
 The tournament is used instead of the roulette wheel in the selection process to select the best solution.
 The solutions not chosen in the selection process are considered and added to the new population.This might help in generating the best solution in the next generations.
 The new crossover is introduced by considering parents individuals as new child (see Fig. 4)  After each iteration, subpopulations (i.e., new populations after crossover) are added into old populations (i.e., parents).

IV. P PERFORMANCE EVALUATION
In this section, the experimental evaluation of the proposed TS-GA algorithm on the original GA, and Round-Robin algorithms is presented, starting by describing the experimental environment.

A. The Experimental Environment
The CloudSim toolkit helps the researchers to simulate cloud computing environment, where it is released by the Cloud Computing and Distributed Systems Laboratory, University of Melbourne [17].In other words, it provides the features of modeling and simulation of Cloud computing environment.According to CloudSim, the user tries to submit his requests in the form of cloudlets.Each cloudlet has the properties of file size, the number of instructions to be executed, etc.These cloudlets will be submitted to the broker to schedule onto VMs according to scheduling.CloudSim has an advantage of the building of broker driven policies.The defined class in CloudSim, VM, represents the virtual machine which can be created on the hosts.Creation of hosts depends on the broker where it allocates each VM to the different host.Datacenter has the capability to hold a maximum number of hosts and the broker can dynamically change the setup of hosts and VMs (see Fig. 5) [17].

B. Experimental Results
By using CloudSim toolkit, the proposed TS-GA is implemented, and a comparative study has been made among three algorithms; Round-Robin (RR), the default GA, and the improved TS-GA algorithms.Five parameters are considered to evaluate the performance.These parameters are the completion time, cost, resource utilization, speedup, and efficiency.6), it is found that the completion time of the proposed TS-GA algorithm is reduced by (41.83%) and (39.26%) about the default GA, and RR algorithms respectively.

2) Execution Cost:
In addition, the total cost of execution of all tasks on the available VMs is calculated as "(4)" [18]: Table 2. and Fig. (7) represent the execution cost of RR, default GA and the proposed TS-GA algorithms using 8 VMs.According to the results in Fig. (7), the cost of the proposed TS-GA algorithm is reduced by (3.6%) and (6.07%) relative to the default GA and RR algorithms respectively.

3) Resource Utilization:
On the other side, the utilization of resources represents the ratio between the total busy time of Virtual Machine and the total finish execution time of the parallel application.It is defined as "( 5)" [19]:    8), it is found that the resource utilization of the proposed TS-GA algorithm is improved by (47%) and (30.04%) relative to the default GA and RR algorithms respectively.

4) Speedup and Efficiency:
The speedup and efficiency for each VM are calculated as "(6)", "(7)" follows [19]:    9), it is found that the speedup of the proposed TS-GA algorithm is improved by (34.03%) and (33.65%) about the default GA and RR algorithms respectively.Also, the efficiency of the proposed TS-GA algorithm is improved by (34.06%) and (33.66%) about the default GA and RR algorithms respectively (see Fig. 10).
The average improved of speedup and efficiency of the proposed TS-GA algorithm about the default GA and RR algorithms are presented in Table 5 and Table 6 respectively.

V. CONCLUSION AND FUTURE WORK
This paper proposes an improved Genetic Algorithm for task scheduling problem in the Cloud computing environment.The proposed algorithm targets to minimize completion time and cost, and maximize resource utilization.The completion time for the proposed TS-GA algorithm is reduced by

Fig. 3 .
Fig. 3. Representation tasks and VMsThe processing time of each task in the virtual machine can be calculated by equation "(3)"[15]:

Fig. 6 .
Fig. 6. the comparison completion time of three algorithms RR, GA and TS-GA According to the results in Fig. (6), it is found that the completion time of the proposed TS-GA algorithm is reduced by (41.83%) and (39.26%) about the default GA, and RR algorithms respectively.

Fig. 7 .
Fig. 7.The comparison cost of three algorithms RR, GA and TS-GA , and Fig.(8) represent the resource utilization of RR, default GA and the proposed TS-GA algorithms using 8 VMs.

Table 4 ,
Fig. (9) and Fig. (10) represent the speedup and efficiency of RR, default GA and the proposed TS-GA algorithms using 8 VMs.

TABLE I .
THE COMPLETION TIME OF RR, GA, AND TS-GA ALGORITHMS USING EIGHT VMS

TABLE II .
THE EXECUTION COST OF RR, GA AND TS-GA ALGORITHMS

TABLE III .
THE RESOURCE UTILIZATION OF RR, GA AND TS-GA ALGORITHMS

TABLE IV .
SPEEDUP AND EFFICIENCY FOR RR, GA, AND TS-GA

TABLE V .
AVERAGE IMPROVEMENT OF SPEEDUP AND EFFICIENCY FOR TS-GA ALGORITHM RELATIVE TO DEFAULT GA ALGORITHM

TABLE VI .
AVERAGE IMPROVEMENT OF SPEEDUP AND EFFICIENCY FOR TS-GA ALGORITHM RELATIVE TO ROUND-ROBIN ALGORITHM