An HC-CSO Algorithm for Workflow Scheduling in Heterogeneous Cloud Computing System

Many scientists are using meta-heuristic techniques for dynamic workflow task scheduling in the area of cloud computing systems to get optimum solutions. Many swarm intelligent algorithms have been designed so far which are having many limitations as some get trapped in local optima, a few are having low convergence speed, some are having poor global search facilities, etc. Still, there is a requirement of designing a new algorithm or modification of existing algorithms to overcome the limitations of the existing techniques. A new Hybrid Cat Swarm Optimization algorithm named H-CSO was designed inspired by the HEFT algorithm and the initialization problem of the Cat Swarm Optimization was overcome. Still, that algorithm has a limitation of getting stuck in local minima. To overcome this algorithm a part of the Crow Search Algorithm has been integrated into H-CSO and described in this paper. After simulation, it was found that the new hybrid algorithm named HC-CSO outperforms CSO and H-CSO. Keywords—Cloud computing; Crow Search Algorithm (CSA); Cat Swarm Optimization (CSO); H-CSO; HC-CSO; HEFT; SelfMotivated Inertia Weight (SMIW); Virtual Machines (VMs)


I. INTRODUCTION
Information Technology has been reshaped by the evolution of cloud computing technology via big storage facilities, high-performance computing, and other hardware and software services. The current technology includes the evolution of computing eras where computers were connected via the internet that took the form of distributed computing. This further transformed into cluster computing, cluster to grid computing, and then cloud computing [1]. The major aim of cloud computing technology is to provide high-performance computing services at the minimum cost. The cloud technology shifted the users' data from client machines to network-abled machines which are having high-powered processors and hardware parts. Cloud computing provides services in the form of Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) [15]. Any end-user can pick up the services as per his requirements. The main benefit of the cloud is that it doesn't include geographical boundaries to provide the services to the end-users [2]. This means that the users need not to know the physical locations of the service providers and computing datacenters. Cloud technology is flexible because a user can increase the number of services and drop whenever it is required due to the pay-per-usage policy. Various cloud service providers are Amazon Web Services, Microsoft, Google, Rackspace, salesfornce.com, etc. The cloud system can be classified into four categories such as private cloud, public cloud, community cloud, and hybrid cloud [3].
The performance of the cloud can be improved at various levels such as network level, scheduling level, database level, etc. After studying various research papers, it can be seen that numerous algorithms have been designed for workflow task scheduling, load balancing, energy consumption management, etc. in cloud computing. Workflow task scheduling is one of the major areas where a lot of improvement is required. Task scheduling can be of two types static and dynamic scheduling. In static scheduling, the execution times of the tasks are preestimated or known but in the case of dynamic scheduling, it is not known. This era is of dynamic task scheduling. So, dynamic task scheduling algorithms are required to be improved. The optimization techniques play an important role to improve the cloud scheduling problem nowadays. A few famous techniques are Ant Colony Optimization, Particle Swarm Optimization, and Cat Swarm Optimization, etc. Cat Swarm Optimization belongs to Swarm Intelligent (SI) family [4]. The algorithms used in this paper are as under: 1) Cat swarm optimization: This is an intelligent algorithm originally developed in the year of 2006. This algorithm is inspired by the behaviour of the original cat which is having to two modes known as seeking (resting mode) and tracing (attacking mode). Here, N numbers of cats are generated randomly and each cat denotes a solution, its position, a flag, and fitness value. M dimensions in the search space represent the position, and each dimension in the search space is having its self-velocity. The flag is used to identify that the cat is in either seeking mode or in tracing mode and this flag is set by a parameter known as Mixing Ratio (MR). After finding the fitness, the best cat is stored in the memory at each iteration and finally, the best solution or cat is identified at the end of the final iteration. The two modes of Cat Swarm Optimization are described below [1] [21]: i) Seeking mode: This mode represents the resting mode of a cat and four parameters play an important role in it: SMP (Seeking Memory Pool), SRD (Seeking range of the Selected Dimension), CDC (Counts of Dimension to change), and SPC (Self Position Consideration). In SMP, one position is selected by a cat randomly for moving to the next position. Let's say, the SMP is set to 10, now for every cat 10 new random positions will be generated and one among them will be www.ijacsa.thesai.org selected randomly for movement. SRD and CDC will decide the randomization of the new positions. How many dimensions need to be mutated that is decided by the CDC factor which is in the interval between [0 to 1]. The amount of mutation is defined by SRD for the dimensions selected by CDC. The Boolean value SPC will consider the candidate cats for the next iteration from the current position. Let's assume, if SPC is true then for each cat SMP-1 candidates will be generated instead of SMP because the current position will be decided from them. Seeking Mode steps are given as: a) Generate S Copies of Seeking CatK equal to the SMP value.
b) For each copy, change at a random dimension of Cats as per CDC by applying SRD operator as: Where, is the current position and is the next new position, n is the numbers of Cats and D is the dimension, rand is a random variable between [0, 1] interval.
Where, , is the velocity, c1 is an acceleration coefficient, r1 is a random variable between [0, 1], , is the best cat position, , is the current cat position and D is the dimension. b) If velocity is going beyond the upper range set it within the range. c) Update the position of the CatK by using the following equation 3.
2) Crow search algorithm: Crow family is treated as one of the most intelligent bird groups. Their brain is considered slightly lower than a human being, based on their body-tobrain ratio. Crow is a very well famous thief as to watches other bird's food and steals it after the victim bird leaves its place of food. This intelligent behaviour of the crow can be used to solve and optimize real-world problems. The CS Algorithm can be described with the help of a pseudo-code in Fig. 1 [22]. For i = 1 to N 6.
Randomly choose one of the crows to follow, let it be Crowj 7.
Define the awareness probability 8.
End For 14.
Check the fitness of new positions 15.
Evaluate the new position of the crows 16.
Update the memory of crows 17. End while First of all, the positions of all crows in a group and other parameters have been initialized. N is the number of crows, ri and rj are random variables between 0 to 1. After evaluation of the positions of all crows, the memory of each crow is initialized. As described earlier, the behavior of the crow is to chase a bird to steal its food. So, a random crow from the search space is selected say it be a Crowj. Crowi will follow it and will steal its food whenever the Crowj will leave its place after hiding its food. If the random variable rj will be greater than awareness probability the position of the current Crowi will be updated by using equation 4 otherwise the position will be updated randomly from the search space. Flight Length (fl) of the crow will decide the local or global search. If the fl is less than 1 then it will work for local search otherwise the global search will take place. Then the new positions or solutions will be stored in the memory and this process will be continued till the termination condition will not be satisfied [22].
In this paper, the limitation of getting stuck in the local minima of the H-CSO algorithm has been improved by integrating the local search part of the CSA. The details are given in the coming sections.
The rest of the paper is structured as follows: Section II covers the related work. In Section III, the proposed methodology is described. Section IV is having a description of the simulation setup and simulation results and discussion are covered in Section V. Section VI is summarizing the conclusion and future scope. In the end, the research papers' references are given.

II. RELATED WORK
Existing work in the area of workflow and task scheduling can be found in the following literature review: In [1] the scientists proposed multi-objective Cat Swarm Optimization based on the Simulated Annealing technique. The Simulated Annealing (SA) technique is incorporated in the local search of the proposed algorithm and the SA is enhanced by the Orthogonal Taguchi approach. The parameters like execution time and execution cost are considered for performance www.ijacsa.thesai.org measurement. The proposed technique worked better as compared to Multi-Objective Ant Colony Optimization, Multi-objective Genetic Algorithm, and Multi-Objective Particle Swarm Optimization. The authors [2] proposed a Multi-Objective Cat Swarm Optimization algorithm and after comparison with the existing Multi-Objective PSO technique, the proposed approach was found better in the account of energy consumption, execution cost, and execution time. The researchers in [3] introduced a Cat Swarm Optimization-based technique for workflow scheduling. The results were compared with PSO (Particle Swarm Optimization) and found that the CSO reduces the processing cost, made a good load balance, and gave the optimum results in less iteration. In [5] a Hybrid Particle Swarm Optimization algorithm using the Hill-Climbing technique was proposed by the scientists. The proposed algorithm was found effective in terms of makespan after experiments. The scientists [6] introduced a Binary Hybrid Particle Swarm Optimization and Gravitational Search algorithm for load balancing of virtual machines (VMs). The experimental results showed that the proposed algorithms worked better than the existing Binary PSO load balancing algorithm in terms of load balancing. In [7] the researchers offered a new HPSOGWO algorithm that is a combination of Particle Swarm Optimization and Grey Wolf Optimization. The idea behind this algorithm was to improve the exploitation of PSO and exploration of GWO for making the better strength of the proposed algorithm. The results concluded that the HPSOGWO is better than standard PSO and GWO variants concerning solution stability, convergence speed, and quality. The authors in [8] proposed an IPSO (Improved Particle Swarm Optimization) algorithm to improve the allocation of large-length tasks. The introduced algorithm outperformed despite Ant Colony, Honey Bee, and Round-Robin algorithm in accounts of load balance, makespan, and degree of imbalance. In [9] authors presented a Particle Swarm Optimization based technique for workflow scheduling and found that the Particle Swarm Optimization based technique saved the cost equal to 3 times as compared to BRS (Best Resource Selection) algorithm. Also, the presented technique balanced the load efficiently. The authors of the paper [10] proposed a PSO (Particle Swarm Optimization) based scheduling technique for independent tasks. The proposed technique was improved using a load balancing strategy. The newly introduced method was compared with Improved PSO, Round-Robin, and existing load balancing techniques. The experimental results showed that the proposed technique is better than the above said three algorithms for resource utilization and makespan. The scientist [11] introduced the MPSO (Modified Particle Swarm Optimization) algorithm for the reduction of cost as compared to the existing Particle Swarm Optimization algorithm. The simulation results showed that the proposed technique worked better. In paper [12] the authors modified Particle Swarm Optimization by modifying the parameters like MIPS and Bandwidth for effective load balancing. The simulation results showed that the proposed model resulted in a reduction of execution time. The researchers of the paper [13] compared a task scheduling strategy based on Ant Colony Optimization with FCFS and RR (Round-Robin). The simulation results proved that the ACO outperforms traditional FCFS and Round-Robin algorithms. The authors [14] developed a hybrid optimization technique using Flower Pollination and Grey Wolf Optimization. The PEFT algorithm was used for the initialization of the proposed method for workflows. The simulation resulted that the proposed method is effective as compared to Flower Pollination with Genetic Algorithm in terms of cost and reliability. In [16] the researchers designed an Improved Social Learning Optimization Algorithm by introducing the Small Vector Position method for task scheduling problems. After simulation, it was found that the proposed approach worked well as compared to GA and PSObased techniques. In the paper [17], the researchers proposed an Adaptive Cost-based Task Scheduling technique to scheduling the tasks between virtual machines at minimum cost. The simulation results concluded that the proposed technique is performing well in terms of communication cost, execution time, CPU utilization and execution cost rather than cost-efficient task scheduling. In [18] research paper, the scientists implemented a Dynamic Adaptive Particle Swarm Optimization algorithm to enhance the efficiency of Particle Swarm Optimization for better makespan, resource utilization. The authors also proposed an algorithm named MDAPSO. DAPSO and MDPSO worked better than the original Particle Swarm Optimization. The authors of [19] research paper proposed an Improved Particle Swarm Optimization based technique to solve workflow scheduling problems in cloud systems. The proposed method worked better as compared to existing state-of-art methods. In [20], the scientists proposed a cloud scheduling model named Task Scheduling System. The proposed Genetic Algorithm-Chaos Ant Colony Optimization worked better than ACO and GA algorithms in respect to cost and convergence speed.
From the above study, it is found that the ACO, PSO, and CSO algorithms are optimizing the scheduling of independent as well as workflows tasks. The ACO algorithm is performing well for local searching; the PSO and CSO algorithms are good for global searching and get stuck in local optima easily. Various researchers integrated a few techniques and formulae in these algorithms to improve the performance of the cloud. But, these techniques are old. The CSO algorithm is a good performer among the ACO and PSO; the H-CSO is better than the CSO.
So, a new method is required to integrate into the H-CSO so that its limitation of getting trapped in local optima can be overcome. Hence, the local searching part of the Crow Search Algorithm has been integrated into the H-CSO algorithm. The details are indicated in the proposed methodology section.

III. PROPOSED METHODOLOGY
From the related work, it is learned that a single algorithm could not be able to give better results for workflow task scheduling. , is flight length (less than 1 i.e. 0.5) of the CrowK at current iteration , is present best location of CrowK in Dimension D

18.
Feed the population generated by Local CSA in H-CSO // Do local and global search as given below 19.
Assign the velocity VK to each Cat 20.
According to Mixing Ratio (MR) flag Distribute Cats to Seeking and Tracing Modes 21.
If current CatK is in Seeking Mode Then 22.
Generate S (SMP) Copies of CatK and Spread them in D Dimensions where each Cat has a velocity (VK, D)

23.
Evaluate the Fitness value of all Copies and Discover Best Cats (XBEST, D) 24.
Replace Original CatK with the Copy of Best Cats (XBEST, D) 25.
Else If current CatK is in Tracing Mode Then 26.
Compute and Update CatK velocity by following equations: Where, is a weight factor calculated by Self-Motivated Inertia Weight method and are constant factors greater than 1, both are set as 2.

31.
Evaluate Many scientists advised improving the existing algorithms. The H-CSO algorithm is a combination of the HEFT, Cat Swarm Optimization algorithm, and Self-Motivated Inertia Weight useful in overcoming the velocity outrange problem of the standard CSO. As said earlier, this H-CSO algorithm gets trapped in local minima in the case of a large search space and complex workflows' tasks environment. A few limitations of the H-CSO algorithm are described below in short: 1) The H-CSO algorithm is having only a good global searching capacity.
2) The H-CSO algorithm gets trapped in local minima due to a large number of cats is always residing in seeking Mode as compared to tracing mode.
3) Due to a lack of balance between seeking and tracing modes, optimal results could not be got using the H-CSO algorithm. www.ijacsa.thesai.org In the proposed algorithm named HC-CSO, a local search part of a well-known Crow Search Algorithm is integrated into the H-CSO for avoiding it getting trapped in the local optima. The working of the HC-CSO algorithm is described as: First of all, the initialization of various parameters takes place then pre-processing of workflows tasks is executed with the HEFT method as shown in the pseudo-code of the proposed algorithm. After the pre-processing of workflows, the tasks are assigned to available VMs. If this solution is getting optimized at the very first stage then the algorithm is stopped and returns the optimized schedule otherwise the initial solution generated by the HEFT algorithm is fed to the CSA algorithm. Here, a number of N Crows are generated using the population obtained by the local Crow Search Algorithm. Then, the population got from the local CSA is fed to the H-CSO algorithm. Now, N numbers of Cats are generated and velocity value VK is assigned to each cat for further processing. In the next phase, the cats are randomly distributed into the seeking and tracing mode as per the Mixing Ratio (MR) rate and flags are set to each cat. If the current CatK is found in the seeking mode as per flag value, this mode gets executed otherwise the tracing mode gets executed. The position and velocities of the cats in the tracing mode are updated using the equations given in the pseudocode in Fig. 2 and the best cats get stored in the memory in the form of solution. This process is continued until the termination condition is not matched and finally an optimal solution is returned in the form of better makespan and cost. The proposed algorithm has both the best local and global searching capacity, so the results could be reached at an optimum level. The proposed algorithm is also useful to avoid the premature convergence of the H-CSO method. The pseudo-code of the proposed algorithm named HC-CSO has been described step by step in Fig. 2.

IV. SIMULATION SETUP
The simulation environment has been created in the CloudSim tool for simulating the different workflows used in this paper. Various experiments were carried out over a computing machine having the configuration as Processor -Intel ® Core ™ i3-5005U at a speed of 2.0 GHz, RAM -4.0 GB, HDD -1 TB, and OS -Windows 10.

A. Parameters
For simulation, a PowerDatacenter has been designed having the configuration as RAM -25 GB, MIPS per VM -1000 MIPS, Storage -1 TB, Bandwidth -50000 bps. Rest configurations of the heterogeneous cloud environment are depicted in Table I. The scheduling policy was set as Time Shared. The parameters of the CSO, the H-CSO and, the proposed HC-CSO are also summarized in Table I.

B. Cost Plan
The cost plan (in Indian Rupees) of workflow scheduling is depicted in Table II.   TABLE II. COST PLAN

C. Cloudlets
Cloudlets are called tasks to be submitted for execution on virtual machines. In this paper, the scientific workflows named CyberShake, Montage, Inspiral, and Sipht have been used to test the proposed algorithm along with others. Each workflow is having 1000 tasks. www.ijacsa.thesai.org

D. Performance Metrics
There are several performance metrics like Makespan, Processing Cost, Waiting Time, Response Time, Energy Consumptions, and Resources Utilization, etc. to test the performance of the algorithms. In this paper, makespan and cost are used to measure the performance of the proposed algorithm, and these parameters are given in the coming topics. [23] is referred to the maximum time taken for finishing the last task in a group. It is the most widely used metric to measure the performance of a scheduling algorithm. A lesser makespan decides that the algorithm is efficient enough. The makespan is computed by the given equation 5.

1) Makespan: Makespan
Makespan = max (CTi) ti∈tasks (5) Where, CTi is the completion time of taski 2) Processing cost: Processing Cost is along with makespan is another important performance metric because cloud service providers want to give efficient services at the minimum costs in this competitive environment. The processing cost can be measured by equation 6.    Table III. The results shown in Table III are the average of the results retrieved by the proposed algorithm HC-CSO along with other algorithms which were executed several times.
Simulation results of the proposed algorithm HC-CSO, H-CSO, and standard CSO for the scientific dataset Cybeshake_1000 are displayed in Fig. 3. Southern California Centre collected data and made the CybeShake workflow to analyze seismic hazards. For execution, this workflow requires almost near to lower CPU power and memory. It can be seen that at the x-axis 10, 20, and 30 VMs are being displayed and makespan at the y-axis. The results express that the HC-CSO outperforms other algorithms better local search after adding the Crow Search Algorithm and a better balance between seeking and tracing mode.
It is seen in Fig. 4 that the experiment carried out with the Montage_1000 dataset needs less memory and CPU power as compared to other workflows. The Montage dataset is having astronomical images collected and stored by NASA. The graph tells that the group of 10, 20, and 30, VMs as well as makespan, are shown at the x-axis and y-axis correspondingly. The results depiction tells the proposed algorithm HC-CSO performed efficiently as compare to other algorithms due to better convergence in less number of iterations.    which is related to the physics field and used to analyze the gravitational waves. This dataset needs high powered CPU and a large amount of memory for execution. The results concluding here that the proposed algorithm performed better than other algorithms because the HC-CSO algorithm has both capabilities of global and local searching as well as the algorithm also manage the velocities outrange. This capability of the proposed algorithm manages under-loaded and overload machines effectively by task migration. In this graph, the numbers of VMs are being displayed at the x-axis and makespan at the y-axis.   For displaying the 10, 20, and 30 sets of VMs and makespan, the x-axis and y-axis are being used respectively. The Sipht dataset is used to represent sRNA-encoding genes of several bacteria and it has been released by the HIB (Harvard International Bioinformatics) Centre. This dataset requires huge memory and a high computational CPU. For the processing of this dataset, the proposed algorithm HC-CSO again worked well as compared to standard CSO and the H-CSO because the proposed algorithm chooses the most appropriate VM instead of high or low power due to better properties of local as well as global searching and the HC-CSO also avoids unnecessary diversity. The proposed algorithm could not trap in local minima due to Crow Search Algorithm local searching property.
In the last, it is concluded that the HC-CSO algorithm is giving better results in the form of a better makespan for all scenarios in comparison to standard CSO, and H-CSO. For all the scenarios, HC-CSO works better than CSO and, H-CSO because of the better pre-processing of tasks by HEFT, a good balance between seeking and tracing modes due to the Crow Search algorithm. The HC-CSO algorithm along with the Crow Search Algorithm searches the VMs at a local and global level very carefully to optimize the results in the minimum number of iterations. The SMIW method restricts the Cats to go outside the search space.   For the first scenario having CybeShake_1000 dataset, a group of 10, 20, and 30 VMs along with processing cost are demonstrated at the x-axis and y-axis respectively in Fig. 7. The graph is expressing that the proposed HC-CSO algorithm is consuming minimal costs as compared to other algorithms. This is because the proposed algorithm has faster convergence and the positions of the workflows tasks on various VMs are updated smartly.   The processing cost results are being depicted in Fig. 9 for the Ispiral_1000 dataset. It can be seen VMs and Processing at the x-axis and y-axis respectively in the graph. The HC-CSO algorithm consumes less cost while executing the Ispiral_1000 dataset with all sets of VMs in comparison to other algorithms, this is because; the proposed algorithm makes an effective balance between seeking and tracing modes due to CSA integration. The VMs were picked up for execution of tasks irrespective of their MIPS, RAM, and bandwidth.  The VMs and processing cost can be seen at the x-axis and yaxis correspondingly in Fig. 10. Again, for the Sipht_1000 workflow; the proposed algorithm outperforms CSO and H-CSO in terms of computing cost. It is because the global and local searching properties of the proposed algorithm are balanced and the overloaded VMs loads are migrated to other VMs very smartly in the minimum time.
With these results, it can be specified that the proposed HC-CSO algorithm is better than other algorithms like CSO and H-CSO in respect of makespan and processing cost. The reason behind this is the good combination of global and local due to CSA. The Self-Motivated Inertia Weight factor integration overcomes the velocity outrange problem of Cats at tracing mode. The proposed algorithm chooses the virtual machines which are idle, under-loaded, or overloaded for workflow tasks migration among different VMs irrespective of their computing power, RAM, and Bandwidth.

VI. CONCLUSION
In the current technological era, cloud computing is one of the important emerging technologies used to store a large volume of data and other computing services in various science and technological fields. For computing facilities, various heuristic and meta-heuristic techniques have been developed. In this paper, an intelligent workflow scheduling algorithm named HC-CSO has been proposed to solve the workflow task scheduling problem. The proposed algorithm is a combination of H-CSO and the Crow Search algorithm. The H-CSO algorithm is an integration of HEFT and SMIW methods. The HEFT algorithm pre-processed the workflows tasks and initialized the proposed HC-CSO algorithm with these tasks. This process saves time and optimizes the results in a fewer number of iterations. The proposed HC-CSO algorithm didn't get trapped in local optima due to the good local searching capacity of the CSA. Velocity outranges cause to push the Cats outside the search space and affects the performance of the algorithm. The Self-Motivated Inertia Method overcame this problem.
The proposed HC-CSO algorithm outperformed CSO and H-CSO in terms of makespan and computing cost with all four scenarios having four scientific datasets CyberShake, Montage, Inspiral, Sipht, on a group of 10, 20, and 30 VMs. This is because the proposed algorithm chose the best VM among a group of VMs using its perfect global and local searching capacity for workflows task scheduling. The proposed approach is a generalized algorithm and will perform well with all types of scientific datasets despite a particular one.
In the future, the proposed algorithm can be tested at a wide scale to reduce makespan, cost and, other parameters in the cloud system and many other fields. The efficiency of the proposed algorithm can also be tested for independent tasks. A new technique can also be developed to enhance cloud performance.