HPSOGWO: A Hybrid Algorithm for Scientific Workflow Scheduling in Cloud Computing

Virtualization is one of the key features of cloud computing, where the physical machines are virtually divided into several virtual machines in the cloud. The user’s tasks are run on these virtual resources as per the requirements. When the user requests the services to the cloud, the user’s tasks are allotted to the virtual resources depending on their needs. An efficient scheduling mechanism is required for optimizing the involved parameters. Scientific workflows deals with a large amount of data with dependency constraints and is used to simplify the applications in the diverse scientific domains. Scheduling the workflow in cloud computing is a well-known NP-hard problem. Deploying such dataand compute-intensive workflow on the cloud needs an efficient scheduling algorithm. In this paper, we have proposed a multi-objective model based hybrid algorithm (HPSOGWO), which combines the desirable characteristics of two well-known algorithms, particle swarm optimization (PSO), and grey wolf optimization (GWO). The results are analyzed under complex real-world scientific workflows such as Montage, CyberShake, Inspiral, and Sipht. We have considered the two essential parameters: total execution time and total execution cost while working in the cloud environment. The simulation results show that the proposed algorithm performs well compared to other state-of-the-art algorithms such as round-robin (RR), ant colony optimization (ACO), heterogeneous earliest time first (HEFT), and particle swarm optimization (PSO). Keywords—Cloud computing; hybrid algorithms; metaheuristic algorithms; optimization; workflow scheduling


I. INTRODUCTION
Cloud Computing is a buzzing word from decades in computer science as it offers advancements like hiding and abstraction of complexity, visualized resources, and efficient use of distributed resources. Few well-known cloud computing platforms are Amazon EC2, GoGrid, Google App Engine, Microsoft Azure, etc. [1] [2]. The services of cloud computing can be classified into Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) [3].
The tasks are dependent on each other in the workflow. The workflow can be represented using Directed Acyclic Graph (DAG) [4], where the nodes in the DAG represent the tasks (T), and the edges (E) joining the nodes represent the dependency between the tasks. A sample workflow is shown in Fig. 1, containing eight tasks {T 1, T 2, T 3, T 4, T 5, T 6, T 7, T 8}. The tasks T 1 and {T 4, T 6, T 7, T 8} are the entry and exit tasks, respectively. Each edge of the DAG shows the dependencies between the tasks. For example, T 2 is executed after T 1 which is shown by the paired set {T 1, T 2}. A scientific workflow is a specialized form of the workflow which is used in various scientific domains like astrology, bio-informatics, gravitational waves, etc. [5]. Pegasus project published some of the realistic scientific workflows like Montage, CyberShake, Epigenomics, LIGO, and SIPHT [6] [7]. The structures of these workflows are shown in Fig. 2.
The workflow scheduling can be considered as a mapping function where several dependent tasks are mapped to several available virtual machines [9]. Suppose, m number of tasks maps to the n number of virtual machines; then, n m combinations which are possible if the brute force algorithm is used. So, the workflow scheduling is a complex problem, and the solution is not found in polynomial time [10]. It is good to find a near-optimal solution to the workflow scheduling problem with a meta-heuristic algorithm.
Many meta-heuristic optimization algorithms have been used to solve workflow problems in cloud computing. Genetic Algorithm (GA) is used in workflow scheduling to minimize the makespan [11]. GA algorithm is robust and generates a high-quality search in polynomial time but takes a bit more time to find the solution. Pandey et al. [12] used Particle Swarm Optimization (PSO) to schedule workflow applications in a cloud computing environment. PSO is a fast optimization algorithm but has a problem such as earlier convergence and trapping in local optimal solution [13]. Grey Wolf Optimization (GWO) is the recent proposed meta-heuristic algorithm that mimics grey wolves' leadership hierarchy [14]. Khalil and Babamir [15] offered the extended version of Grey Wolf Optimizer for solving the workflow problem. GWO reduces the probability of being trapped in the local optimal solution. By combining two or more algorithms considering their strengths, one can overcome the aforementioned issues of algorithms. In this paper, we proposed a hybrid algorithm combining Particle Swarm Optimization (PSO) and Grey Wolf Optimize (GWO), named HPSOGWO. The HPSOGWO is tested on the scientific workflow like montage, cybershake, inspiral, and sipht to optimize total execution cost and time. In the next section, we review some of the scheduling algorithms used in cloud computing.

II. RELATED WORK
Workflow is more popular among the scientists in which a complex scientific process is modeled into small tasks [16]. These tasks can be executed on parallel and distributed computing like cloud computing. Workflow scheduling is a well-known NP-hard problem in cloud computing. Several list based heuristics have been proposed for task scheduling to optimize the performance of cloud computing like first come first serve (FCFS), round-robin (RR), shortest job first (SJF), minimum completion time (MCT), etc. The basic idea of list-based heuristics is to assign a priority to each task and allot to the available resources as per given preferences. The Heterogeneous Earliest Finish Time (HEFT) was designed for heterogeneous multiprocessor systems. Dubey et al. [17] proposed a modified version of HEFT, capable of reducing the makespan time compared to existing HEFT and Critical Path on a Processor (CPOP).
In the Min-Min algorithm, the minimum execution time task is mapped to the machine with minimum completion time [18]. A similar algorithm is Max-Min algorithm where in the task with maximum execution time is assigned to the machine, which takes minimum completion time. The Min-Min and Max-Min are offline scheduling that work in batch mode, which means the tasks are not allocated to the resources as they enter [19]. Suffering from starvation is the drawback with Min-Min and Max-Min algorithms [20]. Besides, they consider only the time as a resource quality. The list-based heuristics concentrate only on the user perspectives; they are less focused on the resource quality parameters.
These aforementioned conventional heuristics algorithms are simple, easy to implement, and fast, but for further improvement in the quality of solution and to achieve the optimum results for complex problems like workflow scheduling, the meta-heuristic approaches can find the near-optimal solution [21]. In addition, the heuristic algorithms are problemdependent techniques, whereas the meta-heuristic methods are problem-independent techniques. The meta-heuristic algorithms have been widely used due to its simplicity and strong searching power in less time and cost. Many of the metaheuristics approaches were proposed for solving the workflow problem. Some of the standard algorithms are the Genetic Algorithm (GA), Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO).
Dasgupta et al. [1] proposed a genetic-based algorithm used for task scheduling problem. The experiment results show better performance in terms of makespan when compared with First Come First Serve (FCFS), Round Robing (RR), and a local search algorithm Stochastic Hill Climbing (SHC). The GA algorithm was reported to be a time-consuming algorithm for reaching the optimum solutions [22].
Tawfeek et al. [23] used the Ant Colony Optimization (ACO) approach for task scheduling to minimize the makespan and found that ACO performed better than FCFS and RR. However, ACO is a very complex algorithm and takes a long time to get optimal results [24]. The dependency of tasks is not been considered.
Particle swarm optimization (PSO) is one of the popular meta-heuristic algorithms. It is simple to implement and has fast convergences. Despite its advantages, it gets trapped in local optimum for the complex problems [25].
Meta-heuristic algorithms are characterized by exploitation and exploration abilities [26]. Exploitation means that the algorithm is very successful in performing local searches. Exploring means that the algorithm is useful to find out the initial solution, which may be near to the global optimum. A good meta-heuristic algorithm balances the exploration and exploitation abilities. Particle Swarm Optimisation has high exploration ability but is low in exploitation ability. The grey wolf optimizer is proposed by Mirjalili et al. [14] and has a right balance between exploration and exploitation abilities.
A single meta-heuristic might not get the optimal solution and may stuck into the local optimal solution for complex problems like scientific workflow scheduling. It is a better approach to combine one or more meta-heuristic algorithms based on their best characteristics. In the last few decades, hybrid algorithms have become popular. Here, we discuss only those existing algorithms which are either hybrid with PSO or GWO. Manasrah and Ali [8] proposed a hybridization of the Genetic Algorithm and Particle Swarm Optimization (GA-PSO) algorithm. The hybrid GA-PSO algorithm reduces the total execution time compared with GA, PSO, and other algorithms. Another hybrid algorithm has been reported in [27], which was the hybrid version of the PSO and gravitation search algorithm (GSA). This hybrid algorithm performs well compared with some non-heuristics, PSO, and GSA algorithms in terms of cost. The hybridization of the Grey Wolf Optimization (GWO) and Genetic Algorithm (GA) was proposed by Bouzary and Frank [28] and they found that the proposed algorithm was superior than GWO and genetic algorithm (GA) cost wise. Khurana and Singh [29] have introduced a hybrid of flower pollination algorithm and GWO to reduce the cost and time and give efficient results compared to flower pollination with genetic algorithm.
Despite the advantages of the aforementioned hybrid algorithms, one might ask the motivation behind the proposed algorithm. The answer lies in the free lunch theorem [30]. The free lunch theorem specifies that the single algorithm is not fit for solving all the optimization problems. It might perform better for a particular optimization problem, but it may not perform well for the other optimization problems. There is no universal solution to optimization problems.

III. BACKGROUND
In this section, we discuss the standard PSO and GWO algorithm used in the designing the proposed algorithm. The fitness function used in the proposed algorithm is also explained in this section.

A. Fitness Function
The fitness function described the targeted objectives to be optimized using the proposed scheduling algorithm [31]. There are two approaches to make a fitness function multiobjective: priori and posteriori [32]. In the priori approach, each involved objective is assigned a weight, as per their importance, to make a single-valued function, also known as fitness value. Whereas, the set of non-dominant solutions is found in the posteriori approach. Here, we follow the priori approach to design the fitness function. The fitness function is the composition of total execution cost (TEC) and total execution time (TET). Mathematically, the considered fitness function can be represented using the equation 1.
where T ET and T EC are the total execution time (makespan) and total execution cost respectively. α 1 and α 2 are the weight assigned for each objectives. Here, we consider similar weight to α 1 and α 2 that is 0.5. The complete description of total execution time (makespan) and total execution cost is explained in the following sub-sections:-1) Total execution time (makespan): The total execution time (makespan) is the maximum completion time taken by tasks in the workflow. In other words, makespan is the time required for finishing all the tasks allotted to different virtual machines [25]. Mathematically, the makespan of the workflow can be derived using equation 2.
where CT i is the completion time of the task T i in the workflow. The completion time is the total execution time of the tasks. In case tasks are dependent, then the waiting time of predecessor tasks is also considered. The completion time CT i is depicted in equation 3.
The waiting time of task T i is the maximum completion time of all the predecessor tasks of workflow as shown in equation 4.
The execution time of the task T i on virtual machine V M j is calculated using equation 5, where SZ T ask is the size of task T i in million instruction (MI), N um(P E j ) is the number of core assigned to the virtual machine V M j , P E U nit is the size of each core in MIPS.

2) Total execution cost:
The cost is a prominent objective to be optimized as cloud computing follows a pay-as-yougo billing scheme [33]. Major of the cloud service providers charges for some specific time interval based on the cloud services used. Cost in cloud computing involves execution cost, communication cost, and storage cost. The total execution cost of VM is the cost charged of VM per unit interval and the execution time of tasks on that VM. Mathematically, the total execution cost (TEC) of workflow W is shown in equation 6 [34].
where CO j is the cost of type-i VM instance for a unit time in the cloud data center. τ is the time period for which the resources are used by the user. ET i,j is the execution time of task T i by type-j VM instance .

B. Particle Swarm Algorithm
Kennelly and Eberhat proposed the Particle Swarm Optimization (PSO) technique in 1995 [35]. It is a meta-heuristic technique based on the social behavior of the swarm of birds or particles. Each particle represents a solution for the problem and searches the optimal solution in the problem space. The particle is characterized by its position and velocity. In every iteration, the position and velocity are updated and moves towards the optimal results. PSO consists of the following stages:-1) Evolve gbest and pbest of the Particles: In Particle Swarm Optimisation, each particle represents a solution and in each generation of particle it produces the global best particle denoted by gbest and the personal best particle is denoted by pbest. The selection of pbest and gbest particles are determined by their fitness values.
2) Update Position and Velocity: Position and velocity of the particle are influenced by the personal best (pbest) and the global best particle (gbest).
Equation 7 represents the velocity of the i th particle at the t iteration. The C 1 and C 2 are coefficient and w is the inertia weight. The initial values of the coefficient are given in Table II. r 1 and r 2 are the random numbers between 0 and 1.
The position x of the i th particle for t th iteration is updated as the equation 8.

C. Grey Wolf Algorithm
Mirjalili et al. [14] proposed Grey Wolf Optimization (GWO) technique which mimics the hunting behaviour and leadership hierarchy of grey wolf. According to the mathematical model of the GWO, there are four types of wolves that are alpha (α), beta (β), delta (δ) and omega (Ω). Each wolf represents a solution. The alpha wolf represents the best solution. The second best solution and the third best solutions are represented by beta and delta wolves respectively and all the other rest solutions are known as omega wolf. GWO algorithm is composed of steps shown in Sections III-C1, III-C2, and III-C3.
1) Encircling prey: The grey wolf encircle prey during the process and this can be mathematically modelled using the equations 9 and 10.
The position of the wolf is updated using equations 9 and 10 for the current iteration t, where X p is the position of prey and X is the position of wolf. A and C are the coefficient vectors and calculated using equations 11 and 12 respectively.
The values in the random numbers r 1 and r 2 in equations 11 and 12 are in the range from 0 to 1 and the value of variable a is linearly decrease from 2 to 0 and is calculated using equation 16 2) Haunting: The alpha wolf (best solution) guides the hunting process in GWO. Equations 13, 14 and 15 are used to update the position of the best search agents.
In equation 14, X(t) is the position vector of grey wolf. X 1 , X 2 , and X 3 are position vectors of alpha, beta and delta wolves respectively.
3) Attacking Prey: The grey wolf attacks the prey until it stops moving. Mathematically, we decrease the value of a in each iteration. The controlling parameter a is defined in equation 16, where t is current iteration and N is the maximum number of iteration. www.ijacsa.thesai.org

IV. PROPOSED ALGORITHM
The complete description of the proposed algorithm is given in this section. The proposed algorithm, named HP-SOGWO, is the combination of Particle Swarm Optimization and Grey Wolf Optimization. The basic idea of the HPSOGWO algorithm is to run the PSO algorithm for the first half of the total iterations and the best solution generated by the PSO (gbest) is initialized to the alpha wolf (α-wolf) and for the latter half of the total iterations run GWO algorithm. The best solution generated by the GWO is stored in alpha wolf and considered to be the best mapping of tasks and VMs. The complete algorithm is shown in Algorithm 1 and the major steps are shown in Fig. 5. The initial or range of the various parameters along with the explanation used in the proposed algorithm is shown in Table II.

A. Encoding the Scheduling Problem
The first step in applying the algorithm is to model the workflow scheduling problem. As discussed earlier, the scheduling problem is considered as a mapping between the user's tasks and virtual machines. The solution (particle and wolf) in the proposed algorithm can be represented using an array (or list). The array's index represents the tasks, and the value in the array represents the assigned VM. The similar encoding is used in [36] and it helps in the reduction of the complexity of the algorithm.
An example of the encoding of the solution is shown in Fig. 3. The array index represents the eight tasks (T1 to T8), and the value at each index represents a virtual machine or instance id. For Solution 1, the tasks T1 is allocated to VM1, T6 is assigned to VM2, T5 and T7 are assigned to VM3, T3 and T8 are allocated to VM4, and T2 and T4 are allocated to VM5. Similar is the case with Solution 2. This encoding does not deal with the precedence constraint of the workflow. For example, the tasks T1, T3, and T8 are assigned to VM2 which does not mean that T1 is executed first. In other words, it does not depicts the precedence among the tasks.

B. Initialize the Population
The HPSOGWO algorithm has a specific number of iterations; in our case, 500. The set of the solutions (particles) is known as the population. In the first iteration, the population is initialized with a random solution. The solution is improved with each iteration of the algorithm. A random initialization of population is illustrated in Fig. 4

C. Evaluation of the Fitness Function
The algorithm begins with the calculation of execution time and assign these values into the execution time matrix as shown in equation 17. Each element value represent the execution time, for example ET 1,1 is the execution time of task T 1 on V M 1 . The value of execution time in the matrix is calculated using equation 5.
The dependency of the tasks in a workflow can be represented using task dependency matrix (TD-Mtx) as shown in equation 18. Each element in the matrix is either 1 or 0. Suppose the value of d 1,2 is 1, then task T 2 is executed after task T 1 .
According to these matrices, we evaluate total execution time and total execution cost and the fitness function of each solution as mentioned in Section III-A.

D. Applying PSO Algorithm
The PSO algorithm starts with random population, and run for n/2 iterations, where n is the maximum iterations. PSO keeps track of the personal best (pbest) position and global best (gbest) position of the particle in each iteration. The updated position of the particles is influenced by gbest and pbest particle in each iteration to reach to the global best solution (gbest). The complete process is mentioned in Section III-B.

E. Applying GWO Algorithm
The best solution (α-wolf) of GWO algorithm is initialized with the best solution (gbest) obtained from the PSO algorithm. Then, we apply GWO algorithm for latter half of the iterations (n/2 + 1 to n) as mentioned in Section III-C. The α-wolf leads all other wolves to the better solution in every iteration of GWO algorithm. After meeting the stopping criteria, the optimal solution is present in α-wolf. And the tasks are assigned to the respective VM as suggested by the α-wolf. Randomly initialize the position and velocity of each particle. 5: Calculate the fitness value of each particle according to equation 1. 6: If the fitness value of the current particle is better than pbest particle, set the current particle as new pbest. 7: After Steps 5 and 6 for all the particles, select the global best solution as gbest, among the pbest particle.

A. Experimental Setup
The proposed algorithm (HPSOGWO) is executed under four scenarios to evaluate the total execution time and cost of the fitness function. The HPSOGWO algorithm is compared with the round-robin (RR) [37], ant colony optimization (ACO) [38], heterogeneous earliest time first (HEFT) [39], and particle swarm optimization (PSO) [12] algorithms. The four scenarios includes CyberShake, Montage, Inspiral, and Spith scientific workflows. These workflows are available with different tasks; for example, CyberShake is available with 30, 50, 100, and 1000 tasks. All experiment are carried out on a computer with Intel(R) Core i5-5200U CPU at 2.2.GHz, 4.00 GB of RAM, Windows 8 Pro 64-bit operating system. To simulate and evaluate the proposed algorithm's performance, we used the WorkflowSim-1.1 toolkit [40], which is an extension of CloudSim. Table I shows the simulation parameters used during the evaluation of the algorithm and Table II shows the initial values or the range of the values of different parameters used in the proposed algorithm.

B. Simulation Results
In this section, we do the performance comparison of the proposed algorithm, HPSOGWO, with the RR, ACO, HEFT, and PSO algorithms. The performance is measured in terms of total execution time (TET) and total execution cost (TEC) with the increasing number of tasks in the ranges from 25 to 1000 under four well known workflows: CyberShake, Inspiral, Montage, and Sipht. The maximum number of iterations were set to 500. Each scenario is executed 10 times and the average value of the result is considered. The simulation results are tabulated in Tables III, IV, and V. 1) Performance evaluation under CyberShake workflow: Fig. 6 shows the simulation results of the RR, ACO, HEFT,   As depicted in Fig. 9, there is an improvement in the TEC by 114.22%, 27.70%, and 28.03% in the HPSOGWO compared to HEFT, ACO, and RR respectively, while the deterioration of 0.08% in performance is reported compared to PSO, for 30 tasks. When there are 50 tasks, the proposed algorithm's performance is improved by 22.02%, 20.34%, and 16.72% compared to HEFT, ACO, and RR, respectively, while deterioration of 0.12% in performance is observed compared to PSO. For 100 tasks, the TEC of HPSOGWO is declined by 13.33%, 29.16%, and 3.38% compared to HEFT, ACO, and RR, respectively. Also, the performance of HPSOGWO is decreased by 2.15% compared to PSO. For 1000 tasks, the HPSOGWO outperformed compared to other compared algorithms. There is 0.62%, 14.23%, 4.30%, and 3.12% of improvement in TEC compared to PSO, HEFT, ACO, and RR, respectively.
3) Performance evaluation under Sipht workflow: Fig. 10 shows the total execution time of different algorithms under sipht workflow with a different number of tasks. The performance of the proposed algorithm for 30 tasks is better than that of HEFT and ACO, but compared to PSO and RR, it decreases a little. For 60 and 100 tasks, HPSOGWO performed well compared to other algorithms. For 1000 tasks, the performance is better than HEFT, ACO, and RR, but a slight increment of 3.61% in TET is observed compared to PSO.
The comparison among the total execution cost of different algorithms under sipht workflow is shown in Fig. 11. The proposed algorithm outperforms the other algorithms for all the cases with up to 190.97% of decrements in TEC. Except for 1000 tasks, an increment of 3.57% in TEC is observed in the proposed algorithm compared to the PSO algorithm.

4) Performance evaluation under Montage workflow:
The performance in terms of total execution time and total execution cost with 25, 50, 100, and 1000 tasks under montage workflow are shown in Fig. 12 and Fig. 13 respectively. The proposed algorithm outperforms in all the cases compared to the ACO algorithm, with up to 7.12% reduction in TET. For 25 and 100 tasks, the declines of 9.69% and 2.06% are noticed in total execution time compared to PSO. While for other cases, the proposed algorithm did not perform well. The TEC is reduced for HPSOGWO compared to PSO, ACO, and RR. The HPSOGWO does not perform well compared to HEFT, with up to 4.13% of increment in TEC.

VI. CONCLUSION
A novel hybrid meta-heuristic algorithm based on a multiobjective model called HPSOGWO is proposed in the present paper. The proposed algorithm is the hybrid version of Particle Swarm Optimisation (PSO) and Grey Wolf Optimisation (GWO) algorithms. The objectives of the proposed algorithm are to optimize the total execution cost and total execution time. The HPSOGWO algorithm is tested on the four scientific workflows: Montage, CyberShake, Inspiral, and Sipht with different number of tasks. The experimental results shows that the proposed algorithm reduces the total execution time and cost compared to PSO, HEFT, ACO, and RR algorithms. In future work, some other parameters like total energy consumption, load balancing, response time, etc. will be considered for the evaluation purpose. The other algorithms can be considered for making the new hybrid algorithm and evaluate under the same parameters.