Comparison of Workflow Scheduling Algorithms in Cloud Computing

— Cloud computing has gained popularity in recent times. Cloud computing is internet based computing, whereby shared resources, software and information are provided to computers and other devices on demand, like a public utility. Cloud computing is technology that uses the internet and central remote servers to maintain data and applications. This technology allows consumers and businesses to use application without installation and access their personal files at any computer with internet access. The main aim of my work is to study various problems, issues and types of scheduling algorithms for cloud workflows as well as on designing new workflow algorithms for cloud Workflow management system. The proposed algorithms are implemented on real time cloud which is developed using Microsoft .Net Technologies. The algorithms are compared with each other on the basis of parameters like Total execution time, Execution time for algorithm, Estimated execution time. Experimental results generated via simulation shown that Algorithm 2 is much better than Algorithm 1, as it reduced makespan time.


INTRODUCTION
Cloud computing is Internet-based computing, whereby shared resources, software and information are provided to computers and other devices on-demand, like a public utility.Cloud computing is a technology that uses the internet and central remote servers to maintain data and applications.Cloud computing allows consumers and businesses to use applications without installation and access their personal files at any computer with internet access.This technology allows for much more efficient computing by centralizing storage, memory, processing and bandwidth.

A. Workflows
The WfMC (Workflow Management Coalition) defined a workflow as "the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules." WfMC published its reference model in [1], identifying the interfaces within this structure which enable products to interoperate at a variety of levels.This model defines a workflow management system and the most important system interfaces (see Fig 1). 1) Workflow Engine.A software service that provides the run-time environment in order to create, manage and execute workflow instances.
2) Process Definition.The representation of a workflow process in a form which supports automated manipulation.
3) Workflow Interoperability.Interfaces to support interoperability between different workflow systems.
4) Invoked Applications.Interfaces to support interaction with a variety of IT applications.5) Workflow Client Applications.Interfaces to support interaction with the user interface.
6) Administration and Monitoring.Interfaces to provide system monitoring and metric functions to facilitate the management of composite workflow application environments.
It can be seen that scheduling is a function module of the Workflow Engine(s), thus it is a significant part of workflow management systems.
The rest of the paper is structured as follows: Related work is discussed in Section II.Then section III describes our Proposed Work.The Implementation is presented in Section IV.And Section V will show the experimental details and simulation results.Finally Section VI includes the future scope of our research work.

A. Cloud Platforms
A comprehensive survey of cloud computing is defined by number of researchers.There are no. of definitions of cloud

Invoked
Applications www.ijacsa.thesai.orgcomputing.According to R. Buyya and S.Venugopal [5] Cloud computing is defined as " a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and consumers".
Sun Microsystems [3] takes an inclusive view that there are many different types of clouds like public cloud, private cloud, hybrid cloud .Many different applications that can be built by using these different clouds.
Recently, several academic and industrial organizations have started investigating and developing technologies and infrastructure for Cloud Computing.

B. Workflow Management Systems
Workflow is concerned with the automation of procedures whereby files and data are passed between Participants according to a defined set of rules to achieve an overall goal.A workflow management system defines, manages and executes workflows on computing resources.Workflow Scheduling: workflow scheduling is a kind of global task scheduling as it focuses on mapping and managing the execution of interdependent tasks on shared resources that are not directly under its control.Workflow management includes five dimensions: time, cost, fidelity, reliability and security.
The related work done in workflow management system is shown below in tabular form (see Table III): This paper initiates discussion by contributing a concept which achieves secutrity merits by making use of multiple distinct clouds at the same time [22].

III. PROPOSED WORK
This section presents a set of scheduling algorithms, based on Time management [23].The aim of the algorithms is to optimize the makespan, which is defined as the maximum time taken for the completion of all the tasks in a given application.The proposed algorithms are implemented using a service based cloud and comparative results are shown.
The problem of scheduling a set of tasks to a set of processors can be divided into two categories:  Job scheduling  Job mapping and scheduling In the former category, independent jobs are to be scheduled among the processors of a distributed computing system to optimize overall system performance.In contrast, the mapping and scheduling problems requires the allocation of multiple interacting tasks of a single parallel program in order to minimize the completion time on the parallel computer system.www.ijacsa.thesai.orgTo generate the schedule, our technique is based on the traditional list scheduling approach in which we construct a list and schedule the nodes on the list one by one to the processors.

A. Algorithm 1
The design of our algorithm 1 is basis on the following heuristics.It is based on the POSEC method [23]  It is assumed that job consist of tasks.The cloud scheduler assigns these tasks to resources.Also it is assumed that each computational resource can run one application at a time, and must run that application to completion.
Let T be a set of n tasks and m is the number of computational resources in a cloud.We define a schedule of T as follows: A schedule S of T onto a cloud with m resources is a finite set of tuples<v, p, t> where v is the schedule, t is the starting time, and p is the resource.
To generate the schedule, our technique is based on the traditional list scheduling approach in which we construct a list and schedule the nodes on the list one by one to the processor.The list is constructed by ordering the jobs according to their urgency score s.The list is static therefore the order of nodes on the list will not change during the resource allocation process.
We restrict ourselves to non-preemptive schedules where a job once started has to run to completion on the same machine.
Scheduler has information about all resources such as processing speed (in MIPS), processing cost per second, baud rate(communication rate) and resource load during peak hours and off peak hours.
After gathering the details of user jobs, the system calculated the importance score.The jobs are executed on the values of urgency and importance score.
The time management parameters used by the algorithms are: 1) Total Execution Time: The total time consumed by the algorithm to execute all the jobs.
2) Execution Time of Algorithm: This is the time taken by the algorithm to execute.
3) Estimated Execution Time: Based on the average of total execution time parameters of previous jobs.
The proposed algorithm comprises of two parts as explained below.
A. Task Ordering Procedure, to get the schedule list B. Resource Allocation Procedure, which allocates resources to the jobs contained in scheduling list, generated by task ordering procedure.

a) Task Ordering Procedure Begin
Step 1: The list is initialized to be an empty list.The cloud clients calculate the urgency score according to the severity of jobs.
Step 2: The urgency score is calculated.The urgency score is based on the scale of 10.Step 3.According to the urgency score , the alert templates have set up b) Resource Allocation Procedure: Begin Step 1: The Cloud Scheduler collects resources and their characteristics like processing speed, processing cost per second, resource load during peak hours and off peak hours.
Step 2: It generates importance score according to these characteristic on the scale of 10.
Step 3: After collecting the information about the job parameters like urgency and score, the jobs are executed according to the following case.

If Urgency High, Importance High:
The email alert sent immediately.
If Urgency High, Importance low: whenever the resources are free, the email is sent on high priority basis.

If Urgency Low, Importance High:
The email alert is sent after emptying the job Queue.
If Urgency low, Importance low: whenever the resources are free, and the job queue is empty, the email is sent with lower priority basis.

B. Algorithm 2
The second algorithm is based upon the Pareto Analysis [23].

Pareto Analysis
This is the idea that 80% of tasks can be completed in 20% of the disposable time.The remaining 20% of tasks will take up 80% of the time.This principle is used to sort tasks into two parts.According to this form of Pareto analysis it is recommended that tasks that fall into the first category be assigned a higher priority.
The 80-20-rule can also be applied to increase productivity: it is assumed that 80% of the productivity can be achieved by doing 20% of the tasks.If productivity is the aim of time management, then these tasks should be prioritized higher.
For example, look at your to do list-if you have 10 tasks on there then two of those tasks will yield 80% of your results.Alternatively, 80% of income is owned by 20% of people -it works both ways!The Pareto principle holds across business, academia, politics, and a number of other areas.The foundation of this time management skill is that: 20% of tasks yield 80% of results This algorithm is also comprised of two parts.
1. Task Ordering Procedure, to get the schedule list 2. Resource Allocation Procedure, which allocates resources to the jobs contained in scheduling list, generated by task ordering procedure.

a) Task Ordering Procedure Begin
Step 1: The list is initialized to be an empty list.The cloud clients send the jobs to the cloud manager according to their priority.
Step 2: The urgency score is calculated.The urgency score is based on the scale of 10.Step 2: According to the importance score and urgency score, calculated, the jobs are executed by the cloud manager.
If Urgency High, Importance High: The email alert sent immediately.
If Urgency High, Importance low: whenever the resources are free, the email is sent on high priority basis.

If Urgency Low, Importance High:
The email alert is sent set after emptying the job Queue.
If Urgency low, Importance low: whenever the resources are free, and the job queue is empty, the email is sent with lower priority basis.

IV. IMPLEMENTATION
The management and scheduling of resources in Cloud environment is complex, and therefore demands sophisticated tools for analysis the algorithm before applying them to the real system.But there are no good tools are available that serve our needs.So we develop a service based cloud using Microsoft .Net Technologies.The proposed algorithms are implemented upon this real time cloud.Cloud Architecture:

Makespan comparison of Algorithm 1 and Algorithm 2:
The following Graph shows the comparison of makespan time between the two proposed algorithms.

VI. FUTURE SCOPE
We would like to extend these algorithms to include various parameters like options for advance reservation, preemptive jobs as well.Also, in the Future we can add more clouds to this main cloud, to distribute the load work.Currently this cloud provides services like email alerts, we can also extend to store online data and providing the synchronization mechanism in this.

There are 4 Step 3 .
cases to determine the urgency score of the job.According to the urgency score , the alert templates have set up b) Resource Allocation Procedure: Step 1: From the previous set of jobs, importance score is calculated.The score is based upon the following method: Let t = (time to execute jobs/estimated time) % If t = 100, then the cloud resources are utilized properly.A high score of importance is send by the algorithm.It t <100 and t>80, the cloud manages resources are overloaded; but according to 80:20 rule by paleto, a high importance score is generated.If t<80, a low importance score is generated.If t> 100, the cloud manager resources are underutilization, a high importance score is generated by the cloud.

Feature
of This cloud: www.ijacsa.thesai.orga) To send real time email alerts to the cloud clients members like Bank, Insurance, and Hospital etc. b) The algorithms are tested on real time cloud.c) Google's SMTP server is used to send the mails.d) The Database is saved on the Web server.e) The cloud is working online.You need no special software to test the results.f) Visual studio 2008 is used as frontend and SQL 2005 is used as Backend.

Fig. 2
Fig. 2 Architecture of service based cloud The cloud architecture is based upon the real time email alert system.It sends email alerts to its cluster client members.Feature of the cloud architecture are: a) This scenario based cloud has real life application of sending email alerts to Bank clients, Hospital clients, and Insurance company clouds.This cloud takes jobs from the all other clients with their urgency score.The cloud manager executes the jobs according to the importance score based on cloud resources.b) The data Flow between all the clouds is using XML.c) XML is hardware and software free technology d) It's widely suited for cloud application.e) There are 4 domains used in this service based cloud.f) Moreover it's three tier architecture, the database is stored on the other server and web services execute on the other server.g)FileZilla client FT\P application is used to upload/download data from the server.V. EXPERIMENTAL RESULTSThis section describes the experiment results obtained after implementing the scheduling algorithms.The algorithms are implemented in Microsoft .Net framework using a service based cloud.It takes as input the required set of resources and a set of tasks.The algorithms are compares with each other on set of parameters like Total execution Time, Execution time for

Fig. 4
Fig. 4 Line Chart results of Algorithm 2

Fig. 5
Fig. 5 Pie Chart with Makespan Comparison . POSEC is an acronym for Prioritize by Organizing, Streamlining, Economizing and Contributing.The objective of our algorithms is efficient time management and load balancing.High Urgency & High Importance There are Four Quadarnts of Descion Making : It needs two types of Priority Scores to take descion , Urgency Score and Importance Score.Urgency Score given by Cluster Member of cloud.Importance Score is given by Cloud Resources Manager .Urgency Score is Calculated on the scale of 10 on the basis of the following table.
Importance Score is Calculated by the Resource manager and its also on the scale of 10.The various parameter of resource cheking are CPU time.Threades etc. we have use resource monitor program to generate the importance score.High Importance means the Resources are available.Low Importance means based the Resources are Not available .

TABLE V EXPERIMENTAL
RESULTS OF ALGORITHM 1