A Leveled Dag Critical Task Firstschedule Algorithm in Distributed Computing Systems

In distributed computing environment, efficient task scheduling is essential to obtain high performance. A vital role of designing and development of task scheduling algorithms is to achieve better makes pan. Several task scheduling algorithms have been developed for homogeneous and heterogeneous distributed computing systems. In this paper, a new static task scheduling algorithm is proposed namely; Leveled DAG Critical Task First (LDCTF) that optimizes the performance of Leveled DAG Prioritized Task (LDPT) algorithm to efficiently schedule tasks on homogeneous distributed computing systems. LDPT was compared to B-level algorithm which is the most famous algorithm in homogeneous distributed systems and it provided better results. LDCTF is a list based scheduling algorithm which depends on sorting tasks into a list according to their priority then scheduling one by one on the suitable processor. LDCTF aims to improve the performance of the system by minimizing the schedule length than LDPT and Blevel algorithms. Keywords—Task scheduling; Homogeneous distributed computing systems; Precedence constrained parallel applications; Directed Acyclic Graph; Critical path


I. INTRODUCTION
Distributed systems have emerged as powerful platforms for executing parallel applications.A distributed system can be defined as a collection of computing systems that appears to its users as a single system, these systems collaborate over a network to achieve a common goal [1].There are two types of distributed systems; homogeneous (in which processors are identical in capabilities and functionality) and heterogeneous (in which processors are different).
In distributed computing environment, an application is usually decomposed into several independent and/or interdependent sets of cooperating tasks.Dependent tasks are represented by a Directed Acyclic Graph (DAG).DAG can be defined as a graph consists of a set of vertices or nodes and a set of edges G(V, E) in which each node represents a task and each edge represents a communication between two tasks (the two tasks are dependent on each other).The computation cost of the task is represented by a weight associated with each node and the communication cost between two tasks is represented by a weight associated with each edge.The communication cost between two dependent tasks is considered to equal zero if they are executed on the same processor.Figure 1 shows an example of a simple task graph (DAG).In the Figure, t0 is called predecessor (or parent) of t2 and t2 is called successor (or child) of t0.The edge between t0 and t2 means that t2 can start execution only after t0 finishes its execution.Efficient task scheduling of application tasks is essential to achieve high performance in parallel and distributed systems.The basic function of task scheduling is to determine the allocation of tasks to processors and their execution order in order to satisfy the precedence requirements and obtain minimum schedule length (or make span) [2].Taskscheduling algorithms are broadly classified into two basic classes: static and dynamic.In static scheduling, the characteristics of an application, such as execution time of tasks and data dependencies between tasks are known in advance (during compile time before running the application).
In dynamic scheduling, some information about tasks and their relations may be undeterminable until run-time [3].Over the past few decades, researchers have focused on designing task scheduling algorithms for homogenous and heterogeneous systems with the objective of reducing the overall execution time of the tasks.Topcuoglu et al. [2] have presented HEFT and CPOP scheduling algorithms for heterogonous processors.Luiz et al. [4] have developed lookahead-HEFT algorithm, which look ahead in the schedule to make scheduling decisions.Eswari, R. and Nickolas, S. [5] have proposed PHTS algorithm to efficiently schedule tasks on the heterogeneous distributed computing systems.Rajak and Ranjit [6] have presented a queue based scheduling algorithm called TSB to schedule tasks on homogeneous parallel multiprocessor system.Ahmed, S.G.; Munir, E.U.; and Nisar, W. [7] have developed genetic algorithm called PEGA that provide low time complexity than standard genetic algorithm (SGA).Xiaoyong Tang; Kenli Li; Renfa Li; and Guiping Liao [8] have presented a list-scheduling algorithm called HEFD for heterogeneous computing systems.Nasri, W. and Nafti, W. [9] have developed a new DAG scheduling algorithm for heterogeneous systems that provide better performance than some well-known existing task scheduling algorithms.
In homogeneous distributed systems, researchers have developed many heuristic task-scheduling algorithms such as ISH [10], ETF [11], DLS [12], MH [13],B-level [14] and some heuristics that depend on the critical path such as MCP [15], FCP [16], and CNPT [17].Among these algorithms, B-level provides the best performance in terms of schedule length, speedup, and efficiency.LDPT (Leveled DAG Prioritized Task) algorithm [18]was compared to B-level algorithm which is the most famous algorithm in homogeneous distributed systems and it provided better results.
In this paper, the problem of scheduling precedence constrained parallel tasks on homogeneous physical machines (PMs) is addressed.A new static scheduling algorithm called LDCTF is proposed.The goal of LDCTF is to optimize the performance of LDPT [18] algorithm in order to provide better system performance.LDCTF is a list scheduling algorithm.It depends on dividing the DAG into levels then sorting tasks in each level into a list according to their priority and finally, picking tasks from the list one by one to schedule it on the suitable processor.LDCTF is compared to LDPT and B-level algorithms and it provided better results in terms of schedule length, speedup, and efficiency.
The remainder of this paper is organized as follows.
Section II provides an overview of the related work algorithm.The proposed algorithm is discussed in section III.Section IV presents performance evaluation results of the proposed algorithm.Finally, conclusion and future work is reviewed in section V.

II. LDPT ALGORITHM
LDPT is a list based scheduling algorithm.It depends on dividing the DAG into levels with considering the dependency conditions among tasks in the DAG.The algorithm has two phases: (1) Task prioritization phase, (2) Processor selection phase.LDPT algorithm depends on giving a priority to each task as shown in Figure 2 then; scheduling each task on one processor with taking into consideration the insertion-based policy.Figure 2 shows the pseudo code of LDPT algorithm.

III. LEVELED DAG CRITICAL TASK FIRST (LDCTF) ALGORITHM
LDCTF is a theoretical task scheduling algorithm.LDCTF, LDPT, and B-level algorithms are applied on Standard Task Graph STG [19] as a bench mark, and it was found that LDCTF algorithm is more efficient than LDPT and B-level algorithms.
LDCTF is a list based scheduling algorithm.It depends on dividing the DAG into levels with considering the dependency conditions among tasks in the DAG then, applying the Minmin method [20] which means calculating the minimum completion time (MCT) for each task on all processors then selecting the task with the lowest MCT to schedule.The algorithm has two phases: (1) Task prioritization phase, (2) Processor selection phase.

A. Task prioritization phase:
In this phase, the critical path [2] is calculated for the DAG (critical path is the longest path from the entry task to the exit task in the graph) then, the DAG is divided into levels and the tasks in each level will be sorted into a list based on their priority.The priority for each task is given as follow: 1) First, the critical task (task located on the critical path) in each level will have the highest priority.
2) Then, the expected Earliest Finish Time (EFT) is calculated for the other tasks in the same level and the task with the lowest EFT will have the highest priority.If tow tasks have equal EFT value then, the task with the lowest task number will have the highest priority.EFT of a task tion processor pj is computed as follow: EFT (ti, Pj) = wi, j + EST (ti, Pj)-----------------(1) 3) Finally, tasks in each level are sorted into the list in ascending order according to their EFT value.

B. Processor Selection Phase:
In this phase, the tasks are picked from the list one by one and assigned to the processor that will minimize the earliest start time of the task, with taking into consideration the insertion-based policy.The insertion policy means that if there is an idle time slot on the processor between two already scheduled tasks and it was enough for executing the task, then the task is assigned on that processor in this idle slot without violating precedence constraints.In other words, a task can be scheduled earlier if there is a period of time between two tasks already scheduled on processor (P), where P runs idle.If two processors provide the same start time for the task then, the task is assigned to the first processor that will minimize the EST of it.The Earliest Start Time of a task   on a processor   is defined as: Where TAvailable(  ) is the earliest time at which processor   is ready.AFT(  ) is the Actual Finish Time of a task   (the parent of task nx) on the processor  . , is the communication cost from task   to task   , , equal zero if the predecessor task   is assigned to processor   .For the entry task,EST(  ,   )= 0. Figure 3shows the pseudo code of LDCTF algorithm.

C. Case Study
Consider the DAG shown in Figure 4; assume the system has two processors (P0, P1).The critical path for the DAG in Figure 4 is (t0, t1, t3, t6, t8).Table 1 shows the computation cost for each task.Both LDPT and LDCTF algorithms generate a list of tasks that shows the execution order of them.Table 2 shows the lists generated by LDPT and LDCTF algorithms.For LDCTF algorithm the critical task in each level will be scheduled first as shown in table 2. Figure (5.a,5.b) shows the Gantt chart generated by LDPT and LDCTF algorithms respectively.Both algorithms assign the selected task to the processor that minimizes the start time (EST) of it.For example, in Figure 5.a, the EST for task t2 on p0 is 5 and the EST for t2 on p1 is 4, so the task t2 is scheduled on p1.In Figure 5.b, the same manner if followed with taking into consideration the insertion-based policy.From Figure 5, it is shown that the schedule length (the finish time of the last task scheduled from the DAG) resulted from LDPT and LDCTF algorithms is 25, and 23 unit of time respectively.In case of LDCTF, we observe that there is less periods in which processors are idle than LDPT.According to this result, the overall running time of the application will be decreased and the efficiency of the system will be improved.

A. Simulation Environment
To evaluate the performance of LDCTF algorithm, a simulator had been built using visual C# .NET 4.0 on machine with: Intel(R) Core(TM) i3 CPU M 350 @2.27GHz, RAM of 4.00 GB, and the operating system is window 7, 64-bit.To test the performance of LDPT and LDCTF algorithms, a set of randomly generated graphs is created by varying a set of parameters that determines the characteristics of the generated DAGs.These parameters are described as follows: DAG size (n: the number of tasks in DAG).Density (d: the probability of existence edge between ni in levelj and nx in the next level levelj+1 for DAG.Where, i, x=1,2,…, N, and N is the number of tasks, j=1, 2,…, T, and T is the number of levels inDAG).With six different numbers of processors varying from 2, 4, 8, 16, 32 and 64 processors.For each number of processors, six different DAG sizes have been used varying from 10, 20,40,60,80 and 100 nodes.

C. Experimental Results
The schedule length generated byLDPT and LDCTF algorithms is shown in Figure 6, 7, 8, 9, 10, 11 for 10, 20, 40, 60, 80, 100 tasks respectively and the results are recorded in table 3.According to the results, the schedule length is decreased that will minimize the running time of the application.The improvement ratio in schedule length is (2.75%).Figure 12,13,14,15,16,17 show a comparative study of the speed up of LDPT and LDCTF algorithms in case of 2, 4, 8, 16, 32, 64 processors respectively.Table 4 shows the speedup results of LDPT and LDCTF algorithms.From the results, we can see that the improving ratio in speed up is (3.2%).Table 5 shows the efficiency results of LDPT and LDCTF algorithms.From Figure 18, 19, 20, 21, 22, 23 we can see that LDCTF is more efficient than LDPT algorithms with improving ratio (1.9%).The schedule length generated by Blevel, LDPT, and LDCTF algorithms is shown in Figure 24, 25, 26, 27, 28, 29. Figure 30 (20,40,60,80,100).It is shown that LDCTF algorithm provides better speed up than LDPT algorithm.This is because in case of LDCTF algorithm, all processors have finished the execution of tasks earlier than LDPT algorithm.

D. Discussion of Results
First, LDCTF algorithm is compared to LDPT algorithm and it provided better results in terms of schedule length, speed up, and efficiency.This is because in case of LDCTF, the critical path is taken into account and the critical task will be scheduled first in each level.This means that the task with the highest computation and communication cost will be scheduled first resulting in minimum schedule length, higher speed up, and higher efficiency.
Finally, LDCTF is compared to B-level algorithm and it provided better results in terms of schedule length, speed up, and efficiency.This is because B-level algorithm depends on paths idea and this will increase the communication overhead during assigning tasks on processors.On the other side, LDCTF algorithm depends on levels idea that will minimize the communication overhead during assigning tasks on processors.Another reason is that B-level algorithm must calculate the b-level value for each task before scheduling so that, the arithmetic calculation in LDCTF is less than B-level algorithm which leads to minimize the complexity factor.

V. CONCLUSION AND FUTURE WORK
In this paper, a new static scheduling algorithm (LDCTF) is developed for homogeneous distributed computing systems.The performance of LDCTF algorithm is compared with LDPT algorithm.LDCTF is evaluated for different DAGs and found to be giving better results than LDPT algorithm in terms of schedule length, speed up, and efficiency with improving ratio 2.75%, 3.2%, and 1.9% respectively.
The performance of LDCTF is also compared with B-level and LDPT algorithms and found to be giving better results in terms of schedule length, speed up, and efficiency.LDCTF, LDPT, and B-level algorithms are applied on Standard Task Graph STG as a bench mark, and it was found that LDCTF algorithm is more efficient than LDPT and B-level algorithms.The future scope of the idea can be as follows: • In this paper LDCTF algorithm is applied on Directed Acyclic Graph (DAG).In the future it can be applied on Directed Cyclic Graph (DCG).
• LDCTF can be applied on Heterogeneous Distributed Computing Systems (HDCS).
• LDCTF can be applied in a dynamic strategy instead of static strategy.
• Finally, duplication technique can be applied with LDCTF algorithm to minimize the communication overhead.

Figure 24 ,
Figure 24, 25, 26, 27, 28, 29 depicts the schedule length versus number of tasks with varying number of processors 2, 4, 8, 16, 32, and 64 processors.It is shown that the schedule length in case of applying LDCTF algorithm is less than LDPT and B-level algorithms.

Figure 30 ,
Figure 30, 31, 32, 33, 34, 35 depicts speedup versus number of processors with varying number of tasks (10, 20, 40, 60, 80, 100).It is shown that LDCTF algorithm provides better speed up than LDPT and B-level algorithms.This is because in case of LDCTF algorithm, all processors have finished the execution of tasks earlier than LDPT and B-level algorithms.

TABLE II .
TASK LISTS FOR LDPT AND LDCTF ALGORITHMS

TABLE III .
SCHEDULE LENGTH RESULTED FROM LDPT AND LDCTF ALGORITHMS