A New Task Scheduling Framework for Internet of Things based on Agile VNFs On-demand Service Model and Deep Reinforcement Learning Method

—Recent innovations in the Internet of Things (IoT) have given rise to IoT applications that require quick response times and low latency. Fog computing has proven to be an effective platform for handling IoT applications. It is a significant challenge to deploy fog computing resources effectively because of the heterogeneity of IoT tasks and their delay sensitivity. To take advantage of idle resources in IoT devices, this paper presents an edge computing concept that offloads edge tasks to nearby IoT devices. The IoT-assisted edge computing should meet two conditions, edge services should exploit the computing resources of IoT devices effectively and edge tasks offloaded to IoT devices do not interfere with local IoT tasks. Two main phases are included in the proposed method: virtualization of edge nodes, and task scheduling based on deep reinforcement learning. The first phase offers a layered edge framework. In the second phase, we applied deep reinforcement learning (DRL) to schedule tasks taking into account the diversity of tasks and the heterogeneity of available resources. According to simulation results, our proposed task scheduling method achieves higher levels of task satisfaction and success than existing methods.


INTRODUCTION
The recent rapid development of artificial intelligence [1,2], machine learning [3], optical networks [4,5], smart grids [6], cloud computing [7,8], 5G connectivity [9], Blockchain, and Internet of Things (IoT) [10,11] is leading to an exponential growth in data usage across a wide range of engineering and commerce disciplines. Over the last decade, the IoT has been recognized for its remarkable potential in computer science. It brings out an environment in which many intelligent objects with limited resources can interact with each other through different technologies [12]. The IoT tries to link the physical and virtual worlds by equipping physical devices with processing, networking, detection, and identification functions. The IoT objects are employed in different applications, such as vehicle networks, energy management, traffic control, medical treatment, and healthcare, aiming to gather information about the physical world [13,14]. In this regard, to obtain valuable information and fulfil the assigned tasks, a massive amount of data is produced that brings challenging problems for efficient information processing, especially for the specific scenarios requiring real-time data handling. Considering edge computing capability in real-time processing, the computing IoT tasks can be offloaded to edge devices for implementation [15,16].
Moreover, since the IoT services and applications are increasing daily, a practical approach to serving the growing needs in different application domains becomes vital [17,18]. To address the mentioned problems, achieve high resource utilization, reduce communication costs, and improve the lifetime of IoT networks, task scheduling methods can have significant impacts [19,20]. These methods aim to schedule the existing tasks in a suitable sequence to accomplish tasks under problem-specific constraints, such as communication costs among IoT objects, resource utilization, and the operational lifetime of sensor nodes [21].
Nowadays, Network Function Virtualization (NFV) is known as a superior technology in telecommunication networks. It refers to centralizing and virtualizing network functions that can be run in data centres on standard and commercial off-the-shelf (COTS) hardware instead of distributed and proprietary hardware [22]. Using NFV, network device purchases and related maintenance costs can be reduced effectively. As a logical consequence of NFV, Virtual Network Functions (VNFs) take advantage of middleware to virtualize network functions. The deployment of VNFs on commodity hardware reduces the need for dedicated hardware devices to perform individual network functions. Our main objective is to improve edge computing operational efficiency by implementing the agile VNF on-demand model and the deep reinforcement learning-based task scheduling method. The proposed method includes two main phases, virtualization of edge nodes and task scheduling based on the deep reinforcement learning method. In the first step, a layered edge framework is presented. In the second step, we employed deep reinforcement learning (DRL) to solve task scheduling problems considering the tasks' diversity and the heterogeneity of available resources. This article makes the following major contribution.
 We analyze the optimization of time scheduling and the assignment of virtual machines in edge computing using model-free DRL-based task scheduling. VM availability, task characteristics, and queue dynamics are considered in the formulation of the problem as an MDP problem.
 In this case, the action is represented as a pair of VMs and tasks, whose dimension may be extremely large. In the MDP formulation, a new mechanism is designed to decouple the scheduling time step from the real-time step. With this mechanism, the action space remains linear with the product of the number of VMs and the queue size, and multiple tasks can be scheduled simultaneously.
The rest of the paper in organized as follows. Related works are reviewed in Section II. The proposed task scheduling method is described in Section III. Section IV reports the simulation results. Finally, the conclusion is presented in Section V.

II. RELATED WORK
Al-Habob, Dobre [23] considers offloading multiple tasks concurrently to several mobile edge computing servers. Offloading latency and failure probability are minimized by scheduling interdependent subtasks for servers. Models based on conflict graphs and genetic algorithms are used to solve scheduling problems. Experimental findings demonstrate that these algorithms approach optimal solutions found by exhaustive searches. In addition, even though parallel offloading employs orthogonal channels, sequential offloading is more likely to fail than parallel offloading. In contrast, parallel offloading provides lower latency. Latency gaps between parallel and sequential schemes decrease as the dependency between sub-tasks increases.
Chen, Guo [24] examine the business processes of edge cloud task scheduling. The authors propose an algorithm to optimize resource-constrained task scheduling using the construction of profit matrices, classification, and clustering preprocessing. Tasks with similar characteristics are grouped and categorized using clustering preprocessing. On the basis of the constructed profit matrix, a resource-constrained task scheduling strategy is derived. In the second step, edge cloud components such as virtual machines, user requests, tasks, and resources are constructed using Petri nets. As a final step, the suggested mechanism is evaluated through several experiments. According to simulations, the algorithm maximizes profit while managing tasks efficiently, reliably, and with a high load balance.
A mathematical formulation of the fog node task scheduling problem is presented by Azizi, Shojafar [25] to reduce fog node energy consumption while maintaining IoT QoS requirements. The goal of their model was to reduce deadline violation time. The authors proposed two partially greedy algorithms, semi-greedy with priority awareness and multi-start semi-greedy with priority awareness, to map IoT tasks to fog nodes efficiently. The proposed approaches were evaluated in terms of system lifespan, latency, energy, and the IoT task completion rate. In comparison with existing algorithms, tests indicate that the introduced algorithms improve task deadline compliance and reduce the total time spent violating deadlines.
Kanbar and Faraj [26] developed the Region Aware Dynamic Scheduling (RADISH) model, which consists of five consecutive processes. To reduce latency in scheduling, they implemented a bi-class neural network based on task nature to classify incoming tasks based on their nature, taking into account login credentials, emails, passwords, types of services, and quality of service parameters. In the second process, the improved moth flame optimization is used to schedule classified tasks considering workload, deadline, priority, and energy. Third, load balancing is accomplished by clustering potential fields. With the aim of balancing server load and improving efficiency, three repository systems have been implemented. The final step involves introducing the Hopcroft-Karp algorithm that takes into account the VM state and minimizes allocation times and enhances the quality of service.
Hybrid Flamingo Search with a Genetic Algorithm (HFSGA) is implemented by Hussain and Begh [27] to optimize task scheduling for cost minimization. HFSGA and other well-known optimization algorithms are compared on seven essential benchmark optimization functions. Furthermore, Friedman Rank Tests are conducted to ascertain the results' significance. Implementing the model produces better results regarding task completion percentages, makespan, and costs. This work shows better results than existing algorithms such as round-robin, genetic, PSO, and ACO.
In order to secure the allocation of tasks on cloud and fog nodes, Najafizadeh, Salajegheh [28] proposed a multi-objective simulated annealing algorithm. A compromise solution is found by applying the goal-programming approach. In addition, a new goal called client-driven access level and schedule is created in regard to distributing tasks among fog and cloud nodes. The proposed algorithm was found to be 50% more efficient regarding deadlines, 88% more efficient regarding control levels, and 10% more efficient regarding service delays in comparison to moth-flame optimization, tabu search, and PSO algorithms.
Task scheduling in edge computing requires the consideration of two special problems: time scheduling and resource allocation. The task execution order is determined by time scheduling, while resource allocation is responsible for allocating tasks to suitable virtual machines (VMs) for execution. In the field of edge computing, a number of scheduling issues have been explored [29][30][31][32][33]. However, the majority of existing works focus on resource allocation, while time scheduling has received little attention. Tan, Han [34] proposed a general model to minimize task response times when tasks are offloaded to edge servers. Zhang, Du [35] proposed a scheduling algorithm based on Lyapunov optimization in order to minimize the communication delay and the computing delay. Chen, Thomas [36] developed a dualscheduling framework to accommodate the unstable capacity of servers and task arrival rates in heterogeneous vehicular edge computing. In [37], a mixed integer nonlinear programming (MINLP) algorithm was employed for dataparallel offloading and scheduling of computationally-intensive data-parallel tasks in order to minimize the average completion time. According to [38], tasks with the lowest delay are scheduled first using the shortest-job-first (SJF) scheduling method. www.ijacsa.thesai.org Alameddine, Sharafeddine [39] explored the use of deviceto-device collaboration for task offloading by taking into consideration the mobility of humans in order to optimize the task assignment and power management. The issue of energyefficient task scheduling for IoT edge computing has been addressed in [20] by a heuristic algorithm. In these methods, ideal mathematical models are used and optimization is achieved through mixed-integer nonlinear programming (MINLP) or heuristic algorithms. In spite of the fact that these model-oriented algorithms can produce excellent results, they are not well suited to dynamic environments in which task arrival rates and popularity are unknown in advance. It is also important to note that the model-based task scheduling algorithms are largely concerned with optimizing one-step instead of pursuing long-term objectives. In these algorithms, the availability of resources is assumed to be fixed during the scheduling period.

III. PROPOSED TASK SCHEDULING METHOD
In our proposed mechanism, each gateway node serves as a connection point for several IoT nodes connected to the gateway node based on their shortest distance. Our main aim is to propose an edge computing service model based on gateways to maximize IoT resource utilization, accelerate the processing of users' service requests, and enhance edge computing efficiency. The proposed model involves user requests passing through the edge gateway, which determines whether to process tasks. To decrease data processing time, the controller forwards service requests to the cloud if the edge gateway is unable to process them. The edge computing service model contains three main components, lightweight VNF configuration, scheduler, and resource estimation. According to the proposed model, when a request for a service is received, it is checked whether or not an adequate amount of computing resources is available to fulfil such demand. In this regard, one of the following events may occur. Depending on the outcome of the task scheduling algorithm, the edge gateway processes the requests through the scheduler and queues them in the system. On the other hand, requests are transferred directly to the cloud if the edge gateway lacks the resources.
Improving edge gateway operational efficiency is the main aim of task scheduling methods. Since the service requests are different from each other, a task scheduling algorithm determines how to meet the demands of each request. A scheduling method reduces time spent on tackling highdemand tasks as a primary objective. Our main objective is implementing on-demand models for agile VNFs and deep reinforcement learning-based task scheduling methods to enhance edge computing efficiency. The proposed method includes two main phases, virtualization of edge nodes and task scheduling based on the deep reinforcement learning method.
In order to simplify the task scheduling process, we will only focus on computational resources. In order to make scheduling decisions, the scheduler monitors the status information of incoming tasks and virtual machines (VMs), including the task sizes, the expected completion time, the computing speed (in million instructions per second (MIPS)), and the waiting time. The scheduler determines when to schedule (e.g., the scheduling order and the start time for each task) and where to schedule (e.g., which VM is assigned to each task) based on the observations. In order to schedule the tasks, they are divided into two sets: a waiting set and a backlog queue. A task in the waiting set occupies a waiting slot that can be observed fully, whereas the scheduler can only observe the number of tasks in the backlog queue. In each scheduling time step, the scheduler selects at most one task from the waiting slot for scheduling. In this study, we examine the scheduling of tasks in edge computing when only one edge server is deployed. It is the objective to maximize the longterm task satisfaction of all tasks, which is achieved by: where g i,j is the task satisfaction of the task i scheduled to VM j.

A. First Phase: Virtualization of Edge Nodes
This step proposes a virtualized edge framework to support Cloud-to-Things applications at all layers. The virtualization process involves three stages: virtualizing objects, network functions, and services. The implementation of object virtualization allows physical sensors to gain IP capability without compromising their unique functionalities. In order to represent heterogeneous physical entities, we need to create a unified software Virtual Object (VO) on edge nodes. An abstract VO is capable of interacting over the Internet with various hosts.
Additionally, it can act as a close neighbour to physical objects through wireless or wired connections available at the edge. VOs provide semantic descriptions of actual objects. Although physical objects have heterogeneous functions, they generally have limited memory and components such as communication modules, sensing modules, and power supply units. These components can be expanded and installed on edge devices for real physical objects as virtual software instances.
As illustrated in Fig. 1(a), two critical streams of object virtualization are taken into account to illustrate VO-hosting solutions: hardware-level and OS-level virtualization. VOs are compatible with established operating systems. Edge players, such as providers, developers, and end users, can develop dedicated hosting platforms with advanced features, such as memory, hardware interface, and CPU, in order to achieve better performance. Fig. 1(b) provides more details about the sensor virtualization framework. Both virtual sensor instructions and physical sensor data are stored in the "Sync Flag". Version numbers are used to synchronize actual and virtual objects. "Energy Manager" indicates the battery life of physical sensors and turns them on. Compared to physical sensors, virtual sensors are composed of V-communication, Vprocessing, and V-sensing components. In the left column, "Actuator Flag" contains the instructions given to the physical sensor, and "Sensor Flag" holds the sensor's collected data. Vprocessing is the equivalent of physical sensor processing. A predefined local or network storage device can be used as an external memory. www.ijacsa.thesai.org To manage virtual and physical nodes as well as VNFs in the proposed method, an Object Virtualization Manager (OVM) is required. As depicted in Fig. 2, the OVM manages and orchestrates physical and virtual entities. Initially, the OVM deploys and programs the corresponding VOs, monitors and coordinates their operation, discovers and registers them, and creates and terminates them. In this regard, the OVM stores the configuration files of VOs editable by remote or local users. In order to perform automatic self-configuration for service deployment, the OVM downloads the configuration profiles of registered VOs. A VO will be migrated to a more resourceful container if its container is overloaded. VNFs, as software instances, contain several portions of VMs that run network functions on standard hardware. In order to decompose the VNF into reusable components, which can be designed as executable microservices and optimized, upgraded, and configured independently, network functions typically involve several approaches, including verification, computing, media access control, coding, and signaling. These components combine to form VNFs in micro containers. www.ijacsa.thesai.org

B. Second Phase: Task Scheduling based on Deep Reinforcement Learning
Scheduling tasks in edge computing involves two main concerns, allocation of resources and schedule. Resource allocation involves assigning tasks to the appropriate virtual machines, and time scheduling determines the order of task execution. To maximize the quality of experience, our task scheduling strategy considers the expected delay requirement for heterogeneous virtual machine resources. The level of task satisfaction acts as a reward in the deep reinforcement learning algorithm. Network edge servers handle computationally-intensive tasks owing to the difficulty of performing them on local devices. The edge servers are configured with various virtual machines that differ in terms of computational capacity and execution time. The scheduler is responsible for monitoring the status of virtual machines and incoming tasks. The waiting time, the speed of computing, the expected completion time, and task size are significant factors in scheduling decision-making. Observations confirm that the scheduler decides the scheduling order and the start time of each task, determining suitable virtual machines. Fig. 3 shows the general layout of the proposed task-scheduling framework.

IV. EXPERIMENTAL RESULTS
A numerical evaluation of the proposed task scheduling mechanism is presented in this section. Pytorch running on Python 3 produced all simulation results. A comparison was made between the proposed mechanism and two baselines. We implemented the task scheduling policy using a four-layer DNN structure. Each hidden layer encompasses 64 neurons with a rectified linear unit (ReLU) as an activation function. The output layer consists of (M+1)×O neurons, where M refers to the number of VMs and O denotes the maximum tasks in the waiting slot. In training, we set the discount factor to 0.99, which indicates that the current decision is affected by future steps. The learning rate is set to 10−4, and gradient descent is performed using the Adam optimizer. The paper focuses on scheduling tasks in the edge system by considering only computational resources. VM resources and task characteristics are the only environment parameters considered in calculating residual computation delay. IoT devices generate tasks and send them to the base station for transmission. The edge server receives these tasks periodically. Latency is expected to range from 5 to 10 seconds, with transmission delays ranging from 1 to 5 seconds. Task sizes range from 500 to 4000 MI. Virtual machines have a processing capacity ranging from 1000 to 2000 MIPS. Waiting slots are set to O=5 and backlog queues to |b|=5.
The effects of task popularity skewness, task arrival rate, and virtual machine count on task satisfaction and success ratio were studied. Fig. 4 and Fig. 5 illustrate the excremental outcomes. As illustrated in Fig. 4, the number of virtual machines and the task arrival rate are associated with the cumulative degree of task satisfaction. With the popularity skewness set at 0.3, the task arrival rate ranges from 3 to 7, and the number of virtual machines increases from 3 to 5. According to Fig. 5, the cumulative task satisfaction degree www.ijacsa.thesai.org decreases as the task arrival rate increases. A higher arrival rate means that more tasks are waiting for scheduling in the edge system simultaneously, which lengthens their waiting time. As the number of VMs increases, the average task satisfaction degree also increases. Tasks can be scheduled across multiple virtual machines, resulting in shorter waiting times. Fig. 5 shows the skewness of task popularity versus task satisfaction degree. Task popularity skewness ranges from 0.1 to 0.9 with three virtual machines. A higher cumulative task satisfaction is associated with greater skewness in task popularity. Each type of task has a different popularity skewness. As task popularity skewness increases, smaller tasks become more popular while larger tasks become less popular, thus reducing the overall waiting time.  Shortest Job First (SJF) [38] and First Come First Service (FCFS) [37] algorithms are chosen as benchmarks for evaluating the performance of the proposed task scheduling mechanism. SJF and FCFS assign the scheduled task to virtual machines with the highest instant reward. This leads to the scheduling of tasks in a greedy manner. These benchmarks can therefore be categorized into greedy-FCFS and greedy-SFJ. A comparison was made between our proposed method and these benchmarks in terms of task success rate and task satisfaction rate. Total task satisfaction is affected by the average task satisfaction degree, which makes it possible to evaluate the algorithm's overall quality. The task is considered complete when the response time is less than the expected delay. According to Eq. (1), the task success ratio can be obtained by dividing the number of satisfied tasks by the number of tasks in total.

∑
(1) Fig. 6 and Fig. 7 show how performance is affected by varying task arrival rates. In all algorithms, the average task success ratio and level of task satisfaction decrease as task arrival rates rise. Our method is significantly more efficient than the greedy-FCFS and greedy-SJF scheduling algorithms. In particular, the suggested method can increase average task satisfaction degrees by around 50% and 25% over greedy-FCFS and greedy-SJF. As FCFS schedules earlier-arriving tasks first, subsequent tasks may be delayed when the earlierarriving tasks demand a lot of CPU power. For long tasks, greedy SFJ prioritizes shorter tasks over longer ones. The expected delay demand is not taken into account by greedy-FCFS or greedy-SJF algorithms.  (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 3, 2023 69 | P a g e www.ijacsa.thesai.org Fig. 8 and Fig. 9 compare our task scheduling mechanism with the baselines in terms of task popularity. A higher skewness factor results in a higher degree of task satisfaction and a higher success ratio. The proposed algorithm can significantly improve task satisfaction, as shown in Fig. 8. The gap widens with increasing popularity factor values compared to greedy-SJF and greedy-FCFS algorithms. The greedy-SJF algorithm suffers from performance degradation due to a larger proportion of small tasks being assigned as popularity increases. The higher skewness factor allows for more accurate predictions of task lengths, leading to more efficient scheduling. This leads to better results for popularity-based algorithms compared to greedy-SJF and greedy-FCFS algorithms. The improved task satisfaction and success ratio observed in Fig. 8 are a direct result of this improved scheduling efficiency.

V. CONCLUSION
With the rapid growth of IoT applications, edge and IoT devices have expanded their services in recent years. Fog computing offers a latency sensitivity advantage over cloud computing for IoT-enabled smart applications. Task scheduling effectively reduces application computation time and latency while improving quality of service. This paper's proposed task scheduling framework comprises two main phases: virtualization of edge nodes and task scheduling based on deep reinforcement learning. The first phase offers a layered edge framework. In the second phase, we applied DRL to schedule tasks taking into account the diversity of tasks and the heterogeneity of available resources. Simulations indicate that our proposed task scheduling method leads to greater levels of task satisfaction and success than existing approaches.