Queueing Model based Dynamic Scalability for Containerized Cloud

—Cloud computing has become a growing technology and has received wide acceptance in the scientific community and large organizations like government and industry. Due to the highly complex nature of VM virtualization, lightweight containers have gained wide popularity, and techniques to provision the resources to these containers have drawn researchers towards themselves. The models or algorithms that provide dynamic scalability which meets the demand of high performance and QoS utilizing the minimum number of resources for the containerized cloud have been lacking in the literature. The dynamic scalability facilitates the cloud services in offering timely, on-demand, and computing resources having the characteristic of dynamic adjustment to the end users. The manuscript has presented a technique which has exploited the queuing model to perform the dynamic scalability and scale the virtual resources of the containers while reducing the finances and meeting up the user’s Service Level Agreement (SLA). The paper aims in improving the usage of virtual resources and satisfy the SLA requirements in terms of response time, drop rate, system throughput, and the number of containers. The work has been simulated using Cloudsim and has been compared with the existing work and the analysis has shown that the proposed work has performed better.


INTRODUCTION
Cloud computing has evolved into a highly dynamic computing model. It has gained attraction from various organizations due to its cost, availability, scalability, and security. It is an internet-based computing technology that provides higher-end computation and a shared pool of resources which are accessible on demand [1]. It has revolutionized the internet world through its hosting services and computational ability. Its unique technology has facilitated the user to pay for only those services and resources which have been demanded by them and further these resources can be increased and decreased depending upon the requirement. The potential and high capabilities have led to amplified productivity with reduced costs and flexibility as against the other IT industries [2]. The prime technology working behind the cloud is virtualization which enabled the cloud to instantiate various Virtual Machines (VMs) on one single physical machine (PM). Virtualization can occur at various levels like desktop, network, storage, and application [3]. It can affirm high performance, confidentiality, reliability, and security among VMs. One VM is isolated from the other VM on the same PM making it securely isolated. Despite various benefits exhibited through virtualization, applications demanding less isolation and maximum flexibility at runtime, VM virtualization may not be sufficient enough to satisfy all the QoS standards [4]. The container-based virtualization is gaining more popularity these days because of the more dynamic and flexible nature of the workload which varies highly with time. It expedites the seamless movement of applications from one architecture to another as against the VMs virtualization. Container executes on a kernel with the equivalent performance as VMs but with lesser cost than expensive VM runtime management overhead [5]. Containers provide a good platform to execute microservices on the cloud and they provide good support for the technologies such as fog computing, and the Internet of Things (IoT) [6]. As container technology gained popularity various large-scale IT industries providing cloud services have come up with their containerbased cloud services.
The most renowned service models available in the cloud are Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) with various energy efficient datacenters (DC) which are solely responsible for managing the scalability through resource management and load optimization [7]. With the PaaS service, the users can deploy any applications on the cloud. This model encapsulates the underlying infrastructure and facilitates the user to deploy the applications anywhere without giving a single thought about infrastructure management. One of the components of the PaaS service is containers and they are its enablers [8]. So, a user application can be deployed on a single cloud infrastructure as a unique block or deployed separately in different cloud infrastructures.
The key characteristic of the cloud to scale up has attracted a lot of users. The variation and fluctuation in workload have compelled the cloud providers to scale up the resources (VMs or containers) dynamically as per the requirement. The cloud has eased the process of obtaining and releasing resources but it can be challenging to decide how many resources are needed to handle a fluctuating workload. There is an urgent demand for a model which can provision and de-provision the resources dynamically at the burst of demands. Despite the development of container technology and harassing its potential, there is still room for the improvement in dynamic scaling of cloud resources. Insufficient scalability which is not competent enough to confront the variation in the workload intensity may lead to under-provisioning (UP) or overprovisioning (OP) of the resources. In the UP scenario, the performance of the cloud degrades and SLA is violated. While in OP there is low consumption of resources that are allocated resulting in a higher cost for the providers. As a result, in response to dynamic changes in the global arrival rate during *Corresponding Author. www.ijacsa.thesai.org runtime, adaptation mechanisms are coveted for polished dynamic scalability. Appropriate dynamic scalability is the demand of the time and it affirms the performance of the SLA while making the cost low. An efficient technique for dynamic scalability is required for fulfilling the requirement of both the users and CSP. This work has proposed a dynamic scaling approach for the containerized cloud. The approach enables us to acquire the dynamic and scalable nature of cloud computing and analyze its efficaciousness. The model tries to estimate the future resource demand and provision the resources in a dynamic way for mitigating the SLA violation and reducing the cost incurred by the system. The work is simulated in Cloudsim and the work is evaluated under the various quantity of workload. The main contribution of the paper goes as:  A queuing model is proposed to estimate and acquire the behavior of containerized DC.
 The load balancer model and container model are discussed.
 The mathematical formulation has been derived from the analytical model for various QoS measures.
 Simulation of the work is performed on Cloudsim.
The remaining paper is compiled as follows: the literature study associated with the work is done in Section II. The proposed work is discussed in Section III. Section IV performs the results and discussion and lastly, the work is concluded in Section V.

II. RELATED WORK
Containerization is not a novice concept of computer science. It was existing back in 1972 on Linux or Unix systems in different ways [9]. It aided the developer in providing an efficacious programming environment which has a quite reduced operational cost. Docker has adopted container technology and led to the start of open containers in the industries like Google, Microsoft, and many more and it is getting popularity day by day due to its isolation strategy. Fig. 1 describes the container in the cloud system with its private OS, interface, and file system. Cloud containers provide a thin encapsulation over the application so its deployment is relatively easier and faster. Initially, it started with the VMs which are light. These technologies possessed an isolated OS on which the application can be deployed [10]. Containers have several benefits over VMs [11]. Firstly, compared to virtual machines, containers use host system resources far more efficiently. Second, starting and stopping the containers only takes a minute time. Next, the mobility container prevents inter-system dependency conflict and guarantees its separate functioning from the system on which it is hosted. Fourth, unlike VMs, which are frequently not distributed production environments, containers possess the feature of being exceedingly lightweight, enabling end users to operate dozens or more of them simultaneously. Fifth, instead of having to go through hours-long installation and configuration hassles, end users of apps can instantly download and run sophisticated software. Additionally, unlike virtual machines (VMs), which strive to virtualize an external environment, a container's primary goal is to make an application fully portable and independent [12].
The containerized cloud is emerging as one of the most challenging issues over the past few years and a lot of work has been published in this regard. This section studies the relevant work associated with the scalability and performance of container-based cloud models. Some study is associated with scalability to provide better insight into scalability and some shows the scalability in container clouds. Scalability has once been qualified as the key feature contributing to the efficient working of cloud-based services. A deep scaling methodology has been introduced in [13] where three components have been included for effective resource utilization. First, it forecasted the workload then it mapped the workload intensity to the approximated CPU utilization and lastly, an auto scale method is developed for maximizing the CPU utilization. A proactive elastic model was defined in [14] which resolved the scalability issues in cloud-based IoT systems. It utilizes the ant colony optimization technique along with the Markov chain for scheduling the resources efficiently which enhances the performance and maintains the QoS measures. It improved the response time and request throughput. A benchmarking method is proposed that defined a framework for scalability benchmarking tools for quantifying the scalability. It included the scalability metrics and measurement methods to specify the achievement of the given service level objectives. It also provided the facility for configuring the scalability parameters for getting an efficient response [15]. The author in [16] proposed a container-based autoscaling procedure that used a heuristic technique for utilizing the resources efficiently. It improved the execution time, throughput, response time, and the minimum number of containers. The author in [17] addresses the two major scalability metrics volume scalability and quality scalability. Volume scalability is highly influenced by the scaling of service volume while quality scalability is affected by the service quality provisioned. These parameters quantify the technical scalability and helped in assessing the impact of demand on the service. Besides, they also aided in designing and performing scalability testing with the motive of the identification of the components that affect the scalability performance. www.ijacsa.thesai.org Scalability in container clouds has made the processing of cloud applications lightweight and efficient. An automatic scaling method is discussed in [18] where it reduces the response time, energy consumption, and better CPU utilization. An analytical model based on the stochastic technique for the container-based DC has been discussed in [19]. It studied and analyzed the performance of the cloud system with respect to mean job delay and job rejection probability. It created a framework for container emulation and assessed the same against the suggested stochastic technique. Through experimental development, the suggested model is validated using actual data. Insight into DC planning is provided to system designers by numerical verification. Another approach is introduced in [20] in which AWS autoscaling is implemented which facilitate estimating the future workload. It applied a future prediction algorithm using Prophet API. It studied the CPU utilization and the creation of new EC2 instances when the workload is heavy. An autoscaler-based model has been discussed in [21] which provide the architecture for the container-based application. It has included a monitoring mechanism, prediction model, time series model, and decision mechanism. The prediction utilized the time series to predict the future workload. It has provided better provisioning and speedy elasticity. The author in [22] introduced a framework for auto-scaled containerized applications which is governed by workload demand. It offered both reactive and proactive scaling. Reactive scaling was implemented using the threshold rules and proactive scaling utilized a neural network. It ensured the requirement of QoS. Another container-based module was developed in [23] which provided efficient provisioning. It used an adaptive function tree for scalable container provisioning. It mitigated the provisioning cost further by using a fetching mechanism showing the quality of on-demand and I/O efficiency. It turned out to be providing better scaling, response time, and provisioning. A horizontal scaling technique is discussed in [24] which configured the services in a docker container while the workload was balanced using the load balancer. It calibrated the infrastructures depending on the number of predicted users. It expanded the infrastructure and processing capability in a short duration and offered a fault-tolerant system for medium and small-scale industries. Another technique for resource utilization in a cloud-based application is discussed in [25] for container clouds leveraging the vertical elasticity of Docker. The resource coordinator and monitoring policies are implemented during the execution of tasks. Scalability parameters are the configurable parameters in the procedure.
To the greatest of our knowledge and as of this time, there hasn't been any research available for the effectiveness and dynamic scalability of containers published in the literature. The existing work does not consider the dynamic scalability in the containers which has provided a cost-effective solution to the virtualization. It is of utmost importance to identify the number of containers required to cope with the highly dynamic workload to satisfy the SLA and QoS requirements. Dynamic scalability is attained only when there is neither overprovisioning nor under-provisioning. Overprovisioning may result in higher costs as more containers will be engrossed while under-provisioning leads to SLA violation. Therefore, the main distinction between our study and the studies listed above is that in addition to forecasting workload, we also forecast the future need for computing resources. Furthermore, in contrast to most techniques that focus on only one factor (CPU utilization), our model provides cloud providers with more information about the timely scaling and descaling of containers' and VMs' volume. This not only decreases the cost incurred by the users but also improves the user's experience and also mitigates the financial burden of service provider and infrastructure cost due to the efficient and wise usage of the resources.

A. Problem Formulation
A model consisting of DC consists of PMs which has the capability of holding various VMs which are further profound enough to hold various containers representing the real practical scenario of current existing cloud services. A hypervisor is held responsible for allocating various VMs to a PM while multiple containers can be allocated to a VM. The task execution request raised from the different users is being sent to the load balancer (LB). This LB routes the traffic to the PMs for execution. These requests are sent to a buffer system which is linked to the LB queue from where it is sent to the containers for the allocation of the resources and their execution. The tasks from the queue are allocated to the containers as per the availability of the resources. Whenever the user demands a new container with a particular requirement of the resources, the establishment of SLA between the final users and the CSP is agreed upon by the delivery of the requested QoS. If the breach in the agreed SLA happens the CSP is supposed to pay the penalty to consumers. The flow of the tasks happens as end users put in the request and it is sent to the LB. This LB receives the requests and distributes the tasks to the PMs as per the allocation policy utilized. Each task is allocated a unique container. As the task request increases the VM scales up the through the addition of the container for the execution of the request. Mostly the companies utilizing the features of the famous company Docker [26], use at least 18 containers simultaneously. Let us consider the m PMs which are represented as * + which can hold a maximum of up to n VMs represented as * + . A VM can contain maximum l containers represented as * +. So, a PM can be scaled maximum to VMs which can accommodate a maximum of containers.

B. Queuing Model
The PMs undertaken in the DC have a similar configuration. The requests generated by the end users are sent to the queue and are served at each node in the DC based on a first come first serve basis. When the requests are executed and served well then they exit from the system. This paper has assumed that each request is served in only one container and one container will serve only one request. The DC is modeled using the open Jackson queuing model represented in Fig. 2. 468 | P a g e www.ijacsa.thesai.org

C. Load Balancer Model
The LB is held responsible for managing the huge load which comes with cloud computing in the form of an ample number of requests from the end users. To serve the requests well the LB is modeled as an M/M/1 queuing model having the provision of the infinite capacity task requests buffer and the arrival of requests is supposed to be one by one [27]. The Markov chain with the continuous time of the LB model, and state depicts the task number which means tasks are waiting to get allocated in the queue. The arrival of the tasks happens at the rate similar to that of the Poisson procedure in which the arrival duration of two immediate tasks is independent and the distribution is exponential according to the rate . The serving time to the task at the PM in the LB is exponentially distributed over the rate and is the mean serving time. If the M/M/1 is assumed to be stationary, where . Let the probability be be for the state. The following equations can be summated utilizing the balanced equation [28]: From equation 1 and 2, it can be written According to the normalized equation, ∑ One can deduce that, And then the steady-state probability of t tasks in the queue can be given as: The number of tasks on an average queued in LB can be deduced as: The average response time that the tasks in the queue obtained can be evaluated through Little's law [29] given as:

D. Container Model
The paper has considered a DC containing various PMs having various VMs designated to hold one or more container instances executing on it. The local Scheduler (LS) and the runtime component (VMs) are the two major holdings of a PM. Fig. 3 demonstrate the placement of VMs and containers and LS in a PM. The container is executed on these VMs as an isolated thread in a similar namespace with a guest OS shared among other containers in the same VM. The hypervisor performs the operations that include resource management for placing the containers in the pool of VMs in accordance with the workload being requested from the users. Let's consider CDC contains PMs with LS modeled as M/M/1/C model [30] with VMs and containers and thus making the queue full. This implies that PM is exhausted with the resources and is not in a condition to accept any new task until it gets finished up with the tasks previously allotted to it. So, it will reject the incoming new tasks. According to Burke [31], the departure procedure in the queue M/M/1 follows the Poisson process with the same rate . Thus, the tasks arriving at each PM follow the Poisson rate with ⁄ and each task is served with a service time exponentially distributed with an average ⁄ . As the queue is finite in size so, for all the values of and the system is stable. The PM with t tasks in the queue has an equilibrium probability that can be defined as: The rate at which tasks are lost at the PM at the LS queue can be obtained as: The LS queue has PM whose throughput is given as: Similarly, the volume of tasks available in PM at the queue is: (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 1, 2023 469 | P a g e www.ijacsa.thesai.org The number of tasks undergoing the service is: (13) So, the tasks waiting in the queue can be given as: (14) The CPU utilization can be given as: (15) The waiting time for the tasks at PM is: (16) Thus, the response time at PM is evaluated as: For the later part of PM, each VM is modeled as the servers with servers and the queue is not available with these servers i.e., M/M/l/l [32] where depicts the volume of containers available with each VM. As stated in [31], each virtual machine's incoming tasks follow a Poisson process with a rate of , indicating that each container receives an equal amount of requests. Since there are VMs so each will get the tasks with an arrival rate ⁄ . This will provide a balance system as each VM has a similar configuration. The service rate of each container can be taken as . The task incoming at the VM can be visualized as a birth and death process. In a state , the rate of the incoming task is where . While in the state , the death rate . Let be the stationary probability with tasks in the VM. It is observed: , ( ) , from the local balance equation. With ⁄ , it can be written as: (18) After the application of standardization condition [33], can be generalized as: It can be further deduced: The loss probability for the tasks lost at VM, as the VM was full, is recognized as: As there is no queue for the VM, so the tasks' volume in the VM As earlier, the response time is evaluated as: As it is already known, when the LB sends the request, a job can only be carried out by one PM and in one VM by a container. Thus, the response time of the tasks before they went for execution can be summed as: (24) With a similar analysis, the task being rejected in DC is: (25) IV. RESULTS AND DISCUSSION The model proposed above is simulated through a series of experiments to analyze its effectiveness. The simulation has been performed on a personal computer with a 2.30 GHz Intel Core i3 processor and 4GB of RAM. The simulation tool used is Cloudsim. Initially, the DC is configured with 5 PMs and each PM is capable of supporting 10 VMs which varies to 50 VMs while each VM can accommodate a maximum of up to 18 containers. The arrival rate of the task varies from 1000 to 10,000 tasks per second. The task in the queue requests for execution which is serviced in 0.0001 seconds. The maximum capacity of the queue is 300. The LS service the request on an average of 0.001 seconds. The experiment is performed with 100 repetitions for efficient analysis.

A. Response Time
The response time of the system is very much affected by the volume of VMs which is analyzed with the varied task arrival rate. Fig. 4 illustrates the same. Here, the capacity of each VM to hold containers is 20. From the figure, it can be observed that the increment in response time to the task arrival is quite proportional. It is analyzed, for all the given scales of VMs, there are no substantial change in the response time when the arrival rate of the task varies from 5500 tasks per second to 9500 tasks per second. As the tasks arrival rate increases from 9500 tasks/second the response time increase exponentially for all the scales of containers. The response time 0.41second is observed in the 20 VMs scenario when the arrival rate is 10,000. The proper configuration of the VM can be chosen if the minimum response time is one of the QoS targets to be achieved for SLA.

B. Drop Rate
The drop rate of the system is defined as the rate at which the tasks are dropped or rejected because of either lack of space in the LS queue or a lack of capacity in DC. Fig. 5 depicts the drop rate against the task arrival rate with the varied number of containers. Each VM has 18 containers. It can be observed from the figure that initially there is not much drop in the tasks but as the task rate increases the drop rate increases. This increase varies differently with a different configuration. In the case of 20 VMs, the loss starts after the 2000 task arrival rate is reached while in the 50 VMs case this loss starts after 5000 tasks/sec. Until the rate reaches 5000 the 50VMs configuration doesn't show any loss while the 3015 tasks/s are lost in the same task arrival rate as the 20VMs configuration. It can be deduced with the increased number of containers there is less loss of tasks.

C. Throughput
The system throughput is being analyzed against the task arrival rate with all four configurations of the containers. Fig. 6 shows the variation of the system throughput measured in tasks per second. As it can be observed that in all four cases the system throughput is similar till 2000 tasks/sec. The impact of the different configurations of containers can be seen beyond 2000 tasks/sec. The system performs better with a large number of containers. There is not much variation that can be seen when the rate of the task reaches 2000 tasks/sec in the first case, 3000 tasks/sec in the second case, 4000 tasks/sec in the third case, and 5000 tasks/sec in the last case. After a certain threshold, the throughput has become quite fixed. The requirement of predefined system throughput in SLA can be resolved using the selection of the best configuration of containers by the service providers.

D. Number of Containers
To study the effects of the number of containers on response time and drop rate the tasks arrival rate has been fixed at 9000 tasks/sec. The number of containers has been increased from 8 to 18 containers. Table I represents the response time. It is observed from the table as the number of containers increases the response time decreases. A dramatic decrease can be seen after the 16 th container. The response time of the system is highly dependent on the volume of containers. The system drop rate is also getting highly influenced by the number of containers. It can be analyzed that as the containers increase the drop rate decreases. It shows that to keep the drop rate below 3010 tasks/sec the minimum number of VMs and containers is 50 and 8 respectively.

E. Comparison with other Algorithms
The response time and system drop rate are compared with the existing work [11] and [16] for the 50VMs with 18 containers each. From Fig. 7, it can be observed that till the 4000 tasks/sec there is not much variation among the algorithms. As the rate increases the proposed shows better results. With 10000 tasks/sec, the response rate is 0.3301sec of the proposed algorithm while that of [11] is 0.524sec and that of [11a] is 0.601 sec. From Fig. 8., it can be deduced that till the task rate is 4000 all the algorithms show the same drop rate but as the task rate increases there is an exponential increment in the drop rate. With 10000 tasks/sec, the drop rate of the proposed algorithm is 4859 tasks/sec and while that of others is 5338 and 5812 respectively. The suggested method is significantly more effective than others.
The results obtained above have demonstrated that the increment in task arrival rate affects the QoS measures depending on the containers available in the DC. So, it's very essential to scale up or scale down the container instances depending upon the rate of the incoming task. In addition to this, the number of containers available has to fulfill the SLA requirements. Besides, in the DC the workload is very dynamic and to provision, the minimum containers dynamically which can fully satisfy the SLA requisite which monitors the usage of virtual resources and modify the number of resources to be used is of utmost importance. Therefore, the main challenge that needs to be worked upon is the engagement of the minimal number of containers for fulfilling the SLA exigencies. Allocation of a greater number of containers than required may lead to the OP which increases the cost. Deploying a lesser volume of containers than expected may result in UP leading to more SLA violations. Therefore, dynamic scalability is the requirement of the time to avoid the situation of under and over-provisioning. The proposed algorithm facilitates the service provider to identify the minimal containers required while the rate of the task fluctuates helping in scaling up and down the resources and maintaining the SLA.  Further, the resources are allocated to the tasks in the order of their arrival. It may not consider the priority tasks which can be handled in further study. Since the arrival rate of the tasks is considered fixed which may differ in real life scenario as the arrival rate can vary with the state and thus making the potential customer to switch other service due to long waiting queue and thus it may affect the efficiency of the system.

V. CONCLUSION
This paper has proposed a queuing model for dynamic scalability in containerized clouds to analyze the workload and the effects of scaling on the QoS parameters. It also suggests the number of containers is scaled up or down for the requirement of a particular given SLA. A mathematical model is developed for identifying the key performance metrics. The model predicts and approximates the resource request for future requirements to mitigate the SLA violations and provide cost-effective solutions. The proposed methodology can also be used to scale the DC containers to guarantee the QoS parameters. The model is proficient enough in deciding the number of containers required for the provision or deprovisioned as per the given workload situation to meet up the SLA demand and QoS metrics. The proposed model is tested against some existing work and has turned out to be performing better. In future work, the model can be implemented in a real working environment and more SLA parameters can be included for the analysis. Further, clustering technique can be included and the model can have queue classified according to the requirements of the tasks like some tasks may require more processing units while some require more storage unit.