Automatic Cloud Resource Scaling Algorithm based on Long Short-Term Memory Recurrent Neural Network

Scalability is an important characteristic of cloud computing. With scalability, cost is minimized by provisioning and releasing resources according to demand. Most of current Infrastructure as a Service (IaaS) providers deliver threshold-based auto-scaling techniques. However, setting up thresholds with right values that minimize cost and achieve Service Level Agreement is not an easy task, especially with variant and sudden workload changes. This paper has proposed dynamic threshold based auto-scaling algorithms that predict required resources using Long Short-Term Memory Recurrent Neural Network and auto-scale virtual resources based on predicted values. The proposed algorithms have been evaluated and compared with some of existing algorithms. Experimental results show that the proposed algorithms outperform other algorithms.


INTRODUCTION
One of the important features provided by cloud computing is Scalability, which is the ability to scale allocated computational resources on-demand [1]. Scalability feature allows users to run their applications in an elastic manner, use only computational resources they need, and pay only for what they use. However, the process of instantiating new virtual machines takes 5-15 minutes [2]. Therefore, predicting future demand might be required to deal with variable demands and being able to scale in advance. In the current literature, many diverse auto-scaling techniques have been proposed to scale computational resources according to predicted workload [3,4,5,6].
However, one of the most famous problems that face current auto-scaling techniques is Slashdot problem; where auto-scaling technique might not be able to scale in case of sudden influx of valid traffic. Slashdot is unpredictable flashcrowd workload. Flash-crowd workload reduces cloud service providers' revenue by violating Service Level Agreement.
Slashdot effects can be reduced by detecting Slashdot situations at earlier stages and performing appropriate scaling actions. However, detecting Slashdot situations at earlier stages is not an easy task. Even if Slashdot is detected, finding suitable scaling action is a very hard task. Recently, several machine-learning techniques (e.g. Support Vector Machine, Neural Networks, and Linear Regression) have been used to predict cloud workload [7,8,9]. However, most of currently used techniques cannot remember events if there are very long and variant time lags between events, as in Slashdot.
To improve memorization of standard feed forward neural network, Jeff Elman has proposed recurrent neural network (RNN), which extends standard feed forward neural network by adding internal memory [10]. RNNs can learn when the gap between relevant events is small (less than 10-step time lags). Unfortunately, conventional RNNs still unable to learn when gap between relevant events grows [1]. In 1997, Hochreiter & Schmidhuber have proposed a special type of RNN, called Long Short-Term Memory network (LSTM), with ability to recognize and learn long-term dependencies (up to 1000-step time lags between relevant events) [1]. This paper tries to answer the question: can we reduce Slashdot effects by using LSTM-RNN? To answer this question, this paper has proposed two auto-scaling algorithms. The first algorithm avoids long and variant time lags between Slashdot situations by using two different LSTM-RNNs. The first LSTM-RNN is employed to deal with normal workload while the second LSTM-RNN is exploited to deal with Slashdot workload. The second algorithm investigates applicability of using one LSTM-RNN to deal with both normal and Slashdot workloads. Performance of the proposed algorithms have been evaluated and compared with some of existing algorithms using CloudSim with real traces. Experimental results show that the first auto-scaling algorithm, which uses two LSTM-RNNs, outperforms other algorithms.
The rest of this paper is structured as follows. Section 2 gives a brief background on Long Short-Term Memory recurrent neural network (LSTM-RNN). Section 3 overviews related work in the area of automatic cloud resources scaling. Section 4 briefly describes the proposed algorithms. Following this, Section 5 evaluates performance of the proposed algorithms using CloudSim simulator with real workloads and (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 7, No. 12, 2016 280 | P a g e www.ijacsa.thesai.org compares their performance with some of existing algorithms. Finally, Section 6 concludes.

II. LSTM-RNN
Feed forward neural network is a set of connected neurons that try to capture and represent underlying relationships in a set of data [10]. One of the major limitations of feed forward neural network is that it does not consider order in time and only remember few moments of training from their recent past. Therefore, feed forward neural network cannot recognize or learn sequential or time-varying patterns [10].
Alternatively, recurrent neural networks (RNN) determine new response by using feedback loops, which combine current inputs with outputs of the previous moment. Feedback loops allow sequential information to persist and allow recurrent networks to perform tasks that cannot be performed by feed forward neural networks [1]. Figure 1 shows simple recurrent neural network design, which was proposed by Elman. New layer (called context layer) has been added to standard feed forward neural network. Context units receive inputs from, and return their results to hidden units. Context units allow RNN to memorize its previous state [10].
Unfortunately, regular RNN still loses its memory very fast. In 1997, Hochreiter & Schmidhuber have proposed a special type of RNN, called Long Short-Term Memory network (LSTM), with ability to recognize and learn longterm dependencies. Long Short-Term Memory blocks have been added to the hidden layers of RNN [11]. As shown in Fig.  2, each memory block contains memory cell to store internal state and contains three different types of gates (input, output and forget gates) to manage cell state and output using activation function (usually sigmoid). The input gate decides what information to store in the memory cell. The output gate decides when to read information from the memory cell. The forget gate decides how long to store information in the memory cell. In 2002, Schmidhuber et al. have enhanced memory block by adding peephole connections from its internal cell to its gates. Peephole connections allow LSTM to learn precise timing between relevant events [1].

III. RELATED WORK
Recently, several auto-scaling techniques have been proposed. In [12], Gandhi et al. have proposed auto-scaling approach, called Dependable Compute Cloud, to scale infrastructure automatically without accessing applicationlevel and without offline application profiling. The proposed approach proactively scales application deployment based on monitoring information from resource-level and based on performance requirements that are specified by users. Multitier cloud application is approximated using product-form queueing-network model. Kalman filtering technique is employed to predict required parameters without accessing user's application. However, the proposed approach has not considered Slashdot and has assumed that incoming requests have Poisson arrivals.  [11] In [13], Moore et al. proposed a hybrid elasticity controller that coordinates between reactive and predictive scalability controllers to enhance cloud applications scalability. Both controllers act concurrently. Cloud applications' administrators configure scaling rules, which are monitored by reactive controller. After some condition has already been met, reactive controller submits scaling requests to centralized decision manager. If the predictive controller is certain of what action to take then it submits scaling requests to centralized decision manager. Otherwise, the predictive controller continues to learn. Decision manager receives, validates, and executes all triggered scaling requests. Although, performance of the proposed elasticity controller has been evaluated using two real traces (ClarkNet web server trace logs and FIFA 1998 World Cup Access logs), none of these traces has Slashdot. Therefore, performance of the proposed elasticity controller has not been evaluated with Slashdot.
Lin et al. [3,6] proposed auto-scaling system, which monitors incoming requests and HTTP response time to recognize cloud applications' performance. Auto-scaling algorithm was proposed based on recognized performance.
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 7, No. 12, 2016 281 | P a g e www.ijacsa.thesai.org Furthermore, Lin et al. proposed an algorithm to analyze the workload trend to reduce the number of peaks in response time caused by the variability of workload. Although, the authors have mentioned that the proposed scaling strategy can respond to variant and sudden workload changes in short time, the proposed strategy has not been evaluated using sudden workload changes and only evaluated using short workload (200 minutes) with predictable seasonality.
Kanagala and Sekaran [14] have proposed Thresholdbased auto-scaling approach, which minimizes violation of service level agreement by considering virtual machine turnaround time and virtual machine stabilization time during adapting thresholds. Thresholds are dynamically specified by using double exponential smoothing. To set upper threshold, double exponential smoothing is used to predict at which time the system will reach max load and specify point before this time to be used as upper threshold. To scale down, double exponential smoothing is used to predict point before reaching the minimum system load and use it as lower threshold. However, weights that are assigned to observations by double exponential smoothing method are decreased exponentially while observations get older. Therefore, double exponential smoothing method does not able to remember Slashdot when there are long time lags.
Mao et al. [4,5,15] proposed auto-scaling mechanism, which considers both user performance requirements and cost concerns. Performance requirements are specified by assigning soft deadline for each job. The proposed autoscaling mechanism allocates/deallocates virtual machines and schedules tasks on virtual machines to finish each job within its deadline with minimum cost. However, instantiating new VMs requires at least 10 minutes. Thus, probability of violating Service Level Agreement is increased.
Nikravesh et al. [16] proposed a proactive auto-scaling system based on Hidden Markov Model. Their experiments shown that scaling decisions that are generated using Hidden Markov Model are more accurate than scaling decisions that are generated using support vector machine, neural networks, and linear regression. In [7,8,9], Bankole and Ajila have applied three machine-learning techniques: Support Vector Machine, Neural Networks, and Linear Regression to proactively scale provisioned cloud resources for multitier web applications. Their results show that Support Vector Machine outperforms other techniques in predicting future resource demands. Although, several auto-scaling techniques have been proposed during the last few years, most of them do not consider Slashdot.

IV. PROPOSED ALGORITHMS
As shown in Algorithm 1, inputs are as following. The first input, , is the history of total required CPU. Total required CPU at time , , is calculated as sum of all required CPU for coming requests at time .
To enhance prediction accuracy of the proposed algorithms, sliding window technique is utilized. Sliding window has been used in many areas to improve prediction accuracy [2]. The input specifies size of sliding window that will be used during prediction.

The input
represents delay of starting up new VM. and are history of previously predicted CPU by using first and second LSTM-RNN respectively. and represent prediction accuracy of first and second LSTM-RNN respectively. Prediction accuracy is calculated as Mean Absolute Percentage Error (MAPE).
The first auto-scaling algorithm uses two different LSTM-RNNs for forecasting future demand. The first LSTM-RNN is trained by normal workload without Slashdot and the second LSTM-RNN is trained with Slashdot workload only. and are continuously updated using predicted and observed CPU. Required CPU after step-ahead is forecasted by using LSTM-RNN with lowest MAPE. Predicted CPU is sent to Scaling Decision Maker algorithm to decide appropriate scaling action. Number of VMs to scale up or down is specified according to the difference between predicted and provisioned resources after step-ahead.
Algorithm 2 shows steps of the second auto-scaling algorithm, which uses only one LSTM-RNN to predict required CPU with normal and Slashdot workloads. : history of total required CPU : sliding window length : VM startup delay : history of predicted CPU using _ : history of predicted CPU using _ : prediction accuracy of _ : prediction accuracy of _ OUTPUTS: Scaling decision Begin 1: = Get sliding window from with length 2: = Predict required CPU after stepahead using _ ) 3: = Predict required CPU after stepahead using _ ) 4: Update and 5: if < 6: Call Scaling Decision Maker using 7: else 8: Call Scaling Decision Maker using 9: endif 10: return scaling decision End www.ijacsa.thesai.org Scaling decision maker algorithm is shown in Algorithm 3. Scaling decision maker algorithm uses three thresholds: upper threshold ThrU, lower threshold ThrL, and ThrbU, which is slightly below the upper threshold ThrU. If required CPU crosses above ThrU, virtual resources are considered over utilized and have to be scaled up. If required CPU crosses above ThrbU and does not cross above ThrU for a prespecified number of times, virtual resources are considered over utilized and virtual resources have to be scaled up. In another hand, if required CPU crosses below ThrL for a prespecified number of times, virtual resources are considered underutilized and some virtual resources have to be released.
Thresholds are initialized by the same values for all applications. However, due to variation nature of workloads, setting the same values for all applications increases the probability of violating service level agreements. Therefore, all thresholds are periodically and automatically adapted using Median Absolute Deviation of required CPU history for each application.
= -= -= wher < < and is median of absolute deviations from median of required CPU. Using , we can adapt the safety of the proposed algorithm. For example, lower values for decrease the cost, but increase the probability of violating service level agreements.

V. PERFORMANCE EVALUATION
Proposed algorithms have been implemented using Cloudsim simulator with deep-learning library called Deeplearning4j [20]. Performances of the proposed algorithms have been compared with two auto-scaling approaches, which are proposed by Kanagala et al. [14], and Hasan et al. [17].
The following subsections describe evaluation environment settings and discuss simulations' results.

A. Evaluation environment settings
The proposed algorithms have been evaluated using CloudSim simulator with real trace called NASA Log [18]. NASA Log contains two month's HTTP requests to the NASA Kennedy Space Center WWW server, which is located in Florida. This log was collected from 00:00:00 July 1, 1995 to 23:59:59 July 31, 1995 and from 00:00:00 August 1, 1995 to 23:59:59 August 31, 1995. Fig. 3 shows number of requests that are generated according to NASA Log from August 1 to August 31.
Slashdot has been added to NASA Log from [19], which contains number of hits for July 26 2000; the day the AUUG/LinuxSA InstallFest story hit Slashdot (Fig. 4 shows number of requests versus time). Fig. 5 shows NASA Log after adding Slashdot.   To implement LSTM-RNN, Deeplearning4j library has been used. Deeplearning4j is an open-source deep-learning library in Java. Deeplearning4j is developed by San Francisco-based business intelligence and enterprise software firm [20]. Fig. 6 and Table 1 show number of running VMs during period from hour 221 to hour 248, which contains the second Slashdot (as shown in Fig. 5). The proposed algorithms increase number of running VMs (between 221 and 230) among other approaches to deal with Slashdot and rapidly decrease number of running VMs (between 230 and 250) to minimize cost.  Table 1 show that number of provisioned VMs by the proposed algorithms is higher than provisioned VMs by the related approaches. These VMs are incorporated to achieve large number of requests in short response time as shown in Fig. 7, Table 2, Fig. 8, and Table 3.

B. Evaluation results
In [17], fixed number of VMs is defined to be allocated or de-allocated during scaling up or down. This fixed number limits scaling speed through Slashdot. In the proposed algorithms, number of VMs is variant and depends on growth or decrease of the workload.
In [14] and [17], if workload goes across the upper threshold for a pre-specified duration, they start to scale up. During this period, Service Level Agreement (SLA) will be violated and some penalty has to be incurred by providers. Moreover, duration of SLA violation will be extended to include startup delay of new VMs, which sometimes takes around 10 minutes. In the proposed algorithms, VMs will be scaled up directly if predicted workload goes across the upper threshold. Therefore, the proposed algorithms act faster to provide enough resources to achieve coming requests.  In [17], they scale down if the trend is down even if the load does not cross the lower threshold, which means that VMs will be shrunken even if we do not need that. Moreover, in [17], it terminates VM after marking it to be terminated after 5 minutes even if it is already finished, which sometimes increases the cost if these few minutes add more hour cost. In addition, it can increase SLA violation if there are running requests need more time. www.ijacsa.thesai.org