Auto-Scaling Approach for Cloud based Mobile Learning Applications

In the last decade, mobile learning applications have attracted a significant amount of attention. Huge investments have been made to develop educational applications that can be implemented on mobile devices. However, mobile learning applications have some limitations, such as storage space and battery life. Cloud computing provides a new idea to solve some limitations of mobile learning applications. However, there are other limitations, like scalability, that must be solved before mobile cloud learning can become completely operational. There are two main problems with scalability. The first occurs when the application server’s performance declines due to an increase in the number of requests, which affects usability. The second is that a decrease in the number of requests makes most application servers idle and therefore wastes money. These two problems can be avoided or minimized by provisioning autoscaling techniques that permit the acquisition and release of resources dynamically to accommodate demand. In this paper, we propose an intelligent neuro-fuzzy reinforcement learning approach to solve the scalability problem in mobile cloud learning applications, and evaluate the proposed approach against some of the existing approaches via MATLAB. The large state space and long training time required to find the optimal policy are the main problems of reinforcement learning. We use fuzzy Q-learning to solve the large state space problem by grouping similar variables in the same state; there is then no need to use large look-up tables. The use of parallel learning agents reduces the training time needed to determine optimal policies. The experimental results prove that the proposed approach is able to increase learning speed and reduce the training time needed to determine optimal policies. Keywords—Auto-scaling; reinforcement learning; fuzzy Qlearning


I. INTRODUCTION
Cloud computing is a computing business paradigm where services such as servers, storage, and applications are delivered to end users through the internet.There are three categories of cloud computing [1] [2] [3] [4]: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).IaaS includes storage, servers, and networking components.Amazon EC2 [5] is a suite that is built on an IaaS service model.PaaS provides the platforms (e.g.operating systems) needed to develop and run applications, such as the Google App Engine [6].Software as a Service (SaaS) offers access to web-based software and its functions, including services such as Salesforce.com[7].There are three deployment methods for cloud computing [3] [8]: private, public, and hybrid.Private clouds are provisioned for use by a single organization while public clouds are provisioned for open use.Hybrid is a combination of both private and public clouds.
Over the past decade, many universities, schools and other educational institutions have moved their e-Learning applications to mobile learning applications.Mobile learning applications [9] are the most important e-Learning model, using handheld devices such as smart phones and tablets.Mobile learning applications have many limitations, however, such as storage space, battery life, and potential data loss.To solve some of these limitations, mobile cloud learning (MCL) applications have been proposed.
MCL integrates the advantages of mobile learning and cloud computing.The main advantages of MCL are solving the data storage limitation in mobile learning by storing data in the cloud rather than in the device, increasing the ease of sharing knowledge, easing accessibility as access is through a browser rather than a mobile operating system, and low costs for set-up and maintenance.
There are some limitations, like scalability, that must be solved before MCL can become completely operational.Scalability refers to resource allocation that can be acquired or released depending on demand.Cloud scalability has two dimensions: • Horizontal cloud scalability (scaling out): adding more servers that perform the same work, and • Vertical cloud scalability (scaling up): increasing capacity by adding more resources, such as adding processing power to a server to make it faster.
Most cloud providers use horizontal scalability because vertical scaling requires rebooting.Auto scaling automatically scales up or down the capacity; this allows the system to maintain performance while also saving money.The auto scaling system needs two elements: a monitor and the scaling unit.There are different performance metrics for scaling purposes, such as CPU utilization, the size of the request queue, and memory usage.
There are two approaches for automatically matching computing requirements with computing resources: schedule-based and rule-based.In schedule-based scaling, the scale adjusts by days and times, so it cannot respond to unexpected changes.Rule-based scaling is dependent upon creating two rules to determine when to scale, such as reinforcement learning (RL).
The premise of RL is learning through trial-and-error from the learner's performance and feedback from the environment.
It captures the performance model of a target application and its policy without any a priori knowledge [10] [11] [12] [13] [14].There are four fundamental components in RL: agent, state, action, and reward.The agent is the decision-maker that learns from experience.A state s can be defined as w, u, or p, where w is the total number of user requests observed in a given time period, u is the number of virtual machines (VMs) allocated to the application, and p is the performance in terms of the average response time to requests.The action is what the agent can do (e.g.add or remove application resources).Each action is associated with a reward.The objective is for the agent to choose actions so as to maximize the expected reward over a given period of time.
Neuro-fuzzy systems [15] [16] [17] is field of artificial intelligence based on neural networks and fuzzy logic, in which truth values may range from 0 to 1.
The rest of the paper is organized as follows.Section 2 provides an overview of related work and we provide an explanation of our proposed approach in Section 3. Experimental results and their analysis are presented in Section 4. Finally, we conclude the paper and discuss future work in Section 5.

II. RELATED WORK
S. Chen et al. [1] proposed a model for an MCL system consisting of four layers: infrastructure, platform, business application, and service access.The infrastructure layer includes system resources (i.e.CPU, network, and storage) which are represented by a virtual resource that provides scalable and flexible services.The platform layer provides software development, application services, database services, data storage, and recovery services.The business application layer supports different application software modules, such as a learning module which could provide self-learning for students and allow teachers to review students' results.Such a teaching module would allow teachers to manage courses, while a communication module would provide a communication method for teachers and students, such as SMS or a blog.A system administration module would provide system management and access control.The service access layer would then work as an interface for students and teachers.
In mobile cloud computing (MCC), data processing and storage are performed outside the mobile device and inside the cloud, offering many applications.In [18], Arun and Prabu discuss some of these applications, including vehicle monitoring, mobile learning, biometry, and digital forensic analysis.
Veerabhadram and Conradie [19] proposed an architecture for MCC, consisting of three main parts: the mobile client, middleware, and cloud services.The mobile client is the mean by which the user can access the system (e.g. a smartphone) and the middleware pushes service updates to mobile clients.The main goal of the architecture is to provide a proxy for mobile clients to connect to cloud services.The authors used a questionnaire to gather the views of educators and students on mobile learning.The results indicate that MCC will be an important technology for education in the near future.Accordingly, a model for mobile cloud learning systems and their applications has been proposed in [20].The structure of this model also has three layers: user, system, and application.The user layer authenticates users, the system layer contains system resources (CPU, network, and storage), and the application layer contains learning system processes and a test.[2] proposed a new model for high performance computing using a high performance computing cluster infrastructure.Cisco's WebEx mobile cloud applications have been used to test remote learning in both fixed and mobile environments and for a variety of educational scenarios; WebEx Whiteboard as a tool for teachers in remote learning environments and Telemedicine to share and highlight medical images.The test relied on the Quality Of Experience (QOE), which measures users' satisfaction.The QOE was evaluated via questionnaire to the participants after the completion of the remote learning course.The result implied that remote learning in a mobile environment is easier than in a fixed environment.P. Hazarika et al. [21] classified the MCC challenges into three categories: technical, security, and miscellaneous.The goal of MCC is to have seamless user interaction reach its full potential.However, this presents some critical technical challenges like data latency, service unavailability, and heterogeneous wireless networks interfaces (WCDMA, GPRS, WiMAX, WLAN).Security challenges are classified into three categories: cloud services, communication channels, and mobile applications.Network accessibility and cloud compliance are examples of miscellaneous challenges.To illustrate, using MCC without network access is useless.Likewise, compliance problems like regulation may affect the MCC user; due to the nature of the cloud, data may span different regions, with each region having different regulations for the stored data.[22] presented different methods of access modes for mobile learning based on cloud computing.The first method is mobile learning based on SMS.In this method, the user sends a message from a mobile device through the internet to the teaching server.The teaching server analyzes and processes the data, then sends the requested data back to the user's mobile phone.The second method is mobile learning based on webpages.In this method, the user accesses the internet and visits the mobile website that contains learning resources, including text, images, sound, animation, video, and other media forms.The third method is mobile learning based on a micro-blog.This method is similar to a blog but each message is restricted to only 140 words; the user can send ideas in the form of messages to mobile phone users and a personalized website group.The final method is multimedia interactive learning based on a Wireless Application Protocol (WAP) browser.A WAP browser is a web browser for mobile devices.WAP browsing is similar to computer browser applications but improves content performance.

Chao and Yue
A proposed algorithm for parallel learning agents was presented in [23].The authors aimed to accelerate the exploration procedure and reduce the training time to determine optimal policies by using parallel learning agents (swarm behaviors).They proposed a neuro-fuzzy system with an actorcritic method, a kind of RL methodology.The actor is used to select an action and the critic is used to evaluate the action chosen.The proposed algorithm focuses on two stages for each individual agent.First, it classifies the input state via fuzzy net.Then, the actor-critic method is applied.Each agent is independent from one other and the adaptive swarm behavior is acquired only as a reward from the environment.Simulation results from this algorithm show that the swarm behavior is a quicker exploration procedure than individual learning.This algorithm does not balance exploration and exploitation because it uses a fixed value for the learning rate.
In [24], a solution was proposed to solve the problem of managing the balance between exploration and exploitation that was present in [23].The authors proposed an adaptive learning rate, which uses larger learning rates for less visited states and smaller learning rates for more visited states.The authors showed how the adaptive learning rate affected a neuro-fuzzy system with SARSA learning; simulation results from this algorithm showed the effectiveness of the adaptive learning rate.
In [25], an algorithm was proposed to balance exploration and exploitation in a multi-agent environment, using the ξgreedy method.Random action (exploration) is selected by the ξ parameter and is updated in each time step.Three fuzzy control parameters are used to update ξ: the weighted difference between maximum and minimum move values in the current state, the difference value of the current rate, and the previous state and exploration rate.One of the drawbacks of this method is the long time it requires for the learning process.
The authors in [26] compared two classic RL algorithms, fuzzy SARSA learning (on-policy) and fuzzy Q-learning (offpolicy).SARSA compares the current state with the actual next state.Q-learning compares the current state with the best possible next states.
In [27], an algorithm was proposed to combine a fuzzy logic controller and fuzzy Q-learning to increase performance and minimize costs.It is assumed that there is no prior knowledge of policies and the fuzzy rules are automatically updated to learn optimal policies during the runtime to improve its performance.This algorithm is good for dynamic workloads because of its capabilities for self-adapting and self-learning.M. Sharafi et al. [28] combine an RL algorithm (SARSA learning) with fuzzified actions.They test their proposed method by simulation using MATLAB and show that this algorithm is efficient for a dynamic workload.
In [29], Kao-Shing Hwang and Wei-Cheng Jiang proposed shaped-Q learning for multi-agent systems.In the architecture, each agent maintains a cooperative tendency table.The action with the maximal shaped Q-value in this state will be selected.This method can make agents complete the task together more efficiently and speed up the learning process.

III. THE PROPOSED APPROACH
Our proposed method combines fuzzy Q-learning [30] with a proposed parallel agents technique in order to solve the two main problems of RL: large state space and long training time.
The main components of the architecture are fuzzy Qlearning and the proposed parallel agents technique.Fuzzy Qlearning is used to solve the large state space problem, in which a similar group of variables belongs to the same state rather than using large look up tables.Parallel agents are used to reduce the training time needed to determine optimal policies.Fig. 1.Fuzzy Q-Learning Architecture [27] The distinct components of the architecture are elaborated below.

A. Fuzzy Q-learning
The architecture of each individual agent consists of two parts -the fuzzy logic controller and fuzzy Q-learning, as shown in Fig. 1.The fuzzy logic controller takes the observed data and generates scaling actions through fuzzy rules (rules are generated by fuzzy Q-learning).The inputs to the fuzzy logic controller are workload (w) and response time (RT ).The output is (sa) in terms of adding or removing of the number of virtual machines (V M s).
The first step for the fuzzy logic controller is partitioning the input to many fuzzy sets by membership functions µy(x), the degree of membership of an input signal x to the fuzzy set y. Membership function is a curve that defines how each input is mapped to a membership value between 0 and 1.In this thesis, we use triangular and trapezoidal membership functions.The fuzzy sets of w are divided into linguistic values Low, Medium and High.The fuzzy sets of RT are divided into linguistic values Bad, Okay and Good.The output is an integer constant from the interval {−2, −1, 0, +1, +2}.
The next step is defining fuzzy if-then rules for the form if X is A, then Y is B, where A and B are linguistic values defined by the fuzzy set.For example, if workload is high and response time is bad, then add VMs.
The three steps that the fuzzy logic controller performs are: 1) Fuzzification of the inputs: the first step is partitioning the state space of each input variable into various fuzzy sets through membership functions.The fuzzification process is a transfer from crisp value to linguistic value by membership functions.2) Fuzzy reasoning: this step performs the operation in the rule and founds the scaling action.3) Defuzzification of the output: the process of transferring the linguistic value to a crisp value.To calculate the output action, use equation 1. N is the number of rules, µ i (x) is the degree of truth of the rule, i for the input signal, and x and a i is the consequent function for the same rule.The fuzzy logic controller starts working with the rules provided by users.There are limitations for the fuzzy logic controller because it uses fixed fuzzy rules.The rules are defined by the user and may not be the optimal policies.To solve this problem, fuzzy Q-learning is needed.Fuzzy Q-learning can start working with no prior knowledge base and obtains knowledge at runtime through the knowledge evolution mechanism.It learns the policies and tries to choose the action that returns a good reward.The objective of the agent is to maximize the received reward, as described in equation 2: It does not always choose the action with a high reward because a different action may lead to better rewards in the future.Therefore, there is a trade-off between exploitation and exploration; exploitation utilizes known information to maximize rewards while exploration discovers more information about the environment.Fuzzy Q-learning continuously updates the rules.
The algorithm for the fuzzy logic controller is summarized in Algorithm 1. First, Q-table values (q[i, j]) are initialized to 0 as shown in Table I.Then, an action is selected for each fired rule.The control action is calculated by the fuzzy controller, as described in equation 1.After that, the Q function is approximated from the current Q-values and the firing level of the rules.Q(s, a) denotes this Q function and it is defined in RL to determine the benefit of taking action a in state s.Then, once the action is taken, the system goes to the next state s(t + 1).The reward r(t + 1) is observed and the value for the new state is computed.Finally, error signal and Q-values are calculated and updated respectively.The space complexity is O(N * J),where N is the number of states and J is the number of actions.For example, if the number of states is 9 and the number of actions is 5, then the space complexity is O(9 * 5) which equals 45 q-values, as clarified in Table I.
The reward function is defined based on SLO violations criteria.To illustrate, the action is appropriate if the response time is less than or equal to SLO, and the reward takes the value 1.The action is not effective and the reward is 0 if the response time is greater than SLO and less than the previous response time.In the other cases, the action is not appropriate and the reward takes a negative value.
Algorithm 1 Fuzzy Q-learning algorithm 1: Initialize q-values in the look-up table to 0: q[i, j] = 0, 1<i<N , 1<j<J , N is the number of states and J is the number of actions.2: Select an action for each activated rule ( -greedy policy): • • • , J} with probability 3: Calculate the control action by the fuzzy logic controller: a = N i=1 µ i (x) × a i 4: Approximate the Q function from the current q-values and the degree of truth of the rules: ) is the value of the Q function for the current state s(t) in iteration t and the action a 5: Take action a and leave the system to evolve to the next state, s(t + 1).6: Observe the reward signal, r(t + 1), and compute the value for the new state denoted by V (s(t + 1)): Calculate the error signal: where γ is a discount factor 8: Update q-values: where η is a learning rate 9: Repeat the process starting from step 2 for the new state until it converges.

B. Parallel Agent
In this section we propose a new approach of Parallel Reinforcement Learning with State Space Partitioning.We divide the state space into multiple partitions, and PRL agents are assigned to explore each specific region, with the goal of increasing the exploration and improving the learning speed.There are two types of agents in our PRL implementation-one global agent and many local agents as shown in Fig. 2 Both are based on fuzzy Q-learning and each agent independently maintains a fuzzy Q-learning.Fuzzy Q-learning for local and global agent value estimates are initialized to 0. At each time step, the knowledge learned by all local agents is synchronized with the global agent.
Each local agent selects actions using the -greedy strategy, where a random action is chosen with probability , or the action with the best expected reward is chosen with the

C. Combine Fuzzy Q-learning with the Proposed Parallel Agent Technique
The architecture for combining fuzzy Q-learning and parallel agents is shown in Fig. 3. Users send requests using a mobile learning application.The state space is divided into multiple partitions and each state partition directs the incoming requests to its local agent.The agent (local or global) schedules the requests that arrive from the users.These requests are distributed evenly based on a certain load balancing method, such as least connection or round robin.Also, each agent is responsible for auto-scaling and monitoring its region.The local agent receives all the incoming requests and forwards them to one of the servers in the pool.At each time step, the knowledge learned by all local agents is synchronized with the global agent.
The procedure of combining fuzzy Q-learning and the proposed parallel agent technique is described in Algorithm 2.

IV. EVALUATION
In this section, we illustrate the dataset and the experimental setup of the proposed technique.Also, we present and discuss the experimental results of the proposed parallel agent with the state space partitioning technique.

A. Dataset
We have evaluated the performance of our proposed technique by using a dataset from ClarkNet, a full-access internet provider for the Baltimore-Washington DC metropolitan area, that contains two week's worth of all HTTP requests to the ClarkNet WWW server.

Algorithm 2
The proposed algorithm for combining fuzzy Qlearning and parallel agents 1: Divide the state space into multiple partitions.
2: Assign each partition to a local agent.
3: Initialize q-values of local and global agents in the look-up tables to 0. 4: Send each state to its local agent depending on the statepartition.5: All agents work in parallel and follow steps 6 through 13: 6: Select an action for each activated rule ( -greedy policy): a i = argmax k q[i, k] with probability 1-, a i = random{ak, k = 1, 2, • • • , J} with probability .7: Calculate the control action by the fuzzy logic controller: a = N i=1 µ i (x) × a i .8: Approximate the Q function from the current q-values and the degree of truth of the rules: ) is the value of the Q function for the current state s(t) in iteration t and action a. 9: Take action a and leave the system to evolve to the next state, s(t + 1).10: Observe the reward signal, r(t+1), and compute the value for the new state denoted by V (s(t + 1)): ). 11: Calculate the error signal: where γ is a discount factor.12: Update q-values: where η is the learning rate.

B. Experiment Setup
Experiments were conducted to evaluate whether the proposed parallel agents with the state space partitioning technique reduces the training time needed to determine optimal policies.The fixed learning rate in the experiments were set to a constant value η = 0.1 and the adaptive learning rate minimum and maximum were set to 0.001 and 0.3 respectively.The discount factor was set to γ = 0.8.The minimum and maximum number of VM instances were set to 1 and infinity respectively.The trade-off between exploitation and exploration to determine more information about the environment was set with an Epsilon value of 0.1.Table II shows the parameters that have been used in the experiments.
In our approach the inputs are: workload w and response time RT and output is scaling action sa in terms of incre- The execution time (PT) would be computed as: where, CP U SP EED is CPU speed in Hz, CPI is the average cycle per instruction (request).The analysis of the average queuing time is complicated and depends on several factors.It can be estimated by modeling the environment as M/M/N queuing system (M = distribution of the interarrival times (negative exponential distribution), N = number of servers (VMs)) However, for this environment, we might assume the queuing time (QT) is inversely proportional to the number of active VMs: where, V M is the number of active VMs (initially = 1), C V M is the Coefficient of proportionality of the queuing time and the number of active VMs.RT range within the interval [0−100], and the fuzzy sets of response time are Good     We then initialized global agent values to 0 as shown in Table VIII.

C. Experimental Results
The initial design-time surface is not shown as it is a constant plane at point zero.Fig. 4, 6, and 8 show the temporal evolution of the control surface of the fuzzy controller for agents #1, #2, and #3 respectively; the surface evolves until the learning converges.The second surface is presented in Fig. 5, 7, and 9, where the learning has converged for agents #1, #2, and #3, respectively.
1) Global Agent: The initial design-time surface is not shown as it is a constant plane at point zero.Fig. 10 shows the temporal evolution of the control surface of the fuzzy controller; the surface evolves until the learning converges.The second surface is presented in Fig. 11, where the learning has converged.
Table IX demonstrates that parallel agents can reduce the training time needed to determine optimal policies, as compared to some of the existing approaches.

Fig. 2 .
Fig. 2. The proposed parallel agent with state partition technique.

Fig. 3 .
Fig. 3.The proposed parallel agent with state partition technique.

Fig. 8 .
Fig. 8. Agent #3 temporal evolution of the control surface 13:The knowledge learned by all local agents is synchronized with the global agent.14: Repeat the process starting from step 4 for the new state until it converges.

TABLE IV
ment or decrement in the number of virtual machines V M s.Workload represents all HTTP requests to the ClarkNet WWW server.Workload w Range is [0 − 100] and the fuzzy sets of workload are Low [0 − 20], Medium [10 − 60], and High [40 − 100].The response time for a workload is computed as: Table III, and five actions, as shown in Table IV.First, we divide the state space into 3 partitions -local agents #1, #2, and #3.

TABLE V .
INITIALIZED LOCAL AGENT #1 Q-TABLE VALUES TO 0