An Analytical Model for Availability Evaluation of Cloud Service Provisioning System

Cloud computing is a major technological trend that continues to evolve and flourish. With the advent of the cloud, high availability assurance of cloud service has become a critical issue for cloud service providers and customers. Several studies have considered the problem of cloud service availability modeling and analysis. However, the complexity of the cloud service provisioning system and the deep dependency stack of its layered architecture make it challenging to evaluate the availability of cloud services. In this paper, we propose a novel analytical model of cloud service provisioning systems availability. Further, we provide a detailed methodology for evaluating cloud service availability using series/parallel configurations and operational measures. The results of a case study using simulated cloud computing infrastructure illustrates the usability of the proposed model. Keywords—Cloud computing; availability evaluation; series and parallel configuration; infrastructure as service


INTRODUCTION
Infrastructure as service (IaaS) cloud providers, such as Amazon Web Service and Microsoft Azure, deliver on-demand computational resources from large pools of equipment installed in a cloud service provider's data centers.The requests submitted by the cloud customers are provisioned and released if the cloud has enough available resources.Conversely, customers expect cloud services to be available whenever they need them, just like electricity or telephone connectivity.This expectation requires cloud service providers to regularly assess their infrastructure for probable failures and reduce the amount of time needed to recover from such failures.
Typically, a cloud service provider offers a service level agreement (SLA) stipulating the service provider's performance and quality in several ways.For example, an SLA may include a metric specifying the availability of the cloud service.Before committing an SLA to the cloud customers, the service provider needs to carry out an availability assessment of the infrastructure on which the cloud service is hosted [1], [2].Most of the cloud providers offer approximately 99.99% of availability in their SLA.However, real data shows that the actual value of the availability of these providers is much lower [3], [4].
Hence, to reduce the overall cloud downtime and to provide a reliable estimate of service availability, cloud service providers need to assess the availability characteristics of their data centers in responsible and dependable manner.This assessment can be done through controlled experiments, largescale simulations, and via analytical models [5], [1].In a massive system such as cloud computing, conducting repetitive experiments or simulations is likely to be costly and timeconsuming.Although analytical models can be cost and timeeffective, accurate analytical modeling must deal with a large number of system states, leading to the state space explosion problem [6].
The primary contribution of this study is to propose a novel analytical model for evaluating the availability of cloud service provisioning systems focusing on IaaS.The proposed model is architecture-based; it relies on National Institute of Standards and Technology -Cloud Computing Reference Architecture (NIST-CCRA), the well-known cloud computing reference architecture [7].NIST-CCRA provides an abstraction for cloud service provisioning system that can be used to model the logical interaction of failures within the system.Consequently, availability is evaluated at two levels: the system-level and the component-level.At the system-level, reliability block diagrams (RBDs) are used to model the system's failures by considering series/parallel arrangements of the cloud components/subsystems.At component-level, availability is determined by probabilistic model and operational measures.Failure and repair data are modeled and analyzed using probability distributions and statistical inferences.Then, operational measures are derived and used to estimate component's availability.
A simulation approach is used to develop and verify the proposed analytical model.CloudSim [8] is used to simulate cloud infrastructure and the underlying components, while FTCloudSim [9], an extension of CloudSim, is used to simulate different failure scenarios using the fault injection technique.Also, BlockSim [10] and Weibull++ [11] are used for availability analysis and interpretation of results.
The rest of the paper is structured as: Section 2 presents relevant background information.Section 3 describes the proposed analytical model, and Section 4 presents conclusions and suggested future work.

A. NIST-based Cloud Service Provisioning System
According to NIST-CCRA, there are explicit processes and activities that cloud service providers need to perform to ensure reliable cloud service provisioning.Through service www.ijacsa.thesai.orgorchestration, a cloud service provider operates the underlying cloud service infrastructure that supports its customers.The NIST defines service orchestration as "the composition of system components to support the cloud provider activities in arrangement, coordination, and management of computing resources in order to provide cloud services to cloud consumers" [11].
Service orchestration has three main components, which are arranged in layers: 1) the service layer (SL); 2) the resource abstraction and control layer (RACL); and 3) the physical layer (PL).The horizontal positioning of these layers reflects the relationships between them; upper-layer components depend on adjacent lower-layer components to provide a service.For instance, the RACL provides virtual cloud resources on top of the PL and supports the SL.
Likewise, in the SL, services can be modeled as threelayered components representing three types of services that have been universally accepted: 1) software as a service (SaaS); 2) platform as a service (PaaS); and 2) IaaS.A cloud service provider may define interface points in all three service models or just a subset.For instance, the platform component (i.e., PaaS) can be built upon the infrastructure component (i.e., IaaS) to support the software component (i.e., SaaS) where cloud service interfaces are exposed.
Although NIST-CCRA does not represent the system architecture for a particular cloud system, a specific cloud service provisioning system such as an IaaS provider or an IaaS broker can be modeled using NIST-CCRA [12].Pereira, et al. [13] used NIST-CCRA to design a cloud-based architecture by refining the system's logical architecture.The suggested method involves 1) the selection of the NIST architectural component for which the respective coverage in the system's logical architecture needs to be analyzed; 2) analysis of the system's components into logical architecture including the respective architectural elements (AEs); and 3) the refinement and development of a new logical architecture in the cloud context by mapping the system's AEs to NIST-CCRA AEs.

B. Availability Evaluation in Cloud Computing
Cloud architecture has been studied using various techniques from reliability theory including RBDs, stochastic Petri nets (SPNs), fault trees, and Markov chains [13]- [17].The availability of cloud computing architecture has been modeled in various ways using RBD techniques.In addition, analytical modeling has been used to estimate the availability of cloud system architectures including virtualized simple architecture and virtualized redundant architecture [18].
By considering the virtualization in the cloud, RBDs can also be applied to full virtualization, OS virtualization, and paravirtualization (see Fig. 1).However, the dynamic nature of cloud computing requires the use of more rigorous modeling such as Markov modeling.Thus further analysis of availability in the context of systemlevel virtualization is needed.Therefore, Dantas, et al. [19] used a hierarchical heterogeneous model based on RBD and a Markov reward model to describe non-redundant and redundant Eucalyptus architectures.Consequently, closed-form equations are obtained to compute the availability of those systems according to the rule of composition of series and parallel components.With respects to virtualization, availability model of a nonvirtualized and virtualized system is presented using a hierarchical analytic model in which fault tree is used in the upper-level and homogeneous continuous-time Markov chains are used in the lower-level [20].
In another study, Silva, et al. adopted a hybrid modeling approach to deal with the complexity of the cloud system; RBDs are used for system-level dependability, whereas operational measures, such as mean time to failure (MTTF) or mean time between failure (MTBF), and mean time to repair (MTTR), are obtained for subsystem and component-level dependability.

A. System Representation and Basic Assumptions
Based on NIST-CCRA, let us consider the following two cloud implementation scenarios that can be used by a cloud service provider.In the first scenario, a cloud service provider may implement a high-level service model (i.e., SaaS) by using the interface points defined in the lower layers.For example, SaaS applications can be built on top of PaaS components, and PaaS components can be built on top of IaaS components.A real-world example of this is Google's cloud offerings.They offer a variety of SaaS products (e.g., Gmail, Google Search, Google Maps, Google Apps) using PaaS components (Google App Engine) that are run operationally on Google's cloud IaaS (Google's cloud platform) [7], [21].As per NIST-CCRA, the dependency relationships among SaaS, PaaS, and IaaS components are represented graphically as components stacked on top of each other (see Fig. 2 (a)).A stack is a clearly defined structure that implies a series of interconnected systems that www.ijacsa.thesai.orgtransport data between each other to provide a certain function or service [22].Therefore, similar to modeling large-scale distributed systems [23], [24] and other cloud platforms [19], [25], [26] the cloud service provisioning system is represented as a simple series system using an RBD (see Fig. 2 (b)).In the second scenario, a cloud service provider may choose to provide an SL without the support of the lower-layer interface points.For example, a SaaS application can be implemented and hosted directly on top of cloud resources rather than using an IaaS virtual machine.A real-world example is salesforce.com,which provides both SaaS and PaaS products.The SaaS layer is built using the well-defined interface components from the PaaS.However, in this case, no IaaS layer is offered.SaaS is run directly on the resource abstraction layer with no explicit IaaS components.As per NIST-CCRA, the angling of the components indicates that each of the service components can stand alone and can be implemented directly on top of the cloud RACL and PL [7].Hence, the cloud service provisioning system is represented as a simple series system using an RBD (see Fig. 2 (c)).

B. Cloud Service Provisioning Availability Model
In this model, the availability of the system is specified concerning the availability of the various components.Following a bottom-up approach, the availability at the component-level is determined using operational measures (i.e., MTBF and MTTR).
The logical relationship between individual components is considered to estimate the system-level availability, and it is expressed graphically using RBD.Table 1 set the definition of the notations that have been used in the availability modeling.
Let us consider a cloud service provisioning system denoted by CSP that consists of a set of subsystems * + in which CSP success depends on the success of every subsystem .Given the serial configuration as shown in Fig. 2, the availability of CSP denoted by is written by [27] ∏ Where, is the availability of subsystem Recall that in NIST-CCRA, a cloud service provisioning system consists of three ordered layers, PL, RCAL, and SL.
Likewise, each subsystem consists of a set of components * }, where denotes the component of the subsystem, and denotes the total number of components in subsystem .Let us assume that the success of each subsystem depends on the success of every individual component .Given this serial configuration, the availability of subsystem denoted by is given by ∏ At the component-level, probability distributions are used for modeling operational data such as time to failure (TTF) and time to repair (TTR).Failure data can be used to make statements about the probability model, either in terms of the probability distribution itself or in terms of its parameters or some other characteristics.
Availability is the probability of a system/component being up (i.e., providing the service) at a specific instant of time [24].
It is often expressed using (3), with many different variants [28], [29]. (3) Where, Uptime refer to a capability to perform the task and Downtime refer to not being able to perform the task.However, the classification of availability is somewhat flexible and is largely based on the types of downtimes used in the computation and on the relationship with time.This study focused on inherent availability to determine the componentlevel availability.Inherent availability is the steady-state availability when considering only the corrective maintenance downtime of the system.Usually, this is the type of availability that companies use to report the availability of their products (e.g., computer servers) because they see downtime other than actual repair time as out of their control and too unpredictable.www.ijacsa.thesai.orgInherent availability used some operational measures from reliability theory: MTTF or MTBF and MTTR [18].Now, let us assume that components failure data (i.e., TTF and TTR) is collected and preliminary analysis is performed using descriptive statistics and statistical inferences.Statistical inferences aim to draw inferences from the collected data in a meaningful way concerning some characteristics failure rates, MTTF, MTTR and related quantities.As probabilistic assumptions regarding the failure data play an important role in reliability and availability analysis [30], failure data (or at least assume the means of the sample data) usually assumed to follow well-known distributions (e.g., exponential, Weibull, lognormal).

Let
be a mean time to failure of the component at the subsystem, and , is the random variable that represents the TTF of that component, and ( ) is the probability density function of the component's failure time, then is defined as the expected value of the random variable such that [24], [29] [ ] ∫ ( ) For a repairable component, is used rather than and defined similarly.
On the other hand, MTTR is used to measure the amount of time it takes to get a component running again after a failure [18].Let is the random variable that represents the TTR of the component at the subsystem, and , and ( ) is the probability density function of the component's repair time, then the component's can be defined as [24] [ ] ∫ ( ) Now, let us consider which represent the component at subsystem, and , the availability of component denoted by is given by [18], [31] (6)

C. Modeling Cloud System Availability with Redundant Components
Redundancy in cloud service provision system (e.g., hardware redundancy, software redundancy, and application redundancy) can also be modeled using RBD.The simplest example of redundancy could be achieved by combining two components in a parallel subsystem (i.e., server, storage, and virtual machine).The subsystem only fails if both components fail.
Let us consider components in a cloud subsystem with parallel composition, the subsystem availability can be computed as follows Where, is the availability of individual component within the subsystem .Further, the availability of more complex configuration (e.g., series-parallel configuration) can be obtained by combining the rules defined for series and parallel configuration [25].

D. Numerical example
To demonstrate NIST-based availability modeling and analysis numerically, let us consider the RBD of the IaaS provisioning system (IPS) depicted in Fig. 3.The objective is to obtain the average availability of the system after one year of operation (i.e., 8,760 hours).www.ijacsa.thesai.orgFor application, let us assume that MTBF=329 and MTTR=5, then application availability is determined by substituting the values of the MTBF and MTTR of the application in ( 6), leading to Subsequently, by substituting the values of and , in (8), the availability of IPS is given by Using the RBD model and all the failure and repair characteristics, the IPS is simulated for 30,000 hours of operation (using BlockSim).After running the simulation for 30,000 hours, the relevant metrics are obtained.The point availability after one year of operation (i.e., ) is estimated to be 93.6000%,whereas the mean availability after one year of operation is estimated to be 92.3134%(see Fig. 5), which corresponds to the analytical result obtained for mean system availability (i.e., ).The subsystems mean availabilities are estimated to be 98.47% for application, 79.50% for hardware, and 69.19% for virtualized platform; these results correspond with those obtained using the analytical method (see Fig. 6).
Modeling and analyzing the IPS availability often carries significant value in boosting the efforts to improve availability, performing a trade-off analysis in system design or suggesting the most efficient way to operate and maintain the system.www.ijacsa.thesai.org

IV. TESTING AND VALIDATION
A. Approach Because obtaining real-life failure data is extremely difficult as a result of the sensitive nature of these data, a simulation approach is used to test and validate the proposed model (Fig. 7).First, CloudSim is used to create a computerized duplication of real cloud infrastructure that is suitable for modeling probabilistic systems.Host, virtual machine (VM), and cloudlet are considered to be the infrastructure components that constituted the cloud service provisioning system.For the experimental simulation, one data center is considered with different numbers of hosts using a 16-port fattree data center network and a corresponding number of VMs.Simultaneously, a Linux environment with x86 architecture is used as the operating environment and Xen as the virtual machine manager (VMM).
By investigating the architectural model of the cloud service provisioning system in CloudSim, the availability of IaaS requires an available host, VM, and cloudlet.Thus, the simulated IaaS cloud system can be modeled in a simple series system.Second, based on the previous study conducted by Zhou,et al. [8], multiple host failures are injected into the simulated system.Likewise, by considering some failure scenarios described by Nita, et al. [31], VM and cloudlet failures are introduced, and then failure and repair data are collected for availability analysis.Next, for the purpose of component availability modeling, Weibull++ is used to model component failure and repair data (i.e., TTF and TTR).A goodness-of-fit test is used to determine the corresponding distribution and estimate its parameters.
BlockSim is then used to model the cloud infrastructure availability at the system-level using an RBD.The cloud infrastructure system is modeled as a simple series diagram, which referred to as the base model (see Fig. 8).Failure and repair distributions are fed into BlockSim, and Monte Carlo simulations are done for 300,000-time units.Moreover, three more models representing different scenarios are created for comparison with the base model, and the model was showing the greatest availability is selected.

B. Results and Discussion
The simulations show that failure data at the componentlevel (host, VM, and cloudlet) were successfully fitted to a Weibull distribution.The parameters of each distribution were estimated using regression analysis.For instance, Fig. 9 shows the Weibull probability plot for host failure data.In the probability plot, the shape parameter (beta) is estimated based on the fitted-line slope.The scale parameter (eta) is the time at which a specific percentage of the population has failed.The correlation coefficient (rho) is a measure of how well the linear regression model fits the data.Furthermore, VM repair data were the best fit with the lognormal distribution and a twosided confidence level of 90% (see Fig. 10).In contrast, both the host and cloudlet have constant values for repair that are 10,800 hours and 300 hours, respectively.The failure and repair data distributions with associated parameters were used to feed several simulation models that were built using BlockSim and configured in four different models.
A base model was built in simple series and configured using the data provided by Weibull++.Table 2 shows the base model block configurations detailing the inputs for each block (i.e., host, VM, and cloudlet) on the availability model and the corrective task.
At the simulation end time (300,000-time units), the base model achieved an availability of 0.590767.Hence, to improve the system availability (i.e., to fulfill the customer's requirements), a sensitivity analysis is performed to study the impact of the repair rate and standby configuration (i.e., redundancy technique) on the overall system availability.
In the second model, host repair time was improved in the base model to determine its impact on overall system availability.The results showed that at the simulation end time (300,000-time units), availability is increased to 0.722.
In the third model, the base model was improved by using standby configurations.The base model was rebuilt with a standby container that included three base model systems; one system was active, and two were on standby (see Fig. 11).It is assumed that the switching reliability is 100%.The standby simulation results showed an improvement in overall system availability.
For instance, at the simulation end time (300,000-time units), system availability was 0.977.Also, a fourth model was created by applying a standby configuration to the second model in which the host repair time is improved.The results showed an improvement in overall system availability.At the simulation end time (300,000-time units), availability had increased to 0.996313.Fig. 12 shows the mean availability of the four models.CONCLUSIONS AND FUTURE WORK This paper's primary contribution is a proposed model for evaluating the availability of cloud service provisioning systems.The model relies on NIST-CCRA, the well-known cloud computing reference architecture, to define cloud subsystems/components and the logical relationships among them.Using mature modeling techniques from reliability theory that can provide the operational measures that are so desirable today, we were able to quantify component-level availability.Furthermore, by considering series/parallel arrangements of the cloud system components, RBD was used to model system-level availability.The proposed model has some limitations imposed by the characteristics of RBDs.In future research, a dynamic RBD [30] will be adopted to consider the dynamic behavior of a cloud system.Other cloud scenarios such as cloud federation will also be modeled in future studies.
Hardware availabilityis determined by substituting the values of its constituent component availabilities in(9), leading to

Fig. 6 .
Fig. 6.Bubble plot of IPS subsystems mean availability with respects to mean time to first failure (MTTFF) and uptime.

TABLE . II
. BASE MODEL CONFIGURATIONS Fig. 11.Base model with standbys RBD.Fig.12. Mean availability overlay plot for all models.