A Novel Approach for Submission of Tasks to a Data Center in a Virtualized Cloud Computing Environment

The submission of tasks to a data center plays a crucial role to achieve the services like scheduling, processing in a cloud computing environment. Energy consumption of a data center must be considered for task processing as it results in high operational expenditures and bad environmental impact. Unfortunately, none of the current research works focus on energy factor while submitting tasks to a cloud. In this paper a framework is proposed to select a data center with minimum energy consumption. The service provider has to register all the data centers in a registry. The energy consumed by task processing using virtualization and energy of IT equipments like routers, switches is calculated. The data center selection framework finally selects the data center with minimum energy consumption for task processing. The experimental results indicate that the proposed idea results in a less energy when compared to the existing algorithms for selection of data centers. Keywords—Energy consumption; Virtualization; Data Center Selection Framework


I. INTRODUCTION
Cloud computing is growing predominantly because of the pay-as-you go model [1].According to Gartner Report the cloud services expanded in to a $210 billion market by the year 2016 [2].Infrastructure maintenance is done by service providers while the actual development can be concentrated by the companies [3].Data centers consume a lot of energy for storage and processing of data.The energy consumption of cloud infrastructure has been growing extensively and by the year 2020 it is expected to reach 1,963.74terawatt-hours (TWh) [3].The data centers growth rate is around 9 percent every year and hence the energy demands are also doubled in the last five years [4].The heavy energy consumption of data Centers result in growth of operational expenses and also result in environment pollution due to excess of carbon emissions.
The tasks submission can be done in such a way that the data center with minimum energy consumption can be preferred over others for processing.To the best of our knowledge the existing algorithms don't concentrate on energy consumption.The algorithms often used in literature are outlined below: 1) Spot electricity prices: In this procedure the current electricity consumed by the data centers are taken in to account and the tasks submission is done by predicting the future variations in the electricity prices.The data center with minimum electricity consumption is preferred.The drawback of this approach is it considers various non processing elements like lights, air conditioning units .., for selection of data centers.
2) Shortest Distance First: This algorithm forwards the request to the closest data center from the point of origin.The disadvantage is it does not take in to account the energy consumption which is relatively expensive when compared to the network elements like routers, bridges.., required to expedite the process of forwarding requests to the data centers.
3) Round Robin Technique: This algorithm chooses a data center randomly within a region where the request is originated.It might lead to a delay factor and the energy factor is also not considered.
To overcome the shortcomings in the existing approaches a data center selection framework (DCSF) is proposed which will calculate the energy used for processing of tasks.The execution time of the tasks and the energy consumed by the virtual machines are considered for calculation.The tasks are forwarded to the data center with minimum energy consumption.The procedure is simulated and compared with the existing algorithms.The results indicate that the proposed approach consumes less energy when compared to the others.

II. RELATED WORK
Many algorithms have been proposed for data center selection for task processing.Service broker routing policy is proposed in [5], [6] where the cloud analyst tool has been used.A probability based approach has been suggested in [7].The research works in [8], [9] mainly focused on cost effective data center selection which take in to account the equipments needed to setup the data centers.A matrix based approach including the resources needed for data centers is discussed in [10].The work in [11] signifies the expenditure spent on energy consumption for selection of data centers.A general data center selection approach based on factors like location, proximity is implemented in [12].The algorithms in [13], [14] mainly concentrates on round robin and shortest job first approaches to select the data centers for processing.The virtualization technology which improves the task processing www.ijacsa.thesai.orgcapability is not given much importance in all these proposed works.
The importance of virtualization in cloud environment has been highlighted in [15].Effective resource utilization using virtualization is discussed in [16], [17], [18] which focused on optimal allocation of resources to virtual machines.The scheduling of virtual machines for efficient energy consumption is formulated in [19] but it does not take into account the current energy being consumed by virtual machines.All the existing works concentrated on data center selection based on either random allocation or proximity and the energy consumption was given least importance.In the current work a framework is developed which focuses on the virtual machines consuming energy for processing of tasks.

III. PROPOSED WORK
In General the tasks generated by the users are transferred to the web servers situated at different locations which in turn route them to the data centers for processing.The current work forwards the tasks to a data center selection framework (DCSF) which calculates the energy consumed by different data centers and decides the one for which the tasks must be submitted for processing.The general depiction of the scenario is illustrated using Fig. 1.

TABLE I. DATA CENTER REGISTRY CONTENTS
A user base (UB) is a region situated around the globe where a lot of users are submitting the tasks to the service provider.Each data center is registered by the service provider in data center registry (DCR) as illustrated in Table I.A Data Center Id (DCID) is stored as a 16 bit number in which the first 4 bits represents service provider Id and the remaining 12 bits are allotted to data center id which should be a unique number generated by the service provider.The information regarding number of servers, routers, switches..., in a data center has to be updated by the service provider.The total energy consumption is calculated and corresponding value is stored in DCR for further processing.

A. Data Center Selection Framework
The selection of data center is done by the Data Centre Selection Framework (DCSF) which is depicted in Fig. 2. Initially the tasks submitted to DCSF are stored in a Wait Queue.As soon as the tasks are removed from Ready Queue for processing then the tasks are allotted from Wait Queue into Ready Queue.A threshold value (Th) is considered depending upon the traffic received by the DCSF.
If the no. of tasks (λ) in Ready Queue > Th or the Time > 60s whichever condition is met earlier then the tasks are assigned to Task Allocator (TA).Then TA sends a request to Energy calculator Module (EC) where the total energy consumed by the data centers is calculated and the data center with minimum energy consumption is given the tasks for processing.The required information such as data center Id, no. of servers... is obtained from DCR.The server energy module (SEM) calculates the total energy consumed by the servers of data centers while the energy consumed by IT equipment is calculated by IT energy consumption (ITE) module.

Fig. 2. Data Center Selection Framework
Finally the data center with minimum energy consumption is selected for task processing and the tasks are forwarded by TA to the appropriate Data center (ADC) depending upon its DCID.

a) Server Energy Module
In order to process the tasks efficiently and rapidly virtualization technique is employed in the servers of data centers.It also reduces the energy consumption of servers and increases the resource utilization.Multiple virtual machines are created on the same physical host thus increasing the www.ijacsa.thesai.orgprocessing power of a server.Hence the energy consumption of the servers is obtained by the total energy consumed by execution of the tasks on virtual machines.
Let S= {S 1 , S 2 , S 3 … S n } be the set of servers located in a data centre DC i .For each server S k Let VM k = {VM 1k, VM 2k... VM nk } be the set of virtual machines created.A decision variable x ijk is set to 1 if task T i is allotted to virtual machine VM jk for processing.Let EC ijk be the energy consumed by task T i running on VM VM jk and let ET ijk be the execution time.The energy consumption rate of the VM is denoted by ECR ijk and energy consumption EC ijk can be calculated as follows: EC ijk =ECR ijk × ET ijk (1) Where ET ijk = ft ijk st ijk (finish timestart time) Hence using (1) the total energy consumption can be calculated as follows: (4) The total energy consumed (E Tot ) is updated in the DCR and the data center with the minimum E Tot is selected as the appropriate data center (ADC) for tasks submission.The entire procedure can be illustrated in the following algorithm.

B. Data Center Selection Algorithm START
For Each Service Provider SP Register Each Data Centre DC i in Data Centre Registry DCR as in Table 1 End For

A. Analysis using real world trace logs
The proposed algorithm is evaluated by using the real world trace obtained from Google cloud trace logs [20].The log contains information regarding 25 million tasks that were submitted for a period of 29 days.The analysis is done only by considering the tasks submitted to the cloud system in the first 5 hours of the day number 18. Fig. 3 illustrates the count of tasks submitted in every 60 seconds in those 5 hours.The count of the tasks is over 200 thousand.The execution times of the tasks have been calculated by obtaining the start time and finish time from the logs.The sample data for our analysis is depicted in Table II.In order to understand easily the data center Ids are assumed as DC1...DC5, instead of binary format.The units of energy consumption are in KWh.The metrics considered for analysis www.ijacsa.thesai.org of the algorithms are Total Energy Consumption (TOT) and Energy Consumption per Task (ECT).The proposed work Data center selection framework (DCSF) is compared with the Shortest Distance First (SDF) and Round Robin (RR) algorithms and the results are depicted in Fig. 4 and Fig. 5.When compared with the existing algorithms the total energy consumption is far less in the proposed approach.The data center DC4 is assumed to be shortest distance data center when compared to the others.

B. Analysis Using Node Red
The requests are generated as shown in Fig. 3 using Apache JMeter tool.All the threads generated by JMeter are created using algorithms of computer science like graph theory, data structures.., and submitted to three data centers.The results obtained are illustrated in Fig. 7 which shows that the total energy consumed by the proposed framework is less than the energy consumption of the remaining algorithms.While the remaining algorithms like Shortest Distance First (SDF), Round Robin (RR) consume less energy at some instances, on an average the proposed framework DCSF results in low energy consumption.

V. CONCLUSION
In this paper a framework is implemented for selection of data centers based on minimum energy consumption of data centers for task processing.The total energy is calculated by summation of the server energy consumption and IT equipment energy calculation.The server energy is obtained by combining the energy consumed by virtual machines for task processing.The results are obtained by simulation of real

Fig. 1 .
Fig. 1.Tasks Submission Process ) k=1 j=1 i=1 Where |T| Indicates Total no. of tasks submitted to VM K b) IT Energy Consumption Module The no. of switches (NSW) and routers (NR) is obtained from DCR and the energy consumption is obtained by utility meter.The total energy consumed by IT equipments is given by TE 2 by equation (3) TE 2 E SW NRT E RT (3) Where E SW is the avg.energy consumed by a switch and E RT is the avg.energy consumed by a router of the data center.The output obtained by the two modules is combined to get the total energy (E Tot ) of the data center.E Tot = TE 1 + TE 2 Ready Queue RQNull, Size (RQ) Threshold Value (Th), Wait Queue WQNull For Each Task T i submitted by User Base UB i Assign WQT i If RQ is Null Then Assign RQWQ Until no. of tasks (λ) > Size (RQ) End If End For If λ > Size (RQ) or Time > 60s Request Task Allocator TA to find Data Centre (ADC).TA assigns the request to Energy Calculator (EC) which retrieves each T i from R and Perform the following computations: |S| |VM k | |T| TE 1 = ∑ ∑ ∑ x ijk × ECR ijk × ET ijk k=1 j=1 i=1 TE 2 E SW NRT E RT E Tot = TE 1 + TE 2 ADCDCID (min (E Tot )) End If For Each Task T i, TA does Assign ADC  T i for Processing End For END IV.PERFORMANCE ANALYSIS

Fig. 6 .
Fig. 6.Node Red flow to read energy from a ServerThe data centers are implemented using three Dell Power Edge C5220 Micro servers represented as DC1, DC2, and DC3.The data center capabilities such as virtualization, storage management are implemented using Windows Server 2012 R2 data center Operating System which is installed in the data centers.The Raspberry Pi chips are incorporated in each of the servers in the data centers so as to read the energy consumption.

TABLE II .
DATA FOR SIMULATION