Analysis and Prediction of COVID-19 by using Recurrent LSTM Neural Network Model in Machine Learning

—As we all know that corona virus is announced as pandemic in the world by WHO. It is spreaded all over the world with few days of time. To control this spreading, every citizen maintains social distance and self preventive measures are the best strategies. As of now many researchers and scientists are continuing their research in finding out the exact vaccine. The machine learning model finds that the corona virus disease behaves in exponential manner. To abolish the consequence of this pandemic, an efficient step should be taken to analyze this disease. In this paper, a recurrent neural network model is chosen to predict the number of active cases in a particular state. To do this prediction of active cases, we need database. The database of COVID-19 is downloaded from KAGGLE website and is analyzed by applying recurrent LSTM neural network with univariant features to predict for the number of active cases of patients suffering from corona virus. The downloaded database is divided into training and testing the chosen neural network model. The model is trained with the training data set and tested with testing dataset to predict the number of active cases in a particular state here we have concentrated on Andhra Pradesh state.


I. INTRODUCTION
One of the journals Nature reported that these viruses are derived from unrecognized group called as corona virus and is identified by electron microscope. The full name of COVID-2019 is the Coronavirus disease of 2019, which has created panic in the whole world today. The total population of Andhra Pradesh in the year 2020 is 91,717,240 the people affected with the disease i.e., the confirmed cases are 4, 45,139, the active cases of novel corona virus are 1, 01,210 as of 2 nd September, 2020 which has been taken from publicly available database [1]. All most all countries announced their states to lockdown in order to stop the travel of their citizens unnecessarily. Somehow the spreading of virus is controlled due to announcement of lockdown otherwise the spread of the disease is anonymous. Even though the economy of many countries was drastically dropped, the government announced lockdown. If anyone is found to be infected, she/he will be under quarantine for 14 days and treatment is given for recovery. Base on the condition, it may cause death and many people gone to depression level. In India, the outburst of virus is disturbed the whole functioning life. At the starting stage, the cases are increased by transmission through local i.e., from person to person and later it is continued as the same [1]. The ways to detect corona virus by using rapid test kit, a portable device also detects virus in mucus membrane using a chip and a scanner and by taking a swab sample from patient"s mouth or nose.
Till now there is no correct vaccine and anti-viral treatments are available and many medical organizations are trying hard to find out vaccine for COVID-19 [2]. It is in our hands to save our lives from corona virus by providing personal protective equipments, masks and sanitization and maintaining social distance [3]. If we consider the present situation of COVID-19, the qualitative information is more prominent when compare with quantitative information. A best suited mathematical model is not able to predict the whole disease but, it may study the model to derive the nature of the disease. So, an appropriate machine learning/deep learning models are best suited to predict and study the nature and behavior of the whole disease shortly [4] [5].
Artificial neural networks are very similar to our biological learning system that is interconnected with many several neurons in brain. ANN systems are provoked to confine this type of parallel computation based on distribution representations. To generate single valued output from real valued inputs, ANNs should be set up with a densely interconnected simple set of units. Here, interconnection is simply expressed as the means of processing the elements in neural network which are interconnected to one another [6]. So, the provision of all the elements and structure of connections are important in artificial neural network. Normally, we have three layers in ANN system. One is input layer where the inputs are feeding to the network and output layer, which generates outputs based on the inputs that we have provided to the system. The last and important layer is hidden layer where it acts as an interface between the input and the output layers. If we keep on increasing the hidden layers, the power required to process and computational speed can be increased and the entire system become complex. www.ijacsa.thesai.org Another class of ANN is recurrent neural network (RNN) [7].The connections are formed between nodes and by a directed graph all along a series. This forms a dynamic behavior for a time series. These networks have feedback and form a closed loop. RNN also uses memory to process the series of inputs that we provided to the network. RNN have single layered recurrent network and multi layered recurrent network.   1 is a network that represents a single layer network which provides a feedback connection in which each element of the node is given feedback to its own element or other element or can be to both. Here, in Fig. 2, the multilayered network is shown; the output of the element is directed to the other element of the same layer and to the previous layer which forms a multi RNN. Both the elements perform the same operation and the output depends on the previous calculations so here no need to have inputs at each step. The computations of the series are captured in the hidden layer.
II. RELATED WORK Lin Jia et al. analyzed three different types of mathematical models namely Bertalanffy, Gompertz and Logistic models. They applied these three models for different regions and found different parameters. With these parameters, they found Logistic model gives the outer performance among all the three models [7]. Narinder Singh Punn developed mathematical models like SVR, DNN, RNN and PR to find the RMSE and concluded that PR model gives the less RMSE value when compare with other models [8]. Sarbjit Singh et al. developed a hybrid model which involves the decomposition of dataset into series of components by applying discrete wavelet function and then these components are applied to a suitable model named as ARIMA model for prediction of death cases for next one month across five countries [9]. LinhaoZhong et al. proposed a mathematical model for early prediction of number of infected cases by using SIR model with minimum parameters like recovered rate and infectious rate. Since the number of cases are exponentially increasing manner, quarantine measures need to be followed strictly and must pay attention towards the medical service [10]. UtkucanSahin et al. presented a model to forecast the number of confirmed cases in UK, USA and Italy. The authors studied a nonlinear grey Bernoulli model, grey model and fractional nonlinear grey Bernoulli model to predict confirmed cases. In their study fractional nonlinear grey Bernoulli model offers the best performance of providing lowest MAPE, RMSE and R 2 values [11]. Lixiang Li et al. proposed suitable model to compare the official data and model predicted data and found the error is very small [12]. NaliniChintalapudi et al. presented a model using R statistics to forecast the registered and recovered cases [13]. Debanjan Parbat et. al. utilized SVR model for prediction of total number of recovered, death, confirmed cases and found the accuracy of the modelalong with MSE and RMSE [14].SalihDjilaliet.al.presented a mathematical model to predict the spread of the disease transmission [15]. Patricia Melinet. al. proposed a neural network multi ensemble model with fuzzy response for the corona virus time series data to get valid and accurate predicted values [16]. Zlatan Car, Sandi BaressiŠegota et al proposed a neural netwok model of multilayer perceptron to find R 2 values for recovered, confirmed and death cases [17].

A. Dataset Description
In this analysis, the COVID-19 data was downloaded from KAGGLE site. There are different source to get the data for analysis which includes: (1) john Hopkins (https://corona virus.jhu.edu/);(2) KAGGLE (https://www. kaggle.com/ sudalairajkumar/covid19-in-india); (3) CDC (https://www.cdc. gov/library/researchguides/2019novelcoronavirus/researcharti cles.html); (4) data hub (https://datahub.io/core/covid-19); (5) Tableau (https://www.tableau.com/covid-19-coronavirus-dataresources) and soon. With these websites any researcher can down load the datasets which is his/her interest and can do analysis. The data we have taken from 30 January, 2020 to 2 September 2020 which consists of confirmed, death, cured/migrated cases of all over India. But in this research article, we concentrated on only the state Andhra Pradesh. The dataset of daily reported cases are summarized in a table in the form of XLSX or CSV format with the parameters like confirmed, deaths, cured/migrated cases. These datasets are taken in this paper for analysis and prediction of active cases especially in Andhra Pradesh.

B. Methodology
The corona virus in India spreads due to local transmission from one person to other person easily at the earliest stage. www.ijacsa.thesai.org The expert person has to diagnose at the earliest stage and can control the spread of the disease. With the objective of forecasting the possibility of transmission among citizens, we developed a recurrent neural network model. This system model utilizes long short term memory (LSTM) cell [18]. To develop this, machine learning and deep learning library packages like pandas, numpy, matplot, seaborn, sklearn and math are imported into jupyter notebook to analyze confirmed, death cases and also to predict active cases of Andhra Pradesh (AP).

IV. IMPLEMENTATION OF ALGORITHM
A conventional neural network is having just a bunch of parallel layers which consists of nodes called as neurons. These neurons are interconnected to each other and forms layers from which data transmits from one layer to the next layer. This first layer is named as input layer, last layer is called as output layer and between layers are named as hidden layers [19]. A set of neural networks known as recurrent networks which deals with the time series data. These networks have memory to process the previous data that is transferred through the network. But the RNN experiences from short term memory. While computing gradients during back propagation, when the gradients become very small and they will not add up large amount of learning. Therefore RNN stops learning since they get very small gradients. Hence these RNN when it is seen in longer series networks won"t learn much and thus have very short memory. To eliminate this short memory, LSTM should incorporate a mechanism inside the network known as gates and controls the stream of data that pass through the nodes and thus eliminates the short memory. The implementation of corona virus forecasting is based on the long short-term memory (LSTM) networks by taking one feature into account at a time [20] [21].   8 represents an LSTM model which is a type of RNN and especially used to predict the time series patterns as well as classification problems. Since our dataset is time series, we have chosen this model for prediction. This model may take last 8 day"s features and forecasts the figures for the 8 th day. When the target value reaches, it stops and exists; it takes into account its predictions. The LSTM model shown in Fig. 9 merges the forget gate and input gate into one gate. It also joins the cell state and hidden states. The number of nodes or neurons is chosen in trial and error method to give best results. The most common method is k-fold cross validation. The formula is expressed below to find number of nodes: (1) Where Ni is the input neurons and No is the output neurons.
The outputs of the respective gates are given below: (2)

1) Procedure of Neural Network1:
a) Import required library packages. b) Read data which is saved in .csv format. c) Filter the data by choosing required state here Andhra Pradesh has been chosen. d) Line plot of confirmed cases of AP. e) Line plot of death cases. f) Divide the train, test data and also define number of epochs, batch size, number nodes, activation function and optimizer.
g) Train the network model by using fit function and plot the graph of training and predicted values.
h) Find out the performance metric like RMSE, MSE, MAE, SSE.
The loss of one layer in LSTM model is very high. To get best output, we need to add one more layer by adding nn2.add instruction.

2) Procedure of Neural Network Layer2
a) Add the model as sequential. b) Add one more layer and one dense layer and repeat the compilation of nn2 to find the performance metrics.
c) Here also all the metrics shows higher values. d) To reduce the metrics values, we have to convert the time series to stationary since the input we applied is dynamic time series and is not repeating.
3 The neural network model is trained with appropriate dataset to predict the active case in AP and computed the performance metrics.

B. Graphical Representation of Active Cases in Andhra
Pradesh Fig. 10 represents the data and predictions of active cases in AP. Fig. 11 plots the data of predictions of active cases that was taken from second neural network model. Fig. 12 provides information of data after taking difference of active cases.            All the performance metric values are tabulated in Table I. We tabulated the actual values and predicted values of active cases in Andhra Pradesh in Table II for the next 25 days.

VI. CONCLUSION
Corona virus pandemic occurred in all over the world. By applying LSTM neural network model, we predicted the growth of active cases of Andhra Pradesh. If we observe the plot of actual series and predicted series, it is still showing exponential behavior. Every citizen need to follow preventive measures to avoid and controlling the spread of virus. This analysis shows the predicted values and performance metrics like MAE, MSE, RMSE and SSE values. Up on observation among all the mentioned metrics the third layer offers minimum error values with the LSTM model of 3 layers with 10 nodes. If we increase computing of difference between the active cases, we get the errors to minimum value.

VII. DATA AVAILABILITY
The data used to support the findings of this study are available from the corresponding author upon request.

VIII. FUNDING
This research has been no funds received for this research work. www.ijacsa.thesai.org IX. CONSENT Informed consent was obtained from all individual participants included in the study.

X. CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest. Authors" Contributions All authors equally contributed to the study conception and design and implementation of the research, analysis and interpretation of results, and manuscript preparation.