A Multiple Linear Regressions Model for Crop Prediction with Adam Optimizer and Neural Network Mlraonn

Due to the increase in population, demand for the food is increasing day by day. Crop prediction is necessary or need of the hour to fill the gap between the demand and the supply. Instead of following a traditional system for crop selection method, a successful crop selection for the given soil properties will help the farmers to get the expected crop yield. The objective of the proposed work is to develop one such system. The proposed system is developed using real data with various soil parameters acquired from soil laboratory located in Chennai. This system uses 16 parameters of soil which includes all the micro, macro nutrients along with that pH, EC, OM values and the recommended crop for the soil parameter. The proposed Mlraonn (Multiple Linear Regression with Adam Optimization in Neural Network) model is developed using Keras software mainly used for Deep Learning. A neural network approach is used to construct a regression model. The model is evaluated with Loss Metrics such as RMSE, MSE, and MAE. The proposed algorithm is compared with the existing standardized machine learning algorithms. It is found that the proposed algorithm gave very minimal error as output in all the above three categories of loss metrics than the standardized algorithm such as Random Forest Regression and Multiple Linear Regression. Keywords—Multiple Linear Regression; Adam Optimization; Neural Network; Keras; Machine learning algorithm; Root Mean Square Error (RMSE); Mean Square Error (MSE); Mean Absolute Error (MAE); presence of Hydrogen (pH); Electrical Conductivity (EC); Organic Matter (OM)


I. INTRODUCTION
First and foremost method in statistics is linear regression; the mathematical equation representation for the same is Y =m x + c; where y is the predicted output; x is the input variable; m is the slope and c is the bias. The above idea can be extended to multiple linear regression where more than one input features which produces single output feature. The mathematical representation of multiple linear regression is; Y = m1*x1+ m2*x2+m3*x3+.........+mn*xn + c. A neural network model can be created by calculating Weights and bias value at each and every node [23]. The layer consists of various nodes; layers are classified in to input; hidden and output layers. Inputs are multiplied with weights of the node to form a summation of the activation function. The activation is a transformation function that may be a linear or non-linear; applied to every input before it gets transferred to the next layer or to the output layer. Different types of activation function available some of those are Sigmoid; RELU; Leaky RELU and Tanh; all activation function has its own purpose [23]. Linear activation function is very simple than non-linear. RELU and Sigmoid is an example for linear and non-linear activation function respectively. Rectified Linear activation (RELU) requires no transformation and model can be easily trained mainly used for multiple linear regression. The performance of the neural network can be optimized with the optimization function one such is gradient descent. In order to adjust the weights; gradient descent algorithm is used; from which the relation between the error and a single weight can be obtained. This optimization step used to arrive at a conclusion that at which point of weight a very low error is generated. Minimizing the error value is the overall aim of developing any model. In the Feed forward step the weights for all the nodes are calculated with the activation function. Whereas in the back propagation step weights of the network is adjusted based generated error. The model can be trained quickly and its performance will be increased with optimization algorithm. There exists many optimization algorithm; some examples are Sgd; Rmsprop; Nestrov; Adagrad; Adadelta; Adam [26] and so on. Adam optimizer is used to update the node weights. This algorithm is a variation of gradient descent algorithm. It uses two momentum first order momentum is a mean value and second order momentum is variance value. Section II of this paper tells about the related works using various machine learning algorithms. Section III explains about data collection and preprocessing works carried with the dataset. Section IV gives the pseudo code for the proposed algorithm. Section V gives the comparison of the results. Section VI gives the conclusion part.

II. RELATED WORKS
As a part of crop management for wheat crop its biomass was estimated using machine learning algorithm such as Random Forest Regression; SVC Regression and ANN. It is that Random Forest produced accurate estimation than other two algorithms. Experiment took place in southern China [12]. Data collected from weather department to predict most www.ijacsa.thesai.org profitable crop; analyzed parameters such as pH; soil & weather data. Author used multiple linear regressions; a machine learning algorithm for prediction. Detecting crop diseases to help farmers has been discussed by the author can be considered as future enhancement [7]. Sensing soil parameters and atmospheric parameters; author used ANN algorithm to predict the crop yield [21]. Rice yield prediction in Maharashtra state data collected from the year of 1998 to 2002. Neural network algorithm is used for prediction; cross validation technique is used to validate the result; accuracy of 97.5% sensitivity and 96.3% spacificity [17]. Crop yield with respect to climate and biophysical change; a huge data were collected; algorithm such as Random Forests and MLR. It is concluded that Random Forest performance seems to be better than MLR [8]. Random forest is used to calculate the accuracy with the climate data [3]. Yield prediction of crops like wheat; maize and potato is done with huge data divided in to training and testing. Algorithm such as Random Forests is compared with Multiple Linear Regression -MLR. Found that RF performed better than MLR. The RMSE value of RF was 6 and14% whereas MLR ranges from 14 to 49% [18]. Four statistical prediction models such as MLR; SGD; RFR and SVM to find soil information in south western Burkina. High spatial resolution satellite data along with soil sample data from laboratory was used. It is found that RFR performed better than other four algorithms. Internal validation is done through cross validation [9]. Crop yield prediction was carried with the data obtained from soil testing lab at Jabalpur; Madhya Pradesh. Naive Bayes and KNN models were generated. Soil were classified in to three categories low; medium and high based on their nutrients values found in the soil. The outcome of this study helped the farmers for choosing a better sowing land. Since the test carried with small dataset the author wishes to have big dataset as future work to get better accuracy [19]. An optimization algorithm such as Gradient Descent with Momentum was used to train neural network pattern classification algorithm to find the soil moisture in an hour advance for irrigation which helped farmers the follow an irrigation pattern. The MSE and RMSE obtained by Gradient Descent with Momentum based neural network pattern classification are 0.039622 and 0.19905 [20]. Three crops such as rice; maize and wheat were considered for study. Machine learning algorithms like Multiple Linear Regression; Random Forest Regression and Multivariate Adaptive Regress-ion Splines (Earth) were used for predict the yield of chosen crops. Multiple Linear Regressions gave good prediction [6]. State wise prediction of rainfall was carried by MLR algorithm [11]. Random Forest Regression was used by the author to predict sugarcane yield [13]. To predict agricultural yield using various algorithms such as linear; non-linear and MLR; experiment was carried at Andhra Pradesh; Telangana state [14]. Crop yield prediction were carried by ANN and MLR; C-ANN and D-ANN algorithm were compared for their performance [15]. Climate change on mustard yield prediction was carried in Haryana state using MLR [16]. Agriculture data is analyzed to find the optimal parameters for maximizing crop production using algorithm Multiple Linear Regressions; PAM; CLARA; DBSCAN [10].

A. Data Collection
Crop prediction with this proposed system developed with only by using soil properties such as micro; macro nutrients Ec; Om & pH values as input or independent features and suggested crop as output or dependent features. The above mentioned soil properties was collected from a soil lab. Dataset consists of nearly 1600 samples. The dataset is analyzed before generating a model. The sample dataset is as shown in " Fig. 1". B. Data Pre-Processing 1) Finding correlation among features: The study of data reveals the nature of data as numerical data. In order to find a relationship between features a correlation map called heat map is generated which is shown in " Fig. 2" shows the correlation value and the generated heatmap [25] for the dataset. Heat map is used to find the correlation between each and every feature in the dataset. The correlation values ranges from -1 to +1; the correlation value of a feature which is near to -.01 to +.01 can be dropped since it denotes the value is equal to zero which mean there is no correlation. In this dataset the features such as N; P; Na; Zn and B were removed for crop prediction; which is shown in the " Fig. 3".
2) Dataset scaling: The dataset is examined to find out the range of the feature it is found that the values differ their exits no uniformity; with this; it is not possible to generate a correct model. A solution is to scale all the values in a predefined range which is nothing but -1 to +1. The above step called scaling and it is implemented with the help of Standard Scalar a pre processing function in sklearn. The code is as follows; from sklearn.preprocessing import StandardScaler scaledX1 = StandardScaler().fit(X) Xsca = scaledX1.transform(X) (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 4, 2020 255 | P a g e www.ijacsa.thesai.org

3) Label encoding:
The target feature such as suggested crops seems to be of string data type; it is wise if it get converted in to numeric. The dataset has 27 different crop names; which comes under multiclass. Label encoding is a method which automatically assigns numerical value when it is called for a particular feature and the data type is converted in to integer array. The code is shown below; from sklearn import preprocessing le1=preprocessing.LabelEncoder() YSca=le1.fit_transform(y)

4) Handling imbalanced dataset:
The value counts of the target variable in the dataset are found to be imbalanced. For example: .
The figure shows the feature value in the decreasing order. Resultant prediction will be incorrect if the above dataset is used for the same. It is necessary to follow a technique which balances the dataset in order to get the correct prediction. SMOTE-Synthetic Minority Over Sampling Technique [24]; in order to increase the sample size; synthetic data need to be generated; Smote uses KNN for oversampling the data [24]. Below line shows the training dataset shape of the independent features x and the dependent feature y. After the implementation of the code the size of the x and y features have changed; showing that the dataset is a balanced one.

IV. PROPOSED ALGORITHM EXPLAINATION
A neural network model with Multiple Linear Regression [2] [5] is generated for training the dataset. The generated model does the functionality of regression using Keras Regressor [22]. Keras is loaded with more built-in libraries through which neural network model can be built efficiently and easily. In order to develop a fully connected network keras layer is imported with Dense [22]. To avoid over fitting this model uses Dropout. Sequentially the layer can be added till the expected result reaches. The input dimension is assigned with the required value. The activation function relu is used here; this is a linear activation function; it denotes that the weights will be taken as it is only for positive output otherwise negative values will be assigned to zero. In output layer is declared with the value 1; which is nothing but the output dimension. Since this is regression problem the model can be evaluated with loss metrics. Since this is a regression www.ijacsa.thesai.org model loss is included in model compile. Two metrics MSE and MAE were calculated for this model. Below code shows the neural network model for crop prediction using keras.
The dataset is split in to training and the validation set. The 1004 samples is considered as training and 495 samples is taken as validation set. The metrics of training loss and the validation loss are calculated in parallel by the model for 200 epochs. Below Table I shows all the loss values for training and the validation set for every 50 epoch.
A part of the training is kept separately as a validation set for checking purpose. Validation set consists of new set of data which is not trained so for. A graph is plotted [27] between the training and validation loss in order to check how the model behaves with an unseen data which is one shown in " Fig. 4". From the figure model behaviour is good since there is a minimum deviation between the two loss lines; as it is understood from the theory if the deviation is high the behaviour of the model is not good towards validation set ; further tuning is required in order to improve the same. shows that the result of the proposed Mlraonn model performed better with less epochs or iterations than the two standard machine learning algorithm. From the graph it is very clear that the model is performing uniformly with both training data and unseen validation data. This can be understood from loss value mentioned for every 50 th epochs up to 200 th epochs. The graph also explains the same. The evaluated loss metrics such as RMSE; MSE and MAE shows very less value for the proposed algorithm. From this it is concluded that Regression model built with neural network suits well for this soil dataset.

VII. LIMITATIONS AND FUTURE ENHANCEMENT
Every developed system has its own limitations; the proposed system also has some limitations which can also be considered as future enhancement; which are as follows; here the crop is predicted only by using the soil parameter. Other parameters like weather condition; wind speed, etc. is not included for the study. As a future enhancement more parameters other than the soil parameters can be included.
From the heatmap; it is found that the correlation of pH is high which is equal to 0.99. A future study for prediction of crop only using pH value can also be tried; using pH sensor. Various optimization algorithms can also be compared with the same dataset to justify the effectiveness and strength of Adam optimizer algorithm.