New Learning Approach for Unsupervised Neural Networks Model with Application to Agriculture Field

An accurate and lower cost hybrid machine learning algorithm based on a combination of Kohonen-Self Organizing Map (SOM) and Gram-Schmidt (GSHM) algorithm was proposed, to enhance the crop yield prediction and to increase the agricultural production. The combination of GSHM and SOM allows to withdraw the most informative components about our data, by overcoming correlation issues between input data prior to the training process. The improved hybrid algorithm was trained firstly on data that have a correlation problem, and it was compared with another hybrid model based on SOM and Principal Component Analysis (PCA), secondly, it was trained using selected soil parameters related to the atmosphere (e.g. pH, nitrogen, phosphate, potassium, depth, temperature, and rainfall). A comparative study with the standard SOM was conducted. The improved Kohonen-Self Organizing Map when applied to correlated data, demonstrated better results in terms of classification accuracy (8/8), and rapidity = 0.015s compared to a classification accuracy (7/8) and a rapidity = 97,828 s using SOM combined with PCA. Moreover, the proposed algorithm resulted in better results for crop prediction in terms of maximum iteration number of 675, mean error ≤ 0.00022, and rapidity = 18.422s versus an iteration number of 729, mean error ≤ 0.000916 and rapidity= 23.707s with the standard SOM. The proposed algorithm allowed us to overcome correlation issues, and to improve the classification, learning process, and rapidity, with the potential to apply for predicting crop yield in the agricultural field. Keywords—Kohonen-self organizing map; gram-schmidt algorithm; principal component analysis; agriculture field; crop yield prediction


I. INTRODUCTION
Preserving balanced agriculture is of paramount importance as it is considered the main source of nutrition for maintaining human life. Thus, the consumption of fruits and vegetables increases on a daily basis with the increase of population [1]. Recently, climate changes hamper the accurate prediction of the crop. However, the environmental changes influence the agricultural production leading to the minimization or a decrease of crop yields which is highly dependent on climate and rainfall conditions this decrease lead also to migration problem from developing countries because the agriculture field represent the source life of the most population [2]. For that, many researchers in agriculture fields started to develop machine learning (ML) algorithms to improve agricultural production. Such as; bias-corrected random forest (BRF), multi-layer perceptron neural network (MLP), and support vector machines (SVM), were proposed to improve the estimation of agricultural drought in South-Eastern Australia [3]. Developed an estimation model of crop yield based on stepwise linear regression (SLR) and vegetation indices in order to predict crop yields [4]. Used a modified Self Organizing Map SOM based on Learning Vector Quantization (LVQ) for weather and crop prediction [5]. Another, study applied the boosted tree regression and artificial neural networks to forecast upland rice yield under climate change in a region (Sahel) with vulnerable weather and with very little capacity to adapt [6]. Earlier research [4,5,7] that targeted crop prediction using machine learning tools don't account for correlated data and dependence problems, in the preprocessing step which reduces classification rate and leads to an increased cost. Pushpa Mohan et al. [5], developed a coupling of SOM and LVQ to enhance the accuracy of prediction rate, in the case of using the correlated data, a decrease of accuracy and prediction rate will be noticed. The same limitations issues were observed in the cited literature. Therefore, developing an efficient new hybrid model for solving this issue and increase the accuracy of the prediction rate. The reduction of agricultural yield is a major common problem in Africa [8], especially in Morocco. For this purpose, in this work, we decide to create an intelligent system based on a new hybrid of neural networks and the Gram-Schmidt algorithm to combater this phenomenon and also to improve the classification, prediction accuracy and rapidity of our intelligent system. Hence, this intelligent system capable of predicting the crop yield for specific soil and atmosphere based selected parameters such as (pH, nitrogen, phosphate, potassium, depth, temperature, and rainfall), which might help to overcome the ecological changes, and to improve the agricultural production. Each vegetable or fruit can be characterized by certain soil parameters to ensure good production. Various models or architectures of artificial neural networks can be found in literature this includes, Kohonen-Self-Organizing Map, Conventional neural network (CNN), and Multilayer perceptron (MLP), etc. Each model or architecture can give very good results effective in a type of task where it was applied. The model proposed in this work, is the SOM this paradigm uses the unsupervised learning algorithm and it is one of the most popular artificial neural networks models, it applied in many industries from which [9, *Corresponding Author www.ijacsa.thesai.org 10,11,12,13], this model is intelligent to learn how to classify quantitative multi-parameter data using an unsupervised learning algorithm which allowed us to make a good prediction system. The goal of combining the SOM and GSHM algorithms is to improve the standard SOM. The GSHM algorithm is one of the fundamental procedures in linear algebra, the purpose of using it is to filter or to withdraw the most informative components for the sake more precise, eliminate inter-data correlation from the pattern, identify each object, and to obtain the new matrix contain novel factorizations or components. The improved SOM was trained by this new matrix, for making this training we used the correlated data from [14] and the crop prediction data collected and preprocessed from [7], the objective of using the correlated data is to test the robustness of our improved SOM in classification with the data that have a correlation problem between inputs. The empirical results showed that our new hybrid model or improved SOM proposed has a significantly better than standard SOM and hybrid model based on SOM and Principal component analysis (PCA) using the data that have a correlation problem because the PCA method can also resolve the problem of correlation and dependence between inputs, and for the crop prediction data, the outcomes showed that the improved SOM gave the best performance in terms of maximum iteration number, rapidity, and with high accuracy of mean error versus standard SOM. This paper is organized as follows. In Section II, some techniques and methods intelligence applied in the agriculture field to solve the various issues about this domain are portrayed in the Literature review section. All methods and algorithms used in this work are presented and described in Section III. In Section IV, the experimental results of standard SOM and new hybrid or improved SOM. In Section V, we make the discussion about our work in terms of results and limitations. This followed by a section of the conclusion, we conclude the results obtained and give our perspectives for future research.

II. LITERATURE REVIEW
In this section, we present the most current works in the agriculture field with intelligent methods by researchers in crop and weather prediction, cited in our introduction. In this study, some brief important works found in the literature are presented. The results showed that the proposed methodology of biascorrected random forest gave a far better performance than SVM and MLP for SPEI prediction. Pushpa Mohan, Kiran Kumari Patil, [5] have made a combination between Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ), this combination named Weighted-Self Organizing Map (W-SOM). In this work, the purpose of using the Weighted-Self Organizing Map is to predict the weather and crop for the Mysore region and to increase the prediction accuracy for rice production. In their experimental results, they found that the proposed approach of W-SOM improved the accuracy term in weather and crop prediction up to 0.5% than standard Self-Organizing Map, Ensemble Neural Network (ENN), and Kernel-Nearest Neighbors (KNN).
Zhang, L., Traore, S., Ge, J., Li, Y., Wang, S., Zhu, G. … Fipps, G, [6] employed tree regression and artificial neural networks to modeled upland rice yield under climate change within the Sahel, for artificial neural networks, they used multilayer perceptron, probabilistic neural network, generalized feedforward, and linear regression, so the probabilistic neural network followed by boosted tree regression gives the best performance of the calibrated rice yield models. Snehal S. Dahikar, Sandeep V. Rode, Pramod Deshmukh, [7] used artificial neural networks and in particular the paradigm of feed-forward backpropagation network to combat the climate change because it has a direct effect on crop production. In this paper, in order to make an intelligent prediction system, they used soil parameters and also the parameters related to the atmosphere (e.g. pH, nitrogen, phosphate, potassium, depth, temperature, and rainfall), for predicting a decent crop.
Han, J.-C., Huang, Y., Li, Z., Zhao, C., Cheng, G., & Huang, P, [9] proposed the Self-Organizing Map in groundwater level prediction (GWL), this prediction can contribute to maintaining reliable water supply in the various field (e.g. agriculture, domestic and especially in arid and semi-arid regions). The goal of this work was to apply the SOM methodology to determine spatially homogeneous clusters of GWL piezometers in order to make a good prediction. In this literature, the proposed modeling system has the capacity to take decision-making in order to inform the use of groundwater resources control, particularly in arid regions.

A. Data Pre-Processing / Source
This work uses data from the literature [7] since our region from loukkos of Morocco does not have for this moment a data of crop-specific. Especially, this data are collected from Shri Shivaji Agriculture College, Amravati of Vidarbha region, India. This region is agricultural, and it is rich in a forest, among the important crop yields of this region are, cotton, oranges, and soybeans. Each crop has specific parameters of soil and weather to give a good agricultural production (e.g. ph, nitrogen, phosphate, potassium, depth, temperature, and rainfall) (see parameters in Table I). Another data contained the correlation problem from [14] is used in this work for testing the capability of our improved model in www.ijacsa.thesai.org classification. Hence, preprocessing it utilized in order to analyze the collected data.

B. Kohonen-Self Organinzing Map
The proposed methodology in this work named as standard Self-Organizing Map or Kohonen network is based on the artificial neural network. This method consisting of two layers, an input layer, and an output layer. An input layer or any pattern to be classified is represented by a multidimensional vector qualified as input vectors, and each pattern has an input neuron. An output layer is also called a competition layer where neurons compete. In SOM, the values from input neurons are passed on to all neurons in the competition layer, at the same time to approximate values passed by input neurons, and can imagine the output layer as a grid, (usually one or two-dimensional). In each node in the grid there is a "neuron" and each neuron is bound to a weight vector of the same dimension as input data vectors responsible for an area in the data space (again called input space).
In a Kohonen-Self-Organizing Map, weight vectors provide a discrete representation of the input space. They are positioned in such a way that they maintain the topological shape of the input space. By keeping neighborhood relations in the grid, they allow easy indexing (via coordinates in the grid). It can be seen from ( Fig. 1) that each entry neuron has a link to all competition neurons.
The most famous learning types found in the artificial neural network are, unsupervised learning and supervised learning. SOM is a type of unsupervised learning. This type of learning is based on competitive and/or cooperative [15]. The algorithms based on unsupervised learning, are represented in particular to clustering algorithms, these latter used as an important data mining technique to group information set into a group of a similar object [16]. For that in this scenario, we propose it in order to improve it in terms of classification accuracy, learning process results, and make a good intelligent prediction system. Before starting the learning procedures of SOM, the input data is stored as a matrix of (N-rows X M-columns) (see Fig. 2).  Where, the rows are the objects and the columns are the components.
The inputs have to be average not too far from zero and variance not too far from 1. The input values should also not have too much weight. We can make some monotonous nonlinear transformations that reduce the large values and minimize the number of training epochs of the neural networks, which leads to a better predictor and speed up learning. For that, in the first step, we add the algebraic formula of normalization [18] Where, is the input objects and N is the number of variables inside the vector x.
Input vectors are connected with all nodes or neurons presented in the grid or map by weight vectors, the outputs of SOM methodology named the winning node. There are numerous functions for finding this winning node ( ), (e.g. Euclidean distance, Product scalar, and Manhattan distance). In this work, we used the Euclidean distance function. The distance between the inputs vector and the weight vector for each node by applying the Euclidean distance is calculated in Where, j is the winner unit and =1, 2,….,N, n represent the number of node, w is the synaptic weight vector of the winner unit j and x is the input vector.
Hence, for updating the weights of winning node and neighborhood region, we have to apply the formula presented in (3) [19], Where, index i stands for the winning node and index j stands for other nodes presented in the grid, (n+1) and (n) are the synaptic weight vectors after and before updating in n and n+1 iteration, the learning rate = is the initial learning rate and neighborhood function , (n)= exp( ( ) ( )).This neighborhood function is centered at the wining node i. www.ijacsa.thesai.org

C. Improved Kohonen-Self Organizing Map based on Gram-Schmidt Algorithm
To achieve the improvement of Kohonen-Self Organizing Map, we make the hybridization of two well-known algorithms of Gram-Schmidt and standard SOM. Thus, the development steps of the new hybrid model are described in the flowchart (Fig. 3).
Firstly, our data are passed through the GSHM processing block before it is processed with the SOM, in order to eliminate the correlation, dependence between inputs or objects, and to withdraw the most informative data. These capabilities of this new implementation will affect classification accuracy in the data that have the problem of correlation from the pattern and will affect on iteration number, mean error, and rapidity, for crop prediction data.
The Gram-Schmidt algorithm is an orthogonal space projection algorithm for finding orthogonal and orthogonal families in a Hilbert space, and it most widely used representative of a broad class of orthogonalization techniques and strategies we refer to [20][21][22]. In this work, we used a modified GSHM algorithm to identify each input object by , factorizations or components.
To illustrate the GSHM algorithm, we consider the matrix (M) which includes the learning data as follows (see Fig. 4).
Each row of the original matrix (M) (Fig. 4) corresponds to a point in an n-dimensional space (the following Fig. 5-for n = 3). In this figure, points (1, 2, and m) in space correspond to the 1st, 2nd, and mth lines of the matrix (see Fig. 5).   The procedure for calculating additional components begins with the choice of the point furthest from the initial coordinates (Fig. 5), that is, the row (S) of the matrix where the square sum of its elements is maximum. We draw a vector from the selected point (1) to the initial coordinates, and perpendicular to it passing through the origin of the coordinates. We draw a plan whose equation is as follows: The distances of an arbitrary point in a given space relative to the drawn plan are equal to the absolute value of (d) see the formula below: As |d| it is the distances between the bases points (1) and the initial coordinates, it can be demonstrated that the component ( ) , is the relative distance between each point and the drawn plan see formula below: After calculating the component ( ), we will apply the previous formula only once to calculate the lines of the matrix that make up the projected points of the original matrix, the Gram-Schmidt orthogonalization algorithm agrees with the constructed plan.
The second component ( ) that will be calculated is the distance on the plan between the points projected on the plan www.ijacsa.thesai.org and the point of the initial coordinates. The formula for calculating the component ( ) is depicted in (8).
Hence, in order to obtain the most informative components ( and ), we have to develop all the previous steps above. For more details, following the steps presented in Fig. 6.
So, the new matrix obtained by GSHM algorithm is portrayed in Fig. 7.

D. Principal Component Analysis
The principal component analysis (PCA) is a well-known and useful technique among the data analysis methods; it often applied in many applications [23,24,25]. The ultimate goal of PCA is to reduce and extract relevant information from original data, these proprieties allow resolving the correlation problem between inputs in data. For that PCA represent a robust tool especially in data mining and machine learning [26] in order to make the best analyze and taking the decision based upon the data.

A. Experimental Results with Standard Kohonen-Self
Organizing Map 1) Simulation using correlated data In this section, we make our experimental test by using the data that have a correlation and regularities problem between the pattern and also the crop prediction data. Firstly, the correlated data are collected from [14], in this work they developed an improved SOM based on PCA for solving the problem of correlation. Their data consist of 8 inputs objects or vectors contained (similar objects, objects have regularities between input and normal objects). So, we going, to training our standard SOM from these data. The results of the classification SOM between us and the last cited paper, are depicted in Fig. 8. The figure above (Fig. 8) shows the results of the classification between our standard SOM and their standard SOM. The experimental results show that the standard Kohonen-Self Organizing Map can't perform a better classification of the data, it gave just five winning nodes in map for both of us, so our wining nodes are colored by a different color. We can conclude, that if we use data that have a problem of correlation in prediction, we can't get the best predictor system due to bad classification, this is among the goals for which we proposed our new hybrid or improved SOM for solving the classification problem in this type of data and to make the good prediction and to improve its accuracy.

2) Simulation using crop prediction data (non-correlated inputs).
Secondly, we make another test using the crop prediction data, these data consist of 9 crops (cotton, sugarcane, jowar, bajra, soybeans, corn, rice, wheat, groundnut), and each crop has specific parameters to give a good prediction (e.g. pH, nitrogen, phosphate, potassium, depth, temperature, and rainfall). The essential parameters of the crop are presented in Table I.
In order to train our standard Kohonen-Self Organizing Map, we based on the training parameters (Table I). Now, we give the dimension for creating the map of SOM (see Fig. 9).  In the figure (Fig. 9) we gave (8X8) for map dimension, in next step we will insert the iteration rate (first stop condition), learning error (second stop condition), learning rate, and input data file (S_matrice) for initializing and starting the learning step of SOM (see Fig. 10).
After finishing the learning, the proposed methodology of the standard Kohonen-Self Organizing Map gave nine winning nodes in the map (each input vector has a winning node).The outputs of the standard SOM or winning nodes are presented in Fig. 11.
The learning is completed when the two conditions mean error and the maximum iteration number is realized. Then, the winning neurons or nodes are colored with different colors. Thus, the coordinates of these winning nodes in the map are depicted in Fig. 12.
We noticed that the standard SOM classified correctly our data, which allowed us to make a good prediction system in this situation. So, the learning process results of the standard SOM are presented in Table II. In the table II, we can conclude that the standard SOM is finished it the learning by maximum iteration number = 729, mean error = 0.000916, learning number = 0.72900.    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 5, 2020 366 | P a g e www.ijacsa.thesai.org Hence, our application is ready to predict the suitable crop associated with various parameters of soil and parameters related to the atmosphere (e.g. pH, nitrogen, phosphate, potassium, depth, temperature, and rainfall), now we will insert these seven parameters for predicting the decent crop. The results of the prediction are presented in Fig. 13. The predicted crop based on these parameters pH=7, N=100, K=50, P=50, Depth=15, Temperature=22, Rainfall=90 is rice and the winning node of this input is colored by red color in the map. Additionally, our intelligent system gave a suitable starting month and cultivation time for this predicted crop.
In this situation, the crop data used in this test by SOM gave the best results in classification and prediction but with a big max iteration number and max error. Therefore, the objective of our improved SOM is to improve the standard SOM by minimizing iteration number, mean error, and also the rapidity.

B. Experimental Results with Improved Kohonen-Self
Organizing Map 1) Simulation using correlated data obtained by PCA and GSHM Now, we use correlated data obtained by the Gram-Schmidt algorithm and PCA for training the improved SOM based on PCA and GSHM, in this test we use the correlated date transformed by PCA from [14], in order to make the comparison in term of classification accuracy. The results of classification between SOM-PCA and GSHM-PCA as shown in Fig. 14. In Fig. 14, our new hybrid of GSHM-SOM and their proposed model of SOM-PCA gave the best performance in classification accuracy, they can classify correctly the data and resolve the correlation problem by eliminating the dependence between inputs, but our new improved SOM based on the GSHM algorithm gave a high accuracy of classification (8/8) than SOM-PCA (7/8). Moreover, the rapidity of the Gram-Schmidt algorithm is 0.015s compared to Principal component analysis is 97,828 s, because the PCA is an iterative algorithm, for that it takes more time for giving the results against the GSHM is a linear algorithm that allows us to obtain the results in fast time.

2) Simulation using crop prediction data obtained by GSHM
The Gram-Schmidt algorithm is powerful than principal component analysis in classification and rapidity. For that, we proposed it in this work. Now, we will test the crop prediction data obtained by the GSHM algorithm for analyzing the experimental results. lthough these data don't contain the correlation issues between inputs, our improved SOM will have an influence on the maximum iteration number, mean error, and rapidity. The data obtained by the GSHM algorithm are presented in Table III. After GSHM we got the most informative components about original data of 9 crops (Table III). Thus, we going to training our improved Kohonen-Self Organizing Map, through the new data that we got by the Gram-Schmidt algorithm. Hence, for starting the learning procedure of SOM, we used the same conditions of the first test (i.e. iteration rate, learning error, learning rate, and map dimension number) and input file data (M_matrice) (see Fig. 15).
As a result, the improved SOM gave also nine winning nodes in the map (each input vector has a winning node), so the wining nodes of improved SOM are depicted in Fig. 16.
The winning nodes of improved SOM are colored by different colors. The coordinates of these winning nodes in the map are shown in Fig. 17.
As a conclusion, we noticed that the improved SOM classified correctly our data transformed by the Gram-Schmidt algorithm and gave a good classification. The learning process results of the improved SOM are displayed in Table IV. For that, we can conclude that the coupling of the Gram-Schmidt algorithm and standard SOM has been applied successfully, in terms of maximum iteration number, mean error, rapidity, and precision of classification. These capabilities allow us to make a good prediction system in a fast time.
Our improved SOM is ready to predict the suitable crop by using various soil parameters and parameters related to the atmosphere obtained by the GSHM algorithm, now we will insert these parameters for predicting the decent or suitable crop. The operation for predicting the crop yield is shown in Fig. 18.
The predicted crop based on these parameters pH=0.841, N= 34.294, K= 6.944, P= 2.840, Depth= 6.582, Temperature = 2.462. , Rainfall= 6.547 is rice and the winning node of this input object is colored by red color. In addition, our intelligent system gave a suitable starting month and cultivation time for this predicted crop (Fig. 18).

C. A Comparison between Standard SOM and Improved SOM
In this section, the maximum iteration number and mean error of standard SOM are compared with the maximum iteration number and mean error of improved SOM.
The comparison between standard SOM and improved based on crop prediction data in terms of maximum iteration number and mean error for every two columns from the original matrix and new matrix is displayed in Fig. 19 and Fig. 20. Fig. 19 determines the maximum iteration numbers graph comparison through our original matrix of standard SOM and new Matrix obtained by the Gram-Schmidt algorithm of the improved SOM in all different two columns. The result shows, that we got an increasing and decreasing in iteration number between columns 2, 4, 6, 7 for original matrix and new matrix , but generally, the new matrix of 7 columns gave the best maximum iteration number by 675 versus 729 of maximum iteration number in original matrix of 7 columns. In conclusion, we can say that our proposed new hybrid is not only based on iteration numbers for columns but is based to solve the correlation problem in the multitude of learning like the linear dependence pattern components. Knowing that our new hybrid, allowed us to develop a linear algorithm in contrary to the algorithm published in [27].

V. DISCUSSION
Artificial neural networks offer very powerful and effective methods, capable of performing complicated tasks, besides, it has the properties not found in the classical methods (e.g. learning ability, adaptation, classification, and generalization of results). Thus, these capabilities allow us to solve various issues and tasks requiring intelligence. For that we proposed the new approach of ANN to improve it, in order to be more powerful to apply it in numerous problems facing several fields, especially in the prediction system. This work proposed a new hybrid model of Kohonen-Self Organizing Map and Gram-Schmidt algorithm in order to improve it at the level of classification accuracy, maximum iteration number, mean error, and rapidity, this study showed that the improved model gave the high accuracy of classification when we used the correlated data and also it minimizes the number of iteration, mean error with high accuracy, and speed up our intelligent system when we use the crop prediction data compared to standard SOM. These powerful of improved SOM can solve the most of problems that we may encounter in the prediction field, which allow us to make a good predictor system for predicting the suitable crop by using various parameters of soil and parameters related to the atmosphere. We can say that the GSHM algorithm represents a powerful tool for machine learning, especially for SOM.
The results of this study couldn't be directly compared to previous literature, as we use different metrics to evaluate the quality of the developed model and very limited studies focused on correlations issues. However, some of the results such as classification and rapidity could be indirectly compared to a previous study by our group [14]. The author targeted correlation problems in the input data using SOM combined with PCA. The proposed model presented here demonstrated better results in terms of classification and computational cost. www.ijacsa.thesai.org However, they are some limitations that need to be acknowledged, this includes the problem of collecting real data in morocco related to crop prediction. In addition, it is difficult to find big data with correlated inputs.

VI. CONCLUSION
This paper illustrated a prediction of the crop by various parameters of soil for the Vidarbha region, this prediction system is realized by our new proposed methodology of Kohonen-Self Organizing Map. In this study, the improved SOM based on the Gram-Schmidt algorithm was implemented to avoid problems related to the correlation issues and speed up the learning process results with high accuracy. The improved SOM gave the best performance than standard SOM in terms of classification, maximum iteration number, mean error, rapidity and it was efficient in prediction. In future work, for further improving the results obtained by improved SOM and also we will use the data of our region loukkos, Morocco; this region is rich in fruits and vegetables (e.g red fruits, potatoes, tomatoes, etc.). And to solve several issues that face farmers.