Analysis of the Artificial Neural Network Approach in the Extreme Learning Machine Method for Mining Sales Forecasting Development

—Forecasting is an accurate indicator to support management decisions. This study aimed to mining sales forecasting on Indonesia’s consumer goods companies with business warehouses engaged in the dynamic movement of large data using the Artificial Neural Network method. The sales forecasting used traditional method by inputting data and improvising simple patterns by collecting historical sales and remaining stock. Furthermore, several data variables in business warehouses were employed for sales forecasting. The study also used qualitative method to investigate the quality of data that cannot be measured quantitatively. The results showed with Mean Square Error score of 0.02716 in forecasting sales. The average accuracy generated by the Extreme Learning Machine after nine data tests is 111%. The result shows an opportunity for the company to further analyze the sales profit growth potential. The predicted value generated by Extreme Learning Machine for the last three months reaches 132%. The company's improved decision-making enlarge potential production line demonstrates the usefulness of this study.


I. INTRODUCTION
Sales forecasting is always at the forefront of decisionmaking and planning [1] and also for business [2]. Forecasting gives an organization the right plan to deal with future demands but does not guarantee the success of a strategy. However, its failure results of wrong decisions in marketing activities caused by the allocation of other resources which constructed on imprecise and uncertain assumptions, hence resulting to wrong decisions [3]. Sales forecasting is an important prerequisite in many aspects of sales chain management. Therefore, further optimization efforts are needed to create sales forecasts that support decision-making by the organization.
This study aimed to explore the potential of scientific forecasting and prove the optimization of existing theories. The forecasting concept uses the Artificial Neural Network (ANN) method, representing the human brain's performance [4]. The human brain always experiences a period of learning in the interconnected neurons. Information received by neurons is sent from one neuron layer to another [5].
This study also explained the Extreme Learning Machine (ELM) which is widely used in batch, sequential, and incremental learning. The method is used because of its efficient convergence and learning speed, good generalization ability, and ease of implementation [6]. Furthermore, the study built a suitable model to estimate the data generation process underlying the series. It also estimated the desired number of future observations through this model [6].
Several data variables in the business warehouse were adopted for sales forecasting using the ANN ELM method. Various studies have revealed that the ELM method has an advantage in learning speed which described in the previous section. It helps determine the effectiveness of forecasting, as well as deficiencies and necessary improvements.

II. RELATED WORK
Prianda and Widodo (2021) used ELM to project foreign tourist visits and obtained a forecast MAPE value of 7.62% [7]. Similarly, Sharma et al. [8] employed an intelligence model to predict sales using sentimental product analysis. The study made the model more productive in obtaining accurate results for each item in the product life cycle. Meanwhile, [9] applied ELM integrated with LSTM to Bitcoin price forecasting. Their study matches the different machine learning algorithms to corresponding multiscale components and constructs the ensemble prediction models based on machine learning and multiscale analysis. The results showed that the ensemble models can achieve a prediction accuracy of 95.12% with enhanced performance than the benchmark models [9]. Moreover, Cholid and Aly [10] forecasted spiral and leaf spring products for fourwheeled vehicles using the Artificial Neural Network Backpropagation method. The study employed a learning rate weight value of 0.1 of four hidden layers with error of 0.01. Several studies use various methods to predict product sales [11]- [13], compare forecasting techniques for financial prediction [14], supply chain [15]- [17], and manufacturing processes [18].

III. METHOD
This study aimed to predict sales using the ELM method, which applies qualitative approaches. It also intended to investigate, discover and explain the peculiarities of the data influence that cannot be measured or described quantitatively. Fig. 1 presents the proposed methods used in this study.
The proposed method in Fig. 1 consists of five phases. Firstly, problem formulation was formed through the question regarding implementation of forecasting with other fast and accurate methods. After finding the problem formulation, a literature study was performed by collecting sources from previous journals and articles and quarrying up information from company forecasting results.
Furthermore, the ELM method accomplished with the first stage is initializing the input weight and bias to prevent layer activation outputs from detonating or fading gradients during the normalization. Second stages are use of range between 0-1 for normalizing to ensure quality of data during training and testing phase. Training and testing stage then taking into account respectively to train model, discover and learn patterns. Testing data use to evaluate the functioning and improvement of algorithms training and optimize it for better results. The last stage of ELM is denomarlizing previous prediction results to compare the output in order to evaluate the model. The final step was to exam the Mean Square Error (MSE) using python pseudocode to determine the error rate of the prediction results.
For training and testing of the proposed model, the company's secondary data have been used. From the proposed model, it can be seen that the prediction result process quickly for both training and testing sets. Hence, the proposed model will provide more suitable for the task at forecasting sales compare to other techniques that described in the previous section.

A. Data Type and Source
This study used secondary data from the dataset of a retail company. The data were collected from daily sales based on the company's existing business warehouse system.

B. Data Used
The study used sales data set digitized through the company's SAP Business Warehouse system. The historical data set had 43 rows and 6 parameter features. Furthermore, a time series analysis was conducted on the previous month of sales data. This study also analyzed time, brand, and stock factors affecting sales levels through forecasting methods. The dataset design has been given in Table I.

C. Schematic of Research Based on Method
The ELM algorithm was used by entering historical sales data, the number of features and hidden neurons, as well as the percentage of training and testing data as initial input. Fig. 2 shows the stages outlined as follows: a) Input weight and bias were processed randomly with values ranging between -1 to 1 based on the number of neurons then. The next step was transposing the matrix and normalizing the data using the range 0 -1 to calculate the hidden layer output.
b) The matrix result was multiplied by the transposition before calculating the Moore-Penrose Generalized Inverse matrix using the Moore-Penrose Generalized Inverse matrix equation. The result was multiplied by the transposition of the activated hidden layer output matrix.
c) The study calculated the output results of the training process to obtain the output weight used in the testing process.
d) The testing process employed the input weight and bias obtained from the training process. The hidden layer output was then calculated using the activation function.
e) The output weight value obtained in the training process was used in the testing process to calculate the output layer as the prediction result.
f) Before the denormalization process, the study calculated the error value on all output layers not denormalized with actual data. This error value is the prediction results obtained. Additionally, the error value was tested using Mean Square Error (MSE).
g) The last step was denormalization to generate a previously denormalized value back to the original value. Fig. 2 is a flowchart of the problem-solving process with ELM.

D. Input Data
This study employed sales data detailing the monthly sales for November 2017-May 2021. The ELM method processes data by determining several criteria parameters to achieve a small error rate and optimal accuracy. Table II describes several parameters used in processing the dataset, including the input and output layers, hidden neurons, and activation function.

E. ELM Development Method Forecasting
The initial step was to find the optimal value of several input parameters tested. This step ensures that the process in the ELM method produces good predictions during training. After testing the MSE on hidden neurons, the dataset was normalized into the range of 0 -1. This was followed by data training, testing, denormalization, and sales predictions in the following month.

A. Hidden Neuron Total Network Testing
The number of hidden neurons was tested to determine its effect on the accuracy value in implementing the ELM algorithm. The number of hidden neurons used in this test includes 11,12,13,14,15,17,24,26,27, and 30.
The tests were performed ten times, and the number of hidden neuron networks was processed into python pseudocode. Furthermore, the test was carried out repeatedly by changing the number of neurons, as shown in Table III.  Fig. 3 shows that changes occur when there is a change in the total of hidden layers. Therefore, the optimal network value was obtained in the second test with an MSE of 0.02716 and a time of 0.00525 seconds.

B. Preprocessing
Data processing was performed after creating the hidden neuron network. The data collected were then normalized using Min-Max normalization or transformed into a range of 0-1 to obtain a value for each auxiliary variable.
The input process used monthly time series, stock, and sales variables by brand. The data were divided for testing and training via Python's train_test_split function. Furthermore, the study compared the performance of the models used in forecasting sales data. The training and testing data distribution was 80% and 20%, respectively. Tables IV and V show that the output data were broken down into two parts and formed a range of numbers 0-1 equated with numbers 0-1.

C. Preprocessing
The training process is conducted using 80% of the dataset. Table VI shows the data normalized and formed a training output table with Order 12 x 6. This implies 12 hidden layer networks and 6 input features.

D. Testing
The testing process aimed to measure the performance of the network model built during the training process. Although the steps used were similar to the training process, all weights were taken from the training results, implying no calculation of the β weight. The data used differed from the training process explained in the preprocessing stage. The accuracy level was calculated in the same way as in the training process. The output data testing is presented in Table VIII.  Table IX shows that the results generated from the training process had an average prediction of 111%.

E. Denormalization
In the denormalization stage, the predicted test data in the 0-1 output range were converted to the actual value in kilograms. This was done to ensure that the target data and predicted results were read on a wider scale than the original value.
The denormalized data in Table X results from nine data tested and predicted to compare each accuracy per line. The accuracy from the average of the final data line was 101%.

F. Next Month Prediction Results
The predictions made via pseudocode python showed that the value on the 44th data or the following month from the processed dataset is 70.524 kg, as shown in Fig. 4. In Fig. 4, the red line on the y-axis explains that sales occur in the 44th data, a decrease from the previous month. www.ijacsa.thesai.org

G. Forecasting Exist Sales
Retail companies have a forecast to predict sales within the next three months. This is known through an analysis of traditional time series forecasting. Table XI and Table XII shows forecast existing and actual sales respectively. Table  XIII compares the prediction made and the actual from the total forecasting, showing a gap with an accuracy of 86%.  Table XIII confirm that the average accuracy is only 82%. A less-than-actual plot accuracy warns the company of insufficient stock available to supply future sales. This means that sales would not be maximized.

H. Comparison of Exist vs ELM Development Methods
ELM exist vs development method matrix: From the implementation of sales forecasting proposed model, a comparison matrix could be made between the Exist versus ELM Method as given in Table XIV. Based on the forecast matrix, it can be seen that the average 3-month forecast value reached 68,250 kilograms by using the ELM method, while the traditional method reached 42,579 kilograms. The average value of 3 months is actual sales 51,681 kilogram has been used for comparison the ELM and traditional method. Based on results, it can be concluded that predictions by ELM provide more better accuracy to provide adequate stock for the next sale. It mean that sales be able to run optimally for company. Meanwhile, based on accuracy matrix the ELM method can achieve admirable accuracy values, this is due to the stock feature as basic calculation label are carried out to acquire the smallest error value as well as several tests also has been done. Some of these features are support in providing stimulation to achieve better accuracy.

V. CONCLUSIONS
Based on the test results and discussion, the following conclusions were obtained: a) As measured by the Mean Square Error (MSE) in forecasting sales results, the error rate is 0.02716 with a time of 0.0525.
b) The ten neuron network tests showed that more hidden neurons do not measure algorithm optimization. Therefore, using 12 of 30 hidden neurons produces fewer error values.
c) The average accuracy value generated by the ELM forecasting method when testing nine data is 111%. This illustrates an opportunity for the company to further analyze the sales profit growth potential.
d) The predicted value produced by ELM for the last three months reaches 132% compared to traditional methods, which only achieve 82%. www.ijacsa.thesai.org

ELM 132%
Comparison of ELM Forecast Value compared to Actual. In line with the proof method described in the previous discussion, the accuracy value of the ELM method could achieve a very good accuracy value. This is due to the featured stock, the reference label for calculations, and several tests are conducted to get the smallest error value and the best accuracy. Some of these features help in providing stimulation to achieve optimal accuracy. Sales forecasting still requires extensive machine learning and statistics knowledge. Addition features of input layer such as number of workers, demographic trends, and behavioral indicators should be involved in future studies to improve the model.