Demand Forecasting Models for Food Industry by Utilizing Machine Learning Approaches

—Continued global economic instability and uncertainty is causing difficulties in predicting sales. As a result, many sectors and decision-makers are facing new, pressing challenges. In supply chain management, the food industry is a key sector in which sales movement and the demand forecasting for food products are more difficult to predict. Accurate sales forecasting helps to minimize stored and expired items across individual stores and, thus, reduces the potential loss of these expired products. To help food companies adapt to rapid changes and manage their supply chain more effectively, it is a necessary to utilize machine learning (ML) approaches because of ML’s ability to process and evaluate large amounts of data efficiently. This research compares two forecasting models for confectionery products from one of the largest distribution companies in Saudi Arabia in order to improve the company’s ability to predict demand for their products using machine learning algorithms. To achieve this goal, Support Vectors Machine (SVM) and Long Short-Term Memory (LSTM) algorithms were utilized. In addition, the models were evaluated based on their performance in forecasting quarterly time series. Both algorithms provided strong results when measured against the demand forecasting model, but overall the LSTM outperformed the SVM.


I. INTRODUCTION
Supply Chain Management (SCM) has been a key area of study and professional practice since the 1980s. However, in recent years, supply chains have come under increased scrutiny due to their critical role in business success or failure. A supply chain relies on a coordinated network of companies and sectors.Within this network, materials are obtained and processed into intermediate or final products so that the final products can be sent to users [1]. SCM has four main processes: plan, source, execute and deliver as shown in Fig. 1. Demand forecasting is one of the main axes of SCM [1]. In a changing world, forecasting has increased in importance across many sectors and forecasting has a particular relevance to supply chain management: accurate forecasting allows a company to ensure that supply exists to meet demand. Demand forecasting involves utilizing a probabilistic assessment of the available data, for aims to quantify and forecast future consumer demand for a good or service [2]. A corporation can improve it's supply decisions by using demand forecasting to predict possible sales volume and profitability. By estimating future sales from analyzing historical consumer trends, a business can use demand forecasting to make the most of their inventory [3]. Machine learning (ML) algorithms can forecast food sales by analysing the wealth of historical sales data and adapting to changes within it. ML models have greater predictive power than linear models with progressive parameter selection. Furthermore, the use of ML algorithms in the forecasting process provides adaptive capabilities to members of the supply chain. The system can be considered adaptable through its flexibility in improving the agreement between supply and demand. As a result, it improves the inventory balance throughout the chain by avoiding overstocking of products that are not in high demand [4]. The main focus of this research was forecasting demand within supply chains in the food industry. Although demand forecasting is important for the success of all supply chain processes, it has a critical role in the food industry because products are perishable. Thus, in this case demand forecasting directly contributes to resource preservation and sustainability. More specifically, this study focuses on using ML on long-shelf-life products, especially confectionery (such as chocolates). Such a model may support distribution companies in demand forecasting and stakeholder management with manufacturers and retailers. LSTM and SVM models were built to forecast demand of individual features for each city, or distribution channel, and product.
The remaining sections of this paper are as follows: Section II offers a brief literature review of research concerning demand forecasting in the food industry, Section III presents the research methodology, Section IV outlines the confectionery distribution company's data set, and Section V analyses the results of the forecasting model. Finally, Section VI shares the conclusion of this research and discusses directions which future research could undertake to further the field.

II. LITERATURE REVIEW: DEMAND FORECASTING MODELS IN THE FOOD INDUSTRY
Demand forecasting plays crucial role in supply chain management. Using machine-learning algorithms in demand forecasting aids decision-makers in making effective and prescient choices. Accurate demand forecasting leads to increases in company revenue and stock value. Therefore, over the last decade, significant research has been conducted on sales demand forecasting in the food industry. A general overview of forecasting models for sales demand in food industry is provided in systematic review of [6]. This demonstrates the benefits of using ML techniques in the food industry especially for forecasting sales across several types of outlets including confectionery stores, grocer's shops and restaurants. ML techniques have greater predictive power than conventional approaches, which are subject to human error: the (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 3, 2023  [5] main advantages of accurate forecasting are that it allows stores to dispose of expired products and minimize stock levels [6]. "Shelf-life" refers to the length of time a given foodstuff remains viable for purchase as a quality product and determines how long products remain on sale [7]. The paradox of having to throw away excessive amounts of products is a significant challenge for retailers. This is especially the case when selling items with a short shelf-life, such as fruits and vegetables. Researchers have made continuous efforts to improve demand forecasting for short shelf-life products. For instance, [3] estimated the sales demand for an Austrian retailer operating in the food industry, specifically with regards perishable products. The researchers used two different models Seasonal Autoregressive Integrated Moving Average (SARIMA) and Long Short-Term Memory (LSTM) and historical daily sales from January 2017 to December 2019 of four perishable products sold in over 90 stores. Both models produced useful outcomes, however, LSTM outperformed SARIMA for products with stable demand, while SARIMA outperformed LSTM for seasonal products. Furthermore, the researchers compared results with SARIMAX after including the external factors such as promotions sales and discovered that SARIMAX performed significantly better for products with external variables. Similarly, [8] used a neural networks approach to construct a forecasting model to predict annual import fruit for the following year. Moreover, [7] used various forecasting models such as LSTM networks, Support Vector Regression (SVR), Random Forest Regression (RFR), Gradient Boosting Regression (GBR), Extreme GBR (XGBoost/XGBR) and Autoregressive Integrated Moving Average (ARIMA) to attain the daily optimal order quantity of fresh produce and to avoid specific vegetables becoming out-of-stock. The study was conducted on the fresh vegetables section of a retail outlet of a college campus. They selected a range of products to fit the categories of low, moderate and long shelf-life (tomatoes, onions and potatoes, respectively). The results indicated that ML algorithms, namely, LSTM and SVR, produced better results as compared with other demand forecasting models.
In another study [9], the forecast demand for a Portuguese company's sales was obtained by comparing various statistical methods (moving average, exponential smoothing and ARIMA). In this case, historical weekly sales of delicatessen products from 2013-2017 were used as the data. The author in [9] combined different forecasting techniques to produce consistently good results. Moreover, they used a simple average to combine the three different results of forecasts. In addition to previous studies, [10] proposed a model of Long Short-Term Memory (LSTM) to forecast daily consumer demand using Moroccan supermarket data from a period of six months. In this study, multi-layer neural was found to be the best neural network framework for demand forecasting. Forecasting future products sales enables stores and companies to avoid food waste. Therefore, [11] presented a case study of several ML models using real-time sales data from a restaurant. They applied data by using over 20 models to demonstrate the impact of creating stationary data-sets on the pre-processing of the feature and model training processes. The results showed that Recurrent Neural Network (RNN) models outperformed other models. The author in [12] also studied the efficacy of ML techniques in forecasting daily customer demand for beer in a restaurant setting. Their predictions were based on combining two different kind of data: internal data (such as point-of-sale (POS) transaction data) and external data (such as weather conditions). Demand forecasting models are not only employed within the food industry, but can also be applied to many other sectors. In e-commerce settings, some products are interconnected, and can be categorized into one subcategory; thus, they have correlated sales and demand patterns. Therefore, [13] suggested that better predictions would arise from using historical data from related products to forecast product demand. They applied an LSTM model by using historical data of interconnected products from Walmart.com in order to forecast the demand of other products within the same category, and achieved more accurate results by using LSTM model. To date, the field has seen a significant amount of research in demand forecasting. However, none of this deals with multi-faceted nature of the supply chain for such products, including channel distributions and city. The present study aims to apply some of these insights to the confectionery distribution industry. Table I summarizes previous studies that are related to the current research. There are two important attributes to consider when making comparisons: Attribute 1: That the study considers sales by city to forecast demand.
Attribute 2: That the study examines the geographic distribution of products across different channels, such as larger stores, mini-markets and wholesale retailers.
This presence of these criteria are indicated in Table I as: Y: Yes, this criterion is applied or considered.
N: No, this criterion is not applied or considered.

III. METHODOLOGY
This section reviews the methodology applied to build the forecasting model. The research methodology is based on artificial intelligence (AI) in food supply chains, taking into account a number of factors which significantly affect sales, such as city distribution channels and actual sales revenue. The methodology is based on a ML algorithm in order to map between input and output data and to discover the underlying rules governing the movement of the time chain so that realistic future predictions can be made. Fig. 2 outlines the methodology for the proposed model.
The proposed model comprises three major phases:

A. Data Pre-Processing
After collecting data from the chosen confectionery company in the Kingdom of Saudi Arabia, it was processed through noise removal. Effective pre-processing of data is essential for network input, it is better to convert raw time-series data into indicators which represent basic information more clearly. Therefore, features were classified as follows: • Three types of product sales are considered: • Actual sales revenue during regular periods, which represents the income the company or factory generates from the sale of its products. • Sales promotions refers to sales records during offer periods (such as those occasioned by holidays, back-to-school offers and special events). Such promotions play a crucial role in demand forecasting as they can skew the results. • Returns of products. • City.

B. Applying ML Algorithms in Forecasting Models
Based on the pre-processing phase, the appropriate model was chosen and applied to forecast quarterly sales volume and the required quantities of the product based on various factors such as city and distribution channel. Two machine learning algorithms were used in this study to determine the optimal order quantity of different chocolate products. This section describes each algorithm in overview.

1) Support vector machine (SVM):
One of the supervised learning methods used to solve classification and regression problems is the Support Vector Machine (SVM). SVM is best suited to forecasting products with high dimensional margins, where the number of features exceeds the number of inputs [14]. SVM has often been used as the solution to demand forecasting in the food industry due to the need to solve regression problems. A key advantage of the SVM method over more conventional prediction techniques is that it does not require any previous information regarding the link between the input and the output [16].
[18] obtained more robust results using the SVM algorithm when forecasting demand for perishable foods. Furthermore, [7] produced good results with low forecast error when using the SVM algorithm to predict daily sales of vegetables. The findings from studies which have used this method, indicate that SVM produces strong results [10]. Therefore, SVM was selected for making predictions in the current study. inputs must be the same size. Secondly, RNN suffers from the disappearing gradient problem. Due to these limitations a Long Short-Term Memory (LSTM) algorithm can be used to solve this problem. LSTM can save information from inferences in sequential data in long memory. This algorithm describes data properties without requiring previous knowledge of parameters or distribution of features [15].

C. Model Evaluation
To evaluate each model, the sales it predicted were compared with the actual sales data. The accuracy of the forecasting model was measured using two common performance measures: • Mean Absolute Percentage Error (MAPE).

IV. CONFECTIONERY DATASET
A large dataset describing customer transactions with the distribution company was provided by a chocolate distribution company in the Kingdom of Saudi Arabia. This dataset was obtained from the company's SAP platform [17], which saves detailed transaction records. This dataset provided more than ten attributes, including those most common to the food sales field as well as factors which might be useful in forecasting product sales. These were factors such as distribution channel, product code, date, plant, returns value, net quantity, and sales in Saudi Riyal (SAR). The target variable is forecasted sales quantities. Table II outlines the dataset dictionary.
Net value and quantity features are the units used by the company to measure the sales of their products. A returns value is represented by a negative value of net value. The data was in the form of daily transactions for 200 products in 11 cities and 6 distribution channels across 3 years (January 1 2018 -December 31 2020). This led to dataset with almost quarter of a million data rows. Table III shows the description of six distribution channels feature that are used in the dataset. Distribution Channel The way of distribute products in stores (6 channels are available). 5 Net Value Actual sales of product in Saudi Riyal (SAR). 6 Net Quantity Actual quantities of product per transaction. 7 Returns Value Value of purchased products returned to stores in Saudi Riyal (SAR). 8 Returns quantity Quantity of purchased products returned to stores.

V. ANALYSING FORECASTING MODEL PERFORMANCE
Forecasting demand for a company with several different products, across multiple cities and distribution channels is a challenging task. Creating one model which covers all products, cities and channels might not be accurate, because each of these factors affects the inputs and is subject to changing circumstances. The main objective of this study was Mini Markets Small stores such as corner shops or newsagents. 4 Convenience Stores Small stores located in gas stations (for example NAFT, SASCO). 5 New Channel Stores whose main product line is different, such as Toys "R" Us, Sky sales (Saudi Airline). 6 Cash Van A car with products in the custody of the company's salesman. The customer pays via cash.
to enhance predictive ability for sales, thus minimizing food wastage and supply chain issues by forecasting supply and demand, and improving the efficiency of the whole system by minimizing errors like data loss in traditional ways. As such, the developed model aims to forecast demand of products for next year's quarters, while the inputs for the models comprised city, distribution channel and sales quantities of products by quarter for the previous year. Therefore, the dataset was split into different time steps. The training dataset consisted of daily sales transactions during 2018 for products and the testing dataset used 2019 data. For model testing, we chose sales data for the top five products (product 1, product 2, product 3, product 4, and product 5) across the ten cities (Riyadh, Jeddah, Taif, Dammam, Qassim, Makkah, Eisha, Madina, Tabuk, Jizan, and Khamis).
The above-mentioned cities were mapped along the xaxis with product demand along the y-axis. With regards the distribution channel, only one channel was used in testing which was the key account channel. This channel represents the most significant markets in Saudi Arabian cities.
A model validation method was implemented in order to compare the performance of the models. After identifying and developing all of the forecasting models, the performance measures for validating and comparing these models was implemented by applying the following equations, using Python: • : Symbol means "sum", E = Actual value -Forecast value and n = sample size.
• Mean absolute percentage error, MAPE = n i=1 |E| * 100/A n [7], where the actual demand value is indicated by A.
Several features potentially play a significant role in forecasting product sales. Separating the sales by channel and city revealed that each of them has unique purchasing trends. This justifies the need for separating out these factors and showing the results of the models for each product by city and by channel. The results of models for multiple cities across one channel using the LSTM model are presented in Section 5.1, while Section 5.2 describes the results when the SVM model was applied. Section 5.3 discusses the results for both models.
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 3, 2023 A. LSTM As explained, time is an important factor in prediction models. Each year's data was split into quarters to observe the accuracy of the forecasting models' output in relation to actual sales. Each algorithm was executed separately to predict sales of the top five products in the given year. Fig. 3 shows the actual values (coloured full lines) and the predicted values (dotted lines) respectively. Chocolate product demand is shown in the below graph for these cities. The actual values and predicted values by our models are almost identical, indicating the accuracy of the models' output. The LSTM model shows that Product 2 saw the highest demand in the capital city RUH (Riyadh) during quarter-1, quarter-2 and quarter-4, as shown in Fig. 3 during quarter-3, however, Product 2 saw the highest demand in the city of (Dammam). Noteworth, Product 4 not buy any items in (Tabuk) city during quarter-2.

B. SVM
The second model used for sales demand forecasting in this study was SVM. Fig. 4 shows the sales forecasting for products per quarter. Fig. 4 shows that Product-2 had the most demand in (Dammam) during quarter-4. During quarter-2 and quarter-3, Product 2 saw the most demand in the capital city of Riyadh (RUH). The graph indicates that the SVM model is less accurate because the predicted values are higher than the actual values during quarter-3.

C. Discussion
The key research question for this study concerned which algorithm could better forecast demand, LSTM or SVM. A Support Vector Machine (SVM) is a classification algorithm used for a small amount of data, and it is less accurate than the LTSM. A Long Short-Term Model (LSTM) is used as a deep learning algorithm that performs effectively when used for a large amount of data. The overall results show that the LSTM model is more accurate than the SVM model, because of LSTM's ability to remember the data more efficiently than the SVM. The LSTM algorithm performed best when used for a large amount of data. Overall, LSTM performs better than SVM across all scenarios. However, both algorithms are useful in forecasting demand and when used together provide a more comprehensive picture.
Performance statistics like MAPE and RMSE enable the forecasting models to be evaluated. For this study, the algorithms resulting in the lowest RMSE and MAPE are the most effective. As explained above, the city and distribution channel destination is an important variable and a significant factor in the forecasting sales models. Therefore, Fig. 5 shows a comparison between LSTM and SVM models by city. Fig.  5 indicates that the LSTM forecasting is better than the SVM model's because the MAPE and RMSE values are lower for the LSTM model than for the SVM model. Additionally , Table  IV shows all the results of the MAPE and RMSE values. Riyadh and Jeddah have the lowest MAPE and RMSE values as compared with other cities (blue line). This is because Riyadh is the capital and largest city of the Kingdom of Saudi Arabia, and Jeddah is also among the largest cities. Due to their geographical size, the demand for products is higher when compared with other cities, the number of sales transactions in these two cities creates significant data and both cities had daily sales in the dataset. Since the forecasting model were trained to look for trends in large numbers of sales, this might have influenced their ability to reduce the rate of forecast error in these cities. The values shown in Fig. 5 were obtained by taking values of all four quarters for each city. In order to evaluate the performance of this study's LSTM and SVM models, a comparison against previously studies was undertaken. Table  V illustrates those previous studies (in blue) which used similar factors to this study, such as store location. Other studies used general sales with category of products as the main factor for the LSTM and SVM models. This study obtained lower MAPE values compared with previous studies using LSTM (in green). However, this study's results using the SVM model (in yellow) are comparable with [7]. This study selected the most commonly-used algorithms for predictions [10]. Looking at the sequence of charts depicting the results, predicted and actual values agree more closely. Accurately predicting demand will help businesses to make better decisions and consequently save food, generate increased revenue, and solve food supply issues. This study shows that the implementation of LSTM and SVM models for real-life food items and the retail market helps to reduce forecast error, improve daily retail inventory and increase product sales. This will help small businesses to reduce the risk of particular items falling out-of-stock and optimize their sales.

VI. CONCLUSION AND FUTURE WORK
In an era where information and data are increasingly available, ML is an important tool from which industries can greatly benefit to future-proof their supply chains. The ability to accurately forecast demand assists distribution companies to manage their supply chains more effectively. This study presents two models for a distribution company in the food industry by using the LSTM and SVM algorithms to forecast demand for products across a variety of factors. In particular, the demand forecasting models here were applied to the individual level of factors such as city and distribution channel. The evaluation of the experiment showed that the LSTM model outperformed SVM. In general, the findings demonstrate that the LSTM model reduces forecasting errors up to 77% compared to the SVM model. This study has generated key insights concerning the sales of chocolate products within different cities of Saudi Arabia. Sales promotions are one of the most common phenomena in the retail industry. Special events such as marketing campaigns or holiday promotions are examples of valuable retail data that are often not incorporated into single-variable statistical forecasting models. Currently, this study only takes standard sales patterns into account when forecasting demand, so future work needs to examine promotion sales as an independent factor. Furthermore, the dataset used here was the company's sales record for 2020, a year in which the COVID-19 pandemic had a significant negative impact on almost all industries. Future work will extend the sales analysis depicted here to understand how the pandemic affected standard sales behaviour.