Supply Chain Network Model using Multi-Agent Reinforcement Learning for COVID-19

—The COVID-19 vaccination management in Japan has revealed many problems. The number of vaccines available was clearly less than the number of people who wanted to be vaccinated. Initially, the system was managed by making reservations with age group utilizing vaccination coupons. After the second round of vaccinations, only appointments for vaccination dates were coordinated and vaccination sites were set up in Shibuya Ward where the vaccine could be taken freely. Under a shortage of vaccine supply, the inability to make appointments arose from a failure to properly estimate demand. In addition, the vaccine expired due to inadequate inventory management, resulting in the vaccine being discarded. This is considered to be a supply chain problem in which appropriate supply could not be provided in response to demand. In response to this problem, this paper examines whether it is possible to avoid shortage and stock discards by a decentralized management system for easy on-site inventory control instead of a centralized management system in real world. Based on a multi-agent model, a model was created to redistribute inventory to clients by predicting future shortage based on demand fluctuations and past inventory levels. The model was constructed by adopting the Kanto region. The validation results of the model showed that the number of discards was reduced by about 70% and out-of-stocks by about 12% as a result of learning the dispersion management and out-of-stock forecasting.


I. INTRODUCTION
The vaccination with the COVID-19 virus vaccine for the pandemic is managed through a vaccination ticket and vaccination reservation system, and priority vaccination is given based on the risk of serious illness and the security of the medical care system, because the amount of vaccine that can be secured is limited and its supply is expected to be sequential. The ministry of health, labour and welfare also prioritizes vaccinations based on the risk of severe cases of the disease and the availability of healthcare [1]. In July 2021, the amount of vaccine supplied by the national government to local governments became significantly insufficient. This caused some local governments to temporarily suspend vaccination appointments and medical institutions that have been forced to reduce their supply of vaccine are forced to coordinate with applicants who have made reservations to postpone their vaccinations [2]. However, about 2.2 million doses of vaccine have been discarded due to expiration [3]. In light of the above, some areas disposed vaccines due to inadequate inventory management while some areas are experiencing shortages. The essence of these problems is that inventory management and demand forecasting were not properly carried out. The only centralized control by the government or relevant ministries is not sufficient to manage the situation.
Thus, when the issue of vaccine shortage and disposal was widely recognized in the press, etc., the measures to be taken focused on the storage method from the pharmaceutical knowledge of vaccines and the proposed solution regarding the vaccination system, and the Ministry of Health, Labor and Welfare provided an explanation on how to ensure the vaccination system [4]. However, very few have addressed issues related to supply chain management regarding the increased demand for vaccines related to the growing number of COVID-19 virus cases and inventory management related to this demand.
This paper creates two model of inventory management and shipping plan. These are the centralized management model and the decentralized management model. The centralized management model ships vaccines based on demand from each municipality. The decentralized management model is a model in which each municipality uses reinforcement learning to manage inventory and forecast demand [5]. In this model, a vaccination site with sufficient inventory provides inventory to a vaccination site with insufficient inventory. The model is designed to verify how effectively vaccines can be utilized when each municipality takes the initiative in inventory management and shipping.

II. PRIOR RESEARCH
Supply chain research has been conducted using various approaches to achieve various objectives, such as avoiding shortage, reducing excess inventory, and reducing costs including design and model proposals based on engineering knowledge [6], mathematical optimization of risk management methods [7], and realistic simulation models focusing on lead time [8]. These approaches have achieved some goals.
In recent years, machine learning, especially AI, has been widely utilized as a problem-solving method. For example, it is researched to assign various planning tasks to machine learning in the design (long-term strategy), planning (mediumterm and short-term strategy), and execution (operational level) stages for proactive supply chain problem-solving approaches [9]; and it is researched to seek to make strategic decisions based on machine learning forecasts of www.ijacsa.thesai.org environmental changes such as demand fluctuations for passive factors [10]. This paper examines the effectiveness of vaccine inventory exchanges from a supply chain management perspective by reinforcement learning about changes in vaccine demand at vaccination sites.

A. Building the SCM model in MAS
There are so many players to study for the Supply Chain Management (SCM). These players generally include from producers to retailers etc. Each player collects information for the sales strategies to be made by management of service, inventory and cost. They which are under the control of upper headquarter office, make their own decisions within their responsibility [11] [12].
Multi-Agent Simulation (MAS) discusses the coordination of behavior in a set of autonomous intelligent agents [13]. This simulation can lead to the whole optimum of the collection in which each player decide to act on its own.
The characteristics of the vaccine supply chain are close to those of the Multi-Agent Simulation. Supply chain model using multi-agent system makes it possible to analyze what kind of supply chain management is appropriate.
The agent based model of supply chain management in this study is built by artisoc3.0. This software is based on java and is specialized for multi-agent simulation [14].

B. The Way to Apply Reinforcement Learning
In this study, the concept of Q-learning was applied to COVID-19 vaccine inventory management in each agent. There are some studies that using Q-Learning to manage inventory with expiration dates [23][24] [25]. The agents in a competitive supply chain take their decisions individually in a distributed environment and independent of one another. At the same time, they must coordinate their actions [26] [27]. In this time, supply chain management is needed to the balance the decision between centralized management and decentralized management.
The state as Q-learning is to avoid out of stock. In supply chain management, there is a value called safety stock quantity, which is the minimum amount of inventory that should be maintained to avoid out of stock. The state is defined in which the inventory quantity always exceeds the safety stock quantity.
The action as Q-learning is the selection of suppliers and the amount of order. Each agent has two way to select supplier. One is the order to an upper supplier with regular and limited quantities. The other is the request surrounding vaccination sites to provide vaccine inventory if they have a surplus. Acquiring vaccine inventory from upper supplier takes time, but the agent will certainly have amount of vaccine inventory.
The orders between same agents are not sure if they are in vaccine inventory, but if they have a surplus, they can get inventory immediately.
The reward as Q-learning is the amount of inventory in excess of the safety stock quantity. If the agent collects more inventory, some vaccine inventory might expire and be discarded. It would also unnecessarily increase transportation requirements by shipping to other agents. For these reasons, it is important to maintain an appropriate amount of inventory

C. Basic Structure of the Agent Model
There are three types of supply chain agents to be constructed in this study: government agents, local government agents, and vaccination site agents. Each of these agents behaves autonomously and has the ability to collect information, process information, make decisions, and act on its own.
The government agent can ensure the stock of vaccines on a regular basis. Based on the amount of vaccines demanded by the local government agents, the government agent ships vaccines to the local government agents. In this case, priority is given to the areas with large demand.
The local government agent ships vaccine stocks based on the quantity requested by the vaccination site agents. In this case, priority is given to the locations with the largest demand.
The vaccination sites agents consume vaccine inventory by administering vaccinations. The number of vaccinations (demand) over the past 100 days is recorded, and the amount of vaccine requested from the municipal agents is calculated based on consumption fluctuations.
As for the deadline for vaccine consumption, it is assumed to be 40 days after the government agent secures the vaccine.
This simulation is performed as one step per day, and the simulation is performed for five-year periods.
The relationship between each agent is shown in the Fig. 1

1) Demand and supply:
Each vaccination site should randomly vaccinate 0-50 persons per day. The reason for having variation due to randomness is that there are cases where the number of people who wish to be vaccinated continues to reach the daily limit in cases where vaccinations are given by appointment through the reservation system. In www.ijacsa.thesai.org addition, some vaccination sites that did not implement the reservation system had far more applicants than initially expected, resulting in shortage [28]. Conversely, there are days when the number of applicants for vaccination does not reach the allowable daily dose. Not only do such fluctuations in demand exist, but also there is always a steady-state shortage of vaccine throughout the country. To account for these fluctuations in demand and shortage conditions, the total number of vaccine stocks per 30 days was set in the simulation to 100,000 per 30 days, while the total demand is set to exceed this number. Since the total number of vaccination sites is 203, the average number of vaccine demand over the 30-day period is 100. The average number of vaccines in demand over a 30-day period is 203 × 30 days × 2 5 doses = 152250 doses.
2) Vaccine supply method: Local governments distribute vaccines based on demand at vaccination sites. In this case, the number of vaccines to be distributed is stated by the following formula. The lead time is 1 day. amount of shipping to vaccination sites = stock in local government × one vaccination sites / sum of demand in all vaccination sites (1) The amount of vaccine shipped by the government to local governments is based on the number of vaccination sites under the local government. This is stated by the following formula. amount of shipping to each local government = government stock ×number of vaccination sites in local government / all vaccination sites (2) Suppose a country can obtain 100,000 doses of vaccine every 30 days.

1) Differences from the centralized management model:
This model is used for the same model, as the centralized model of the demand specification and vaccine supply methods. The difference is that each vaccination sites has ability to receive the vaccine from other sites. When a vaccination center estimated the possibility of shortage of vaccine, it would inquire at other vaccination sites in the order of near to itself to see if there was any sufficient stock. If the other vaccination sites determine that they afford to tolerate sharing the vaccine stock, the vaccine can be shipped to other vaccination sites. In this case, the vaccination sites themselves calculate and set their own order time and order quantity based on the results of the reinforcement learning of fluctuations in vaccination demand and stock expiration dates.
2) Reinforcement learning model: The demand for vaccines and vaccine expiration dates for the past 100 days are recorded. This data is used to estimate future inventory status to prevent vaccine shortage through regression analysis. The objective variable was the amount of orders placed to other vaccination sites, and the explanatory variables were own demand and the number of days remaining before the expiration date of the vaccine in their possession. An order quantity formula and conditions for order time are as follows. amount of order = average in demand ×(number of days left until the vaccine use deadline + lead time) (3) condition for order: stock < (number of days left until the vaccine use deadline × average of demand)(4)

IV. APPLICATION OF REAL DATA TO THE SIMULATION MODEL
This simulation model is modeled after cities in the Kanto region. The government agent is a single agent modeled as the Ministry of Health, Labor and Welfare, which manages the importation of vaccines. The local government agents are seven agents modeled as Tokyo, Kanagawa, Saitama, Chiba, Gunma, Ibaraki, and Tochigi prefectures, which distribute vaccines to each vaccination site. A total of 203 vaccination site agents, modeled to designated cities, cities, and special wards, administer vaccines. The basic relationship between agents is shown in Fig. 2

A. Results
The results of the modeling and simulation according to the above are given in Table I. Each simulation number is a five-year simulation, and "average in Simulation No." is the average of each experiment. Discarded number is the number of discarded vaccine. Vaccine shortage is the total number of vaccine demand in the absence of vaccine when there was a demand for vaccination. www.ijacsa.thesai.org

B. Considerations
Regarding the number of vaccine discards, the decentralized management model has reduced the number of discards to about 70% of the centralized management model. Shortage were also reduced to about 12% of the centralized management model. These indicate that more effective vaccination is possible when there is an exchange of vaccines among vaccination sites.
As for the reason for the number of discards, it was observed that when periods of extremely low demand occur consecutively, even vaccination sites with stock shortages are fully stocked, resulting in sufficient discarded vaccine. It was also observed that when demand increased during the abovementioned period of reduced demand and small inventory, there was an overall shortage of vaccine to meet the demand, resulting in shortage.

VI. CONCLUSION
In this paper, vaccine inventory management and shipping plan were simulated using a centralized management model and a decentralized management model to reveal the management of the vaccine demand and to avoid shortage. Compared to the model with centralized management, which caused problems in reality, the decentralized management model verified in the model reduced shortage of expired vaccines by approximately 70% and vaccine shortage by approximately 12%. It is estimated that the ability to exchange vaccines in the vicinity of vaccination sites where vaccines are consumed will greatly reduce the number of discards and the possibility of non-vaccination due to vaccine shortages.

VII. FUTURE RESEARCH ISSUES
The next challenge is to consider supply chain management that considers multiple product elements and can streamline those that include more variables.
In Japan today, prices for food and many other things are rising. And it is said that security is needed for many things such as rare metals, semiconductors, oil, wheat, etc. But these problems are treated as vertical issues such as price increases for raw materials, processing cost, and so on. Many factors must be considered across the board to solve essential problems that affect the final product.