Research on the Academic Early Warning Model of Distance Education based on Student Behavior Data in the Context of COVID-19 Early Warning Model of Distance Education

—The COVID-19 epidemic has caused great impact on the entire society, and the spread of novel coronavirus has brought a lot of inconvenience to the education industry. To ensure the sustainability of education, distance education plays a significant role. During the process of distance education, it is necessary to examine the learning situation of students. This study proposes an academic early warning model based on long-and short-term memory (LSTM), which firstly extracts and classifies students’ behavior data, and then uses the optimized LSTM to establish an academic early warning model. The precision rate of the optimized LSTM algorithm is 0.929, the recall rate is 0.917 and the F value is 0.923, showing a higher degree of convergence than the basic LSTM algorithm. In the actual case analysis, the accuracy rate of the academic early warning system is 92.5%. The LSTM neural network shows high performance after parameter optimization, and the academic early warning model based on LSTM also has high accuracy in the actual case analysis, which proves the feasibility of the established academic early warning model.


I. INTRODUCTION
Due to the impact of the COVID-19, teachers' teaching methods and students' learning models in many regions have been transformed into distance education [1]. The online learning mode of distance education involves computer and other technologies. This new teaching mode requires students and teachers to change traditional teaching methods and learning concepts, and adapt to online education and learning mode through new teaching and learning practice approaches [2]. Distance education ensures the continuity of education and provides an effective way for students to learn professional knowledge [3]. Although distance education provides students with a new way of learning, there are obvious problems in online education, which lead to the poor effect of online education. First of all, in online education, students lack the learning atmosphere of offline teaching, and teachers are difficult to manage students' learning behavior, so they generally show a lack of learning initiative. Some research shows that keeping the classroom active during online learning is conducive to improving students' academic performance [4].
In addition, in the survey of distance education for students, it is found that the learning level and self-confidence of students in online education environment are lower than those in offline environment. [5]. This has brought great challenges for students to successfully complete their homework. In order to solve these problems and improve the teaching effect of online teaching, this research uses artificial intelligence technology to establish a learning management system to evaluate students' academic achievements and homework [6]. A reasonable evaluation method can improve students' confidence in learning. An effective evaluation method will strengthen the management effect of teachers on students. The research establishes the academic early warning model of online learning platform using neural network technology, and confirms the validity of the model through experiments. The model can give early warning to students with poor learning conditions with high accuracy. The research is of positive significance to the improvement of academic performance and other comprehensive abilities, and it can help students successfully complete their studies.
II. LITERATURE REVIEW Early warning systems have been effectively used in medical, ecological and other fields. Ginestra et al. established a sepsis early warning system using machine learning to help nurses and patients predict sepsis symptoms and severe septic shock. Experiments have shown that the early warning system can effectively identify alerts for sepsis symptoms, helping nurses and patients to detect sepsis symptoms in advance [7]. The academic early warning system plays an important role in the analysis of students' learning situation. Through the academic early warning system, the early warning of students' problems in their learning can be realized, and intervention policies and measures can be proposed based on the obtained information and data to help students successfully complete their studies [8]. In the existing literatures, researchers have investigated students' learning motivation and used the learning analysis dashboard to analyze students' learning motivation and academic performance. By taking e students' academic performance as a reference, it is possible to intuitively analyze students' academic performance through the early academic early warning system. However, the early academic warning *Corresponding Author. www.ijacsa.thesai.org system has a poor degree of automation, cumbersome operation and low accuracy, so it has not been popularized in practical applications [9]. The application of machine learning technology in the early warning system can effectively improve the accuracy of early warning, and realize automatic monitoring and rapid response. Cho optimized and improved the early warning system on the basis of deep learning. The optimized early warning system had a higher accuracy rate and recall rate, which can be used to accurately judge patients' condition [10]. Lin et al. established a bridge scouring early warning system composed of the Internet of things and artificial intelligence. The researchers encapsulated the water level sensor in a stainless-steel ball. The early warning system uses neural networks for deep learning and real-time recording according to the changes of water level [11]. In the application of students' academic early warning system, Rashid et al. used a hybrid system including recurrent neural network and gray wolf optimizer to identify students' weaknesses, predict students' academic performance, and put forward corresponding suggestions. This improved academic early warning system can help teachers to improve the teaching model and improve the learning experience of students at the same time. Although this model has good performance, it is the same as most such systems. It mainly serves offline teaching, and the application effect in online teaching is not good [12]. In general, the academic early warning system has been applied to a certain extent at present, but most of them serve offline, and the accuracy and timeliness need to be improved.
LSTM is a neural network model, which belongs to the field of machine learning. It can effectively save and analyze time series information with strong dependence on time. Among various forecasting models, LSTM-based hybrid models can be used for forecasting foreign exchange fluctuations. Researchers have used LSTM technology to optimize the forecasting model, and the experimental results show that the method combined with LSTM has a high accuracy in foreign exchange volatility forecasting [13]. LSTM combines the convolutional neural network, retains the advantages of the LSTM method itself and time-frequency information, which can be used for the prediction of remaining lifespan and help solve health management problems [14]. In the early warning of industrial systems, Dorgo et al. used LSTM to predict rare operations in chemical technology when processing the information obtained by data mining, and at the same time analyzed time series information to predict the possible consequences of rare operations [15]. Based on its excellent analytical ability, LSTM is also used in the field of education. There are many courses in online education, and the information is miscellaneous. When students face many course data, it is difficult to accurately locate the courses. The researchers proposed a learning framework based on LSTM, and introduced attention mechanism on the basis of LSTM to predict and recommend courses according to students' behavior data [16].
The LSTM neural network retains time-dependent time series information, which can optimize the model in the establishment of early warning models. It is widely used in the establishment of early warning systems in medical, geological, economic and other fields. Students' performance and relevant data in the online learning process show obvious time series nature, so LSTM network is a more suitable analysis tool. However, after the discussion of literature review, it is found that the application of LSTM in distance education academic early warning model is less. Therefore, this study proposes to use LSTM neural network to analyze students' behavior data, and establishes an academic early warning model for students' learning evaluation in online education.

A. Academic Early Warning Model based on LSTM Neural Network
The information is input to the signal Xt in the conventional recurrent neural network and output as ht , and the input and output of the information are continuously circulated to realize the preservation of the information. However, the recurrent neural network does not have the ability to process long time series. When the distance between the input and the output is large, the output quality of the network will be reduced. LSTM can improve the capacity of the cyclic neural network in time series processing, and store long-term series information by increasing the complexity of the cyclic neural network structure [17]. The standard LSTM consists of three parts: input gate, output gate and forget gate, as shown in Fig. 1. Although LSTM overcomes the gradient loss problem of traditional recurrent neural network, it still faces the problem of gradient decrease when processing longer-period time series [18]. LSTM has a large amount of computation and a deeper network, which takes more time in computation and training [19]. Combining the superiority of LSTM algorithm, this paper presents an academic early warning model of distance education based on student behavior data, and optimizes the method for the problems existing in LSTM. First, collect and clean the relevant distance education data of students, and organize the cleaned data into the format required for input. Second, high-order mapping of data is performed, so that the impact of the experiment on the neatness of the data is reduced. Next, feature extraction is performed according to the data characteristics of students, and classification and training are performed. Then, the long-and short-term neural network input www.ijacsa.thesai.org incentives are optimized, and the results of the optimized network output are cascaded and fused so as to improve the accuracy of the output. Fig. 2 shows the schematic diagram.
To reasonably distribute the attention of administrators to students' studies and facilitate students' self-awareness of their own studies, the academic warnings are divided into green, yellow, orange and red warnings according to the actual situation. Different early warning methods are used for students at different levels in the early warning system, and factors that actually affect learning, such as family circumstances, are fully considered in the early warning process.

B. Research on Feature Extraction and Classification Methods
The deepening of datafication has reformed students' learning methods, and distance education has been widely used in the context of COVID-19. It is necessary to mine data information such as students' achievements in distance education. Based on the distance education network database, the scores of each subject of the students are collected in the cloud. The score table of each student is generated using Excel, and each student is abstracted into a class containing variables such as score information and corresponding credits. See formulas (1) and (2) for the calculation formulas of the average score and the weighted average score of the students' original grades. 1 2 3 ... n a a a a M n Where a represents the score of the corresponding subject, the number of subjects is n , and the weight value of the subject score is m . The variance of the student's grade can be obtained through the above formula, as shown in formula (3).
The calculated average scores and weighted average scores of the corresponding subjects of the students are added to the abstract class. The grades and credits corresponding to each subject of the students are saved as float data, and the average scores and weighted average scores are saved as double data. In distance education, in addition to collecting students' subject scores for academic evaluation, students' online course learning information and other information are also collected for academic evaluation. Amplify the corresponding variables in the formed class to generate a 9*N source data matrix. However, information loss is easy to occur during data collection. In this experiment, the RBF kernel function is used to process the data information. In this method, lowdimensional data is mapped to generate high-dimensional data, and high-dimensional vector operations are maintained, so as to solve the problem of accuracy decline caused by missing data. Using data mapping can ignore the problem of linear inseparability, while preserving the integrity of the data. In the source data matrix, the information vector of each row of students is mapped to the high-dimensional space using formula (4) to generate the corresponding vector label.  (5) Where, the kernel parameter gamma value of the RBF kernel function is expressed as  . Through function mapping, the vacant data information in the source data matrix can be filled to obtain a full data matrix. In the evaluation of academic tests, the normal distribution can be used to grade the outstanding and backward situations, so as to determine the corresponding number of people, score lines, and winning rates. When the students' grades conform to the normal distribution, the RBF function in formula (6) is used to map the grade information to the high-dimensional space for feature extraction of the multi-dimensional normal distribution, and the students who meet the multi-dimensional normal distribution are divided into standard classes. Students who meet the multidimensional normal distribution are classified as singular.
Where, the mean value is expressed as  , k is a constant.
In the classification based on multi-dimensional normal distribution, data feedback is used to adjust the parameters of multi-dimensional normal distribution, as shown in Fig. 3.
The selection of the threshold in the normal distribution has an important impact on the accuracy of network training results. The optimal solution of the threshold parameter is selected by the parameter iteration method of gradient descent. After repeated attempts of the gradient descent method, the global optimal solution  of the average threshold can be obtained, which is then normalized by formula (7).
Where, Y represents the matrix that needs to be normalized, X represents the matrix obtained after normalization, and , Max Min represent the maximum and minimum vector values of all elements in X , respectively.

C. Research on Early Warning Algorithm based on
Improved LSTM LSTM performs well in processing long time series information. In distance education, the information is recorded as a long time series, which shows a strong dependence on time. The modelling of Multilayer perceptron (MLP) network is easy to realize, and neurons can be constructed according to the actual situation [20]. In this experiment, the MLP network is used to train students' grades. The MLP-based LSTM network uses time to divide the length of information in different dimensions, which reduces the complexity and polymorphism of information on the basis of retaining data features. The other data is back-propagated using the function to increase the speed at which the network operates. At the same time, the LSTM neural network is used to train other data, and the weighting function is introduced to solve the problems of gradient disappearance and gradient explosion. Then, cascade fusion is used to train students' behavioral data to further improve the accuracy of prediction results and solve the problem of academic early warning.
By using multi-dimensional normal distribution to extract the features from student data matrix, two matrices of singular class and standard class can be obtained. The data with high time-series dependence is more sensitive to time. The LSTM with adaptive excitation function is used to train the data with high time-series dependence. The schematic diagram of the neural network unit is shown in Fig. 4. The sum function is introduced into the adaptive excitation function, and the expression of relu and tan is shown in formula (8). 11

11
(  formulas of the current round information carrier, the current round state quantity output and the current round output can be obtained, as shown in formulas (9) to (11).
The adaptive excitation function is to obtain the weighted average function after the function is weighted. The adaptive excitation function in formula (12) is used to excite the incoming data in each gate, and the data with high time series dependence is trained. Academic warnings are classified as follow. , the optimal data training results are obtained. RProp is a resilient backpropagation algorithm, and the MLP network with RProp can be used to train data with low time-series dependence such as the scores of various subjects. First, assign an initial value to the weight change and determine its acceleration and deceleration factors. Then, according to the feedback that the error gradient sign change or does not change during the iteration process, the training is decelerated or accelerated respectively to ensure the stable convergence of the system. Finally, backpropagation is performed by combining the sign of the error gradient with the changing step size. When the sign of the error gradient is negative, the weight needs to be appropriately increased to obtain the minimum value, as shown in formula (13).
The neural network system performs classification operations, and then combines the output results and uses formula (15) to fuse the results, so as to present different academic warnings for students.
Where, p is the weight coefficient, a represents the LSTM prediction result, b represents the MLP prediction result, and the c represents graduation probability.

IV. SIMULATION ANALYSIS OF ACADEMIC EARLY
WARNING MODEL BASED ON LSTM The construction and training of the academic early warning system model of distance education is the key to the development of the system, and the accuracy and running time of the system are affected by the model construction and training results. Determining the number of layers and the number of nodes in the neural network is the main step that affects the construction of the system. In the experiment, the root means square error (RMSE) and the mean absolute error (MAE) are used to verify the number of layers in the neural network. The LSTM-based academic early warning model is repeatedly trained in the training set, and the parameters of the model are continuously adjusted in the process. Fig. 5 shows the training trend of RMSE and MAE values when different models predict time-dependent time series information. The abscissa represents the number of iterations of model training, and LSTM-1 to LSTM-5 represent the number of layers in the neural network, which are 1 to 5, respectively.
In Fig. 5(a), the number of iterations for LSTM-1 and LSTM-2 to start to converge is about 200. The minimum RMSE value of LSTM-1 with 1 neural network layer is 31.84, and the minimum RMSE of LSTM-2 with two neural network layers is 31.84. When the number of layers of the neural network increases to three, the RMSE value decreases more significantly. When the number of iterations is 225, LSTM-3 obtains the lowest RMSE value of 20.17, and the RMSE value remains at this level in subsequent iterations of training. When the number of layers of the neural network increases to four and five layers, the RMSE value has no obvious downward trend. In Fig. 5(b), the change trends of MAE value and RMSE value are consistent. The number of iterations when LSTM-1 and LSTM-2 begin to converge is about 200, the minimum MAE value of LSTM-1 is 22.91, and the minimum MAE value of LSTM-2 is 18.96. When the number of layers of the neural network increases to three, the downward trend of the MAE value is more significant. When the number of iterations is 225, LSTM-3 obtains the lowest MAE value of 15.17, and the MAE value remains at this level in subsequent iterations of training. When the number of layers of the neural network increases to four and five layers, the MAE value has no obvious downward trend. It can be seen from Fig. 5 that in the training data set, the lowest error values RMSE and MAE are obtained when the number of layers of the LSTM neural network is set to 3 layers. When the number of layers is one or five, underfitting or overfitting may be encountered.
When the LSTM neural network is embedded in the academic early warning model, the embedding size is also one of the important factors affecting the performance of the early warning model. In the experiment, the number of layers of the neural network is set to three, and each neural network layer has different embedding sizes. Fig. 6 shows the training variations of the RMSE and MAE values of the LSTM neural network layers under different embedding sizes. In Fig. 6, LSTM-100 represents a three-layer neural network with an embedding size of 500-250-100, LSTM-150 is 500-300-150, LSTM-200 is 500-350-200, and LSTM-250 is 500-400 -250, 500-400-300 for LSTM-300. The number of iterations at which the LSTM-100 and LSTM-150 models start to converge in Fig. 6(a)   The basic LSTM algorithm and the optimized LSTM algorithm proposed in this experiment are comparatively analyzed in terms of accuracy (P), recall (R) and the ratio F. Table I shows the experimental results of the HTM algorithm, the basic LSTM algorithm and the optimization algorithm proposed in this experiment in time series state perception. In Table I, H represents the HTM algorithm, L represents the basic LSTM algorithm, and LSTM represents the prediction model based on the basic LSTM algorithm; C represents the system parameter CPU, M represents the system parameter MEM, and N represents the system parameter NET. The sensitivity of the algorithm to abnormal situations can be judged by the accuracy rate, the detection ability of the algorithm to abnormal situations can be judged by the recall rate, and the F value is used to evaluate the overall performance of the algorithm. The results in Table I show that the accuracy of the optimized LSTM algorithm is 0.929, the recall is 0.917 and the F value is 0.923, which are significantly higher than those of the basic LSTM algorithm. The comparative experimental results show that the accuracy rate of the optimized algorithm is higher, and the probability of false positives of the model is significantly reduced; the recall rate of the optimized algorithm is outstanding, and the probability of false negatives of the model is significantly reduced; the accuracy rate and recall rate of the optimized algorithm are higher, indicating that the overall performance of the optimized algorithm is higher. Based on the experimental parameters set above, the convergence rates of the basic algorithm and the optimized algorithm are compared. www.ijacsa.thesai.org Whether the model satisfies the convergence condition is mainly judged by comparison with the preset value, judgment of the weight value and the comparison of the maximum number of iterations. The purpose of training the model is to reduce the loss value, and it is necessary to set a threshold in advance to determine whether the training is over. The preset threshold will affect the convergence of the neural network model. When the loss value is less than the preset value, it is considered to end the training. When there is no significant change in the weights between two adjacent iterations, it can be considered that the model has reached a state of convergence with no need to adjust the weights. There are many preset applications for the maximum number of iterations. The iteration period is selected according to the actual situation, and the model is trained to achieve the optimal solution. Fig. 7 shows the comparison results of the convergence rate of the basic method and the optimized method in this experiment. It can be clearly seen that the convergence effect of the optimized method is significantly better than that of the basic method. 80 college students in distance education were randomly selected for the comparative evaluation of the academic early warning system, and the results obtained by the academic early warning prediction model based on the optimized LSTM algorithm were compared with the actual situation. In Fig. 8, the abscissa represents the number of students, 0 in the ordinate represents ungraduated, and 1 represents graduation. The figure shows that among the 80 college students, six are inconsistent with the actual number of graduates, and the accuracy rate of the academic warning system is 92.5%. In addition, the academic early warning system is used to predict students' enthusiasm for active learning in distance education, and the predicted value is compared with the real situation. Fig. 9 shows the results. Fig. 9 shows that during distance education, the number of submissions in the academic early warning system is consistent with the actual trend, reflecting that prediction results of learning enthusiasm by the academic early warning system are consistent with the actual situation. The academic early warning system in distance education designed in this experiment has high prediction accuracy and can be applied to students' academic early warning. V. DISCUSSION At present, the proportion of middle line education in the whole education industry is gradually increasing. In the online education environment, students' learning behavior is difficult to be monitored. And according to the information in literature [4] and [5], students under online education generally lack initiative and learning self-confidence, and their overall performance is lower than that of offline teaching. The early warning system is mainly a system to detect signs and warn relevant personnel at the early stage of or before the beginning of a negative situation. The quality of current academic early warning systems is uneven. The early warning system in literature [9] has been applied in some colleges and universities, but its operation has a low degree of automation, and its accuracy also has a large room for improvement. The early warning system in [12] and the proposed early warning model both use neural network models, but the performance of the latter is slightly higher than that of the former. In addition, the early warning model in literature [9] and [12] pays more attention to the academic early warning of offline teaching. When the teaching scenario is online, its performance will be further limited. The early-warning model proposed in this study is specially designed for online teaching, which has great application potential in the current social context. From the results of the performance test, the model proposed in this study also has a high degree of early warning accuracy and automation, which can be applied to online teaching scenarios.

VI. CONCLUSION
In the simulation analysis experiment of the academic early warning model based on LSTM, the number of layers of the neural network is set to three and the embedding size is set to 200 through parameter optimization and comprehensive consideration. The accuracy rate of the optimized LSTM algorithm is 0.929, the recall rate is 0.917 and the F value is 0.923, which are significantly higher than those of the basic LSTM algorithm, which has the accuracy rate of 0.886, the recall rate of 0.812 and the F value of 0.847. Comparing the results obtained by the academic early warning prediction model based on the optimized LSTM algorithm with the actual situation, the accuracy of the academic early warning system is 92.5%. To predict the enthusiasm of students for active learning, the number of submissions in the academic early warning system is consistent with the actual trend. The LSTMbased academic early warning model proposed in this experiment has high performance after parameter optimization, and also has high accuracy in actual case analysis, which can be applied to academic early warning in distance education. This research discusses the application of LSTM network in online teaching evaluation, and broadens the application field of LSTM network. At the same time, the research results have brought a practical model to the field of online education, which can improve the teaching effect through accurate and timely warning, and help students learn and teachers teach. However, the running time of the model was not verified, and it was not compared with other existing models in terms of the running time. Therefore, the proposed model may not be the optimal solution in terms of time, and there is still room for further improvement. In the follow-up work, we need to further improve the relevant experimental content.