Patent Text Classification based on Deep Learning and Vocabulary Network

—Patent documents are a special long text format, and traditional deep learning methods have insufficient feature extraction ability, which results in a weaker classification effect than ordinary text. Based on this, this paper constructs a text feature extraction method based on the lexical network, according to the inner relation between words and classification. Firstly, the inner relationship between words and classification was obtained from linear and probability dimensions and the lexical network were constructed. Secondly, the lexical network is fused with the features extracted from the deep learning model. Finally, the fusion features are trained in the original model to get the final classification result. T This method is a classification enhancement method that can classify patent text alone or enhance the accuracy of various types of neural networks in patent text classification. Experimental results demonstrate that the accuracy of BERT combined with lexical network method is as high as 82.73%, and the accuracy of lexical network method combined with CNN and LSTM is increased by 2.19% and 2.25% respectively. In addition, it was demonstrated that the lexical network feature extraction method accelerated the convergence speed of the model during training and improved the classification ability of the model in Chinese patent texts.


I. INTRODUCTION
As the main carrier of scientific and technological information, patents contain about 90% of global technical information. In 2021, the effective number of Chinese invention patents will be 3.597 million, facing a large number of patent applications, patent text review takes up a lot of time for patent examiners. Patent classification is an important part of the patent review, which can reduce the time and cost of subsequent retrieval and management of patent documents. At present, the main patent classification used in China is the International Patent Classification [1]. IPC classification has five levels, belonging to a multi-level and multi-label classification system. At present, China mainly uses manual methods to classify patents. For example, after receiving patent application documents, patent examiners need rich professional knowledge to accurately classify patents, making subsequent patent texts easy to review and retrieve. In the face of massive patent texts, intelligent and effective implementation of automatic classification of Chinese patent texts has become the focus of the industry and academia.
Automatic text classification is one of the important fields in NLP (Natural Language Processing). In recent years, there has been much research on text classification using machine learning and deep learning, Early text classification models generally refer to the support vector machine [2] [3], Bayesian classifier [4] [5], k-Nearest Neighbor algorithm [6] [7] and other machine learning [8] methods for text classification. However, text classification of machine learning methods relies on manually constructed features and ignores the context between words and sentences, which makes it difficult to improve the accuracy of such methods. After Hinton [9] and others first proposed the concept of deep learning in 2006, scholars proved that deep learning technology has good performance in text classification. Since then, in the field of text classification, deep learning has gradually replaced machine learning as the mainstream research direction. Therefore, on the basis of previous studies, based on the analysis of a large number of patent data, this paper proposes a deep learning feature extraction method based on a vocabulary network, which improves the accuracy of deep learning on patent text classification through feature fusion between the extracted features and the deep learning model. The practice shows that this method combined with deep learning has a certain improvement in Chinese patent text classification, which confirms the contribution of this paper in the following aspects.

1)
In this study, a method of combining deep learning with vocabulary network feature extraction is proposed theoretically.
2) It is proved in practice that this method combined with other deep learning models can speed up the model convergence and improve the accuracy of model classification.
The rest of this paper is organized as follows: In Section II, we introduced the relevant work and our contributions. In Section III, we introduce our research framework. In Section IV, we selected text data to verify and illustrate the relevant parameters of this model. Finally, the conclusion and further work are given. *Corresponding Author www.ijacsa.thesai.org II. RELATED WORK Automatic patent text classification is a technical means to use computers to classify and mark the text according to a specific label system, which is convenient for the patent document management and text search. This paper mainly solves the task of automatic patent text classification through deep learning. Therefore, this section mainly introduces the current research status of scholars using deep learning methods to classify text.
Deep learning text classification is mainly represented by a neural network model, attention mechanism and pre-training model.

1)
The text classification method based on the neural network model presents the text in low dimension by word embedding and achieves the classification effect by using an encoder. Representative models include convolutional neural network (CNN) [10], recurrent neural network (RNN) [11] and graphical neural network (GCN) [12], Among them, CNN can acquire local relations [13], RNN uses a recurrent neural network to have the ability to learn long-distance texts [14], GCN can effectively integrate the features of references.
2) Compared with the neural network model, the attention mechanism can focus on a part of the information related to the classification task while ignoring the irrelevant part when processing tasks, In 2014, Bahdanau et al [15].Used the attention mechanism to perform machine translation tasks for the first time, making great progress in machine translation. In text classification tasks, the attention mechanism is often combined with the neural network model, She et al [16]. Proposed a text classification algorithm based on the hybrid CNN-LSTM model, used the feature vector output from CNN as the input of LSTM and used Softmax classifier for classification, which can effectively improve the accuracy of text classification. Yang et al [17]. Based on the combination of convolutional neural networks and long-term and shortterm memory, are used to capture local features to enrich feature information and use weight values to adjust the enhancement intensity.
3) The pre-training model is represented by the BERT model, and the improvement of model classification results mainly depends on the adjustment process of the model [18]. BERT has a strong sentence and word expression ability and has an excellent performance in NLP tasks at the word level [19]. Lu et al [20]. Proposed the VGCN-BERT model, which combines BERT's capabilities with vocabulary graph convolutional network (VGCN). The different layers of BERT interact to enable them to interact and jointly build the final classification representation. Zheng et al [21]. Proposed a BERT-CNN model for text classification. By adding CNN to the task-specific layer of the BERT model, you can obtain information about important fragments in the text. Some of the above methods have been successfully applied to patent text classification tasks, and have achieved some results. As a branch of special long texts in text classification, Chinese patent texts are difficult to improve the accuracy of classification. Many scholars have carried out a lot of research on this. For example, Wen et al. [22] proposed a multi-level patent text classification model, called ALBERT-BiGRU, which combines ALBERT and bidirectional gated loop units. The dynamic word vector pre-trained by ALBERT is used to replace the traditional static word vector, and the BiGRU neural network model is used for training, which preserves the semantic association between long-distance words in the patent text to the maximum extent. Roudsari et al [23]. Studied the effect of applying the DistilBERT pre-training model and finetuned it to complete the important task of multi-label patent classification. To sum up, domestic and foreign scholars' patent text classification methods based on deep learning mainly focus on the fusion application of deep learning models, the adjustment of pre-training models or the introduction of attention mechanisms to extract relevant features. Although these methods can achieve the goal of improving the classification effect, they are all cumulative applications of existing methods, which not only affect the model classification efficiency but also difficult to improve the patent classification effect.
In previous studies, the focus has been on improving the models, while ignoring the specific structure of the data itself, making it difficult to improve the accuracy of patent text classification. Based on this, a generic feature extraction method is considered for patent text feature extraction to improve the accuracy of automatic patent text classification, mainly considering the intrinsic connection between words and classification as the basis for feature extraction, and introducing the extracted features into the deep learning model to fuse with the features extracted by the model to improve the model classification effect.

III. PROPOSED WORK
Through the analysis of previous text classification studies, it is found that LSTM has more advantages than CNN in text classification because of its contextual characteristics [24], which has also been confirmed in many long text classification studies. However, in the classification of Chinese patent texts, there is an exception in that the accuracy of CNN is equal to or slightly higher than LSTM. Through the analysis of patent data and experimental reproduction, it is found that patent documents belong to a special format in long texts, and are insensitive to the context, but sensitive to the words divided by social production practices such as industry and industry. Therefore, in the neural network feature extraction stage of deep learning, the parameters extracted by traditional feature extraction methods in patent texts have the low feature density required for classification, and the existence of multi-word synonyms in feature words increases the difficulty of classification, which makes the full connection layer learning feature inefficient and feature learning incomplete in deep learning, ultimately leading to low accuracy of Chinese patent text classification. In order to solve the problem that the feature extraction ability of Chinese patent text of neural network is insufficient, this paper proposes a feature extraction method based on a vocabulary network to strengthen the feature extraction ability of Chinese patent text of deep learning network, so as to improve the accuracy of Chinese patent text classification. www.ijacsa.thesai.org

A. Deep Learning Model based on Vocabulary Network
Feature Extraction This paper proposes a feature extraction method of a deep learning model based on a vocabulary network. This method combines the extracted features with the extracted features of the original model and expands the features on the basis of retaining the features of the original model to improve the classification accuracy. Fig. 1 is a schematic diagram of the LNFE method of deep learning model fusion, which includes four parts: text vector representation, feature extraction, feature fusion and text classification prediction. In the figure, Pi is patent data, Wi is patent text information after word segmentation, and Fi is the feature information extracted from text. In the training or prediction process, the patent text is first segmented and processed separately. In the deep learning network feature extraction, the data is extracted in the original feature extraction layer and LNFE, respectively, and then the extracted features are spliced and fused, and then the appropriate parameters are trained in the full connection layer to classify the subsequent data.

B. Vocabulary Network Generation
The lexical network is a key step in feature extraction, which includes synonym clustering of words in the patent, the internal relationship between words and classification and other information. In the subsequent feature extraction process, the characteristic words of patent text can be accurately extracted using the lexical network as a reference. Fig. 2 is a schematic diagram of the generation of a vocabulary network.
First, the word vector (Word Embedding) representation of each word in all texts is obtained by unsupervised learning of Word2vec, and the word vector similarity is calculated by using the cosine formula of Formula 1 to cluster and merge synonyms in words, thus avoiding the problem of slow model convergence caused by synonyms in the feature extraction stage.
Where v ⃗ and w ⃗⃗⃗ are word vectors trained by Word2vec respectively, Sim is the similarity between vector v ⃗ and w ⃗⃗⃗ , and is the value of the vector v ⃗ and w ⃗⃗⃗ at different latitudes.
Each group of word vectors after merging has corresponding label labels. According to Formula 2 and Formula 3, the prior probabilities of words and classifications are obtained from the two dimensions of probability and linearity through Label information and word vector information, and each word vector is given a classification feature weight by weighting.
Where the ρ Is the correlation between word vector and classification, is the frequency of word change in patent No.i, ̅ appears in this category ̅ Average frequency of. is www.ijacsa.thesai.org Classification of y in i word patent, ̅ is the average of the categories in the patents with words. In Formula (3), ( | ) represents the probability of being classified as when the word A appears, ( | ) is the probability of occurrence of A in classification, ( ) is the probability of occurrence of classification.

C. Classification Model Design
Based on the characteristics of existing models, this paper selects CNN, LSTM and Bert models to combine with LNFE as the new deep learning model, and designs LNFE, CNN, LSTM and Bert classical models as the contrast to judge the classification effect of the new deep learning model.   LNFE-LSTM model LNFE-LSTM model short-term memory is a special RNN network, which is mainly used to solve the problem of gradient disappearance and gradient explosion during long sequence training. LSTM mainly has three gate structures: forgetting gate, input gate and output gate. The forgetting gate, mainly decides to give up the information and leave important information. The input gate, mainly determines which information is added, and the output gate determines which information can be output after judgment. The supplementary calculation formula of the LNFE-LSTM model is as follows: Input gate ̃= tan ℎ ( · [ℎ −1 , ] + ) (6) Output gate In this paper, we propose an LNFE-LSTM model, which can accurately extract patent features through LNFE, combine LSTM text classification to deal with long text classification problems, and has the characteristics of context semantic features.
 LNFE-Bert model The Bert model perfectly considers the bidirectional semantic features, while retaining the maximum meaning of the text according to the context word order features. This paper accurately extracts the patented features through LNFE, combines Bert (Bidirectional Encoder Representations from Transformers) [25] with the bidirectional transformers encoder, trains the model through the masked language model and next sentence prediction, Have strong sentence expression ability. The supplementary calculation formula of the LNFE-Bert model is as follows: ( , 1 , 2 ) = 1 ( , 1 ) + 2 ( , 2 ) (10)

D. Evaluation Indicators
In order to evaluate the classification effect of the LNFE method combined with deep learning, this score uses Precision, Recall and F1 values as evaluation indicators, and judges the feature extraction ability of the LNFE method by comparing various indicators of LNFE, CNN, LNFE CNN, LSTM, LNFE LSTM, Bert, and LNFE Bert models. Formula (13)(14)(15) is the calculation formula of the P value, R-value and F1. Generally, the higher the P value is, the higher the prediction accuracy is which proves that the stronger the prediction ability of the model is, the higher the R-value is, the better the model is, and the better the classification effect is. Since the single expression of the P value and R-value may lead to incomplete measurement standards due to the uneven distribution of data, F1 value can better evaluate the model by means of a weighted average.

A. Experimental Platform and Parameter Selection
The processor of the experimental platform used in this paper is Dual Core Intel Core i7, the memory is 8 GB (2133 MHz) LPDDR3, and the GPU is Nvidia 2080Ti. The operating environment is Python 3.9, the network framework is PyTorch, and the training models are CNN, LSTM, BERT, LNFE, LNFE-CNN, LNFE-LSTM, and LNFBERT. Among them, CNN, LSTM and BERT are general comparison networks in current patent classification research, and LNFE is a neural network that only uses the LNFE method as a feature extraction method. LNFE-CNN, LNFE-LSTM and LNFE-BERT are neural networks constructed by combining the extracted features of LNFE with the original features. Through analysis, it is found that the LNFE method has a high correlation with the classification in the first 40 words on average in the extracted feature times, so the LNFE method intercepts 40 words for training. In the training process, Table I is selected as the parameter of the neural network for training.

B. Data Source and Preprocessing
In this experiment, the "part" in the main IPC classification number is selected as the classification basis. The classification data is from the incoPat patent database. The obtained patent information includes the title, abstract, IPC main classification number, independent claims and other information. The uniform distribution of data in the learning features of the model is conducive to improving the learning effect of the model. Therefore, patent data are randomly selected from the eight IPC ministerial classifications of patents, and 125064 patents are screened through data denoising. In order to evaluate the effectiveness of the model, cross-validation method is adopted. The paper experiment extracts 80% of the collected data as the training set, 10% as the verification set, and 10% as the test set. The data distribution of the patent data set is shown in Fig. 4.

C. Patent Text and Word Vector
The title and abstract of a patent are a summary of patent documents. Independent claims fully represent technical solutions and necessary technical features. This information can be used as a feature source of patent classification. To classify patents using deep learning, we first need to import the text to a word vector into the neural network, so we need to use Word2vec to convert Chinese patent text information into a word vector and then import it into the neural network for learning. This paper combines several neural networks, including Word2vec, CNN, LSTM, Bert, and so on. The vocabulary length of the training text is fixed during the training process. Table II shows the statistics of the vocabulary length of patent texts. The vocabulary queue of most patent texts is between 200 and 400, with an average vocabulary length of 241. In this paper, 240 words are used as the text cutting position, and the missing parts are filled with 0 vectors.

D. Vocabulary Network Generation
In the process of vocabulary network generation, 125064 patent documents generated 272141-word vectors, and 241524word vectors remained after removing stop words and merging synonyms. By calculating the Spearman correlation and Bayes probability weighting of words and tags, 30080 words with strong correlation characteristics were obtained and classified, and their word network is shown in Fig. 5. Each point in the vocabulary network represents a word. The position of the point is the mapping of the high latitude of the word vector in the low dimensional. The shorter the distance between the points, the closer is the part of speech. The colors of the dots represent different parts of the patent classification related to the word. It can be found from the number of related words that a large number of words have a strong relationship with some categories. From the distribution of words, the meaning of words is also related to classification, which also confirms the above statement that "Chinese patent texts are sensitive to words classified by social production practices such as industries and industries".

E. Model Training
The deep learning model used for classification is usually trained at the full connection layer after feature extraction. Finally, the data are classified through SOFTMAX. After repeated training, the accuracy of classification is improved. In order to verify whether the LNFE method has the ability of feature extraction, the LNFE method is used as the feature extraction layer of the neural network to test the classification results. First, after removing the stop words and vectorization of the Chinese patent text, the similarity analysis of the word meaning with the words in the vocabulary network is conducted, and the words in the word segmentation results of the patent text are replaced with highly similar words in the vocabulary network to merge the multi-word consent words. According to the weighted results of words and classification, the word vector of 40 words with the highest weight is selected as the feature, which is sent to the network layer for training. The training process is shown in Fig. 6. After several iterations of training, the accuracy of the model can be found to improve rapidly. This result confirms that the LNFE method indeed extracts classified feature data.

F. Accuracy Verification
After the LNFE method is confirmed to have the feature extraction capability, this section verifies the feature extraction effect of LNFE by comparing it with other networks. Table III shows the comparison results between the LNFE method and other networks. During the validation process, it is found that traditional neural networks have improved after combining the LNFE method, while Bert has limited improvement ability to Bert due to its excellent feature searching ability. Although the Fig. 6. LNFE training process Word2vec+LNFE method is slightly higher than other networks; it is lower than traditional neural networks combined with the LNFE method. This is because the LNFE method only extracts features in probability and linear dimensions, while traditional networks consider more dimensions. Because the LNFE method has the characteristic of being Embeddable for neural networks, it has practical value. www.ijacsa.thesai.org

G. Data Analysis
The classification results after CNN and Bert classify separately and combined LNFE method is shown in Fig. 7.
Because the text features found by the LNFE method are more explicit, LNFE CNN has a faster convergence speed than CNN. At the same time, LNFE CNN may learn more text features than CNN, so the accuracy of LNFE CNN is slightly higher than CNN. The LNFE method has the prior probability of words in the patent text so that LNFE Bert can find the complex functions of features and classification earlier than the Bert model in the training process, that is, the LNFE Bert model converges earlier than the Bert model. However, Bert's model pays more attention to the feature extraction of text. After a lot of training, the accuracy of the Bert model and LNFE Bert is almost the same. Therefore, it can be determined that the LNFE method can improve the learning ability and classification accuracy of traditional depth learning models. When comparing many text classification materials, it is found that the accuracy of the classification results of Chinese patent texts is lower than that of other texts. Through repetition experiments and analysis and verification, it is concluded that patent documents are insensitive to context due to special formats, and the traditional depth learning model has insufficient ability to extract features of Chinese patent texts. These two reasons lead to slow convergence speed and low accuracy of the model. In order to solve this problem, this paper proposes an LNFE feature extraction method, which uses Spearman and Bayes to obtain the prior probabilities of vocabulary and classification from the probability and linear dimensions respectively, constructs the network relationship between vocabulary and classification in a weighted way, and uses the features extracted from the vocabulary network to fuse with the features extracted from the deep learning model for classification. The method differs from the traditional method of improving the accuracy of the model by altering the neural network, by extracting the patent text features in depth through the prior probability of the text, thus allowing the combination www.ijacsa.thesai.org of various neural networks to enhance the performance and accuracy of the network in patent text classification. The experimental results demonstrate that the method speeds up the convergence of the model to a certain extent and improves the classification accuracy of the model. However, in this paper, only a ministerial lexical network is constructed to validate the method for Chinese patent text classification. In future work, we will try the method for more fine-grained patent classification in the case of feature extraction, and try the method in other text classification to verify whether the method is universal.
In future research, we will enrich the construction methods of patent text features based on how to make more subtle potential features of patent text, so as to improve the classification performance of the model. At the same time, it will also further explore the performance of the model applied to other tasks in the field, and consider how to deal with multiclassification problems to provide better help for researchers.