Medical Name Entity Recognition Based on Lexical Enhancement and Global Pointer

—Named entity recognition (NER) in biological sources, also called medical named entity recognition (MNER), attempts to identify and categorize medical terminology in electronic records. Deep neural networks have recently demonstrated substantial effectiveness in MNER. However, Chinese MNER has issues that cannot use lexical information and involve nested entities. To address these problems, we propose a model which can handle both nested and non-nested entities. The model uses a simple lexical enhancement method for merging lexical information into each character's vector representation, and then uses the Global Pointer approach for entity recognition. Furthermore, we retrain a pre-trained model with a Chinese medical corpus to incorporate medical knowledge, resulting in F1 score of 68.13% on the nested dataset CMeEE, 95.56% on the non-nested dataset CCKS2017, 85.89% on CCKS2019, and 92.08% on CCKS2020. These data demonstrate the efficacy of our proposed model.


INTRODUCTION
Electronic medical records are a digital repository of a patient's comprehensive medical and health information that can be used for medical and healthcare services. These are vital repositories of medical knowledge and therapeutic experience, containing detailed content about patients' diagnoses and treatment histories. These enable deeper analysis to extract and consolidate valuable medical knowledge, transform implicit knowledge into explicit knowledge, and build a medical knowledge system with a clear hierarchy, well-defined concepts, rich connotations, and significant practical implications. However, due to the high percentage of unstructured data in electronic medical records, direct exploitation is difficult. MNER is a vital step in the extraction of medical information and is essential for biomedical text mining.
MNER has received a lot of attention in recent years, which has resulted in a lot of assessment contests. The China Conference on Knowledge Graph and Semantic Computing (CCKS) held NER evaluation challenges for medical texts from 2017 to 2021. Meanwhile, AliCloud and the Chinese Information Processing Society of China (CIPSC) have released the Chinese Medical Information Extraction (CMeIE) subtask of the Chinese Biomedical Language Understanding Evaluation (CBLUE).
The previous studies in Chinese MNER have mainly used methods based on English MNER to improve the model performance [1]. Although these methods produce good results, they rely on word-level annotation models. However, unlike English, Chinese is not naturally tokenized and does not segment words by spaces in sentences, leading to additional ambiguities when using word segmentation as an additional step in Chinese MNER, which could result in inaccurate entity boundary detection and class prediction due to improper word segmentation errors [2]. As a solution to this problem, character-based Chinese MNER techniques that are better at avoiding segmentation errors have been proposed. Considering that character-based approaches cannot fully exploit lexical information, Zhang et al. [3] introduced the Lattice-LSTM model, which combines lexical information into a characterbased recognition model. However, the complex structure of Lattice-LSTM made it difficult to combine with other models.
On the other hand, in the Chinese MNER, the challenge of identifying nested entities has remained unsolved. Several earlier models are based on sequence models, however not all entities in electronic medical records are self-contained, and there may be nested structures between them. As shown in Fig.  1, in the sentence, where the entity " (lung)" is nested within the other entity " (lung lesions)". Because of the complexity of the nested entity structure and the irregularity in its granularity and number of nested levels, it is difficult to rapidly and accurately gather nested entity information for semantic comprehension, a critical component of improving Chinese MNER. To address the aforementioned challenges, our research provides a Chinese MNER model based on lexical enhancement and Global Pointer. The model first adds lexical information to each character in the electronic medical record by using the lexical enhancement method, and then uses the Global Pointer method to score the beginning and end of each character to identify the medical entity. The following are the primary contributions of our work: www.ijacsa.thesai.org 1) We incorporate the word lexicons into the character representations by introducing a lexical enhancement approach. The approach matches each character with a dictionary built from a corpus to get word sets. Then the word set is compressed and combined with character representation. The experimental results demonstrate that the model with lexical enhancement can improve performance.
2) The Global Pointer is used as the entity recognition module. Extensive experiments are carried out on the nested dataset CMeEE and the non-nested datasets including CCKS2017, CCKS2018, and CCKS2020. The experimental results show that the Global Pointer model can use a unified method to deal with both nested and non-nested entity recognition problems and has better performance than some recent models.
3) The pre-trained model is retrained with Chinese medical corpus for experimentation. It has better results than the basic pre-trained model.
The final experimental results show that the proposed model can be universally applied to nested and non-nested Chinese MNER tasks, and both yield the best results on the four datasets.
The remaining sections are organized as follows: Section II presents the related work on MNER. Section III introduces our proposed model. Section IV presents experiments and results analysis on our model. Section V concludes this paper.

II. RELATED WORK
The purpose of NER is to discover entities in a text and classify them into specified categories such as person, organization, place, and so on. NER is a critical component of information extraction that allows structured information to be extracted from unstructured text input. Effective NER models are required for a variety of downstream tasks, including entity linking, relation extraction, and event extraction. MNER focuses on recognizing and categorizing clinical terminologies such as symptoms, medicines, and therapies in medical data. The MNER task is often treated as a sequential labelling issue, with the objective of assigning a category to each word or character in the text.
The neural network models are now the recommended strategy for English NER. The Bi-LSTM-CRF [4] model is the most typical, incorporating Bi-directional Long Short-Term Memory (Bi-LSTM) for feature extraction and Conditional Random Fields (CRF) for decoding. Unlike English text, Chinese text does not possess clear boundary information. Hence, Chinese NER methods can be broadly categorized into two groups: word-based methods and character-based methods. In the word-based method, word segmentation is performed first, followed by entity recognition [5]. However, this approach may result in the propagation of errors from inaccurate word segmentation, thus leading to incorrect NER. The character-based method, on the other hand, operates on each word individually and does not have the issue of error propagation, but cannot utilize lexical information. Consequently, researchers in word-based models are striving to improve the use of word information [6]. The majority of current studies demonstrate that character-based methods often surpass word-based methods in Chinese NER, due to the issue of error propagation in word segmentation. So, we build a character-based model for Chinese MNER that incorporates lexical information to address the limitations of traditional character-based methods.
MNER in the Chinese language is primarily researched using deep learning-based approaches. Bi-LSTM-CRF model has been proposed for predicting the sequence labels in a posterior conditional random field. Gridach et al. [7] were the pioneers in using a Bi-LSTM-CRF model for NER in the biomedical domain. Dang et al. [8] fine-tuned the word vectors using linguistic information based on the Bi-LSTM-CRF model. Liu et al. [9] combined a multi-channel convolutional neural network with the Bi-LSTM-CRF model and used lexical and morphological features of words as information for entity recognition.
In addition to the Bi-LSTM-CRF model, there have been several studies that apply other deep learning models to the medical field for MNER. For instance, Qiu et al. [10] utilized a residual dilated convolution model for efficient and quick NER in the medical field. Zhang et al. [11] proposed a hybrid model of Dilated Convolutional Neural Network (DCNN) and Bi-LSTM for hierarchical encoding, taking advantage of DCNN to gain global information with fast computational speed. Du et al. [12] proposed a multi-task learning approach with multistrategies based on MRC. NER in the medical field can be approached as a sequence labeling task or a span boundary detection task.
However, the methods mentioned above are based on sequence labeling and cannot be directly used to solve the identification problem of nested named entities, because the same lexical entry in a nested named entity may have two or more different labels at the same time.
Solving the identification problem of nested entities will also improve the accuracy of the model in extracting entities. Previously, a combination of rule-based and machine-learningbased approaches was often used to deal with nested named entities. First, the inner non-nested named entities are identified using the Hidden Markov Model (HMM). Then, the other named entities are identified using rule-based post-processing. Alex et al. [13] proposed several CRF-based models for nested named NER on the GENIA dataset. These methods apply CRFs to entity types in a specific order, so that each CRF can use the output of the previous CRFs. This cascading approach can achieve the best results for nested named NER. In 2009, Finkel and Manning [14] implemented the task of nested NER from a parsing perspective. They constructed a selection tree to map all named entities to a node in the tree. Rule based and machine learning based methods have high accuracy and can make rules according to a specific domain to extract nested named entities. However, there are some problems such as difficulty in recognizing the same type of nested named entities, high time complexity, and difficulty in scaling to large datasets with long sentences.
Recently, the research of nested NER has become a hot topic in the field of information extraction. Span-based methods have become increasingly popular due to their high (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 3, 2023 594 | P a g e www.ijacsa.thesai.org performance. For example, Xu et al. [15] used a local detection approach, where each possible entity span is classified independently. Sohrab and Miwa [16] introduced a simple deep neural model that enumerates all possible spans and classifies them using LSTM. Wang et al. [17] proposed a transitionbased method that builds nested entities incrementally by performing a series of actions designed for this purpose. Tan et al. [18] extended the span-based approach by including a boundary detection task that predicts entity boundaries in addition to classifying spans. Quoc et al. [19] proposed a twostage entity recognition method to address the limitations of span-based models. Our method is also a span-based approach, but unlike previous studies, our model predicts each character as the beginning or end of each span without enumerating each span. As a result, it is highly efficient.

III. PROPOSED MODEL
To address the problem that traditional lexical enhancement methods are complicated and cannot be easily transferred to different deep learning neural network architectures, a simpler approach is used to quickly merge lexical information into each character's vector representation. Meanwhile, the entity recognition module uses a span-based entity recognition approach as the entity recognition module. The Fig. 2 depicts the model's general architecture:

A. Lexical Enhancement
The input sentence * + is processed as a sequence of characters by the Chinese character-based NER model, where denotes the character vocabulary. Each character is represented by a word vector: where denotes the character embedding lookup table.
Character-based NER techniques have the disadvantage of not fully utilizing the information contained in words. To address this, we use SoftLexicon, a lightweight dictionary matching approach for entity recognition in Chinese electronic medical records. SoftLexicon incorporates lexical information into character representations, addressing the issue of character-based NER models' inability to use word information while simplifying the lexical enhancement process [20]. The lexical enhancement features are constructed in three steps: First, in order to preserve the lexical information of characters, all matching words for each character are divided into four sets of "BMES," which are constructed as follows: is the lexicon we built. We constructed it from the already labelled texts of the train and dev sets and counted the number of their occurrences for the second step. The set ( ) represents the word set with the character at the beginning and the length is greater than 1, the set ( ) represents the word set with the character in the middle and the length is greater than 1, and the set ( ) represents the character at the end and the length is greater than 1, the set ( ) represents a single character . If the word set corresponding to the current character is empty, it is set to the special word "NONE". Fig. 3 is an example of the SoftLexicon vocabulary extension. In this way, word embedding can be introduced and there is no loss of information because the tag matching result can be accurately restored by the four word sets.  In the second step, transform each word in the word set using a pre-trained word vector; then perform weight normalization on all words in the four word sets, using a static weighting method based on statistics, that is, the frequency of each word in the static data. This frequency can reflect the importance of the word to a certain extent, and the frequency is obtained statistically when building a dictionary. The weighted method is as follows: where is the "BMES" word sets, ( ) is the frequency of word in the dictionary in the static statistics, is the sum of all words in the word set, and is the lexical embedding lookup table.
Finally, the representations of the four word sets are combined into a one-dimensional feature, which is then stitched onto the representation of that character vector to obtain the final input vector.
where is the result of the calculation in the previous step.

B. Global Pointer
In our work, we use the Global Pointer [21] as an entity recognition module, which is a span-based entity recognition method. Span-based methods identify named entities by classifying sub-sequences of sentences. This method outperforms sequence annotation-based methods in preventing error propagation and is able to easily detect nested named entities as they belong to different sub-sequences.
The Global Pointer concept is similar to a simplified multihead attention mechanism, with as many heads as there are entities. For a sequence of length , NER is performed by the th category. The sequence has a total of ( ) candidate entities, containing all possible entities. The entity recognition task is to select the actual entity from these ( ) candidate entities. The Global Pointer scores each character to determine whether it is the beginning or the end of an entity. Based on this idea, the lexically enhanced vector token can be transformed into and : where is the weight matrix and is the bias. The and are the vector representations of the token which used to identify the entity of type . Specifically, for span ,of type , the start and end positions are represented by and . The score of an entity of type can then be calculated as follows: In the inference step, the segments for which the condition ( ) is satisfied are considered the output entities of type .
The use of Global Pointers alone is insufficient to accurately identify different types of entities as it does not take into account the length and span of the entities. For example, in the sentence " (cytopenia is closely related to the degree of lung lesions)", " (cytopenia and lung)"may be mistaken as a single entity, as " (fine)" is the start of the entity " (cytopenia)" while " (lung)" is the end of the entity " (lung)". The model treats this combination of start and end positions as one entity. To overcome this issue, it is critical to incorporate relative position information into the Global Pointer method, which is more sensitive to the length and span of entities. Global Pointer uses rotational position encoding to encode relative position information. This encoding is based on a transformation matrix with the property , which can be applied to and respectively. The dot product between and is then transformed by the relative position information, resulting in a more accurate representation of the entity.

C. Loss Function
Entity recognition is a multi-label classification problem, so we use a multi-label classification loss function for category imbalance to perform multi-label classification for each label category in the ( ) category after obtaining scores for all segments of each label category. Because the number of entities present in each input text phrase is frequently fewer than ( ) , direct bi-classification causes serious category imbalance problems. Therefore, the method treats the problem as a two-by-two comparison of target and non-target category scores and uses cross-entropy to compute self-balancing weights to avoid the label category imbalance problem, which can be formulated as follows:  where , represent the start and end indexes of a span, refers to a set of spans that have entity type , while refers www.ijacsa.thesai.org to a set of spans that either do not have an entity type or have a different entity type from . The function ( ) calculates the score for a span ,to be an entity of type .

A. Experimental Settings and Evaluation Metrics
The PyTorch deep learning framework is used to create the experimental model. For our experiments, we used pre-trained models including BERT-Base-Chinese [22] and RoBERTalarge [23]. To incorporate more medical information, we also used Bertcner [24], which was retrained on a medical corpus based on BERT-Base-Chinese. We calculate the classification metrics including True Positive ( ), False Positive ( ), and False Negative ( ), and evaluate the performance of the model using the metrics of precision ( ), recall ( ) and Micro-F1( ):

B. Introduction of Nested Dataset
The Tianchi Chinese Biomedical Language Understanding Evaluation Benchmark, jointly provided by Peking University, Zhengzhou University, Pengcheng Laboratory, and Harbin Institute of Technology (Shenzhen), published the CMeEE dataset, with a total of 938 files annotated with 47,194 sentences divided into nine major medical entity categories: diseases (dis), clinical (sym), and so on. We conducted experiments and analyzed them on this dataset to validate the effectiveness of our model. The data sources of the dataset include clinical trials, electronic medical records, medical books and search logs from real-world search engines. As biomedical data may contain privacy information such as patients' names, ages and genders, all collected data are anonymous and reviewed by ethics committees to protect privacy.
The CMeEE dataset differs from traditional NER in that there is a nested relationship between entities, which is a common phenomenon in medical texts and makes the model processing more complex. The nested entity instances accounted for about 11% of the training set. The dataset contains 15,000 sentences in the training set, 5,000 sentences in the validation set, and 3,000 sentences in the test set. To protect privacy, all data were anonymized and reviewed by an ethics committee.

C. Baselines of Nested Dataset
The comparison models used are the benchmark models officially released by CBLUE, and these methods are implemented based on the pre-trained model, including: 1) BERT-Base-Chinese: The basic model used had 12 layers, 768 hidden layers, 12 headers, and 110 million parameters.
2) RoBERTa-large: RoBERTa removed the next sentence prediction target and dynamically changed the masking pattern applied to the training data.
3) Bertcner: A medical pre-training model was obtained by crawling a 1.05G clinical text consisting of different medical domain corpora obtained on the web to train the BERT model again.

4) ZEN [25]:
A BERT-based Chinese text encoder was enhanced by n-gram representation, considering different character combinations in training. [26]: The Mac-BERT was an upgraded BERT that included a new Masked Language Model (MLM) as a correction pre-training assignment, which reduced the differences between pre-training and fine-tuning.

5) Mac-BERT-Base/large
6) PCL-MedBERT: A pre-trained medical language model was proposed by the Intelligent Medicine Research Group in the PengCheng Lab, which excelled in medical question matching and NER.
In addition to the officially provided baselines, the following experiments are used for comparison: 7) TPLinker: Wang et al. [27] developed a single-stage joint extraction approach for addressing entity relationship extraction designs that could find overlapping relationships between one or two entities while being unaffected by exposure bias. The approach utilized to identify entity pieces was chosen as a comparison method in this paper.
8) Muti-head: Li et al. [28] developed a training strategy based on fragment annotation to solve the lack of entity data annotation. The essential idea was to employ negative sampling, which prevented NER models from being trained on unlabeled items. 9) Biaffine: Yu et al. [29] introduced a new NER approach that treats NER as a problem of dependent syntactic analysis, using graph neural networks to model the global information of the input sequence.

D. Results and Analysis of Nested Dataset
As shown in Table II, the F1 scores for entity recognition using only pre-trained models including BERT-Base-Chinese, RoBERTa-large, Bertcner, ZEN, Mac-BERT-Base/large, and PCL-MedBERT range from 60.7% to 62.8%. Compared to the human score of 67%, there is a gap of at least 4.2%. One www.ijacsa.thesai.org important reason for this gap is the presence of many nested entities in this dataset, which cannot be recognized by these basic models based on sequence modeling. Therefore, our model uses a span-based model called Global Pointer as the entity recognition module. This model can calculate whether each character in the input text can be the beginning or end of an entity. It can recognize small entities nested within larger ones without a limit on the nesting level. We conducted three sets of experiments to validate this approach. Specifically, GP+BERT-Base-Chinese improved performance by 4.66% as compared to utilizing only BERT-Base-Chinese. When compared to the model utilizing only Bertcner, using GP+Bertcner improved the score by 4.58%, while using Global Pointer with RoBERTa-large, the GP+RoBERTa-large model improved the score by 5.99%. Additionally, our model GP+ BERT-Base-Chinese outperformed two other span-based models, Muti-head and Biaffine, by 1.93% and 3.91%, respectively. The above results demonstrate the effectiveness of Global Pointer. This also demonstrates that if nested entities can be recognized, it can significantly improve the performance of model and make it more suitable for MNER. To verify that a character-based model with added lexical information would perform better, we conducted experiments using SoftLexicon as a vocabulary enhancement method. Compared to experiments using only Global Pointer, we found a generally improved performance, with an increase of approximately 0.1% to 0.67%. This demonstrates that, under the same experimental conditions, adding lexical information into word embeddings has a certain gain effect on entity recognition.
We also conducted comparative experiments using Bertcner, which was trained on medical corpora based on the BERT-base model. When using pre-trained models directly for experiments, Bertcner outperformed BERT-Base-Chinese by 0.7%. With the addition of Global Pointer, GP+Bertcner outperformed GP+BERT-Base-Chinese by 0.62%. When SoftLexicon was added, GP+Bertcner+SoftLexicon still resulted in a 0.28% improvement over GP+BERT-Base-Chinese+ SoftLexicon. These results demonstrate that using BERT models trained on medical corpora can further improve the recognition of medical entities.

E. Introduction of Non-nested Datasets
Three non-nested datasets provided by the CCKS competition were used in the experiments. the CCKS-2017 dataset has 300 electronic clinical records and 29,866 labeled entities, categorized into five entity types: treatments, signs and symptoms, diseases and diagnoses, examinations and tests, and body parts. the CCKS-2019 dataset has 23,401 labeled entities, annotated into six entity types: diseases and diagnoses, examinations, tests, procedures, medications, and anatomical sites. The CCKS-2020 dataset has 32,120 labeled entities, with the same entity type classification as CCKS2019.

F. Baselines of Non-nested Datasets
The comparative models are presented below: 1) RoBERTa-Bi-LSTM-CRF [30]: The model contains three layers, a character embedding layer, a Bi-LSTM layer, and a CRF layer, relying on character-based word representations learned from a supervised corpus. The Bi-LSTM-CRF model can improve medical named entity recognition by capturing contextual information using bidirectional LSTM, and by modeling dependencies between tags using CRF.
2) RoBERTa-Bi-GRU-CRF [31]: The neural network model integrated Bi-GRU and CRF for sequence labeling tasks. Bi-GRU is a gated recurrent unit, an improved RNN that can solve the gradient vanishing and long-term dependency problems. By feeding the Bi-GRU output into CRF, the Bi-GRU-CRF network may use both bi-directional context information and label constraints for sequence tagging at the same time.
3) Bertcner [24]: A medical pre-training model was obtained by crawling a 1.05G clinical text consisting of different medical domain corpora obtained on the web to train the BERT model again.
4) Ra-RC [32]: The model used RoBERTa as an encoder to capture contextual information and adds part-level features on top of it to enhance the understanding of Chinese language. 5) AR-CCNER [33]: The model used part-level characteristics to augment character semantic information and a self-attention technique to record character interdependence. 6) ACNN [34]: This method effectively learned global context information using an attention mechanism and multilayer CNNs and captured both short-term and long-term contextual information.
7) BE-Bi-CRF-JN [35]: This method combined the original text in NER tasks with its medical encyclopedia knowledge by establishing connections and interactions to enhance the ability of entity recognition. www.ijacsa.thesai.org 8) RGT-CRF [36]: This model used two sets of features, word-based and word-based features, to make full use of the characteristics of Chinese language. The model also used a rule generator to automatically construct rules to improve the generalization ability of the model.

G. Results and Analysis of Non-Nested Datasets
As shown in Tables III, IV and V, we conduct experiments on the CCKS2017, CCKS2019, and CCKS2020 datasets to evaluate our model's performance on non-nested datasets. The experimental results indicate that using Global Pointer as the entity recognition module has a significant improvement compared to using only pre-trained models. Compared with BERT-Base-Chinese, RoBERTa-large, and Bertcner, using the Global Pointer method shows an improvement of 2.37% to 5.35% on the three datasets.   Compared with the mainstream model using BiLSTM-CRF, our GP+RoBERTa-large model has gained 1.61%, 4.93%, and 3.74% F1 performance improvement over RoBerta-Bi-LSTM-CRF on the CCKS2017, CCKS2019, and CCKS2020 datasets.
In addition, we have compared our model with some other methods, such as the Ra-RC model that uses bi-directional long short-term memory networks to learn radical features of Chinese characters, the AR-CCNER model that uses convolutional neural networks to extract aggressive features while using self-attention mechanism to capture dependencies between characters, and the ACNN model that uses a multilayer CNN structure to capture short-term and long-term contextual relationships for experiments, our model still outperformed these models in terms of F1 performance.
To verify the effect after adding vocabulary information, SoftLexicon was added separately to introduce vocabulary information for experimentation. Compared with the GP model without using vocabulary information, there is an improvement of about 1%. We also compared our model with other models combining lattice method with Chinese character information such as RGT-CRF. The performance is comparable on CCKS2017 dataset and improved by 0.72% on CCKS2019 and by 0.88% on CCKS2020 in terms of F1.
Finally, we experimented incorporating medical information into MNER model by using Bertcner trained on medical corpus data to obtain semantic features. Compared with basic pre-training model, GP+Bertcner+SoftLexicon has an improvement of 0.1%-0.5% improvement on the F1 over GP+BERT-Base-Chinese+SoftLexicon. When comparing it with BE-Bi-CRF-JN, there are improvements of about 1% and 8% on CCKS2018 and CCKS2020 datasets respectively. www.ijacsa.thesai.org

H. Limitation Analysis
Compared with other methods, our model achieved the best results. However, there are still many entities that have not been recognized. In order to promote future work on MNER, this section analyzes the reasons for identification errors. The causes of errors are broadly classified into the following two cases: 1) Ambiguity: Some entities may have different meanings or belong to different categories in different contexts, resulting in difficulties and inaccuracies in entity recognition. To solve this problem, contextual information should be used to determine the true meaning or category of the entity.
2) Inadequate medical knowledge: Because the medical field contains a large number of terms and complex concepts, identifying and classifying these entities can be difficult for non-professionals. For example, in order to correctly classify entities such as diseases, drugs, symptoms, and treatment plans, it is necessary to have a thorough understanding of these concepts. Furthermore, medical knowledge evolves quickly, with new research findings and treatment methods constantly emerging. In this case, relying solely on pre-trained models may not meet real-time demands; collaboration with domain experts can ensure timely model updates, allowing for the resolution of new challenges.

V. CONCLUSION
In this paper, we propose a model which uses the Global Pointer with a lexical enhancement method and demonstrate its effectiveness for Chinese MNER on nested and non-nested datasets. By using lexical enhancement to incorporate word lexicons into the character representations, our model can perform Chinese NER at the character level and avoid the word segmentation errors. By using Global Pointer, our model can recognize both nested and non-nested entities by enabling a global view that takes the beginning and end locations into account. Experiment results, conducted on the CMeEE, CCKS2017, CCKS2019, and CCKS2020 datasets, show that the proposed model has excellent performance on these four data sets. Our results establish a new benchmark for Chinese MNER and open avenues for further research and exploration. In future work, we will investigate the way to leverage unlabeled data and extend our work to more datasets.