BERT based Named Entity Recognition for Automated Hadith Narrator Identification

Hadith serves as a second source of Islamic law for Muslims worldwide, especially in Indonesia, which has the world's most significant Muslim population of 228.68 million people. However, not all Hadith texts have been certified and approved for use, and several falsified Hadiths make it challenging to distinguish between authentic and fabricated Hadiths. In terms of Hadith science, determining the authenticity of a Hadith can be accomplished by examining its Sanad and Matn. Sanad is an essential aspect of the Hadith because it indicates the chain of the Narrator who transmits the Hadith. The research reported in this paper provides an advanced Natural Language Processing (NLP) technique for identifying and authenticating the Narrator of Hadith as a part of Sanad, utilizing Named Entity Recognition (NER) to address the necessity of authenticating the Hadith. The NER technique described in the research adds an extra feed-forward classifier to the last layer of the pre-trained BERT model. In the testing process using Cahya/bert-base-indonesian-1.5G, the proposed solution received an overall F1-score of 99.63 percent. On the Hadith Narrator Identification using other Hadith passages, the final examination yielded a 98.27 percent F1-score. Keywords—Hadith narrator; hadith authentication; natural language processing; named entity recognition; NLP; NER; BERT; BERT fine-tune


I. INTRODUCTION
Islam is a massive faith globally, with over 2 billion Muslims in 2018, accounting for approximately 29.04% of global society. Indonesia has the most massive Muslim community, with more than 233.38 million Muslims in 2018. Muslims need to refer to Islamic rules in the Holy Qur'an and Hadith as their life guidance. The Holy Al-Qur'an is the absolute revelation from Allah (God of Muslims), while Hadith, notably, is a compilation of quotes that Prophet Mohammed has said. Nevertheless, not all the Hadith text is authenticated and authorized to apply. Several fabricated Hadiths cause many issues in determining between genuine and non-genuine Hadiths. The existence of fabricated Hadiths denigrates and reduces the Hadiths' authority and significantly affects Muslim's entire lives, mainly in belief, law, morals, observance, and others. The worst effect of fabricated Hadiths is the confusion they bring to Muslims and, consequently, corrupt their faith [1]. Therefore, it is vital to investigate to verify the authenticity and originality of the accessed Hadiths.
Hadith verification can employ two principal parameters that possibly recognize the condition of a particular Hadith: (1) the context (the meaning of the Hadith itself) and (2) the narrators (the people who recite the Hadith). Recognizing the narrators' names has a crucial role in authorizing a particular Hadith. For example, the snippet of text from Indonesia's Hadith is below: Telah menceritakan kepada kami Al Humaidi Abdullah bin Az Zubair dia berkata, Telah menceritakan kepada kami Sufyan yang berkata, bahwa Telah menceritakan kepada kami Yahya bin Sa'id Al Anshari berkata, telah mengabarkan kepada kami Muhammad bin Ibrahim At Taimi, bahwa dia pernah mendengar Alqamah bin Waqash Al Laitsi berkata; saya pernah mendengar Umar bin Al Khaththab diatas mimbar berkata; saya mendengar Rasulullah shallallahu'alaihi wasallam bersabda: "Semua perbuatan tergantung niatnya, dan (balasan) bagi tiap-tiap orang (tergantung) apa yang diniatkan; Barangsiapa niat hijrahnya karena dunia yang ingin digapainya atau karena seorang perempuan yang ingin dinikahinya, maka hijrahnya adalah kepada apa dia diniatkan" The meaning is: Has told us Al Humaidi Abdullah bin Az Zubair he said, Has told us Sufyan who said, That has said us Yahya bin Sa'id Al Ansari said, has told us Muhammad bin Ibrahim At Taimi, that he had heard of Alqamah bin Waqash Al Laitsi said; I once heard Umar bin Al Khaththab on the pulpit say; I heard the Prophet sallallaahu'alaihi wasallam say: "All actions depend on the intention, and (retribution) for each person (depending on) what is intended; Whoever intends to emigrate because of the world he wants to achieve or because of a woman he wants to marry, then his hijrah is what is he intended for?" The example Hadith text above has five narrators (highlighted in gray). The status of the above Hadith, authentic or not, can be assessed by identifying and assessing the worthiness of the five narrators. NER is an NLP role that recognizes and classifies named entities in a provided text. "Named entities" refer to predefined semantic categories such as people, locations, and organizations. NER is theoretically applicable to various domains and languages. Therefore, it is challenging to address NER to identify the Hadith Narrator and authenticate it.
This study proposes semi-supervised BERT (Bidirectional Encoder Representations from Transformers) with an extra feed-forward neural network for Hadith Narrators to execute NER, particularly for Indonesian Hadith texts. In case all of the Hadith Narrators have already been identified using the proposed NER Model. Then it is possible to continue with defining the Hadith authentication. The remainder of the essay is organized as follows: To begin, Section II reviews prior work on NER and Hadith Narrator Identification. Section III substantiates this view by discussing the academic definitions of the NER and BERT and the evaluation factors used. Then, Section IV clarifies the recommended model for this investigation. Section V discusses the findings of this research examination. In the end, Section VI discusses the final findings and future research directions.

II. RELATED WORK
The implementation of NER is domain and languagedependent. When utilized in other domains, the NER generated for one domain performs poorly [2] [3]. The scope of the study reported in this paper is limited to identifying the Hadith Narrator using the NER technique. Specifically, Hadith in the Indonesian language.
The study [4] proposes a new Part of Speech (POS) tag and rule-based narrator name extraction for Malay Hadith text. The result was the creation of the POS tag involving 256 words developed from Hadith text, and the rules were created based on five Narrator chains. Similarly, in [2], the author presents a unique rule-based technique for automatically identifying person-name entities in the Malay Hadith text-domain. The model was developed by manually recognizing the names and mannerisms of 150 Malay Hadith books and then developing rules based on them.
The study [5] created a model of NER for Hadith texts written in English. The proposed model makes use of the Support Vector Machine (SVM), the Maximum Entropy Classifier (ME), as well as the Naive Bayes (NB), and classifier combination methods. The results indicate that the classifiers' combination technique achieves the best performance, with precision, recall, and F-Measure values of 96.9 percent, 93.6 percent, and 95.3 percent, respectively. Another author [6] built a NER-based knowledge extraction framework that employs finite-state transducers (FSTs) -KEFST -to extract the Hadith Narrators from the Urdu Translation Hadith text. KEFST consists of five steps: content extraction, tokenization, part of speech tagging, multi-word detection, and NER. This study achieved a precision, recall, and F-measure sequentially of 68%, 75%, and 72%.
The study [7] constructed NERs for Arabic Hadith texts using three machine learning algorithms: naive Bayes, Knearest Neighbor, and Decision Tree. During the training phase, the NER model achieved a precision of 90% and a recall of 82%. Evaluating the created model on various corpora demonstrates that it can achieve an accuracy of 80% and a recall of 73%. The author [8] constructed NERs for Arabic Hadith texts using three distinct approaches: rule-based, statistical, and hybrid (rule-based combined with statistical). The statistical methods used are the Log-likelihood Ratio (LLR), Point-wise Mutual Information (PMI), S-cost, R-cost, and U-cost. LLR outperformed PMI, S-cost, R-cost, and Ucost, capturing 76% of the F-measure. Additionally, the rulebased approach captured 80% of the F-measure. The experimental results indicate that the proposed hybrid method of rule-based and statistical analysis achieved an F-measure of 82 percent, which is a positive outcome compared to the individual approach.
The study [9] used two RNN-based models to recognize and categorize named things in Classical Arabic Hadith text by fine-tuning the pre-trained BERT language model. Additionally, this study investigates alternative designs for the BERT-BGRU/BLSTM-CRF models. The BERT-BGRU-CRF model outperformed the other models with an F-measure of 94.76 percent on the CANERCorpus. Another author [10] developed a novel NER model for Arabic Hadith text extracted from the Sahih Bukhari Urdu translation book. The proposed model extracts entities from Hadith text using Finite State Transducers (FST) and subsequently tags them using Conditional Random Fields (CRF). The model had a precision of 96.44 percent, a recall of 88.77 percent, and an F-Measure of 92.41 percent. Similarly, in [11], the author proposes a new approach for extracting Arabic person names from Arabic Hadith text. This study built NER using N-gram phrase extraction and a simple rules model, and the result showed excellent precision of around 84%.
The work [12] offers a novel NER model for Hadith texts in Indonesia using Support Vector Machines (SVM). The results suggest that the NER model attained the most incredible F-1 score of 0.9 using 140 Hadiths containing 1564 entities for training and 60 Hadiths containing 677 entities for testing. Another study [15] built NER with a Naive Bayes classifier for Indonesian Hadith from nine narrators. The results of experiments involving 258 people's names extracted from 13870 tokens of data from 100 Indonesian hadith texts show that combining all features can achieve 82.63% of the F1-Score. The author [13] built NER for indexing names in the Indonesian Hadith Text. This study employs the Hidden Markov Model (HMM). The values of performance that were obtained using HMM's method are 86%. However, by using cross-validation based on the parameters, the performance values increased by 2%, which means that the performance in this research is quite suitable for 38.102 data hadith.
Although various studies have successfully proved the application of NER on Hadith, there is a dearth of studies that optimize BERT to execute NER in order to detect the Narrator in Indonesian Hadith text automatically.

A. NER
The NER is an NLP task that identifies text fragments related to a specific named entity and categorizes them according to predefined categories such as a person, location, or organization [14]. Four key lines of progress can be seen in the evolution of NER techniques [15]: 1) Rule-based techniques are non-annotated and rely on manually written rules.
2) Unsupervised learning methods rely on unsupervised 605 | P a g e www.ijacsa.thesai.org algorithms in the absence of manually labeled training examples.
3) Feature-based techniques for supervised learning depend on supervised learning techniques that have been carefully engineered with features. 4) Deep-learning techniques automatically locate representations required for classification or detection.

B. BERT
A BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer representation that may be fine-tuned to utilize one extra layer of output. BERT enhanced its capacity to create new state-of-the-art outcomes for several assignments, including question answering and sentence categorization, without significantly altering the taskspecific architecture [16] [17]. Fig. 2 illustrates the BERT architecture.
A feature-based or fine-tuning method might be applied to assign downstream assignments to pre-trained language representations [16]. Fine-tuning is simple, as the transformer's self-attention mechanism enables BERT to perform many downstream assignments on a single text or text pair by swapping the relevant inputs and outputs. Each assignment needs BERT to receive just the assignment-specific inputs and outputs, which fine-tunes all parameters end-to-end.

IV. PRESENTED MODEL
The methodology used in this investigation is summarized in Fig. 3. Step

A. Hadith Dataset Tagging
The first step is preparing the Hadith dataset to be compliant with the input requirements of the model. The Hadith dataset is formatted with a NER Tag in IOB Format. This study uses texts from the Bukhari Hadith Book, and a total of one hundred and two Hadith texts are formatted entirely in IOB format. The IOB format used is as follows: B-Narrator : This tag means the word is either a single-word entity of Hadith Narrator or the first word of a multi-word entity of Hadith Narrator.

I-Narrator
: This tag means the word is part of a multi-word entity of Hadith Narrator but is not the first word in the entire entity of Hadith Narrator.
O : This tag means the word is not part of the entity of the Hadith Narrator.
The meaning is as follows: From Abu Abdurrahman Abdullah bin Mas'ud, he said that the honest and trustworthy Messenger of Allah said to us, "Indeed, your creation was collected in the mother's womb for forty days in the form of sperm.
That snippet of Indonesian Hadith text is tagged in IOB format as indicated in Table I. There is one Hadith Narrator Entity in the Hadith text above, Abu Abdurrahman Abdullah bin Mas'ud, which is made up of numerous words. As the first word in the Hadith Narrator Entity, Abu is assigned the B-Narrator tag. The I-Narrator tag is supplied to Abdurrahman, Abdullah, Bin, and Mas'ud as part of the Hadith Narrator Entity, but not the initial word, and O Tag provides the rest of the Hadith text.

B. Tokenization and Input Formatting
The second step is tokenization and input formatting, which are done in several steps below.

1) Tokenize the hadith text:
The tokenization will be initialized at the word level. The Hadith dataset used is already "tokenized", meaning it has been divided into lists of terms. However, there is still more text processing before BERT may send these sentences. The tokenizer will break these down into subwords for words, not BERT vocabulary (Out of Vocabulary -OOV). For unknown tokens, the [UNK] token is inserted. Fig. 4 Step B and Table II show the Hadith text tokenized at the word level. Two words that need to be broken into subword-level tokens are "mas'ud" and "terpercaya" with the final result needing one OOV token.
2) Prepend the special tokens: Tokenization in BERT entails putting the unique [CLS] token at the beginning of the Hadith text and appending the [SEP] token at the end to indicate the text's beginning and end. This treatment is depicted in Fig. 4 Step C.
3) Map tokens to their IDs and truncate the sentences: The word tokens must be mapped to their BERT vocabulary IDs, and all sentences must be made to have the same number of tokens. In order for the GPU to operate on a batch. This process is addressed with some steps, i.e., (1) defining the max sentence length, (2) adding the special [PAD] token to the sentences with the token shorter than the max length, and (3) truncating sentences that are longer than the max length.
The max length adjustment in this study refers to the column max length in Table III, except when employing "bertbase-uncased," in which case the max length is set to 512. Since the limit is derived from the Transformer architecture's positional embeddings, a maximum length must be imposed. Fig. 2 Step D depicts this treatment. The max length adjustment in this study refers to the column max length in Table III, except when employing "bertbase-uncased," in which case the max length is set to 512. Since the limit is derived from the Transformer architecture's positional embeddings, a maximum length must be imposed. Fig. 2 Step D depicts this treatment.

4) Create attention masks:
The final tokenization and formatting process provides the model with an "attention mask" for each sample that identifies and instructs BERT to ignore the [PAD] tokens. This procedure is depicted in Fig. 4 Step E.

C. Classification and Model Training
Thirdly, the classification model must be trained. The proposed model architecture is the BERT with a single linear layer for classifying the entity classes associated with each Narrator token. The proposed model architecture is depicted in Fig. 5.

Token2
'dari' [-100] 608 | P a g e www.ijacsa.thesai.org According to the pre-trained model used, the BERT model employs N-Layer Transformers. The final additional layer will consist of two steps for classifying the token into the token class. Forecast probability of tokens first, followed by the class of tokens. This study examines how four different BERT pretrained models can be scaled up to become NER models. The NER model is the target model, which was trained using the following parameters:

D. Performance Measure
The fourth stage is to evaluate the performance of the test set. The performance of the NER Model is quantified using an F1-Score. The F1-score is a numerical term that denotes the compatible mean of precision and recall. The formula below expresses the precision, recall, and F1-score: The F1-score result for the NER model constructed on top of four separate BERT pre-trained models is shown in Table IV. Except for the bert-base-uncased model, the general BERT model, the BERT pre-trained model contains options supporting the Indonesian language.

E. Evaluation
The fifth stage involves the evaluation of the NER model. As shown in Table IV, the NER model's performance was assessed again using an additional forty New Hadith texts. The evaluation conclusions are summarized in Table V. The NER model's assessment results are, on average, 0.28 percent lower than the training results. The sequence, on the other hand, is preserved. Cahya/bert-base-indonesian-1.5G demonstrated the best performance, followed by indobenchmark/indobert-base-p1, indobenchmark/indobert-base-p2, and bert-base-uncased. Table VI summarizes the results of the NER evaluations of three Hadith passages. On hadith1, hadith2, and hadith 3, texts of assessment, some errors in tagging output were made. For instance, in Hadith 1, "Abu" should be labeled as I-Narrator rather than B-Narrator. On hadith2, the word "Ummu" should be categorized as I-Narrator but tagged as "O", and so forth. 609 | P a g e www.ijacsa.thesai.org  [5]. This study aims to identify the Hadith Narrator using the NER approach. More precisely, Hadith in Indonesian. The results of this study can be compared to those of the previous study, as indicated in Table VII. Most studies on NER measure its performance by utilizing the F1-score. The distribution of each NER Tag cannot be predicted and may have imbalanced data. An F1score is needed to capture the harmonic mean of precision and recall. The highest F1-score of the proposed NER model indicates a high value for both precision and recall. The proposed NER model achieved 99.27% of the F1-score.

VI. CONCLUSION AND FUTURE WORK
The BERT is designed and implemented in this research to provide a NER Hadith Narrator identification using an extra feed-forward classifier. Cahya/bert-base-indonesian-1.5G received the highest F1-score of 99.63 percent during the training phase. On the Hadith Narrator Identification using other Hadith passages, the final examination yielded a 98.27 percent F1-score. It suggests that when utilized to identify Hadith Narrators for Indonesian Hadith texts, the suggested NER model in this work performs best.
There are several future avenues that this experiment should take. The first step in developing an experiment dataset is to increase the amount of Hadith texts. This is required since the Hadith contains a range of Sanad and Matn forms. The current study then applies three IOB tags, namely B-Narrator, I-Narrator, and O. These three tags sufficed because all that was required was to determine which words were associated with the Narrator and which were not. Additional study is warranted in light of the Hadith dataset's addition. Other tags must be considered, such as those for Rasulullah, Taraf, and others. 610 | P a g e www.ijacsa.thesai.org