Enhancing the Takhrij Al-Hadith based on Contextual Similarity using BERT Embeddings

Muslims are required to conduct Takhrij to validate the truth of Hadith text, especially when it is obtained from online media. Typically, the traditional Takhrij processes are conducted by experts and apply to Arabic Hadith text. This study introduces a contextual similarity model based on BERT Embedding to handle Takhrij on Indonesian Hadith Text. This study examines the effectiveness of BERT Fine-Tuning on the six pre-trained models to produce embedding models. The result shows that BERT Fine-Tuning improves the embedding model average accuracy by 47.67%, with a mean of 0.956845. The most high-grade accuracy was the BERT embedding built based on the indobenchmark/indobert-large-p2 pre-trained model on 1.00. In addition, the manual evaluation achieved 91.67% accuracy. Keywords—Hadith text; Takhrij; natural language processing; text-similarity; word embedding; BERT fine-tuning


I. INTRODUCTION
With the growth of information on the Internet, nowadays, most people, including Muslims, use online media as a primary source of information or knowledge. Most Muslims have adopted online media as a direct reference for exploring religious content, including those seeking verses of Qurán or Hadith. The problem is, not all information or knowledge on the Internet are verified for its correctness and authenticity [1].
The primary sources of Islamic law are the Qurán and Hadith [2] [3]. Qurán is God's significantly trustworthy and unchanged Holy Book, which has been used as Islamic main reference for more than 14 eras since its revelation [4]. Hadith is an Islamic rule derived from the accumulation of the Prophet Muhammad PBUH's expressions, behaviors, judgments, or character [5] [6]. Unlike the Holy Qurán, the Hadith distributed among Muslims are not all trustworthy [7]. Thus, Muslims need to authenticate the correctness of the Hadith, especially when it is from online media.
The approach in validating the correctness of the Hadith is referred as criticism of Hadith [8]. Three Hadith studies support Hadith's criticism: (1) Musthalah Hadith's study; (2) Study of Takhrij and Dirasah Sanad; and (3) Study of Thurud Fahmil Hadith. To recognize a Hadith's authenticity status, a Muslim must perform a Takhrij. Takhrij refers to the examination of the existence of the Hadith in the Hadith Books (its initial sources) such as Kutub al-Sittah, the Six Canonical Books of Hadith, Muwatta Imam Malik, and others. There is some prior research that describes and employs Takhrij al-Hadith to define the status of Hadith. According to [9], Takhrij is performed to meet several objectives as follows: (1)  1) Matn's first-word method.
2) The Word indexing method.
3) The Companion name index method. 4) The Hadith theme method.

5)
The search method is based on Hadith status. 6) The search method is based on multiple Matn or Sanad conditions. 7) Digital searching via computer CDs or the Internet.
The Takhrij methods numbered 1-6 above are classical methods. In tune with digitalization growth, method number 7 has arisen [12]. Table I summarized the methods, and it can be formed that the Takhrij is a process for obtaining the original Hadith Text through numerous techniques. Table I also shows that traditional Takhrij requires human expertise in the process. Fig. 1 shows the illustration of the Takhrij process.
From the expert's point of view, Takhrij must be based on Arabic Hadith texts to avoid distortion on the Hadith translation. As such, performing Takhrij for other language, such as Indonesian, presents additional challenges.
For illustration, Table II shows two examples of Matn Hadith (highlighted in grey) that are different in sentence but have identical context. As we analyzed deeper; there are several different uses of the word that bring same meaning (bold in Table II), as manifested in Table III. These present a challenge in the Takhrij process to confirm the Hadith authenticity, where the translations are textually diverse though contextually identical.  The intended Hadith will likely be discovered right away.
Weakness: An inaccuracy in the pronunciations of the first word employed becomes an obstacle to obtaining the Hadith.  Weakness: Due to the small number of hadiths included, the scope is minimal. This paper employs semi-supervised BERT (Bidirectional Encoder Representations from Transformers) word embedding with a feed-forward neural network classifier to produce a Hadith text representation and determine its contextual similarity level. This work focuses on Hadith in the Indonesian language. The rest of the paper is structured as follows: Section II presents the previous work related to the contextual similarities. Section III about the theoretical definition of the text similarities, BERT, and the evaluation parameters used. Then Section IV describes the proposed model in this study. The results of this research examination are then discussed in Section V. Finally, Section VI explains the conclusion and directions for future research.

II. RELATED WORK
Review on existing works shows that there are several work on word embedding techniques. Authors in [13] established a model that employs word2vec word embeddings, i.e., CBOW and Skrip-gram, innovated with SVM, for classifying the sentiment of social media tweets according to the context. This study shows that skip-gram 100-dimension achieved best classification performance with the values of precision, recall, f-score sequentially 64.4%, 58%, 61.1%.
Several other studies have applied BERT word embedding. The study by [14] utilized BERT sentence embedding for building automatic essay scoring. The outcome indicates that the BERT sentence embedding reaches an F1-score of 82.9%. Similarly, authors in [15] proposes a neural network with a pretrained language model, M-BERT, that acts as an embedding layer to detect clickbait headlines. Evaluated with 5-fold crossvalidation, it has an accuracy score of 91.4%, f1-score of 91.4%, a precision score of 91.6%, and ROC-AUC of 92%. Another study in [16] employs Latent Dirichlet Allocation and BERT embeddings to conduct topic modeling for graduate students' articles collected from the internet. The proposed model reached an average of 92.6% success rate for classifying the appropriate subject documents.
The study in [17]  Although several studies have successfully demonstrated the use of BERT word embedding, there is lack of studies on the contextual similarity between conducting Takhrij al-Hadith with Indonesian Hadith text.

A. Text Similarities
Text similarity is an extensively applied method for obtaining relatedness between two texts [19]. A mechanism for representing text is required to measure the text-similarity of the natural language. Machines are incapable of understanding the notions of words. The usual technique for representing text is term vectors, in which terms or phrases are converted into vectors of real numbers [20]. Fig. 2 illustrates the variety of text representation forms. Work in [21] presents the weakness of traditional word embeddings, i.e., they carry no contextual representation of 'comparable' words. Furthermore, it raises a 'sparse' matrix problem on an extensive vocabulary. On the contrary, Word2Vec in Skip-Gram or CBOW can attract the semantic (contextual) representation. However, the 'sparse' matrix is still an obstacle to an extensive vocabulary.

B. BERT
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer form that can be fine-tuned with a single supplementary output layer. BERT fine-tuned the ability to generate various NLP tasks with new state-of-the-art outcomes for a broad range of assignments, including question answering, sentence classification, and sentence-pair regression, without significant task-particular architecture modification [22] [23]. Fig. 3 shows the BERT architecture.  A feature-based or fine-tuning approach can assign pretrained language representations for downstream assignments [22]. Fine-tuning is simple as the transformer self-attention mechanism provides BERT with the ability to perform various downstream assignments on a single text or text pair by exchanging suitable inputs and outputs. Every assignment only requires providing the assignment-particular inputs and outputs inside BERT also fine-tune entire parameters end-to-end.

C. Evaluation Parameters
The evaluation of performances plays a crucial role in the development of classification models [24]. BERT Fine-Tuning employs Categorical Cross-Entropy Loss, also called Softmax Loss, as a loss function to calculate the distance within the current output of the algorithm and the expected output. It is a Softmax activation plus a Cross-Entropy loss, as shown in Fig. 4 [25]. Accuracy metrics operate to measure the number of accurate predictions to the total number of input specimens. The accuracy metric defined as [26]:

A. Methodology
The methodological approach practiced in this study is compiled in Fig. 5. The first step is gathering and preparing Hadith text as a dataset for training and testing the model. This study focuses on Hadith text that correlated to the Forty Hadith of al-Imam an-Nawawi. Ten unprocessed Hadith texts were gathered from [27] as Hadith texts to be tracked (Takhrij). Furthermore, fifty raw Hadith texts were gathered from https://carihadis.com as Takhrij references. The datasets are distributed as 70% training and 30% testing data. Table IV shows the specimens of raw Hadith text.
The second step is data pre-processing. Here are three substeps that are (a) text standardization, (b) eliminating undesirable items from Hadith text, and (c) data labeling. Fig. 6 shows the flow of the data pre-processing process in detail. Table V shows the specimen of Hadith text resulting from pre-processing sub-step point a and point b.

BERT: A Two Step Framework
Step

BERT Embedding Models Evaluation
Accuracy:

Text Similarity Model Training
Hadith Text (Training and Testing Data) Data Pre-processing (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 11, 2021 290 | P a g e www.ijacsa.thesai.org   Attaching label on each data row (equal or distinct) www.ijacsa.thesai.org The third step is to build the BERT Embedding for contextual similarities model. The model is built with the exercises on top of some semi-supervised BERT models that employ BERT Fine-Tuning. A precise fine-tuning approach is needed to fit the BERT to NLP tasks in contextual similarities (Takhrij).
This study investigates the fine-tuning of five different BERT pre-trained models. The target model is a single label classification model and trained with several parameters, which are:

V. RESULT
The BERT embedding model fine-tuning process is reported in Fig. 8, optimizing loss and accuracy values on each training and testing process iteration.

VI. CONCLUSION AND FUTURE WORK
In this paper, a semi-supervised BERT word embedding with a feed-forward neural network classifier is proposed and implemented to produce a Hadith text representation and determine its contextual similarity level. This work focuses mainly for Hadith in the Indonesian text. The BERT finetuning raised average accuracy by 47.67%, with a 0.956845 mean accuracy in the training process. The pre-trained model Indobenchmark/indobert-large-p2 achieved the highest accuracy with training 1.00. The final manual evaluation achieved 91.67% accuracy on the Hadith contextual similarity identification. It means the proposed model in this study reaches the highest performance when used to conduct Hadith Takhrij (searching) for Indonesia Hadith text. As a future development of this experiment, there are some directions to be studied. The first direction is to extend the number of Hadith texts as an experiment dataset. That is needed because Hadith text is known to have some different Sanad and Matn structures. Another direction is in the order of automatic recognition of parts of the Hadith text. Identify its Sanad or Matn and then classify the Hadith text based on its structure.