Computer Vision Conference (CVC) 2026
21-22 May 2026
Publication Links
IJACSA
Special Issues
Computer Vision Conference (CVC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 5, 2026.
Abstract: Large Language Models (LLMs) have reshaped how machines read and compare text, yet most similarity learning pipelines built on top of them still behave like black boxes: a single cosine score is returned without any indication of why two sentences were deemed close. This study proposes a different route. We pair a fine-tuned Sentence-BERT (SBERT) encoder with MCESTA, a fuzzy multi-criteria aggregation layer that combines a semantic (cosine), a geometric (Manhattan) and a lexical (Jaccard) similarity through a small set of human-readable linguistic rules. The output is a single similarity score in [0, 1] that remains traceable to the rules that produced it: because the rule base contains only twelve Mamdani rules, the chain of fired antecedents can be inspected directly after each inference, which is what we mean by traceability in this study. We evaluate the framework on the Quora Question Pairs corpus (the public release contains about 404,000 question pairs in English with a 63/37 non-paraphrase to paraphrase split) against five strong baselines, including SimCSE and AnglE. Our model reaches Accuracy = 0.90 and AUC = 0.94, outperforming every baseline. A controlled ablation shows that the gains come from the fuzzy aggregation step itself, not from the choice of encoder, while a robustness study reveals that the soft membership functions absorb noise and threshold variations more gracefully than a plain cosine baseline. The fuzzy aggregation step runs in O(|R|) per pair, where |R| = 12 is the size of the rule base, so its computational overhead on top of the encoder forward pass is negligible. Adaptive fuzzy rules, multilingual similarity, and domain-specific deployments are positioned as future extensions rather than as results of the present study.
Mohamedou Cheikh Tourad, Abdelmounaim Abdali, Mohamed Dhleima, Naoual Mouhni, Sana Chakri, Ibtissam Amalou and Saadbouh Cheikh El Mehdy. “Leveraging MCESTA and Large Language Models for Next-Generation Similarity Learning”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.5 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170578
@article{Tourad2026,
title = {Leveraging MCESTA and Large Language Models for Next-Generation Similarity Learning},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170578},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170578},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {5},
author = {Mohamedou Cheikh Tourad and Abdelmounaim Abdali and Mohamed Dhleima and Naoual Mouhni and Sana Chakri and Ibtissam Amalou and Saadbouh Cheikh El Mehdy}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.