Future of Information and Communication Conference (FICC) 2024
4-5 April 2024
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 9 Issue 5, 2018.
Abstract: Tasks such as clustering and classification assume the existence of a similarity measure to assess the similarity (or dissimilarity) of a pair of observations or clusters. The key difference between most clustering methods is in their similarity measures. This article proposes a new similarity measure function called PWO “Probability of the Weights between Overlapped items ”which could be used in clustering categorical dataset; proves that PWO is a metric; presents a framework implementation to detect the best similarity value for different datasets; and improves the F-tree clustering algorithm with Semi-supervised method to refine the results. The experimental evaluation on real categorical datasets, such as “Mushrooms, KrVskp, Congressional Voting, Soybean-Large, Soybean-Small, Hepatitis, Zoo, Lenses, and Adult-Stretch” shows that PWO is more effective in measuring the similarity between categorical data than state-of-the-art algorithms; clustering based on PWO with pre-defined number of clusters results a good separation of classes with a high purity of average 80% coverage of real classes; and the overlap estimator perfectly estimates the value of the overlap threshold using a small sample of dataset of around 5% of data size.
Mahmoud A. Mahdi, Samir E. Abdelrahman and Reem Bahgat, “A High-Performing Similarity Measure for Categorical Dataset with SF-Tree Clustering Algorithm” International Journal of Advanced Computer Science and Applications(IJACSA), 9(5), 2018. http://dx.doi.org/10.14569/IJACSA.2018.090565
@article{Mahdi2018,
title = {A High-Performing Similarity Measure for Categorical Dataset with SF-Tree Clustering Algorithm},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2018.090565},
url = {http://dx.doi.org/10.14569/IJACSA.2018.090565},
year = {2018},
publisher = {The Science and Information Organization},
volume = {9},
number = {5},
author = {Mahmoud A. Mahdi and Samir E. Abdelrahman and Reem Bahgat}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.