A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets

S Anjali Devi; S Siva Kumar

doi:10.14569/IJACSA.2020.0110748

DOI: 10.14569/IJACSA.2020.0110748

PDF

A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets

Author 1: S Anjali Devi

Author 2: S Siva Kumar

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 11 Issue 7, 2020.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: As the size of the document collections are increasing day-by-day, finding an essential document clusters for classification problem is one of the major problem due to high inter and intra document variations. Also, most of the conventional classification models such as SVM, neural network and Bayesian models have high true negative rate and error rate for document classification process. In order to improve the computational efficacy of the traditional document classification models, a hybrid feature extraction-based document cluster approach and classification approaches are developed on the large document sets. In the proposed work, a hybrid glove feature selection model is proposed to improve the contextual similarity of the keywords in the large document corpus. In this work, a hybrid document clustering similarity index is optimized to find the essential key document clusters based on the contextual keywords. Finally, a hybrid document classification model is used to classify the clustered documents on large corpus. Experimental results are conducted on different datasets, it is noted that the proposed document clustering-based classification model has high true positive rate, accuracy and low error rate than the conventional models.

Keywords: Classification; document feature extraction; document similarity

S Anjali Devi and S Siva Kumar, “A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets” International Journal of Advanced Computer Science and Applications(IJACSA), 11(7), 2020. http://dx.doi.org/10.14569/IJACSA.2020.0110748

@article{Devi2020,
title = {A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2020.0110748},
url = {http://dx.doi.org/10.14569/IJACSA.2020.0110748},
year = {2020},
publisher = {The Science and Information Organization},
volume = {11},
number = {7},
author = {S Anjali Devi and S Siva Kumar}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets

Upcoming Conferences