Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 10 Issue 7, 2019.
Abstract: Vectorization is imperative for processing textual data in natural language processing applications. Vectorization enables the machines to understand the textual contents by converting them into meaningful numerical representations. The proposed work targets at identifying unifiable news articles for performing multi-document summarization. A framework is introduced for identification of news articles related to top trending topics/hashtags and multi-document summarization of unifiable news articles based on the trending topics, for capturing opinion diversity on those topics. Text clustering is applied to the corpus of news articles related to each trending topic to obtain smaller unifiable groups. The effectiveness of various text vectorization methods, namely the bag of word representations with tf-idf scores, word embeddings, and document embeddings are investigated for clustering news articles using the k-means. The paper presents the comparative analysis of different vectorization methods obtained on documents from DUC 2004 benchmark dataset in terms of purity.
Anita Kumari Singh and Mogalla Shashi, “Vectorization of Text Documents for Identifying Unifiable News Articles” International Journal of Advanced Computer Science and Applications(IJACSA), 10(7), 2019. http://dx.doi.org/10.14569/IJACSA.2019.0100742
@article{Singh2019,
title = {Vectorization of Text Documents for Identifying Unifiable News Articles},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2019.0100742},
url = {http://dx.doi.org/10.14569/IJACSA.2019.0100742},
year = {2019},
publisher = {The Science and Information Organization},
volume = {10},
number = {7},
author = {Anita Kumari Singh and Mogalla Shashi}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.