Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 10 Issue 12, 2019.
Abstract: The earliest modification of Latent Dirichlet Allocation (LDA) in terms of words or document attributes is by relaxing its exchangeability assumption via the Bag-of-word (BoW) matrix. Several authors have proposed many modifications of the original LDA by focusing on model that assumes the current topic depends on the words from previous topic. Most of the earlier work ignored the document length distribution since it is assumed that it will fizzle out at the modelling stage. Thus, in this paper, the Poisson document length distribution of LDA model is replaced with Generalized Poisson (GP) distribution which has the strength of capturing complex structures. The main strengths of GP are in capturing overdispersed (variance larger than mean) and under dispersed (variance smaller than mean) count data. The Poisson distribution used by LDA strongly relies on the assumption that the mean and variance of document lengths are equal. This assumption is often unrealistic with most real-life text data where the variance of document length may be greater than or less than their mean. Approximate estimate of the GPLDA model parameters was achieved using Newton-Raphson approximation technique of log-likelihood. Performance and comparative analysis of GPLDA with LDA using accuracy and F1 showed improved results.
Ibrahim Bakari Bala and Mohd Zainuri Saringat, “GPLDA: A Generalized Poisson Latent Dirichlet Topic Model” International Journal of Advanced Computer Science and Applications(IJACSA), 10(12), 2019. http://dx.doi.org/10.14569/IJACSA.2019.0101253
@article{Bala2019,
title = {GPLDA: A Generalized Poisson Latent Dirichlet Topic Model},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2019.0101253},
url = {http://dx.doi.org/10.14569/IJACSA.2019.0101253},
year = {2019},
publisher = {The Science and Information Organization},
volume = {10},
number = {12},
author = {Ibrahim Bakari Bala and Mohd Zainuri Saringat}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.