Future of Information and Communication Conference (FICC) 2024
4-5 April 2024
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 14 Issue 11, 2023.
Abstract: Facial image generation from textual generation is one of the most complicated tasks within the broader topic of Text-to-Image (TTI) synthesis. It is relevant in several fields of scientific research, cartoon and animation development, online marketing, game development, etc. There have been extensive studies on Text-to-Face (TTF) synthesis in the English language. However, the amount of relevant existing work in Bangla is limited and not comprehensive. As the TTF field is not vastly prospected for Bangla language, the objective of this study sets forth to explore the possibilities in the field of Bangla Natural Language Processing and Computer Vision. In this paper, a novel system for generating highly detailed facial images from textual descriptions in the Bangla language is proposed. The proposed system named Mukh-Oboyob consists of two essential components: a pre-trained language model, BanglaBERT, and Stable Diffusion. BanglaBERT, a transformer-based pre-trained text encoder, is a language model used to transform Bangla sentences into vector representations. Stable Diffusion is used by Mukh-Oboyob to generate facial images utilizing the text embedding of the Bangla sentences. Moreover, the work uti-lizes CelebA Bangla, a modified version of the CelebA dataset consisting of face images, Bangla facial attributes, and Bangla text descriptions to develop and train the proposed system. This paper establishes a system for image synthesis with excellent performance and detailed image outcomes, as evidenced by a comprehensive analysis incorporating both qualitative and quantitative measures, leading to the system under consideration achieving an impressive FID score of 34.6828 and an LPIPS score of 0.4541.
Aloke Kumar Saha, Noor Mairukh Khan Arnob, Nakiba Nuren Rahman, Maria Haque, Shah Murtaza Rashid Al Masud and Rashik Rahman, “Mukh-Oboyob: Stable Diffusion and BanglaBERT enhanced Bangla Text-to-Face Synthesis” International Journal of Advanced Computer Science and Applications(IJACSA), 14(11), 2023. http://dx.doi.org/10.14569/IJACSA.2023.01411142
@article{Saha2023,
title = {Mukh-Oboyob: Stable Diffusion and BanglaBERT enhanced Bangla Text-to-Face Synthesis},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2023.01411142},
url = {http://dx.doi.org/10.14569/IJACSA.2023.01411142},
year = {2023},
publisher = {The Science and Information Organization},
volume = {14},
number = {11},
author = {Aloke Kumar Saha and Noor Mairukh Khan Arnob and Nakiba Nuren Rahman and Maria Haque and Shah Murtaza Rashid Al Masud and Rashik Rahman}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.