The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Digital Archiving Policy
  • Promote your Publication
  • Metadata Harvesting (OAI2)

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • Guest Editors
  • SUSAI-EE 2025
  • ICONS-BA 2025
  • IoT-BLOCK 2025

Future of Information and Communication Conference (FICC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • Subscribe

DOI: 10.14569/IJACSA.2020.0110907
PDF

Factored Phrase-based Statistical Machine Pre-training with Extended Transformers

Author 1: Vivien L. Beyala
Author 2: Marcellin J. Nkenlifack
Author 3: Perrin Li Litet

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 11 Issue 9, 2020.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: This paper presents the development of a cascaded hybrid multi- lingual automatic translation system, by allowing a tight coupling between the two underlying research approach in machine translation, namely, the neuronal (deterministic approach) and statistical (probabilistic approach), while fully taking advantage of each method in order to improve translation performance. This architecture addresses two major problems frequently occurring when dealing with morphologically richer languages in MT, that is, the significant number unknown tokens generated due to the presence of out of vocabulary (OOV) words, and size of the output vocabulary. Additionally, we incorporated factors (additional word-level linguistic information) in order to alleviate data sparseness problem or potentially reduce language ambiguity, the factors we considered are lemmatization and Part-of-Speech tags (taking into consideration its various compounds). We combined a fully-factored transformer and a factored PB-SMT, where, the training data is pre-translated using the trained fully-factored transformer, and afterwards employed to build an PB-SMT system, parallelly using the pre-translated development set to tune parameters. Finally, in order to produce the desired results, we operated the FPB-SMT system to re-decode the pre-translated test set in a post-processing step. Experiments performed on translations from Japanese to English and English to Japanese reveals that our proposed cascaded hybrid framework outperforms the strong HMT state-of-the-art by over 8.61% BLEU and 7.25% BLEU, respectively, for validation set, and over 8.70% BLEU and 7.70% BLEU, respectively, for test set.

Keywords: Machine translation; transformer; statistical machine; morphologically rich; hybrid

Vivien L. Beyala, Marcellin J. Nkenlifack and Perrin Li Litet, “Factored Phrase-based Statistical Machine Pre-training with Extended Transformers” International Journal of Advanced Computer Science and Applications(IJACSA), 11(9), 2020. http://dx.doi.org/10.14569/IJACSA.2020.0110907

@article{Beyala2020,
title = {Factored Phrase-based Statistical Machine Pre-training with Extended Transformers},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2020.0110907},
url = {http://dx.doi.org/10.14569/IJACSA.2020.0110907},
year = {2020},
publisher = {The Science and Information Organization},
volume = {11},
number = {9},
author = {Vivien L. Beyala and Marcellin J. Nkenlifack and Perrin Li Litet}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

IntelliSys 2025

28-29 August 2025

  • Amsterdam, The Netherlands

Future Technologies Conference 2025

6-7 November 2025

  • Munich, Germany

Healthcare Conference 2026

21-22 May 2026

  • Amsterdam, The Netherlands

Computing Conference 2026

9-10 July 2026

  • London, United Kingdom

IntelliSys 2026

3-4 September 2026

  • Amsterdam, The Netherlands

Computer Vision Conference 2026

15-16 October 2026

  • Berlin, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference
  • Communication Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

© The Science and Information (SAI) Organization Limited. All rights reserved. Registered in England and Wales. Company Number 8933205. thesai.org