The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Digital Archiving Policy
  • Promote your Publication
  • Metadata Harvesting (OAI2)

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • Guest Editors
  • SUSAI-EE 2025
  • ICONS-BA 2025

Future of Information and Communication Conference (FICC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • Subscribe

DOI: 10.14569/IJACSA.2022.0130411
PDF

Extended Max-Occurrence with Normalized Non-Occurrence as MONO Term Weighting Modification to Improve Text Classification

Author 1: Cristopher C. Abalorio
Author 2: Ariel M. Sison
Author 3: Ruji P. Medina
Author 4: Gleen A. Dalaorao

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 13 Issue 4, 2022.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: The increased volume of data due to advancements in the internet and relevant technology makes text classification of text documents a popular demand. Providing better representations of the feature vector by setting appropriate term weight values using supervised term weighting schemes improves classification performance in classifying text documents. A state-of-the-art term weighting scheme MONO with variants TF-MONO and SRTF-MONO improves text classification considering the values of non-occurrences. However, the MONO strategy suffers setbacks in weighting terms with non-uniformity values in its term's interclass distinguishing power. In this study, extended max-occurrence with normalized non-occurrence (EMONO) with variants TF-EMONO and SRTF-EMONO are proposed where EMO value is determined as MO interclass extensions as improvements to address its problematic weighting behavior of MONO as it neglected the utilization of the occurrence of the classes with short-distance document frequency in non-uniformity values. The proposed schemes' classification performance is compared with the MONO variants on the Reuters-21578 dataset with the KNN classifier. Chi-square-max was used to conduct experiments in different feature sizes using micro-F1 and macro-F1. The results of the experiments explicitly showed that the proposed EMONO outperforms the variants of MONO strategy in all feature sizes with an EMO parameter value of 2 sets number of classes in MO extension. However, the SRTF-EMONO showed better performance with Micro-F1 scores of 94.85% and 95.19% for smallest to largest feature size, respectively. Moreover, this study also emphasized the significance of interclass document frequency values in improving text classification aside from non-occurrence values in term weighting schemes.

Keywords: Extended MO; normalized NO; text classification; term weighting scheme

Cristopher C. Abalorio, Ariel M. Sison, Ruji P. Medina and Gleen A. Dalaorao, “Extended Max-Occurrence with Normalized Non-Occurrence as MONO Term Weighting Modification to Improve Text Classification” International Journal of Advanced Computer Science and Applications(IJACSA), 13(4), 2022. http://dx.doi.org/10.14569/IJACSA.2022.0130411

@article{Abalorio2022,
title = {Extended Max-Occurrence with Normalized Non-Occurrence as MONO Term Weighting Modification to Improve Text Classification},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2022.0130411},
url = {http://dx.doi.org/10.14569/IJACSA.2022.0130411},
year = {2022},
publisher = {The Science and Information Organization},
volume = {13},
number = {4},
author = {Cristopher C. Abalorio and Ariel M. Sison and Ruji P. Medina and Gleen A. Dalaorao}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

Future of Information and Communication Conference (FICC) 2025

28-29 April 2025

  • Berlin, Germany

Computing Conference 2025

19-20 June 2025

  • London, United Kingdom

IntelliSys 2025

28-29 August 2025

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2025

6-7 November 2025

  • Munich, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference
  • Communication Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

© The Science and Information (SAI) Organization Limited. All rights reserved. Registered in England and Wales. Company Number 8933205. thesai.org