The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Metadata Harvesting (OAI2)
  • Digital Archiving Policy

IJACSA

  • About the Journal
  • Call for Papers
  • Author Guidelines
  • Fees/ APC
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Editors
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Call for Papers
  • Proposals
  • Guest Editors

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future of Information and Communication Conference (FICC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Guidelines
  • Fees
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Editors
  • Reviewers
  • Subscribe

Article Details

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Hybrid Spelling Correction and Query Expansion for Relevance Document Searching

Author 1: Dewi Soyusiawaty
Author 2: Denny Hilmawan Rahmatullah Wolley

Download PDF

Digital Object Identifier (DOI) : 10.14569/IJACSA.2021.0120838

Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 12 Issue 8, 2021.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: A digital library is a type of information retrieval (IR) system. The existing IR methodologies generally have problems on keyword searching. Some of search engine has not been able to provide search results with partial matching and typographical error. Therefore, it is required to be able to provide search results that are relevant to keywords provided by the user. We proposed a model to solve the problem by combining the spell correction and query expansion. Searching is starting with indexing the title of the document by preprocessing the title of all incoming document data and then weighting the Term Frequency – Inverse Document Frequency (TF-IDF) against all terms of the whole document. Levenshtein Distance algorithm is used in the search process to correct typo-indicated keywords. Before calculating the relevance between the keywords and the documents using Cosine Similarity, the keywords are expanded using Query Expansion to increase number of documents retrieved. Calculation results using Cosine Similarity are then added to Query Expansion weight calculation to get final ranking result. Results show improvements over IR system compared with system without spell check and query expansion. The results of the study in the form of web-based application conducted testing for 50 times with number of data of 2,045. The system was able to correct typo-indicated keywords and search documents with average recall value of 95.91%, average precision value of 63.82% and average Non Interpolated Average Precision (NIAP) value of 86.29%.

Keywords: Cosine similarity; information retrieval; Levenshtein distance; TF-IDF; typographical error; query expansion

Dewi Soyusiawaty and Denny Hilmawan Rahmatullah Wolley, “Hybrid Spelling Correction and Query Expansion for Relevance Document Searching” International Journal of Advanced Computer Science and Applications(IJACSA), 12(8), 2021. http://dx.doi.org/10.14569/IJACSA.2021.0120838

@article{Soyusiawaty2021,
title = {Hybrid Spelling Correction and Query Expansion for Relevance Document Searching},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2021.0120838},
url = {http://dx.doi.org/10.14569/IJACSA.2021.0120838},
year = {2021},
publisher = {The Science and Information Organization},
volume = {12},
number = {8},
author = {Dewi Soyusiawaty and Denny Hilmawan Rahmatullah Wolley}
}


IJACSA

Upcoming Conferences

Future of Information and Communication Conference (FICC) 2023

2-3 March 2023

  • Hybrid | San Francisco

Computing Conference 2023

13-14 July 2023

  • Hybrid | London, UK

IntelliSys 2022

1-2 September 2022

  • Hybrid / Amsterdam

Future Technologies Conference (FTC) 2022

20-21 October 2022

  • Hybrid / Vancouver
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference
  • Communication Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

© The Science and Information (SAI) Organization Limited. Registered in England and Wales. Company Number 8933205. All rights reserved. thesai.org