A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

Taysir Hassan A. Soliman; Ahmed Sharaf Eldin; Marwa M. Ghareeb; Mohammed E. Marie

doi:10.14569/IJACSA.2014.051214

DOI: 10.14569/IJACSA.2014.051214

PDF

A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

Author 1: Taysir Hassan A. Soliman

Author 2: Ahmed Sharaf Eldin

Author 3: Marwa M. Ghareeb

Author 4: Mohammed E. Marie

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 5 Issue 12, 2014.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Protein fold recognition plays an important role in computational protein analysis since it can determine protein function whose structure is unknown. In this paper, a Classified Sequential Pattern mining technique for Protein Fold Recognition (CSPF) is proposed. CSPF technique consists of two main phases: the sequential mining pattern phase and the fold recognition phase. In the sequential mining pattern phase, Mix & Test algorithm is developed based on Grammatical Inference, which is used as a training phase. Mix & Test algorithm minimizes I/O costs by one database scan, discovers subsequence combinations directly from sequences in memory without searching the whole sequence file, has no database projection, handles gaps, and works with variant length sequences without having to align them. In addition, a parallelized version of Mix & Test algorithm is applied to speed up Mix & Test algorithm performance. In the fold recognition phase, unknown protein folds are predicted via a proposed testing function. To test the performance, 36 SCOP protein folds are used, where the accuracy rate is 75.84% for training data and 59.7% for testing data.

Keywords: Data mining; grammatical inference; sequential mining; protein fold recognition

Taysir Hassan A. Soliman, Ahmed Sharaf Eldin, Marwa M. Ghareeb and Mohammed E. Marie, “A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition” International Journal of Advanced Computer Science and Applications(IJACSA), 5(12), 2014. http://dx.doi.org/10.14569/IJACSA.2014.051214

@article{Soliman2014,
title = {A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2014.051214},
url = {http://dx.doi.org/10.14569/IJACSA.2014.051214},
year = {2014},
publisher = {The Science and Information Organization},
volume = {5},
number = {12},
author = {Taysir Hassan A. Soliman and Ahmed Sharaf Eldin and Marwa M. Ghareeb and Mohammed E. Marie}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

Upcoming Conferences