Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.
Digital Object Identifier (DOI) : 10.14569/IJACSA.2017.081038
Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 8 Issue 10, 2017.
Abstract: Text corpus is important for assessment of language features and variation analysis. Machine learning techniques identify the language terms, features, text structures and sentiment from linguistic corpus. Sindhi language is one of the oldest languages of the world having proper script and complete grammar. Sindhi is remained less resourced language computationally even in this digital era. Viewing this problem of Sindhi language, Sindhi NLP toolkit is developed to solve the Sindhi NLP and computational linguistics problems. Therefore, this research work may be an addition to NLP. This research study has developed an own Sindhi sentimentally structured and analyzed corpus on the basis of accumulated results of Sindhi sentiment analysis tool. Corpus is normalized and analyzed for language features and variation analysis using DTM and TF-IDF techniques. DTM and TF-IDF analysis is performed using n-gram model. The supervised machine learning model is formulated using SVMs and K-NN techniques to perform analysis on Sindhi sentiment analysis corpus dataset. Precision, recall and f-score show better performance of machine learning technique than other techniques. Cross validation techniques is used with 10 folds to validate and evaluate data set randomly for supervised machine learning analysis. Research study opens doors for linguists, data analysts and decision makers to work more for sentiment summarization and visual tracking.
Mazhar Ali and Asim Imdad Wagan, “Sentiment Summerization and Analysis of Sindhi Text” International Journal of Advanced Computer Science and Applications(IJACSA), 8(10), 2017. http://dx.doi.org/10.14569/IJACSA.2017.081038