Special Issue on Artificial Intelligence - thesai.org

Special Issue on Artificial Intelligence

Copyright Statement: This is an open access publication licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

View Full Issue

Paper 1: Parts of Speech Tagging for Afaan Oromo

Abstract: The main aim of this study is to develop part-of-speech tagger for Afaan Oromo language. After reviewing literatures on Afaan Oromo grammars and identifying tagset and word categories, the study adopted Hidden Markov Model (HMM) approach and has implemented unigram and bigram models of Viterbi algorithm. Unigram model is used to understand word ambiguity in the language, while bigram model is used to undertake contextual analysis of words. For training and testing purpose 159 sentences (with a total of 1621 words) that are manually annotated sample corpus are used. The corpus is collected from different public Afaan Oromo newspapers and bulletins to make the sample corpus balanced. A database of lexical probabilities and transitional probabilities are developed from the annotated corpus. These two probabilities are from which the tagger learn and tag sequence of words in sentences. The performance of the prototype, Afaan Oromo tagger is tested using tenfold cross validation mechanism. The result shows that in both unigram and bigram models 87.58% and 91.97% accuracy is obtained, respectively.

Author 1: Getachew Mamo Wegari

Author 2: Million Meshesha

Keywords: Natural Language processing; parts of speech tagging; Hidden Markov Model; N-Gram; Afaan Oromo.

Paper 2: Speaker Identification using Row Mean of Haar and Kekre’s Transform on Spectrograms of Different Frame Sizes

Abstract: In this paper, we propose Speaker Identification using two transforms, namely Haar Transform and Kekre’s Transform. The speech signal spoken by a particular speaker is converted into a spectrogram by using 25% and 50% overlap between consecutive sample vectors. The two transforms are applied on the spectrogram. The row mean of the transformed matrix forms the feature vector, which is used in the training as well as matching phases. The results of both the transform techniques have been compared. Haar transform gives fairly good results with a maximum accuracy of 69% for both 25% as well as 50% overlap. Kekre’s Transform shows much better performance, with a maximum accuracy of 85.7% for 25% overlap and 88.5% accuracy for 50% overlap.

Author 1: H B Kekre

Author 2: Vaishali Kulkarni

Keywords: Speaker Identification; Spectrogram; Haar Transform; Kekre’s Transform; Row Mean; Euclidean distance

Paper 3: Forecasting the Tehran Stock Market by Artificial Neural Network

Abstract: One of the most important problems in modern finance is finding efficient ways to summarize and visualize the stock market data to give individuals or institutions useful information about the market behavior for investment decisions. The enormous amount of valuable data generated by the stock market has attracted researchers to explore this problem domain using different methodologies. Potential significant benefits of solving these problems motivated extensive research for years. In this paper, computational data mining methodology was used to predict seven major stock market indexes. Two learning algorithms including Linear Regression and Neural Network Standard feed-forward back prop (FFB) were tested and compared. The models were trained from four years of historical data from March 2007 to February 2011 in order to predict the major stock prices indexes in the Iran (Tehran Stock Exchange). The performance of these prediction models was evaluated using two widely used statistical metrics. We can show that using Neural Network Standard feed-forward back prop (FFB) algorithm resulted in better prediction accuracy. In addition, traditional knowledge shows that a longer training period with more training data could help to build a more accurate prediction model. However, as the stock market in Iran has been highly fluctuating in the past two years, this paper shows that data collected from a closer and shorter period could help to reduce the prediction error for such highly speculated fast changing environment.

Author 1: Reza Aghababaeyan

Author 2: TamannaSiddiqui

Author 3: NajeebAhmadKhan

Keywords: Data mining; Stock Exchange; Artificial Neural Network; Matlab.

Paper 4: A Comparison Study between Data Mining Tools over some Classification Methods

Abstract: Nowadays, huge amount of data and information are available for everyone, Data can now be stored in many different kinds of databases and information repositories, besides being available on the Internet or in printed form. With such amount of data, there is a need for powerful techniques for better interpretation of these data that exceeds the human's ability for comprehension and making decision in a better way. In order to reveal the best tools for dealing with the classification task that helps in decision making, this paper has conducted a comparative study between a number of some of the free available data mining and knowledge discovery tools and software packages. Results have showed that the performance of the tools for the classification task is affected by the kind of dataset used and by the way the classification algorithms were implemented within the toolkits. For the applicability issue, the WEKA toolkit has achieved the highest applicability followed by Orange, Tanagra, and KNIME respectively. Finally; WEKA toolkit has achieved the highest improvement in classification performance; when moving from the percentage split test mode to the Cross Validation test mode, followed by Orange, KNIME and finally Tanagra respectively.

Author 1: Abdullah H Wahbeh

Author 2: Qasem A. Al-Radaideh

Author 3: Mohammed N. Al-Kabi

Author 4: Emad M. Al-Shawakfa

Keywords: component; data mining tools; data classification; Wekak; Orange; Tanagra; KNIME.

Paper 5: SOM Based Visualization Technique For Detection Of Cancerous Masses In Mammogram

Abstract: Breast cancer is the most common form of cancer in women. An intelligent computer-aided diagnosis system can be very helpful for radiologist in detecting and diagnosing micro calcifications patterns earlier and faster than typical screening programs. In this paper, we present a system based on gabor filter based enhancement technique and feature extraction techniques using texture based segmentation and SOM(Self Organization Map) which is a form of Artificial Neural Network(ANN) used to analyze the texture features extracted. SOM determines which texture feature has the ability to classify benign, malignant and normal cases. Watershed segmentation technique is used to classify cancerous region from the non cancerous region. We have investigated and analyzed a number of feature extraction techniques and found that a combination of ten features, such as Cor-relation, Cluster Prominence, Energy, Entropy, Homogeneity, Difference variance, Difference Entropy, Information Measure, and Normalized are calculated. These features gives the distribution of tonality information and was found to be the best combination to distinguish a benign micro calcification pattern from one that is malignant and normal. The system was developed on a Windows platform. It is an easy to use intelligent system that gives the user options to diagnose, detect, enlarge, zoom, and measure distances of areas in digital mammograms. Further Using Linear Filtering Technique and the Texture Features as Mask are convolved with the segmented image .The tumor is detected using the above method and using watershed segmentation, a fair segmentation is obtained The artificial neural network with unsupervised learning together with texture based approach leads to the accuracy and positive predictive value of each algorithm were used as the evaluation indicators. 121 records acquired from the breast cancer patients at the MIAS database. The results revealed that the accuracies of texture based unsupervised learning has 0.9534 (sensitivity 0.98716 and specificity 0.9582 which was detected thorough the ROC. The results showed that the gabor based unsupervised learning described in the present study was able to produce accurate results in the classification of breast cancer data and the classification rule identified was more acceptable and comprehensible.

Author 1: S.Pitchuman Angayarkanni M.C.A

Author 2: M.Phil

Author 3: V.Saravanan

Keywords: Image Enhancement; Gabor Filter; Texture Features; SOM; ROC.

Paper 6: Improvement of Secret Image Invisibility in Circulation Image with Dyadic Wavelet Based Data Hiding with Run-Length Coded Secret Images of Which Location of Codes are Determined with Random Number

Abstract: An attempt is made for improvement of secret image invisibility in circulation images with dyadic wavelet based data hiding with run-length coded secret images of which location of codes are determined by random number. Through experiments, it is confirmed that secret images are almost invisible in circulation images. Also robustness of the proposed data hiding method against data compression of circulation images is discussed. Data hiding performance in terms of invisibility of secret images which are embedded in circulation images is evaluated with the Root Mean Square difference between the original secret image and extracted one from the circulation images. Meanwhile the conventional Multi-Resolution Analysis (MRA) based data hiding is attempted with a variety of parameters, level of MRA and the frequency component location of which secret image is replaced to it and is compared to the proposed method. It is found that the proposed data hiding method is superior to the conventional method. Also the conventional data hiding method is not robust against circulation image processing.

Author 1: Kohei Arai

Author 2: Yuji Yamada

Keywords: Dyadic wavelet; Lifting wavelet; Data hiding; Data compression.

Paper 7: Unsupervised Method of Object Retrieval Using Similar Region Merging and Flood Fill

Abstract: In this work; we address a novel interactive framework for object retrieval using unsupervised similar region merging and flood fill method which models the spatial and appearance relations among image pixels. Efficient and effective image segmentation is usually very hard for natural and complex images. This paper presents a new technique for similar region merging and objects retrieval. The users only need to roughly indicate the after which steps desired objects boundary is obtained during merging of similar regions. A novel similarity based region merging mechanism is proposed to guide the merging process with the help of mean shift technique. A region R is merged with its adjacent regions Q if Q has highest similarity with R among all Q’s adjacent regions. The proposed method automatically merges the regions that are initially segmented through mean shift technique, and then effectively extracts the object contour by merging all similar regions. Extensive experiments are performed on 22 object classes (524 images total) show promising results.

Author 1: Kanak Saxena

Author 2: Sanjeev Jain

Author 3: Uday Pratap Singh

Keywords: Image segmentationl; similar regions; region merging; mean shift; flood fill.

Paper 8: Tifinagh Character Recognition Using Geodesic Distances, Decision Trees & Neural Networks

Abstract: The recognition of Tifinagh characters cannot be perfectly carried out using the conventional methods which are based on the invariance, this is due to the similarity that exists between some characters which differ from each other only by size or rotation, hence the need to come up with new methods to remedy this shortage. In this paper we propose a direct method based on the calculation of what is called Geodesic Descriptors which have shown significant reliability vis-à-vis the change of scale, noise presence and geometric distortions. For classification, we have opted for a method based on the hybridization of decision trees and neural networks.

Author 1: O Bencharef

Author 2: M.Fakir

Author 3: B. Minaoui

Author 4: B.Bouikhalene

Keywords: component ; Tifinagh character recognition; Neural networks ; Decision trees, Riemannian geometry ; Geodesic distances.

Paper 9: Motion Blobs as a Feature for Detection on Smoke

Abstract: Disturbance that is caused due to visual perception with the atmosphere is coined as smoke, but the major problem is to quantify the detected smoke that is made up of small particles of carbonaceous matter in the air, resulting mainly from the burning of organic material. The present work focuses on the detection of smoke immaterial it being accidental, arson or created one and raise an alarm through an electrical device that senses the presence of visible or invisible particles or in simple terms a smoke detector issuing a signal to fire alarm system / issue a local audible alarm from detector itself.

Author 1: Khalid Nazim S. A

Author 2: M.B. Sanjay Pande

Keywords: Motion blobs; Blob Extraction; Feature Extraction.

Paper 10: Extraction of Line Features from Multifidus Muscle of CT Scanned Images with Morphologic Filter Together with Wavelet Multi Resolution Analysis

Abstract: A method for line feature extraction from multifidus muscle of Computer Tomography (CT) scanned image with morphologic filter together with wavelet based Multi Resolution Analysis (MRA) is proposed. The contour of the multifidus muscle can be extracted from hip CT image. The area of multifidus muscle is then estimated and is used for an index of belly fat because there is a high correlation between belly fat and multifidus muscle. When the area of the multifidus muscle was calculated from the CT image, the MRA with Daubechies base functions and with the parameter of MRA of level is three would appropriate. After the wavelet transformation is applied to the original hip CT image three times and LLL (3D low frequency components) is filled “0” then inverse wavelet transformation is applied for reconstruction. The proposed method is validated with four patients.

Author 1: Kohei Arai

Author 2: Yuichiro Eguchi

Author 3: Yoichiro Kitajima

Keywords: multifidusmuscle; Computer Tomography; wavele; Multi Resulution Analysis; morphological filter.

Paper 11: Robust Face Detection Using Circular Multi Block Local Binary Pattern and Integral Haar Features

Abstract: In real world applications, it is very challenging to implement a good detector which gives best performance with great speed and accuracy. There is always a trade-off in terms of speed and accuracy, when we consider performance of a face detector. In the current work we have implemented a robust face detector which uses the new concept called integral Haar histograms with CMBLBP or CSMBLBP (circular multi block local binary operator).Our detector runs for real world applications and its performance is far better than any of the present detector. It works with good speed and enough accuracy with varying face sizes, varying illumination, varying angle, different face expressions, rotation, scaling like challenges which are mostly issues of concern in the domain of face detection. We use Matlab and Image processing tool box for the implementation of the above mentioned technique.

Author 1: P K Suri

Author 2: Amit Verma

Keywords: CMBLBP; MBLBP; LBP; Gentle Boosting; Face Detection.

Paper 12: A new vehicle detection method

Abstract: This paper presents a new vehicle detection method from images acquired by cameras embedded in a moving vehicle. Given the sequence of images, the proposed algorithms should detect out all cars in realtime. Related to the driving direction, the cars can be classified into two types. Cars drive in the same direction as the intelligent vehicle (IV) and cars drive in the opposite direction. Due to the distinct features of these two types, we suggest to achieve this method in two main steps. The first one detects all obstacles from images using the so-called association combined with corner detector. The second step is applied to validate each vehicle using AdaBoost classifier. The new method has been applied to different images data and the experimental results validate the efficacy of our method.

Author 1: Zebbara Khalid

Author 2: Abdenbi Mazoul

Author 3: Mohamed El Ansari

Keywords: component; intelligent vehicle; vehicle detection; Association; Optical Flow; AdaBoost; Haar filter.

Paper 13: Multimodal Biometric Person Authentication using Speech, Signature and Handwriting Features

Abstract: The objective of this work is to develop a multimodal biometric system using speech, signature and handwriting information. Unimodal biometric person authentication systems are initially developed for each of these biometric features. Methods are then explored for integrating them to obtain multimodal system. Apart from implementing state-of-the art systems, the major part of the work is on the new explorations at each level with the objective of improving performance and robustness. The latest research indicates multimodal person authentication system is more effective and more challenging. This work demonstrates that the fusion of multiple biometrics helps to minimize the system error rates. As a result, the identification performance is 100% and verification performances, False Acceptance Rate (FAR) is 0%, and False Rejection Rate (FRR) is 0%.

Author 1: Eshwarappa M N

Author 2: Mrityunjaya V. Latte

Keywords: Biometrics; Speaker recognition; Signature recognition; Handwriting recognition; Multimodal system.

Paper 14: A Fuzzy Decision Tree to Estimate Development Effort for Web Applications

Abstract: Web Effort Estimation is a process of predicting the efforts and cost in terms of money, schedule and staff for any software project system. Many estimation models have been proposed over the last three decades and it is believed that it is a must for the purpose of: Budgeting, risk analysis, project planning and control, and project improvement investment analysis. In this paper, we investigate the use of Fuzzy ID3 decision tree for software cost estimation, it is designed by integrating the principles of ID3 decision tree and the fuzzy set-theoretic concepts, enabling the model to handle uncertain and imprecise data when describing the software projects, which can improve greatly the accuracy of obtained estimates. MMRE and Pred are used, as measures of prediction accuracy, for this study. A series of experiments is reported using Tukutuku software projects dataset. The results are compared with those produced by three crisp versions of decision trees: ID3, C4.5 and CART.

Author 1: Ali Idri

Author 2: Sanaa Elyassami

Keywords: Fuzzy Logic; Effort Estimation; Decision Tree; Fuzzy ID3; Software project.

Paper 15: An Extended Performance Comparison of Colour to Grey and Back using the Haar, Walsh, and Kekre Wavelet Transforms

Abstract: The storage of colour information in a greyscale image is not a new idea. Various techniques have been proposed using different colour spaces including the standard RGB colour space, the YUV colour space, and the YCbCr colour space. This paper extends the results described in [1] and [2]. While [1] describes the storage of colour information in a greyscale image using Haar wavelets, and [2] adds a comparison with Kekre’s wavelets, this paper adds a third transform – the Walsh transform and presents a detailed comparison of the performance of all three transforms across the LUV, YCbCr, YCgCb, YIQ, and YUV colour spaces. The main aim remains the same as that in [1] and [2], which is the storage of colour information in a greyscale image known as the “matted” greyscale image.

Author 1: H B Kekre

Author 2: Sudeep D. Thepade

Author 3: Adib Parkar

Keywords: Colouring; Colour to Grey; Matted Greyscale; Grey to Colour; LUV Colour Space; YCbCr Colour Space; YCgCb Colour Space; YIQ Colour Space; YUV Colour Space; Haar Wavelets; Kekre’s Wavelets; Walsh Transform.

Paper 16: A Prototype Student Advising Expert System Supported with an Object-Oriented Database

Abstract: Using intelligent computer systems technology to support the academic advising process offers many advantages over the traditional student advising. The objective of this research is to develop a prototype student advising expert system that assists the students of Information Systems (IS) major in selecting their courses for each semester towards the academic degree. The system can also be used by academic advisors in their academic planning for students. The expert system is capable of advising students using prescriptive advising model and developmental advising model. The system is supported with an object-oriented database and provides a friendly graphical user interface. Academic advising cases tested using the system showed high matching (93%) between the automated advising provided by the expert system and the advising performed by human advisors. This proves that the developed prototype expert system is successful and promising.

Author 1: M Ayman Ahmar

Keywords: academic advising; expert system; object-oriented database.

Paper 17: Face Recognition Using Bacteria Foraging Optimization-Based Selected Features

Abstract: Feature selection (FS) is a global optimization problem in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, and results in acceptable recognition accuracy. This paper presents a novel feature selection algorithm based on Bacteria Foraging Optimization (BFO). The algorithm is applied to coefficients extracted by discrete cosine transforms (DCT). Evolution is driven by a fitness function defined in terms of maximizing the class separation (scatter index). Performance is evaluated using the ORL face database.

Author 1: Rasleen Jakhar

Author 2: Navdeep Kaur

Author 3: Ramandeep Singh

Keywords: Face Recognition; Bacteria Foraging Optimization; DCT; Feature Selection.

Paper 18: Instant Human Face Attributes Recognition System

Abstract: The objective of this work is to provide a simple and yet efficient tool for human attributes like gender, age and ethnicity by the human facial image in the real time image as we all aware this term that “Real-Time frame rate is a vital factor for practical deployment of computer vision system”. In this particular paper we are trying to presents the progress towards face detection and human attributes classification system. We have developed an algorithm for the classification of gender, age and race from human frontal facial image As the basis of the classifier proposed algorithm uses training set neuron receptors that process visual information a study of the several variants of these classifiers and shows the principal possibility of sex determination, assessment of a person's age on a scale (adult - children) and recognition of race by using the neuron-like receptors.

Author 1: N Bellustin

Author 2: Y. Kalafati

Author 3: Kovalchuck

Author 4: A. Telnykh

Author 5: O. Shemagina

Author 6: V.Yakhno

Author 7: Abhishek Vaish

Author 8: Pinki Sharma

Keywords: Gender recognition; Age recognition; Ethnicity recognition; MCT; AdaBoost; attributes classifier.

Paper 19: Mining Volunteered Geographic Information datasets with heterogeneous spatial reference

Abstract: When the information created online by users has a spatial reference, it is known as Volunteered Geographic Information (VGI). The increased availability of spatiotemporal data collected from satellite imagery and other remote sensors provides opportunities for enhanced analysis of Spatiotemporal Patterns. This area can be defined as efficiently discovering interesting patterns from large data sets. The discovery of hidden periodic patterns in spatiotemporal data could provide unveiling important information to the data analyst. In many applications that track and analyze spatiotemporal data, movements obey periodic patterns; the objects follow the same routes (approximately) over regular time intervals. However, these methods cannot directly be applied to a spatiotemporal sequence because of the fuzziness of spatial locations in the sequence. In this paper, we define the problem of mining VGI datasets with our already established bottom up algorithm for spatiotemporal data.

Author 1: Sadiq Hussain

Author 2: G.C. Hazarika

Keywords: data mining; periodic patterns; spatiotemporal data; Volunteered Geographic Information.

Paper 20: Method for Extracting Product Information from TV Commercial

Abstract: Television (TV) Commercial program contains important product information that displayed only in seconds. People who need that information has no insufficient time for noted it, even just for reading that information. This research work focus on automatically detect text and extract important information from a TV commercial to provide information in real time and for video indexing. We propose method for product information extraction from TV commercial using knowledge based system with pattern matching rule based method. Implementation and experiments on 50 commercial screenshot images achieved a high accuracy result on text extraction and information recognition.

Author 1: Kohei Arai

Author 2: Herman Tolle

Keywords: text detection; information extraction; rule based classifying; patern matching.

Paper 21: Efficient Cancer Classification using Fast Adaptive Neuro-Fuzzy Inference System (FANFIS) based on Statistical Techniques

Abstract: The increase in number of cancer is detected throughout the world. This leads to the requirement of developing a new technique which can detect the occurrence the cancer. This will help in better diagnosis in order to reduce the cancer patients. This paper aim at finding the smallest set of genes that can ensure highly accurate classification of cancer from micro array data by using supervised machine learning algorithms. The significance of finding the minimum subset is three fold: a) The computational burden and noise arising from irrelevant genes are much reduced; b) the cost for cancer testing is reduced significantly as it simplifies the gene expression tests to include only a very small number of genes rather than thousands of genes; c) it calls for more investigation into the probable biological relationship between these small numbers of genes and cancer development and treatment. The proposed method involves two steps. In the first step, some important genes are chosen with the help of Analysis of Variance (ANOVA) ranking scheme. In the second step, the classification capability is tested for all simple combinations of those important genes using a better classifier. The proposed method uses Fast Adaptive Neuro-Fuzzy Inference System (FANFIS) as a classification model. This classification model uses Modified Levenberg-Marquardt algorithm for learning phase. The experimental results suggest that the proposed method results in better accuracy and also it takes lesser time for classification when compared to the conventional techniques.

Author 1: K Anandakumar

Author 2: M.Punithavalli

Keywords: Gene Expressions; Cancer Classification; Neural Networks; Neuro-Fuzzy Inference System; Analysis of Variance; Modified Levenberg-Marquardt Algorithm.

Paper 22: Clustering Student Data to Characterize Performance Patterns

Abstract: Over the years the academic records of thousands of students have accumulated in educational institutions and most of these data are available in digital format. Mining these huge volumes of data may gain a deeper insight and can throw some light on planning pedagogical approaches and strategies in the future. We propose to formulate this problem as a data mining task and use k-means clustering and fuzzy c-means clustering algorithms to evolve hidden patterns.

Author 1: Bindiya M Varghese

Author 2: Jose Tomy J

Author 3: Unnikrishnan A

Author 4: Poulose Jacob K

Keywords: Data mining; k-means Clustering; Fuzzy C-means; Student performance analysis.

Paper 23: Comparative Analysis of Various Approaches Used in Frequent Pattern Mining

Abstract: Frequent pattern mining has become an important data mining task and has been a focused theme in data mining research. Frequent patterns are patterns that appear in a data set frequently. Frequent pattern mining searches for recurring relationship in a given data set. Various techniques have been proposed to improve the performance of frequent pattern mining algorithms. This paper presents review of different frequent mining techniques including apriori based algorithms, partition based algorithms, DFS and hybrid algorithms, pattern based algorithms, SQL based algorithms and Incremental apriori based algorithms. A brief description of each technique has been provided. In the last, different frequent pattern mining techniques are compared based on various parameters of importance. Experimental results show that FP- Tree based approach achieves better performance.

Author 1: Deepak Garg

Author 2: Hemant Sharma

Keywords: Data mining; Frequent patterns; Frequent pattern mining; association rules; support; confidence; Dynamic item set counting