Feature based Algorithmic Analysis on American Sign Language Dataset

Physical disability is one of the factor in human beings, which cannot be ignored. A person who can’t listen by nature is called deaf person. For the representation of their knowledge, a special language is adopted called ‘Sign-Language’. American Sign Language (ASL) is one of the most popular sign language that is used for learning process in deaf persons. For the representation of their knowledge by deaf persons, a special language is adopted ‘Sign-Language’. American Sign Language contains a set of digital images of hands in different shapes or hand gestures. In this paper, we present feature based algorithmic analysis to prepare a significant model for recognition of hand gestures of American Sign Language. To make a machine intelligent, this model can be used to learn efficiently. For effective machine learning, we generate a list of useful features from digital images of hand gestures. For feature extraction, we use Matlab 2018a. For training and testing, we use weka-3-9-3 and Rapid Miner 9 1.0. Both application tools are used to build an effective data modeling. Rapid Miner outperforms with 99.9% accuracy in auto model. Keywords—Hand gesture recognition; pre-processing; weka; rapid miner; HOG; LBP; auto model


I. INTRODUCTION
Sign language provides a big aid and convenience in human life [1] and used especially by deaf persons and by other people to add weight in conversation. Visual representation by hands, delivers a meaningful message to others [2]. Sign Language consists in three forms: one is called facial expression, second is hand gestures and third is called body postures [1], [2]. In our daily life, we mostly use our body postures and facial expression to deliver meaningful information to others. The goal of the communication is achieved when the senders message fully interpreted by the receiver with full of emotions. Hand gestures and facial expressions play an important role in the learning process of deaf persons. Sign language is greatly influenced by hand gestures recognition. Hand gesture plays vital role in understanding sign language [1]. It can be taken from live camera in the form of moving hand gestures or in the form of still images [3]. In our research we will consider only still images of hand gestures.
The persons who can't listen by birth are called deaf persons. Deaf persons can't listen any voice through their ears.Teaching them verbally is not effective way of communication. There is a need of special language for their learning purpose. That is called a "Sign Language". Sign Language is used to understand the conveyed message from others. American Sign Language has 24 different hand postures. Each  [21] posture shows a unique ASL letter. The following Fig. 1 shows American Sign Language alphabets. Sign Language field is very vast. The study of Hand gestures are always being a very tough to learn. A machine could not be recognized gestures until or unless the machine is professionally trained.
The above data set of sign American Sign Language is taken form a well-known website "kaggle" [21]. The data set did not contain the letter "J" and "Z". It is because visually similarity of these two signs with others.
According to a rough counting in a research [3], almost there are 500,000 to 2,000,000 deaf people's uses the sign language for communication with one another. The counting figure may be different from other proved research, but everyone would be agreed that the sign language is at the third most wanted and most used all over the world [3].
We can build a model to recognized hand gestures using www.ijacsa.thesai.org different techniques. In the past, developers used a finger technique, in which a user uses a finger mouse to capture fingers [4], using skin colour detection from any useful algorithm [3], gloves technique that was used neural net [5], feature extraction technique using Scale-Invariant Feature Transform(SIFT) algorithm [6]. All of these techniques are very tough to implement. From the above source [3] talked, a skin detection algorithm is used to detect the skin colour.
A special environment is created for skin detection with sufficient lightning conditions. There are some constraints that needs to be satisfied. First, the background color must be different from skin color. Second, algorithms fails to perform well under different backgrounds and colored clothes. For skin detection, the user should be there in a specially created environment in which a sufficient light was required. In case of less light or different background and cloth colours, the skin detection algorithm did not work properly and did not detect skin properly. Fig. 2 shows the detail. Just like skin detection algorithms, the neural net and SIFT techniques are also difficult to implement. A neural net algorithm takes a lot of time to process digital images.
In this paper, we present simple but efficient technique for ASL recognition. We provide a comprehensive analysis on different techniques with feature extraction and different algorithms. we use tools like (Weka and Rapid Miner) and achieved 99% accuracy on test data. Methods used in these techniques, experimental results and assumptions are described in coming Sections II, III, IV and V accordingly.

II. LITERATURE REVIEW
Focus of our research work is sign language recognition using hand gestures. It is important to understand gestures so that true semantics of communication can be grasped. According to a rough counting in a research [3], there are more than 500,000 to 2,000,000 deaf people's using the sign language for communication with one another. The counting figure may be different from other researcher's research, but everyone would be agreed that the sign language is at the third most wanted and most used all over the world [3]. There are different sign languages such as American Sign Language (ASL), Indian Sign Language (ISL), Arabic Sign Language (ArSL), Tamil Sign Language (TSL), Koran Sign Language (KSL), Japanese sign Language (JSL) and many more [1]. In our research work, we are focusing on American Sign Language (ASL).
Gesture recognition was used in 1993 for the first time [3]. Later for recognizing dynamic gestures, Dynamic Time Wrapping (DTW) technique was used [4]. Hidden Markov Model (HMM) was also used for recognizing sign language's shape [5], [6]. They used HMM efficiently and accuracy of sign language recognition reached to 94%. Later, it was found that accuracy dropped to 47.6%, when system was by a person other than those images were used for training. If both person's images are used for training then accuracy level increased [7]. Major limitation of HMM was its context dependency. HMM was used with 3D data to classify 53 ASL and attained accuracy of 89.91% [8].
Image acquisition and Pre-processing is the backbone of gesture recognition. In the Past, image is acquired using Leap Motion Controller (LMC), Kinect and vision based approaches [19]. LMC can acquire signals 200 frames per sec [18]. It has been widely used for hand gesture recognition tasks [20].
In the past, researchers used many methods for recognizing hand gestures. Some used a finger technique, in which a user uses a finger mouse to capture fingers [9], using skin colour detection from any useful algorithm [10], [11], gloves technique that used neural nets [12], [13], feature extraction technique using "SIFT" algorithm [14], [15]. Viola-Jones method was used for detecting skin, skin colour was used to detect hand. After hand detection features are extracted using SIFT and Support Vector Machine(SVM) is used for classification purpose [15].
Skin detection techniques are more sensitive in this process, the user should be there in a specially examined environment, which requires a specific intensity light. There are various other constraint like: The background colour must be different form skin colour, light should be constant and background and clothes should be simple.Various algorithms fails to perform in skin detection, if these condition cannot be fulfilled [9]. Fig. 2 shows the detail. Just like skin detection algorithms, the neural net and SIFT techniques are also not efficient both in accuracy and time. A neural net algorithm takes relatively more time than other techniques to process digital images [4]. So, Neural net is not suitable for real time skin detection [16]. K Nearest Neighbour algorithm was used with PCA and achieve 96% accuracy [17]. analysis of different tools and techniques with respect to time and accuracy. Sign language recognition can be divided into four major steps [16].

A. Image Acquisition and Pre-Processing
Data set for proposed work are taken from a well-known source "Kaggle". Other Data sources are also visits but we could not found enough data for hand gestures in digital images or Comma Separated Value (CSV) file format. At Kaggle, we found two data sets for hand gestures one is in the form of set of colour images and other is in the form of CSV file as shown in Fig. 5. Colour images has 9 folders and each folder has 241 colour images with an excel file. Excel file contains image name and the images dimensions from x1, y1, x2 and y2. After Pre-processing (reduction of images according to x1, y1, x2 and y2 given points in excel file) in MATLAB 2018a, we found images are not good as they still contains some unnecessary contents. In these images we found some images did not have proper cutting contents. According to the given dimensions in excel file, some hand gestures was cute and they did not express the accurate meaning of sign language. Fig. 3 shows the cute area of hand gestures.
Data set in the form of CSV files with the following name "signministtrain" and "signministtest" are checked. The training data set contains 27,455 digital images record and test data set contains 7,172 digital images records. These files contain pixel values of a grey scale digital image in the form 785 columns and the last column contains the class of each image. First 784 columns have pixel values of each image with dimension 28x28. Have a look of these CSV file in the following Fig. 4. ' First and very important task is to separate each file into its original graphical form, from its pixel values. For this purpose we use set of instruction in MATLAB to convert each pixel into an image. After analysing the "signministtrain" file, we have the following labels or classes for image dataset. In CSV file 5 each record has a label in numeric format, which means that each numeric digit is represent a sign language letter.
The following algorithm takes each row form csv or excel file from very first record to end of the file and reshape the each row vector from 1x784 columns to 28x28 columns vector. 28x28 column vector stores in an array and convert it into a graphical image file and store it on the given location.

1) Read CSV file and convert into an excel file format.
2) Resize each 1x784 column to 28x28 vector column by reading each record in excel file. 3) Generate digital images by reading each 28x28 vector. 4) Store the file at given location in digital form.
After executing this algorithm we got 27,455 training and 7,172 test images for hand gesture data set. After preprocessing the following Fig. 6 shows data set is generated in digital images. After analysing the "signministtrain" file, we have the following labels or classes for image dataset.

B. Feature Extraction
We use simple feature extraction technique in this paper to make is simple to simplest. MATLAB 2018a is used for feature extraction techniques. HOG (Histogram of Oriented Gradient) and LBP (Local Binary Pattern) are the important feature extraction techniques using in MATLAB.   Table I.In Larger images, we set the cell size large. As we increase the cell size, number of features are also increased. In our feature extraction technique, we use cell size [2,2], [4,4] and finally [8,8]    3) Statistical Feature Measurements: Based on above two techniques HOG and LBP, we use some additional statistical techniques for better feature extraction as shown in Table II. We use Mean, Standard deviation, variance and skewness for additional feature extraction techniques. The algorithm reads the files directory which contains training data set images. Get each file one by one and extract the HOG, LBP and other statistical feature measurements. Store this features into a CSV file on the specified location. The following algorithm is used for feature extraction in MATLAB 2018a.
1) Read stored images one by one from specified directory. 2) Generate HOG, LBP and other statistical feature measurements. 3) Set labels against each feature. 4) Store features into a CSV file in specified directory.

IV. EXPERIMENTS AND RESULTS
Weka-3-9-3 and Rapid Miner 9.1.0 are used for experiments of training and test data models as shown in Fig. 8.

A. Weka
Weka has a large collection of algorithms for creating effective models in machine learning techniques. Weka provides important facilities in regard of data preparation and classification. Weka also has regression algorithms and clustering algorithms for unsupervised learning. Decision trees and random forest algorithm are also include in weka for supervised learning. We use Navie Bayes, Lazy IBK and Random Forest algorithms.

B. Rapid Miner
Rapid Miner is the most latest software used for machine learning, data mining, deep learning and text mining. Rapid Miner introduced in 2006 and it has wonderful GUI and provides a lot of options to build a model for machine learning [22]. The algorithms KNN , Neural Net, Generalized Leaner Model, Deep Learning, Naïve Bayes, Random Forest and Decision Trees are used in Rapid Miner 9.1.0 for effective machine learning model. C. Experiment 1 The following Table III shows the detail of experiment 1. Different results are showing on different parameter settings of cell size as shown in Fig. 9. Lazy.IBK and Random forest gives the accuracy of 96.75% and 94.95% on cell size [8,8] in weka.

D. Experiment2
In our 2nd experiment we use some statistical measurement for extracting effective features. Table IV shows the detail of experiment 2. Lazy IBK gives the highest accuracy 96.87% on combination of HOG, LBP, Arithmetic mean and Variance. Random Forest gives the highest accuracy at HOG, LBP and Arithmetic mean.

E. Experiment3
In 3rd experiment we use only HOG features with LBP and increase the HOG cell size from [2,2] or [4,4] to [8,8] with Numbins 12. We got much better accuracy using Lazy.IBK up to 97.37%. Whereas Random Forest did not achieve much better accuracy. At NumBins 15, Lazy.IBK achieve better accuracy up to 97.70% whereas Random forest achieve accuracy 95.70%. Table V shows the detail of experiment.

F. Experiment4
The final experiment in weka we use NumBins 25 with cell size [8,8] and Lazy.IBK achieved 98.24% accuracy. Random Forest did not achieve much better accuracy than previously achieved 95.70%. Table VI shows the detail of experiment 4 and different graphs represents the data set accuracy of confusion matrix generated by Lazy.IBK. The below Fig. 9

G. Experiment5
After lot of experiments, we use a different tool "Rapid Miner 9.1.0". Rapid Miner is very sophisticated tool used for data mining. Algorithms (K-NN, Neural Net, Generalized Linear Model, Deep Learning, Naive Bayes, Random Forest and Decision Tree) are used to train a model and test our data set. Table VII shows the test results. Unfortunately, we could not achieve much better results than Weka. We got highest result 98.03 using K-NN.     in the graph.

H. Experiment6
After lot of experiments 8 in Rapid Miner, we decided to use auto model facility in Rapid Miner to build a model for training data. After building model, we achieved impressive results. Using HOG features with Cell [8 8] and NumBin 25, we got the success to build a model with 100% accuracy result using 'Generalized Linear Model' in auto model, in Rapid Miner. Using 'Deep Learning' we achieve 99.9% results whereas "Naïve Bayes" achieved 89.5% our results. Fig. 12 shows the detail of our achieved results on given test data set.

V. CONCLUSION AND RESULT ANALYSIS
From the above experiments, numbers of well-known algorithms are test on training and testing data provided by kaggle. Their results are clearly giving a message that Rapid Miner using auto model gives 100% accuracy. Whereas building up model using Lazy.IBK in Weka 3-9-3 gives 98.24% accuracy. Naive Bayes and Decision Tree did not achieve much better results and we did not add their results in this paper. On the other hand in Rapid Miner we use the following algorithms: KNN (K-Nearest Neighbour), Neural Net, Generalized Leaner Model, Deep Learning, Naïve Bayes, Random Forest and Decision Trees. Auto model is also used in Rapid Miner with following algorithms: "Nave Bayes", "Generalized Leaner Model" and "Deep Learning" in Rapid miner. In Rapid Miner 9.1.0, using different algorithms, we achieved highest accuracy 98.03% from K-NN (K Nearest Neighbor) algorithm.
In Rapid Miner using auto model, "Generalized Linear Model" produced 100% results whereas "Deep Learning" also produced 99.9%results and "Naive Bayes" achieved 89.5% results.
Rapid Miner performs extra ordinary performance on test data. Rapid Miner achieved 100% results as compare to Weka tool.