Analysis and Selection of Features for Gesture Recognition Based on a Micro Wearable Device

More and More researchers concerned about designing a health supporting system for elders that is light weight, no disturbing to user, and low computing complexity. In the paper, we introduced a micro wearable device based on a tri-axis accelerometer, which can detect acceleration change of human body based on the position of the device being set. Considering the flexibility of human finger, we put it on a finger to detect the finger gestures. 12 kinds of one-stroke finger gestures are defined according to the sensing characteristic of the accelerometer. Feature is a paramount factor in the recognition task. In the paper, gestures features both in time domain and frequency domain are described since features decide the recognition accuracy directly. Feature generation method and selection process is analyzed in detail to get the optimal feature subset from the candidate feature set. Experiment results indicate the feature subset can get satisfactory classification results of 90.08% accuracy using 12 features considering the recognition accuracy and dimension of feature set. Keywords—Internet of Things; Wearable Computing; Gesture Recognition; Feature analysis and selection; Accelerometer.


I. INTRODUCTION
Internet of Things (IoTs) has become a hot topic in the computer science field, which indicates that all objects in the environment like human, home appliances, building, and service equipment can be sensed, identified, even controlled via the internet.IoTs will promote many development of application system, such as health supporting system for elder.Many countries are facing a serious society issue of population ageing.One common trend is more and more elders living alone and less able to benefit from the care and supporting that might be available in a large household.Investigation from World Health Organization indicates, in Japan, the proportion of people living in 3-generation households has fallen from 46% in 1985 to 20.5% in 2006.Health care both physical and mental becomes an important problem in current society.
Health supporting system has been studied widely in recent years [1] [2].Two kinds of main supporting way focus on speech-based communication and activity-based recognition.The former provides a direct and effective way to know users intension, which has been used in the hospital and household [3] [4].However, voice signal is sensitive to environment sound such as a TV being on, so that sometimes hard to pick out useful speech signal made by a user.Even under certain circumstance, the user may too weak to make a voice to call for a help.Activity recognition provides an active and undisturbed way for elderly care.For example, if an elder person falls down, the system of falling recognition can send message automatically asking for help.
Among the human activities, finger gestures are the most flexible ones.In our daily life, most of works are performed by hands.Gesture recognition is significant for learning user behavior, realizing for device control, and getting user intention.In the paper, we designed a wearable device with an accelerometer to detect finger gestures.Based on the accelerometer characteristic, a variety of finger gestures are defined in a 3D space.
Particularly, gestures features are studied both in time domain and frequency domain considering the paramount importance of feature generation in recognition task.Each kind of candidate features and their combination are analyzed based on stepwise regression algorithm to form a feature vector for accurate gesture classification and computing complexity control.
The paper is arranged as follows.The section Ⅱ introduces the related work about activity recognition and feature analysis.The section Ⅲ outlines the prototype system of gesture detection, and gives the gestures definition and data collection.The section Ⅳ describes the feature analysis method in detail including features generation process and features selection algorithm based on stepwise regression.The section Ⅴ gives the experiment and evaluation on feature generation and selection.Conclusion is given in the last section.

II. RELATED WORKS
Two types of system are mainly used for activity recognition.One is fixed device-based recognition system, and the other is wearable device-based detection system.Fixed device-based techniques have been applied widely into various fields by different devices such as camera, computer vision system, and so on [5] [6].The kind of system provides an application way of no burden to users.However, some people do not like the way of being supervised, and some private spaces are inconvenient to be set a camera like in bathroom.Moreover, some issues have to be considered including whether environment factor is fit for monitoring or www.ijacsa.thesai.orgnot such as surrounding light and blind corner of a camera, image processing speed and delay, information loss of 3-D object projecting to a 2-D image, and confusion of multiple users in same background and so on [5].
With the development of micro-electrical technology, micro wearable devices are penetrating into our life.They can be attached on human body to obtain user information directly, typically using RFID and sensor.RFID technology is attractive for many applications since it can detect user's situations by simple ID and location information [7].But it is difficult to detect motion.Wearable sensors have shown their capability for activity recognition.Some research set sensors in different position of body to detect human daily activities.In [8] the system sets five accelerometers on hip, wrist, arm, ankle, and thigh for classifying 20 daily activities such as walking, sitting & relaxing, brushing teeth, bicycling, etc, and got the recognition accuracy ranging from 41% to 97% for different activities.In [9] a large realistic data are collected from many different sensors (accelerometers, physiological sensors, environment sensors, etc.) to recognize 7 activities like lie, row, walk, etc. with accuracy of 80% over.However, to our knowledge, most of the researches seldom focus on the tiny activity recognition like finger gesture.
Finger is one of the most flexible body parts.Most of works in our daily lift are accomplished by it.Therefore, detection of finger gestures not only helps to know user current behavior, but also reflect user intension and carry on some operations.Data glove, as an interactive device worn on the hand to sense gestures, has been applied in the environment of virtual reality [12].However, the gestural interface required user to wear a cumbersome device to connect with external.It is inflexible and inconvenient for daily operation.
Moreover, most of above researches have no explained on the process of features generation and selection while the features are of paramount importance in any recognition works.Based on different sensing device, some researches get features directly from time-varying signal [10] [11] and with frequency analysis [8] [9].Some prefer to wavelet analysis to obtain both spectral and temporal information [13] [14].However, it is not be illustrated why the features are necessary, if they can be substituted on others, what will happen if adding or deleting one of them.
In this paper, we introduce a wearable device in our previous research named Magic Ring.It can be used to detect 12 kinds of predefined one-stroke finger gestures based on a 3-axis accelerometer [15].A verity of gesture features are extracted for classification evaluation.However, the process of feature selection is a lack.In this paper, we focus on the feature analysis method including feature generation, feature selection, and feature evaluation taking the wearable sensor and its receiver as a prototype of the final target devices.

A. Prototype Structure
The system is a ring shape sensing device based on a 3-trial accelerometer, MMA7361L from Freescale Semiconductor, Inc.
In its two sensitive scales of ±1.5g and ±1.8g, ±1.5g is adopted to detect all predefined gestures.Excepting for the sensing unit, data processing unit is used for A/D conversion and simple digital signal processing; and transmitting unit is for acceleration data transmission and communication.The system can be worn a finger with no much disturbing to daily activity as shown in Fig. 1.

B. Gestures Definition
In the paper, the purpose of gesture recognition is to learn user simple intension and further to apply the gestures into daily life like controlling home appliances or calling for help.Therefore the gestures should be easy to be done for reducing physical load and easy to be understood and learned for reducing conscious load.Combining the way of controlling appliances and the characteristic of 3-axis accelerometer, 12 kinds of dynamic one-stroke gestures are designed.One-stroke refers to dynamic gestures which are performed no more than one degree of freedom in one direction.For example, "pushing" a button, "turning" a knob, and "pointing to" a picture can be regarded as one-stroke gestures.
The tri-axis accelerometer can detect the acceleration change of three directions in space as shown in Fig. 2.

C. Data Collection
The system is attached on the middle phalanx of forefinger since it is the most flexible in all fingers.Gestures data is collected as sampling 50Hz.The digital signal is stored in a PC for data analysis, features extraction and gestures classification.20 students in the university (16 males and 4 females, average age 25.8±7.8)volunteered for the experiment of data collection under the supervision of a researcher.They are required to perform the predefined 12 gestures by forefinger in a natural and relax way.The gestures started in horizontal and static state, ended with static state.Each gesture was repeated 5 times per people and 100 times for 20 people totally.
IV. FEATURES ANALYSIS For a finger gesture, it can be expressed quantitatively as a digital signal based on the sensing information.The features of the signal can indicate the type of a gesture and is useful for recognizing the gesture.A signal can be identified with various features.Therefore the features analysis is vital for identifying the signal.Roughly speaking, the more features are used, the higher accuracy may be achieved, but higher complexity the recognition is.However, not each feature can be used for distinguishing the gesture from others, e.g. for the two signals in Fig. 4, it is the acceleration change in X axis from two different kinds of finger gestures.The feature of signal energy or mean can distinguish them, but the peak value as a feature is failed to recognize one from the other.
Moreover, the number of features may not have direct relation with identification effects.Although it is different that each feature and their combinations contribute to recognition accuracy, that do not mean the more features are, the better recognition accuracy is.Furthermore, the high dimensionality of features increases computing cost for some recognition algorithm.Therefore, analysis and selection of proper features of gestures to be recognized has to be performed to get the optimal feature vector/set for the balance between acceptable recognition rate and computing complexity.
The feature analysis process is, first, to extract signals of target gestures; second, based on the signals, generate candidate features; and finally, selection of proper features from the candidates to form feature set for recognition task.The paper, taking an accelerometer as an example, gives the feature analysis and selection procedure of finger gestures with an accelerometer.

A. Extraction of Gesture signal
Extraction of gesture signal refers to get the section related to the target gesture from successive signals, namely to detect start and end point of the gesture.Since the gesture signal shows a dynamic change trend from a static state to a dynamic activity, and then back to static, therefore the short time energy (STE) of signal is considered to distinguish the different states.
When STE in a sliding window is higher than a level, we think a gesture start to be performed, until the STE becomes lower than the level.We recorded the duration of each gesture per subject.Results show it roughly ranges from 200 to 800ms.Due to the sampling rate is 50Hz, the window size is compromised within 10 samples with 50% overlap between two continuous windows.

B. Feature Generation
Basically, feature can be divided into two types: features in time domain and frequency domain.Features in time domain show the signal characteristic varying with time.Typical features are shown in Fig. 5.
Frequency features are used to capture the periodic nature of a sensing signal for distinguishing some repetitive activities like walking and running.The typical frequency-domain feature is shown in Fig. 6.
Excepting for the features mentioned above, others can be extracted according to different classification objects like Euclidian distances, similarity, and so on.All above mentioned features can be generated based on the corresponding mathematical or statistical method.However, not all features are necessary for a classification system.A direct reason is whether a feature may bring good classification significance.Even though both two features have good classification capability, there is maybe little gain when they are collected into a feature vector due to a high mutual correlation [16].Another reason is the computational complexity.The number of features directly decides the dimensionality of classifier parameter.Thus a feature vector as small as possible is desired both in training process and in classifying process.

C. Feature Selection
The feature selection is very crucial, which helps us use as less as possible of features to find out as much as possible of classification information then to get the optimal recognition performance.Here we try to find an optimal feature vector to reach the balance between the acceptable recognition rate and computational complexity.In practical application, a satisfactory feature vector instead of an optimal vector.
In the paper, we adopt the algorithm of feature selection, stepwise regression.It is a greedy algorithm that includes a regression model in which candidate features are evaluated automatically.Forward selection and backward elimination are two main approaches to achieve the algorithm.The former represents the procedure starting with no features in the model, then trying to add one into the feature vector one by one until them reaching a "satisfactory significance".The latter is contrary to the former, which including all candidate features in the model, and deleting one that is no significant.Here, we used the forward selection way.
 be the candidate feature set of using for classification design.The elements in the set are descending order by the significance level.Let i f S denotes the significance level of feature i f , for the feature set . We select a classifier as the model, then adding the feature one by one from 1  f to test the classification accuracy until reaching a satisfactory result.

V. EXPERIMENT AND EVALUATION
A. Features Generation 20 subjects completed the total 12 kinds of finger gestures under the supervision of researcher.Before each one-stroke finger gesture, it requires finger in a horizontal and static state.When finishing a gesture, finger should maintain ending state and stillness.In other words, the finger is dynamic just during a gesture being performed.Therefore, it is possible to identify if a gesture happening using a threshold-based approach.Fig. 7 and Fig. 8 shows two finger gesture signals "Finger Left Shift" and "Finger Up", which is composed of three channels and indicates the acceleration change in three axis.From the extracted the gestures signals, various features can be generated both in time domain and frequency domain as described in last section.For example the mean and standard deviation (sd) of each axis in Fig. 7 and Fig. 8 can be computed as where x denotes the sampling, and n denotes the number of sampling in the window.Other features can also be achieved by mathematical or statistical way.
Although any signal features can be regarded as a candidate, in order to reduce the computing load, some obvious insignificant features will be neglected.In our case, the finger gestures are one-stroke type, which means each gesture is aperiodic and instantaneous.Therefore, the features in frequency domain are neglected.
The process of feature generation can be expressed as: (1) Observe the signals of one gesture from different subjects and try to describe it using some typical features.For example, in Fig. 7, we may consider mean, energy, etc.
(2) Observe the signals of variety of kinds of gestures to find out features with the capability of distinguishing with others like sd of Z axis in above two figures.
(3) Abandon some insignificant features like "amplitude" of each axis in our case, because even if a gesture is performed by same person, amplitude of each time will be great different due to the difference of performance speed.
(4) Collect the features to form a candidate feature set for feature selection.In our case, time-domain features of each axis are calculated including mean, standard deviation, energy, entropy, correlation of any two axes, difference of peak and valley, and position of peak and valley in the time axis, totally 8 kinds of features to compose a candidate feature set.Each feature in the set consists of three elements in X, Y, and Z axis, such as  1 f {meanX, meanY, meanZ}.

B. Features Selection
Features selection is to find a satisfactory feature subset from the candidate feature set, so that to reach an optimal classification accuracy and computing complexity control.It is crucial since it decides the classification result directly.Forward selection algorithm of stepwise regression is adopted to test each feature and their combinations one by one.In the algorithm, a model is required to evaluate the features.For obtaining an objective evaluation results, here we select three basic classifiers of machine learning, C4.5 decision tree (C4.5),Nearest Neighbor (NN), and Naïve Bayes (NB), as three test models to calculate the classification accuracy of 12 kinds of one-stroke finger gestures mentioned above, and testing average of three classifiers (Avg) is employed as the final evaluation result.
First, each one in the candidate feature set is tested by three models, the evaluation results are ranged in a descending order as shown in Table 1.Second, selecting the optimal feature from the Table 1, 1 f , and combining with other features, the combinations will be recomputed based on the models.The results are shown in Table 2. Third, repeating the above process of selection to get the best one, 3 1 f f  , as a basic feature combination, and then combining with rest features to reevaluate the significance of combined features.From Table 3 to Table 7 show the evaluation results with different combination.Each optimal combination in each Table will be the basic of next combination.
, is 85.22%.Fig. 9 shows the evaluation results under each kinds of combination.
It can be seen from the Fig. 9, with the increasing of the number of features, the classification accuracy will increase.However, when feature combination reaches to some extent, such as after , the accuracy has no obvious change.That indicates not the more features are, the better classification accuracy is.Even under certain circumstance, large number of features will reduce the accuracy, which is the peaking phenomenon occurring for larger features.
We can find from the Fig. 9, the best classification result is about 85%.To improve the classification, other features are considered to add into the candidate set.By observing the acceleration signals of each kinds of finger gesture, we find, to a gesture signal, the acceleration change in each axis shows great difference.For example the gesture "Finger Left Shift" in Fig. 7, acceleration in Y axis shows intense change than X and Z axis.While for "Finger Up" in Fig. 8, X and Z axes show obvious fluctuation.If taking sd as the fluctuation level, the comparison of sd between two axis is generated adding into the candidate feature, which is sdX>sdY, sdY>sdZ, and sdX>sdZ.we name the features as relative features.In addition, for identifying the gestures with opposite direction, coming sequence of peak and valley in single signal is adopted, which is represented as posPeakX>posValX, posPeakY>posValY, and posPeakZ>posValZ.Using the relative features, candidate feature set is regression evaluated again.Result show the recognition accuracy reach to 90.08% with the feature subset {meanX, meanY, meanZ, sdX, sdY, sdZ, sdX>sdY, sdY>sdZ, sdX>sdZ, posPeakX>posValX, posPeakY>posValY, posPeakZ>posValZ}.These relative features not only improve the recognition accuracy, but also reduce the computing complexity because of their alternative Boolean value.Besides, the system robust can be increased since a relative relationship can prevent classifier from acting of the initial state.The final recognition matrix is shown in Table 8 based on Nearest Neighbor classifier.

VI. CONCLUSION
In the recognition task, features are of paramount importance.In the paper, we focus on the research of feature analysis and selection, which includes how to generate the candidate feature set based on the sensing information, how to evaluate each feature and their combinations, and how to select the optimal feature subset.
Feature generation process need to observe the activity signal and initially select some features and neglect insignificant one.Based on the candidate feature set, forward selection algorithm of stepwise regression is adopted to evaluate each feature and their combinations.The final combination with good classification significance is selected as feature subset for gesture recognition.Experiment result indicates the process of feature analysis and selection is feasible to most of activity recognition research.In the future, we plan to use our method to evaluate other kinds of sensing device and activity data.

ACKNOWLEDGMENT
We would like to thank the subjects who help us with the experiment in the research.

Figure 1 .
Figure 1.Sensing system on the finger

Figure 2 .
Figure 2. Sensing direction of a tri-axis accelerometer Considering the tri-axis characteristic of the accelerometer and requirement of finger gestures, 12 kinds of one-stroke finger gestures are defined as shown in Fig. 3.The 12 kinds of gestures can be divided into 3 pairs of gestures in X, Y and Z axis and 3 pairs of gestures in XY, YZ and XZ axial plane.These gestures are named as Crook and Unbend in X axis, Finger L-Shift and Finger R-Shift in Y axis, Finger Up and Finger Down in Z axis, Wrist L-shift and Wrist R-shift in XY plane, L-Rotate and R-Rotate in YZ plane, Wrist Up and Wrist Down in XZ plane.The modes of motion for the six pairs of gestures are shown in Fig. 3.

Figure 3 .
Figure 3. Modes of motion for the six pairs of gestures

Figure 4 .
Figure 4. Two signals with same max value

Figure 9 .
Figure 9. Recognition accuracy under different feature combinations

TABLE 1 .
THE RESULTS OF SINGLE FEATURE EVALUATION

TABLE 2 .
THE EVALUATION RESULTS OF COMBINING TWO FEATURES

TABLE 8 .
RECOGNITION MATRIX FOR 12 KINDS OF FINGER GESTURES BASED ON THE SELECTED FEATURE SUBSET