Fine-grained Accelerometer-based Smartphone Carrying States Recognition during Walking

Due to the dependency of our daily lives on smartphones, the states of the device have impact on the quality of services offered through a smartphone. In this article, we focus on the carrying states of the device while the user is walking, in which 17 states, e.g., in the front-left trouser pocket, calling phone in the right hand, in a backpack are subjects to recognition based on supervised learning with accelerometer-derived features. A large-scale data collection from 70 persons with three walking speeds allows reliable evaluation regarding suitable features and classifiers model, the feature selection method, robustness of localization against unknown person, and effect of walking speed in training a classifier. Person-independent evaluation shows that average F-measures of 17 class classification and merged 9 class classification were 0.823 and 0.913, respectively. Keywords—Smartphone; on-body localization; accelerometer; machine learning; feature selection; wearable computing


I. INTRODUCTION
Our daily lives heavily depend on smartphones that provides not only phone calling functionality, but also ubiquitous access to the Internet and replacement of objects for specific purposes, e.g., camera, pedometer, etc. as software.Various sensors are embedded into the device, which allows the a system to extract a user's and/or a device's context such as engaging activity [19], [23], [26] and a person/device location [16], [24], identity of pedestrian [28], environmental conditions around a user [8], [10], [15], [27], and so on, which contributes to provide appropriate information/services to a user based on the context.
According to a phone carrying survey, 17% of people determine the position of storing a mobile phone based on contextual restrictions, e.g.no pocket in the T-shirt, too large phone size for a pants pocket, comfort for an ongoing activity [4].These factors are variable throughout the day, and thus users change their positions in a day.This suggests that a context, on-body device position, has great potentials in improving the usability of a smartphone and the quality of sensordependent services, facilitating human-human communication, the reduction of unnecessary energy consumption, etc. [7].Note that the position is not an exact 3D coordinate, but the names of the parts of the body, clothes and items to carry the device during walking such as "inside a chest pocket", "inside a bag", and "calling (attaching to the ear)".
In this article, we propose a machine learning-based classifier and classification features to identify 17 storing positions of a smartphone on the body against a segment of data, i.e., window, obtained while a person is walking.The contribution of this article is summarized as follows: • Classification features suitable for classifying 17 classes are specified, in which we show a subsetbased feature evaluation is superior to a collection of individual "good" features.
• We show raw acceleration signal shows better classification performance than linear and vertical component of acceleration signals.
• A large scale user independent classification performance evaluation is presented, in which 70 persons provided acceleration signals of smartphone carrying during walking.
• The effect of heterogeneity of walking speed in training a classifier is evaluated, in which a training dataset with various speed can build more robust classifier than training with single, i.e., normal, speed data only.
The remaining part of this article is organized as follows.In Section II examines related work regarding on-body position sensing.Section III describes about dataset used in this study.The localization method is presented in Section IV, in which the notion of series is introduced, and classification features are presented.Section V shows experiments from various aspects including suitable features and classification model, the feature selection method, robustness of localization against unknown person, and effect of walking speed in training a classifier.Finally, in Section VI, we conclude the article.

II. RELATED WORK
On-body position sensing is getting attention to researchers in machine learning and ubiquitous computing communities [7], [25], [29].A research direction is on the type of a device which is actually realized or intended to be utilized in the future as wearable devices [18], [21], [29] or a smartphone [1], [5]- [7], [12], [14], [22], [25], [30].The type of a device relates to the selection of target positions.In the wearable device approach, the target positions range from the head to the ankle including fine-grained discrimination such as upper arm vs. forearm and shin vs. thigh [29].A device is usually attached firmly using a belt or a special mounting fixture.This indicates that the direction of the device might not change so irregularly within a specific activity in a frequent manner, given that small displacement might occur during activities [17].By contrast, a smartphone terminal is usually stored into containers such as the pockets of jacket, chest and trousers pockets and a wide variety of bags, as well as in a user's hand, hanging from the neck and on a table as surveyed in [4], [30].In this case, the degree of freedom of irregular movement in a large container, e.g., jacket pocket, handbag, would increase.Another aspect is the modality of sensing, in which an accelerometer is dominant due to its low power operation and the availability in most commercial smartphones and wearable devices.Incel [14] shows an extensive study on acceleration-based phone localization, in which recognition features are proposed that represent the movement, rotation and orientation of devices during diverse activities of a person such as walking, sitting and biking.Fujinami proposed 63 classifier-independent features for 9 on-body phone positions including bags during walking, which selected based on as what are more predictive of classes and less correlated with each other [7].Shi et al. [25], Alanezi et al. [1], and Incel [14] utilized a gyroscope in combination with an accelerometer.They reported that the combined approach slightly improved the accuracy [1], [14]; however, considering the power-hungry nature of a gyroscope [32], the improvement would not be the major reason for utilizing a gyroscope.
Regarding the evaluation method, n-fold cross validation is often utilized [1], [12], [18], [22], [25], [29], which utilizes (n-1)/n of dataset for training a classifier and 1/n for testing the classifier; it tends to result in good recognition performance because the training dataset may contain (n-1)/n of data from each person in theory, and hence the classifier "knows" about the subjects in advance.By contrast, Leave-One-Subject-Out (LOSO) cross validation is carried out by testing a dataset from a particular person with a classifier that is trained without a dataset from the person.So, LOSO-CV is regarded as a fairer and practical test method, and recently getting attention [1], [7], [14], [30].To validate the generalization of a classification model, the number of subjects is important, i.e., small number of subjects fail in capturing the characteristic of the population.Incel [14] carried out a performance evaluation using LOSO-CV against an integrated dataset from 35 persons in total; however, the number of persons varies between positions (35 persons for trouser pocket, 25 for backpack, 15 for hand and 10 for messenger bag, jacket, belt and wrist), and the average number is 15.6.Fujinami utilized LOSO-CV using dataset from 20 persons, in which data from 9 positions were equally collected [7].By contrast, we equally collect data from 17 positions on the body of 70 persons including three states in hands, i.e., swinging, talking on the phone, watching the screen from both left and right hands, as well as carrying items, i.e., bags, which is a unique aspect of our work.In existing work, the type of a bag is not clearly defined [30] or limited to a messenger bag [14], [30].We consider that the scale of experiment in this article, i.e., 17 positions of 70 persons, is the largest one in the literature.

A. Target Positions
We targeted 11 popular positions as shown in Fig. 1, among which both the left and right sides of three types of "hand", trousers front/back pockets, and jacket pockets were collected.Three type of "hand" correspond to calling, watching the screen in the portrait direction and swinging during walking.In total, 17 classes are defined and analyzed.

B. Sensor Modality
The three-axis accelerometer employed in this study is a primary sensor embedded into almost all of today's smart- phones.The signals can be used to characterize the movement patterns generated while a person is walking.Although the combination of an accelerometer and a gyroscope slightly improved the classification accuracy [1], [14], because a gyroscope is power-hungry sensor [32].Typical waveforms of the target classes are presented in Fig. 2.

C. Data Collection
In data collection, 70 subjects (53 male and 17 female, undergraduate or graduate students at the age of 20's) were recruited with a 2,000 yen equivalent worth of remuneration.The subjects carried Huawei Ascend P7 smartphone terminals running Android 4.4 in 2 to 5 positions simultaneously, and were asked to walk about 5 min per position in the campus of our university including straight ways and corners at three walking speeds, i.e., slow, normal and fast.The speeds could be chosen by the subjects themselves; however, the order of the trial in the walking speed was kept constant such that fast, normal and slow.The subjects may get tired as the experiments proceeds.So, we consider that it is preferable to start walking with fast speed.We collected raw acceleration signals from Android API at the speed of SENSOR_DELAY_FASTEST.The sampled data from Android sensor system are added to an internal queue of our data collection system and polled at 50 Hz.Note that the data on a phones being carried in trousers pockets covered four orientations: downward and upward and with the display surface facing towards and away from the body.

A. Overview
The localization is carried out to recognize a class of a position from the 17 candidate positions based on the similarity www.ijacsa.thesai.org, Fujinami [7], and Mannini, et al. [21], primarily recognizes the storing position of a smartphone while a person is walking.In this article, we assume that a segment representing a person is walking is already identified.

B. Signal Series and Axis
The term "series" indicates the type of basic time series data, which includes raw acceleration signal, linear acceleration component, and vertical acceleration component.As described in Section III-C, the raw acceleration signal is what is just obtained from accelerometer.In this section, we present the other two series.Here, the notation a s,a,i represents the ith sample in a window of a-axis in the s-signal series.The coordinates in the definition of the three series is illustrated in Fig. 3. 1) Linear Component: Linear acceleration is obtained by removing gravity components from the measured signals.Sophisticated linear acceleration signal estimation methods have been proposed by combining gyroscope and magnetometer [13]; however, we utilize only accelerometer for the same reason as the choice of an accelerometer as a modality of storing position recognition.We adopted the method proposed by Cho et al. [3], in which the gravity components are approximated by the mean of raw acceleration signals ( 1) in a window, and the linear components are obtained by subtracting the gravity component from the raw acceleration signals ( 2).Here, a raw is a vector of the mean raw acceleration signals of x, y and z axes in a window.Also, a linear and a raw indicate a vector of a sample of linear acceleration signal and raw acceleration signal in a window, respectively.
2) Vertical Component: The vertical component is obtained by decomposing the linear component based on the component of gravity in each axis (3) [13].
Here, cosθ is obtained based on the definition of inner product (•) as represented by (4).
Then, ( 3) is represented with the gravity and the linear components by assigning (4) as represented by (5).
In addition to the three axes, i.e., x, y and z, we introduce the magnitude of the three-axes signals (m) as the forth dimension for series s as shown in (6).

C. Recognition Features
Recognition features play very important role on determining the performance of a recognition system.In this section, we describe the definition of the candidates of features.The localization is carried out to recognize a class of a position from the 17 candidate positions based on the similarity of patterns of acceleration signals.The recognition process is carried out window-by-window, in which a window consists of a certain number of sampled acceleration signals.A feature vector is obtained per window, in which features are calculated against the three series of acceleration signals.
We take an approach of listing up candidates of features from literature [1], [7], [14], [30] and observation of waveforms (Fig. 2), and selecting relevant and non-redundant features based on a machine learning technique.We systematically calculate the candidates of features from a window of four-dimensional vector of raw acceleration signals by the combination of feature types and the axes.Totally, 326 features are obtained (72 types × 4 axes for individual axes, 5 (one for time domain and four for frequency domain) types × 6 (= 4 C 2 ) pairs for correlation-based features and two types × 4 (= 4 C 3 ) triples for features obtained from combination of three axes).Tables I, II, and III show the features calculated from the four axes individually, the features regarding the correlation of two axes, i.e., correlation coefficient, and the features representing the relationship among three axes, respectively.The feature selection is described in Section V-B.
Regarding the subscript L, M and H, the frequency spectrum is equally divided into three "frequency ranges", which correspond to 0.001-5.000Hz, 5.001-10.000Hz and 10.001-25.000Hz, respectively.In addition, the subscript all indicates the entire frequency range of 0.20-25.00Hz.Note that a feature maxSdev F is obtained in a way similar to "sliding window average"; a subwindow with 2.9 Hz range is created in an entire frequency spectrum to calculate standard deviation (sdev); the subwindow is slid by 0.1 Hz throughout the frequency spectrum; and the maximum sdev is found.maxSdevF F is the central frequency of a particular subwindow that gives maxSdev F .The size and sliding-width (0.1 Hz) of subwindow were heuristically determined.A feature calculated as the sum of squared values of frequency components is sumPower F (a.k.a "FFT energy" in [9]) [2].The FFT entropy (entropy F ) is then calculated as the normalized information entropy of FFT component values of acceleration signals, which represents the distribution of frequency components in the frequency domain [2].

V. EXPERIMENT
In this section, we describe experiments from various aspects.

A. Condition
The window size is set to 256 samples, i.e., 5.12 seconds, with the sliding of every 128 samples (overlapping 50 %).
Throughout the experiment, we utilized a machine learning toolkit Weka 3.7.13[20] running on Apple Mac Pro (3.5 GHz 6-Core Intel Xeon E5, 32 GB RAM, OS X El Capitan).Table IV summarizes average number of recognition instances, i.e., feature vectors, and standard deviation per person.B. Feature Selection 1) Methodology: Feature selection consists of three phases: feature subset evaluation, feature subset search, and series selection.As feature subset evaluation, we utilized a correlation-based feature selection (CFS) [11], which is called CfsSubsetEval in Weka.CFS has a heuristic evaluation function merit, which can specify subset of features that are highly correlated with classes, i.e., more predictive of classes, but uncorrelated with each other, i.e., more concise.As described in Section IV-C, a large number of features were listed up, which may contain redundant features.So, we consider that the capability of CFS is suitable for this problem.Formula (7) defines the heuristic merit M s of a feature subset S that contains k features, in which r cf is the mean feature-class correlation and r f f is the mean featurefeature inter-correlation.For more detail, please refer to [11].M S acts as a ranking on feature subsets in the search space of all possible feature subsets.Note that CFS is a classifierindependent method of feature selection.
To find the subset of features based on the CFS evaluation, we initially attempted to utilize the forward greedy stepwise search against entire feature set (GreedyStepwise in Weka).The method searches the best feature subsets, which begins with no features and greedily adds features one by one.However, the computation ended up with out of memory error.So, we needed to take another approach, which finds a subset with much smaller number of features than entire dataset, i.e., 326 features, with lightweight computation at first and applies the greedy stepwise search on the subset.As a lightweight computation of searching the space of feature subsets, we utilized, BestFirst in Weka, a greedy hill-climbing method augmented with a backtracking facility.Setting the number of consecutive non-improving samples allowed controlling the level of backtracking done.In this experiment, the number was set to five.The dataset of selected feature subset for each "series" is evaluated by 10 fold cross-validation (10 fold CV) to specify The number of crossing the mean value the best feature subset for later analysis.The RandomForest classifier with 10 trees is utilized as a base classifier for the cross-validation.The classification result is evaluated by Fmeasure.F-measure is a harmonic mean between recall and precision.F-measure for class i is defined by (8), which is averaged over 17 classes.The recall and precision for class i are represented by ( 9) and (10), where N correcti , N testedi , and N judgedi represent the number of cases correctly classified into class i , the number of test cases in class i , and the number of cases classified into class i , respectively, while i corresponds to either one of 17 classes.
2) Result: The BestFirst search filtered out about 70 features for each series as initial "meaningful" features.We then applied GreedyStepwise search against these features to understand the best combination in particular number of features.Fig. 4 shows the relationship between the number of features and the merit score M s .Here, the number of features increases in the order of adding to the feature subset.As shown in the figure, the increase of M s becomes saturated at a particular number of features.This indicates that the redundancy of features increased and/or the predictiveness of an added feature decreases after a particular number of features.The merit scores of "raw" series are larger than the other two series in almost all cases of the number of features.This suggests that "raw" series contains more predictive and less redundant than the features from the other two series and may performs best.

Result
The BestFirst search filtered out approximately 100 features from each series as "meaningful" features.We then applied a GreedyStepwise search to these features to understand the ranking of the features.Figure 4.1 shows the relationship between the number of features and the merit score M s .In this experiment, the number of features was increased in the order in which they were added to the feature subset.As can be seen, the increase in M s became saturated.This result indicates that the redundancy of features increased and/or the predictiveness of each added feature decreased after a particular number of features were added.We selected 40, 45 and 40 features for the "raw", "linear" and "vertical" data series, respectively, near the saturation points.These are summarized in Tables 4.1, 4.2, and 4.3, respectively.Note that, as shown in Figure 4.1, the merit scores of the "raw" series were larger than those of the other two series for almost any number of features.This result suggests that the features in the "raw" series were more predictive and less redundant than those in the other two.This characteristic is proven in Section 4.2.Table 4.4 summarizes the median ranks (upper row) and ratios of selected features (lower row) in the "raw" data series.At the point "single axis features vs. correlationbased features", features obtained from single axes seems better in the ranking than that of correlation-based ones, and the number of selected features of single axis features was smaller than that of correlation-based ones.The time-domain features of the signals contributed to their classification.In terms of the axis (or "axes" for multi-axis features), the y axis gave the smallest ranking number and was the most contributed axis, followed by the x axis.We utilized 40, 45 and 40 features for "raw", "linear" and "vertical" data series near the saturation points, respectively, in series selection.Table V summarizes average F-measures of the three series.As shown in the table, "raw" series performed the best in the three series using selected feature subsets.Table VI summarizes the selected features for "raw" data series.

C. Classifier Selection
To find the best classifier, we compare popular classifiers by taking into account not only correctness of classification, but also the computational load for running on a smartphone.

1) Methodology:
We carried out 10 fold CVs against Naïve Bayes, Multi-Layer Perceptron (MLP), J48 Tree and RandomForest classifiers using 40 features from raw dataset.We also measured the elapsed time to complete one fold of evaluation (test) that contains approximately 44,000 instances.Note that different numbers of trees in RandomForest were tested, i.e., 10, 50 and 100.The Support Vector Machines (SVM) has not been tested because it is parameter sensitive.
2) Result and Analysis: Fig. 5 shows the average Fmeasure of the classifiers and elapsed time for testing dataset per one fold.As shown in the figure, three types of Random-Forest performed better than the others.Paired t-tests showed significant difference between RandomForest with 10 trees and Naïve Bayes, MLP, and J48 with p<.05 (t(9)=224.80,t(9)=94.83,and t(9)=45.99,respectively).Also, the performance gets better as the number of threes in creased from 0.980 to 0.987.To determine the number of trees, it is important to consider the trade-off between the classification performance and processing workload.As shown in Fig. 5, the number of trees in RandomForest influences the processing speed because of the nature of the algorithm.Although paired t-tests showed that RandomForest with 10 trees was significantly lower in F-measure than RandomForest with 50 and 100 trees with p<.05 (t(9)=-44.03and t(9)=-47,19, respectively), we took the number of trees 10 for the following experiments by taking the processing speed in processing on the smartphone.Hereinafter, RandomForest with 10 trees is utilized as a classifier in this article.

D. Feature Subset Evaluation vs. Individual Feature Collection
In Section V-B, 40 features were selected using CFS, which allows us to find subset of features that are more predictive of classes yet less correlated with each other.In this section, we evaluate the subset evaluation approach by comparing a collection of individual "good" features.

1) Methodology:
The contribution of each feature is evaluated based on information gain (IG).IG is commonly used in feature selection, where the gain of information provided by a particular feature is calculated by subtracting a conditional entropy with that feature from the entropy under random guess [31].So, the more informative feature has the higher IG.
After specifying the same number of features as those obtained by CFS method, i.e., 40 features, 10 fold CVs are performed against these two feature subsets, and F-measures are compared.

2) Result and Analysis: Table VII summarizes top-40 informative features based on IG. The features derived from x-axis
show their effectiveness by appearing 7 in top-10 features.The table also shows the order of adding to the subset of 40 features obtained by CFS, which shows that individually informative feature, i.e., high IG, is not always selected in early stage (or not at all) of adding to CFS-based feature subset, i.e., low CFS value.This is natural because CFS is designed to take into account the redundancy among features and find the best combination of features, while IG is used to represent the informativeness of individual features.
Regarding the classification performance, the F-measures obtained from classifiers trained by IG-based features and CFS-based features are 0.967 and 0.980, respectively, and CFS-based feature subset is significantly contributive in classification compared to IG-based one (t(9)=83.06,p<.05).Therefore, suppose that the same number of features is utilized, we consider that the approach of feature subset evaluation was effective in building better classifier than collecting individual features with good evaluation results.

E. Recognition against Unknown Person
As described in Section II, LOSO-CV is regarded as a fairer and more practical test method under a condition in which individual difference exists.In this section, we apply LOSO-CV to 70 subjects, which we consider the largest case in onbody smartphone localization.

1) Methodology:
The dataset from one subject is treated as a test set, while the dataset from remaining 69 subjects are utilized for training a classifier.The train-and-test process is iterated 70 times.
2) Result and Analysis: Table VIII shows the detail of the classification result in the form of confusion matrix.The average F-measure in the classification of 17 classes against 70 subjects is 0.823.Although the value decreased by 0.157 from the one by 10 fold-CV, we consider that the performance is rather good given that there are 17 classes.Especially, by taking into account that no data from person for testing are included in the training data, it is surprising that left and right sides in "hand call" and "hand swing" were separated with very high F-measure (>0.93), in which clear differences in xaxis are observed as shown in Fig. 2. "Neck" also has high F-measure (0.932).A smartphone hanging from the neck is hit by the user's body as he/she walks forward, which causes strong impact on z-axis (Fig. 2(a)).However, as shown in Table VIII, the discriminations of left and right sides of "hand front", "trousers back", and "trousers front" are often confused with each other.As shown in Fig. 2, less differences are observed in the left and right sides of theses classes than the successful cases.Also, the confusion within "bags" is slightly observed.So, we merged the left and the right sides into one class, e.g., "hand call left" and "hand call right" → "hand call", against "hand call", "hand front" and "hand swing".Also, three types of bags are merged into one "bag" class.The result of the merging is summarized in Table IX, in which the mean F-measure is 0.913 (increased by 0.090 from original 17 classes).Furthermore, three subclasses of hand and two subclasses of trousers pockets, i.e., front and back, are merged into single class "hand" and "trousers pocket", respectively, resulting in six class classification, which is shown in Table X.As shown in these tables, merging of multiple classes into a single class increases the performance metrics.Application designers should consider the required resolution, i.e., the level of detail of position recognition, for their target applications.

F. Effect of Various Walking Speed in Classifier Training
As described in Section III-C, we collected data with three walking speeds based on the decision of the subjects.The above experiments were carried out with dataset that contains all walking speeds.Training a classifier with single, i.e., "normal", speed is easy for the participants in data collection; however, it may sacrifice the robustness against different speed.Data collection process can be simplified if no difference exists in the robustness between classifiers modeled with heterogeneous speed and single speed.In this section, we explore the effect of walking speed in classifier training.

1) Methodology:
The experiment follows LOSO-CV principle with a slight difference in walking speed between training and test datasets.More specifically, two classifiers for 17 class classification are trained using 1) dataset that contains all speeds and 2) dataset with only "normal" speed, in which training a classifier with "normal" speed is a traditional approach.Here, a dataset obtained from a test subject is excluded from the training dataset.Meanwhile, the dataset for test is either "slow", "normal", and "fast" speed.For example, a combination of "normal" speed for training with "fast" speed for testing represents a case where a person is walking faster than what the classifier knows.
In training classifiers with three walking speeds, we reduced the size of dataset to 1/3 so that it can become similar size to that of "normal" speed to avoid the bias of the number of training instances.Actually, three sets of 1/3 sampled dataset are applied, and F-measures are averaged.Regarding the classification features for training with "normal" speed, we selected dedicated ones in the same way as with all speeds (Section V-B) because we consider that suitable set of features can be different from each other due to the variation of walking speed in "all speed" case.
2) Result and Analysis: Table XI summarizes average Fmeasures in different combinations of walking speed in training and test datasets.Paired t-tests regarding the heterogeneity in training datasets showed that using three walking speeds performed better classification than using single, i.e., "normal", speed (p<.05) in all cases of walking speeds in test datasets (t(69)=-2.34,t(69)=2.64,and t(69)=5.30for "slow", "normal", and "fast", respectively).The result shows that, in building a classifier, heterogeneity of walking speed is important for robust classifier.

VI. CONCLUSION
In this article, we proposed a machine learning-based classifier and classification features to identify 17 states of a smartphone while the user is walking.A large-scale data collection from 70 persons were carried out with three different walking speeds to evaluate the effect of heterogeneity of walking speed in training a classifier.The following results were obtained: • In feature calculation, we introduced three series of acceleration signals, raw, linear, and vertical components, in which the raw acceleration series showed the highest classification performance in the three series.
• 40 features in the raw series were selected from 326 candidates features based on correlation-based feature subset evaluation.The comparison with a subset by collecting individually informative features based on information gain showed that the subset evaluation method was superior to the collection-based method with the same number of features.
• Person-independent evaluation (LOSO-CV) showed that an average F-measure of 17 class classification was 0.823, while 9 class classification by merging left and right sides into one class showed an average Fmeasure of 0.913.
• Comparison of the heterogeneity of walking speeds in training dataset showed that the classifier built from various walking speed allowed us to realize more robust classifier than using a classifier with a single walking speed (normal speed).
We consider that the F-measure of 0.824 for 17 class classification has still room for improvement by using suitable classifier to address "classifier compatibility" issue as suggested in [7].
In addition, the classification in the experiment was carried out against a window, which means that decisions of successive windows may differ due to mis-classification.For practical recognition, we will investigate temporal smoothing techniques to smoothen such "discontinuity" of recognition results.We have already developed a mechanism to identify a segment of walking, to which these future investigation will be integrated.

Figure 4 . 1 :
Figure 4.1: The relationship between number of features and subset merit score.

Fig. 4 .
Fig. 4. Relationship between the size of feature subset and merit score of the subset (partially).

TABLE I .
CLASSIFICATION FEATURES (x, y AND z AXES AND THE MAGNITUDE (m) OF THE THREE AXES) Absolute value of mean T , i.e., |mean T | absM ax T Absolute value of max T , i.e., |max T | absM in T Absolute value of min T , i.e., |min T | meanAbsD T Averaged absolute value of successive value's difference, 4 smallest value) of time-series data 3 rd Q T 3 rd quartile (3/4 smallest value) of time-series data IQR F,{all|L|M |H} 1 st quartile (1/4 smallest) frequency spectrum 3 rd Q F,{all|L|M |H} 3 rd quartile (3/4 smallest) frequency spectrum IQR F,{all|L|M |H} Inter-quartile range of frequency spectrum, i.e., 3 rd Q F − 1 st Q F 1 st QF F,{all|L|M |H} Frequency that gives 1 st Q F 3 rd QF F,{all|L|M |H} Frequency that gives 3 rd Q F var F,{all|L|M |H} Variance in the low-frequency range maxSdev F,{all|L|M |H} Maximum standard deviation in subwindows in frequency spectrum maxSdevF F,{all|L|M |H} Central frequency of subwindow that gives maxSdev F | 2 , where Cep i is the i-th element of cepstrum coefficient T Inter-quartile range of time-series data, i.e.,3 rd Q T − 1 st Q T energy F,{all|L|M |H}Sum of energy spectrum, i.e., i × log 2 p i , wherep i = f i 2 / FROM THREE AXES (i, j, k ∈ {x, y, z, m}, i = j = k)Name Description max3axes T Max of the max of 3 out of 4 axes, i.e., max(max i,T , max j,T , max k,T ) min3axes TMin of the min of 3 out of 4 axes, i.e., min(min i,T , min j,T , min k,T )

TABLE VI .
SELECTED FEATURES FOR "RAW" DATA SERIES."#" REPRESENTS THE ORDER OF ADDING TO THE FEATURE SUBSET, WHILE Ms INDICATES THE MERIT SCORE OF THE SUBSET

TABLE VII .
TOP-40 INFORMATIVE FEATURES BASED ON INFORMATION GAIN FEATURE EVALUATION.THE COLUMN "CFS" INDICATES THE ORDER OF ADDING TO THE FEATURE SUBSET AS SHOWN IN TABLE VI, IN WHICH "-" REPRESENTS THAT THE FEATURE IS NOT INCLUDED IN THE SUBSET OF 40 FEATURES OBTAINED BY CFS