Improving Accelerometer-Based Activity Recognition by Using Ensemble of Classifiers

In line with the increasing use of sensors and health application, there are huge efforts on processing of collected data to extract valuable information such as accelerometer data. This study will propose activity recognition model aim to detect the activities by employing ensemble of classifiers techniques using the Wireless Sensor Data Mining (WISDM). The model will recognize six activities namely walking, jogging, upstairs, downstairs, sitting, and standing. Many experiments are conducted to determine the best classifier combination for activity recognition. An improvement is observed in the performance when the classifiers are combined than when used individually. An ensemble model is built using AdaBoost in combination with decision tree algorithm C4.5. The model effectively enhances the performance with an accuracy level of 94.04 %. Keywords—Activity Recognition; Sensors; Smart phones; accelerometer data; Data mining; Ensemble


INTRODUCTION
Health applications utilizing the built-in sensors in smartphones or those that are wearable are considered as system to simplify healthcare services such as monitoring.It is an efficient and innovative way to deliver healthcare to patients for improving healthcare outcomes and quality of life.There is a huge increase in the use of such technology.As a consequence, there is an increase in the generated data as well.In terms of health informatics, these data have received the greatest attention in various research areas such as diagnosis, decision making, and prediction.Sensed data need to be processed, analysed, and mined to derive valuable knowledge.In an attempt to address this need, classification techniques offer most capabilities need to identify physical activities by using accelerometer data [1,5,14].Activity recognition is used for different purposes for a patient such as monitoring of chronic diseases, as well as fitness and wellness [8].
Despite the amount of research in activity recognition, enhancement for more accurate detection is a challenge in activity recognition problem.There is a recent advance in combining multiple classification techniques known as an ensemble of classifiers.In order to find the best combination, the best result is selected based on several experiments and using different evaluation criteria.Thus, the goal of this paper is to improve the overall performance and increase the ability to deal with more complex activities by applying ensemble of classifiers technique to improve the accuracy of recognizing various activities, as compared with other classification algorithms individually [1].An investigation performed by Weiss and Lockhart showed that the performance of the personal model is higher than impersonal and hybrid model.Furthermore, the best algorithm that provided high performance of the personal model is MLP and Random Forests (RF) for impersonal model [4].Lockhart and Weiss reviewed 34 AR papers; they observe many issues related to the datasets.Some issues could be found in datasets in terms of the number of subjects.They lack information about the type of developed model which is important in evaluating the performance [7].
The purpose of this study is to build activity recognition model to detect the activities by using an ensemble of classifiers technique.In this study, AdaBoost, meta classifier, is used in combination with C4.5, decision tree algorithm, for activity recognition.
The rest of the study is organized as follows: Section 2 presents the work of related activity recognition models.Section 3 describes the model development process.Section 4 presents result and Section 5 discusses results.Finally, Section 6 presents conclusion of the study.

II. RELATED WORK
In line with the increasing usage of sensors and health applications, there is a tendency on collecting the sensor data to extract valuable knowledge.Till now, there are few applications for the activity recognition (AR), Lockhart, et al. recognized some AR applications such as health monitoring, self-managing systems, and fitness tracking [8].
Several studies applied data mining techniques to classify accelerometer sensor data to predict human physical activities.The summary of some articles reviewed is shown in Table 1.Kwapisz, et al. utilized the accelerometers in smartphones to design a system aimed at recognizing various activities.They applied three different algorithms, which are C4.5 decision tree, Logistic Regression, Multi-Layer Perceptron (MLP), on data collected from 29 users using 43 features.They reached an accuracy of 90% using MLP algorithm [6].Catal, et al. conducted study based on Kwapisz, et al. study [6] and proposed model by using ensemble techniques of combing three classification algorithms, namely C4.5 decision tree, Multi-Layer Perceptrons (MLP) and Logistic Regression.They used the voting technique.They collected data from 36 users.The result showed that the performance of the proposed www.ijacsa.thesai.orgmodel is higher compared with applying the classification algorithms individually.
The model built by Bayat, et al., using six activities, achieved 91.15% accuracy.Moreover, a combination of three classification algorithms applied for the phone's potions, either in-hand or in-pocket.Based on several experiments that performed in this study, the best reported combinations that provided a high performance are MP, LogitBoost, SVM for in-hand position (91.15%) and MP, Random Forest, SimpleLogistic for in-pocket position (90.34%)[1].While Wang, et al. achieved 94.8% accuracy for proposed algorithm which applied on Hidden Markov Model (HMM) [5].Kwon et al. used suggested unsupervised learning algorithms.In this study, knowing the number of activities led to proper use of Gaussian method.Additionally, selecting K Calinski-Harabasz index achieved 90% accuracy [16].Ayu et al. focused on the performance of the activity recognition model and the affection of the phone potion.To achieve this, they use machine learning algorithms and reach the highest performance of hand palm's position by IBk algorithm.For shirt pocket's position, Rotation Forest was the best algorithm [11].Gao et al. investigated AR problem by using multiple sensors.The reported result was >=96.4% accuracy for ANN, decision tree and KNN which is better than the better performance by using Naïve Bayes, and SVM algorithms.Although the decision tree approach achieved the second accuracy rate, but it considered the best because training and test time consuming was less [9].Hong, et al. suggested use three accelerometers in addition to RFID technology to build a model.The model with two accelerometers was able to classify the activities using decision tree with 95% accuracy.They have drawn an attention to utilize the smartphones to develop models similar to the suggested one without extra devices [17].
Recent studies motivated the use of meta algorithms such as AdaBoost, bagging and vote, which have the capability to combine one or more classifier.Dalton and O´ Laighin compared between basic and meta algorithms to find a better algorithm in terms performance, reliable and appropriate position of the sensors.The study aimed to recognize physical activities to develop monitoring system remotely.The accuracy for three highest basic algorithms was 89%, 86%, 83% for C4.5 graft, SVM and BayesNET, respectively.On the other hand, the accuracy of three meta algorithms is 95%, 92% and 91% for AdaBoostM1 with C4.5 Graft, Multiboost with AdaBoostM1 combined with C4.5 and AdaBoostM1 with SVM, respectively.The main remark from the study is the power of meta algorithms specifically AdaBoost which reached higher performance than basic algorithms [3].Gupta and Kumar applied various algorithms to predict activities using data collected from a smartphone.The model built using AdaBoost, C4.5, Random Forest and Support vector machines (SVM).The activities classified with an accuracy level above 90% using four selected algorithms.The AdaBoost and C4.5 algorithms achieved an accuracy of 98.83% and 96.75%, respectively [13].Wu and Song [15] used Random forest and AdaBoost to develop a model to classify activities on smart phones.They compared the result of both models and found that AdaBoost model is better performance than Random Forest model.The error rates of models were 1.10% for AdaBoost and 1.65% for Random Forest in addition to the lower time of AdaBoost model.
There are many researches focused on monitoring in healthcare by using data that generated from numerous monitoring devices.Advancements in activity recognition have demonstrated potential application in healthcare such as monitoring.Utilizing such systems and devices can improve quality of life for patients with different conditions.Massé et al. utilized stroke patients' information that generated from sensor system such as accelerometers and gyroscopes to develop activity monitoring system.As part of the system, classifier algorithms used to recognize the daily activities (standing, walking, sitting, lying) and barometric pressure to differentiate body elevation.For the purpose of improving the performance of the system, they experimented many classification algorithms and gain 82.5 %, 81.6 %, 87.1%, 85.6 %, for CCR , Naïve Bayes, Random Forest and K-Nearest-Neighbors, respectively [12].Similarly, diabetes patients need to monitor their activities for a better lifestyle.Luštrek, et al. proposed using sensor data from smartphone to recognize activity for diabetes patients.Nine algorithms have been used in Weka, the classification accuracy was 88% [10].

III. METHODOLOGY
The study proposed activity recognition model by an ensemble of classifiers techniques, it aims to detect the human activities.The Wireless Sensor Data Mining (WISDM), which is publicly available on http://www.cis.fordham.edu/wisdm/dataset.php, is used in this study.This data is obtained from the transformation of time series accelerometer sensor data from smartphones during experiments of 36 people.It includes 46 features and label class.In the dataset, there are 5418 instances for six activities which are walking, jogging, upstairs, downstairs, sitting, and standing.WEKA software used to build the model using AdaBoost ensemble approach.According to previous studies, AdaBoost used effectively to enhance performance for activity recognition in combining with other classification algorithm.Several experiments were conducted by using AdaBoost in combination with C4.5 (decision tree) MLP (artificial neural network), Logistic algorithms.The three classifiers used in this study were decided due to the high performance achieved by those algorithms in previous studies.During experiments, 10-fold cross-validation (CV) approach was used.The confusion matrix presented the result of all experiments and performance compared among different parameters which are true positive (TP), false positive (FP), precision, recall, area under ROC Curve (AUC) and F-measure.Parameters employed as measure method to evaluate the model are as follows:  True positive (TP): These are activities that correctly predicted.
 False positive (FP): These are activities that not predicted incorrectly.
 Precision: how often the prediction is correct.
 Recall: The number of correct activities predicted divided by the number of activities that should be predicted.
 Area under ROC Curve (AUC): The larger AUC indicates a high correct prediction and low incorrect prediction for activities.
 F-measure: it measures the accuracy of the test by a weighted harmonic average of precision and recall.
Furthermore, the experiments were repeated using different iteration numbers.NumIterations is one of the Adaboost algorithm parameters that determines the number of models that will be used in the decision step.Ensemble AdaBoost -C4.5 model re-build, repeatedly with altering iteration numbers from 10 to 100.The aim of this additional step is to enhance the performance of the selected combination of classifiers.The following section presents the results of the mentioned parts.

IV. RESULTS
The result of experiments confirms that AdaBoost used effectively to recognize activities in addition to power of C4.5 algorithm.Based on the height results of related work, AdaBoost selected and combined with each of the three algorithms which are C4.5, Logistic, Multi-Layer Perceptron (MLP).The performance achieved was over 90% most times but the best performance was achieved by combing AdaBoost with C4.5.It started from 94.034 % using default sitting (ten iteration numbers).Fig. 1 shows the overall performance of proposed models that reached during experiments.
The performance for each classifier is individually calculated and presented to demonstrate the affectivity of ensemble classifiers.The overall performance is 89.46%, 84.94%, 92.65 for C4.5, Logistic, Multi-Layer Perceptron (MLP), respectively.The confusion matrix for each algorithm alone is shown in Tables 2 to 5. Table 5 presents the confusion matrix of proposed AdaBoost-C4.5 model with default sitting 10 iterations.The new model achieved 94.04% which is the www.ijacsa.thesai.orghighest compared with standalone classifiers or other classifiers combination.In terms of Adaboost parameters, different values have been set to iteration number and reached our goal to improve the performance.The experiments repeated using different iteration numbers indicate a significant improvement in the performance as shown in Figure 2.
Table 6 also presents the confusion matrix of the proposed AdaBoost-C4.5 model that used 80 iterations to compare the results.Clearly, the improvement reflected on all parameters such as false positive rate, it decreased until 0.9%, which indicates reduced in a number of instances that were classified incorrectly.www.ijacsa.thesai.orgV. DISCUSSION In this study, an improvement is observed in the performance when combine classifiers than use them individually.C4.5 was the most effective classifiers although Multi-Layer Perceptron (MLP) achieved better accuracy alone, but it is not effective one to combine with AdaBoost.Also, Multi-Layer Perceptron (MLP) and C4.5 alone are slightly better than AdaBoost model for standing activity.Moreover, The C4.5 algorithm classified 97.56% of instances correctly compared to AdaBoost model 94.04%.
A comparison between the vote model proposed by Catal et al. study and the proposed model in this study is performed.As a result of the comparison, the proposed AdaBoost-C4.5 ensemble model achieved higher overall performance 94.04 % than vote model 93.47%.In addition to the shorter calculation time consumed by AdaBoost model.As mentioned above, rebuilding the model using different iteration number led to improve the performance.In fact, Adaboost build a model per iteration.As number of models increases the area under ROC Curve (AUC) also increases, although the prediction confidence slightly decreases.The possibility of recovering false negative will increase and classifying the new samples will be more accurate.The result showed improvement among various parameters as summarized as shows in Table 7. Increasing values of different parameters, except FP rate, indicates a better classification.According to the confusion matrix of Ababoost model, there is improvement in the performance of Downstairs activity reflected in true positive (81.1%) value and F measure measurements (98.8%).Furthermore, The results of walking and jogging activities were high due to the large number of instances for both activities compared to the others.In other hand, the lowest results were observed for upstairs and downstairs activities due to the difficulty in differentiating between them.However, performance improvement observed in the downstairs activity using AdaBoost -C4.5 ensemble.

A. Conclusion
Mining data collected from sensors provides valuable result in the activity recognition area.The improvement in performance is a requirement especially in the health field where such results are used to develop various health systems www.ijacsa.thesai.orgrelated to patient's lifestyle.The spread of smartphones made desirable data existing with huge volume.This increases opportunity in the data mining research area.
In this study, AdaBoost-C4.5 ensemble model is proposed using public data to recognize physical activities.The result shows a significant improvement in performance using meta classifiers instead of basic classifiers individually.Proposed model has an accuracy level starting from 94.034%.

B. Future work
The improved results motivate to conduct more studies in this field.Other combinations (meta and basic) and different machine learning methods can be used.The proposed models can be applied on different datasets to recognize more and complex activities.

Fig. 2 .
Fig. 2. the performance of the model using different iterations number

TABLE I .
THE SUMMARY OF SOME ARTICLES REVIEWED 95% www.ijacsa.thesai.org

TABLE VI .
COMPARISON OF MODELS AMONG VARIOUS PARAMETERS