Physical Activity Identification using Supervised Machine Learning and based on Pulse Rate

Physical activity is one of the key components for elderly in order to be actively ageing. Pulse rate is a convenient physiological parameter to identify elderly’s physical activity since it increases with activity and decreases with rest. However, analysis and classification of pulse rate is often difficult due to personal variation during activity. This paper proposed a CaseBased Reasoning (CBR) approach to identify physical activity of elderly based on pulse rate. The proposed CBR approach has been compared with the two popular classification techniques, i.e. Support Vector Machine (SVM) and Neural Network (NN). The comparison has been conducted through an empirical experimental study where three experiments with 192 pulse rate measurement data are used. The experiment result shows that the proposed CBR approach outperforms the other two methods. Finally, the CBR approach identifies physical activity of elderly 84% accurately based on pulse rate. Keywords—Physical activity; Elderly; Pulse rate; Case-based Reasoning (CBR); Support Vector Machine (SVM) and Neural Network (NN)


I. INTRODUCTION
Physical activity or moderate exercise is one of the key components for elderly in order to stay active and maintain a longer life.Research has shown that exercise brings a greater benefit for physical capacity.That means, the elderly who are physically active considering a moderate level of exercise promote improved health than over the elderly who do not exercise but are "physically active" (in motion) throughout the day [15].Similarly, by doing physical exercise one can control or manage the chronic diseases such as heart disease, stroke, diabetes [16], [17] and also for mental health [18].Today, most common way of measuring the physical activity is by using accelerometer technology i.e. motion sensors are mounted on the users' wrists, waist, and ankles [19], [20], [21].However, using accelerometers, physical activity can be measured while persons are performing regular household work but composite activities (i.e.exercise) such as running or playing tennis is still a challenge and the exact placements of sensor is also an issue while considering its potential sensitivity [19].Similarly, concurrent and overlapping activities are also not easy to measure based on accelerometer signals, such as brushing teeth and walking [22].In addition, the measurements using an accelerometer do not provide indications of an individual's biomedical signals such as pulse rate or heart rate.Pulse rate has an effect with physical activity i.e. pulse rate increases with activity and vice versa [23] [24] [40] [41].However, analysis and classification of pulse rate for a specific person is often difficult due to large individual variation while doing exercise.For example, pulse rate can be fluctuated between 65 and 90 for one person and for other it can be between 90 and 110.Again, pulse rate can be increased and decreased sharply for one person and for other person it can be changed steadily.So, it is not so easy to classify and identify physical activity using only on a simple threshold or any general rules.Thus, the application domain requires a machine learning algorithm that can identify personalized elderly's physical activity based on pulse rate data.
Today, supervised machine learning algorithm is a hot topic in Artificial Intelligence (AI) and commonly applied to classify physiological sensor signals data.The main goal of the supervised machine learning algorithm is to build a model based on a set of training samples paired with the corresponding labels of those samples.This model is then used to assign class labels on a set of testing samples where the class label is unknown.[11] [12].However, to identification of appropriate algorithm for a particular classification problem is one of the main challenges.Selection of the appropriate algorithm can be done through an analysis of the application domain, but there is always an unsolved question while considering empirical experiment.Recent research also shows that many researcher have been conducted some empirical experiment on different supervised machine learning algorithms before they select and propose the best algorithm which fit well in their particular application domain [13][14][39].This paper presents an application of a supervised machine learning algorithm in order to classify physical activity of elderly.To handle the complex data processing, first a feature extraction and selection are performed on the raw data.The raw data is labelled according to the control of the measurements.The extracted features together with their class labels are then applied in three supervised machine learning algorithm to classify physical activity.The algorithms are: 1) Case-based Reasoning (CBR), 2) Support Vector Machine (SVM) and 3) Neural Network (NN).The classification accuracy has been evaluated considering the three experimental data sets with 192 controlled pulse rate measurements data.According to the experiment result, CBR approach has been selected to identify physical activity of elderly based on pulse rate as it outperforms the popular SVM and NN techniques.Here, the obtained sensitivity, specificity and overall accuracy for the CBR approach were 87%, 85% and 86% respectively.Finally, 12 measurements with www.ijacsa.thesai.orgunknown class are evaluated by applying the proposed CBR approach to identify physical activity of elderly based on pulse rate.According to the experimental work, the CBR system was able to identify the physical activity 84% accurately.
The rest of the paper is organized as follows: Section 2 presents the materials that are the data collection procedure and the feature extraction from the sensor signal.Section 3, describes the approach and methods, and namely presents three supervised machine learning algorithms.The experimental work is presented in section 4, here; the comparison between the three supervised machine learning algorithms is presented.Section 5 discusses the evaluation result of the identification of physical activity using CBR approach.Finally, Section 6 ends with the summary and concluding the remarks.

II. MATERIALS
To classify physical activity using supervised machine learning algorithm elderlies' pulse rate has been used as a physiological measurement.Several features have been extracted from each pulse rate sensor signal before the classification scheme.Each signal has been labelled according to the procedure of the data collection.

A. Data Collection
Pulse rate measurement data were collected using a wearable sensor called Wristox 2 .Wristox 2 is simple and can easily be integrated on the body which offers a continuous data through Bluetooth communication.In total, 192 pulse rate measurements were collected from 24 elderlies where a 'three phases' procedure is applied in order to collect data and the procedure is adapted from previous study [25].Example of pulse rate changes between activity and relax state.
Each subject is asked to place the sensor (i.e.WristOx2 sensor) in their finger and follows the phases.The phases are: 1) deep breath: inhaling with nose and exhaling with mouth, 2) physical activity: walk briskly or running and 3) relax: sit down and try to be relaxed.Thus the 192 measurements of pulse rate are controlled and labelled with the names of phases.Finally, each class (i.e.deep breath, physical activity and relax) contains 64 pulse rate measurements.An example of pulse rate measurement collected from two subjects is presented in Fig 1 .As can be seen from Fig 1, the pulse rate is changing between resting and doing exercise.It can be also observed that the changes are very personal, subject 1 (blue colour) has steady changes and subject 2 (red colour) has sharp changes while they are doing exercise.Moreover, the pulse rate of subject one lies between 70 and 90 bpm whereas, for subject 2 it lies between 65 and 126 bpm.

B. Features Extraction
The feature extraction from the sensor signal and feature selection has been conducted with close collaboration of domain expert.Here, time, frequency and time-frequency domains have considered to extracted features.In the time domain, statistical features namely maximum value, arithmetic mean and standard deviation of data were considered.To calculate frequency domain features, first the power spectral density was calculated from squared amplitude of Discrete Fourier Transform value of data using Fast Fourier Transform algorithm and scaling it to sampling frequency range which is 1 Hz in this case.Zero padding of data was done so that number of data samples became power of two for applying Fast Fourier Transform algorithm.From the power spectral density Low frequency power, High frequency power, Low frequency power to High frequency power ratio, were calculated [26].The frequency between 0.04 Hz and 0.15 Hz was considered as Low frequency and frequency between 0.15 Hz and 0.4 Hz was considered as High frequency [26].The power in High and Low frequency region was calculated by numerical integration of Power Spectral Density of the corresponding frequency range.The unit of power spectrum density and power for the pulse rate were BPM (beats per minute) Hz-1 and BPM2 respectively.Similarly, in timefrequency domain features, a discrete wavelet transform (DWT) is performed since it can keep the information of both time and frequency.Statistical features of the maximum, arithmetic mean and standard deviations were calculated from the approximation coefficient of wavelet decomposition of level 1 [27].The function 'Daubechies 2' was used as the mother wavelet.The continuous wavelet transform linked to mother wavelet ) (t  can be defined by the equation 1.
where ) (t y is any square integral function and a, b are scaling and translation parameters respectively.Evaluating the continuous wavelet at dyadic interval the signal can be expressed by the equation 2.
Symmetric padding was used to make the data samples power of two to implement discrete wavelet transform [27].Thus, 9 features from the pulse rate sensor signal are calculated considering the three domains and each measurement is classified according to the condition of the data collection procedure that is the measurements are labelled as deep breath, physical activity and relax.www.ijacsa.thesai.org

III. APPROACH AND METHODS
To identify physical activity of elderly, three supervised machine learning algorithms have been used to classify elderlies' pulse rate measurements.The approach of the classification scheme is presented in Fig 2 .Here, pulse rate measurements come from a Wristox2 sensor and the measurements have been applied as an input of the classification scheme.General approch of classification scheme of pulse rate.
Each measurement is then prepossessed and handled artifact.Here, erroneous data values caused by a loose collection are identified and replaced by the previous samples.The clean measurements are then used to extract a set of various features considering time, frequency and timefrequency domains discussed earlier.The extracted features are sent into three machine learning algorithms where each of them classifies separately.Finally, the classification as output is used for experimental work.

A. Case-based Reasoning (CBR)
Learning from past experience and solve new problems by adapting similar previously solved cases is a cognitive model based on how humans often solve a large group of problems.A requirement is that the similarity of the case also indicates how easy the solution can be adapted to the current situation and reused.CBR cycle.Adapted from [28] A case-based reasoning (CBR) [28], [29] approach can work in a way close to human reasoning e.g.solves a new problem applying previous experiences, which is more common for doctors, clinicians or engineers.CBR has been applied successfully when the domain theory is not clear enough or even incomplete.It is getting increasing attention from the medical domain since it is a reasoning process that also is medically accepted [30] [44].For example, a clinician/doctor may start his/her practice with some initial experience (solved cases), then try to utilize this past experience to solve a new problem and simultaneously increases his/her case base.So, this method is getting increasing attention from the medical domain since it is a reasoning process that also is medically accepted.Aamodt and Plaza has introduced a life cycle of CBR [28] with four main steps as shown in Fig. 3. Retrieve, Reuse, Revise and Retain present key tasks to implement such kind of cognitive model.In the retrieval step, for any new problem the system tries to retrieve the most similar case(s) by matching previous cases from a case base.If it finds any suitable case that is close to a current problem then the solution is reused (after some adaptation and revision if necessary).
Similarity of a feature value between two cases (i.e. a target case and one case from library) was measured using the normalized Manhattan distance between the feature values of two cases.An example of case base is show in Table I, where a new case is being matched with Cases 1 and 2 from the case base by using a function illustrated in equation 3. Similarity between two cases is then measured using the weighted average of all the features that are to be considered.The function for calculating similarity between two cases T and S with n features is presented in equation 4, where w i is the weight of the feature i defined by expert of the domain.Note that, in the weight vector w i is also considered the weight of three domains (i.e.time, frequency and time-frequency features).
Non numeric features such as gender is converted to numeric value by substituting the contextual value with a numeric one (1 for male, 0 for female).The Manhattan distance function is used to calculate the similarity of a feature between two cases.The function is illustrated in equation 4 where T i and S i are the i-th feature values of target and source case respectively.
Here, Max (i) and Min(i) represents the Maximum and Minimum value of the feature i obtained from whole case library.Then "max" and "min" functions compare the values between the new case feature T i and Maximum and Minimum  Training and testing steps of activity classification using SVM.
In linearly separable data, the SVM works based on a distance value between the hyperplane and the two data classes.However, to handle non-linearly separable data, a kernel function [5] could apply where it is used to map nonlinearly separable data onto a feature space for classification.Some popular kernel functions that commonly used are linear, polynomial, RBF and sigmoid [6].However, when there are more than 2 classes (i.e. in our study it is 3 classes), one of the common approach is to used one-versus-all classifiers (also known as "one-versus-rest"), where the target class is determined by choosing the class that is selected by the largest classifiers, that is, let k y ,....,

C. Neural Network (NN)
Neural Network (NN), is another supervised machine learning inspired by biological neural network, is widely used to model complex relationships between inputs and outputs [38] NN is a network system with many simple processors where the processing elements are referred to as units, nodes, or neurons.These neurons are interconnected and it receives process and transmits numeric data via the connections [7] [8].A NN works based on creating a model by training its network where it is used a set of examples samples that contains the input and its corresponding known target output.Same time, it learns by comparing the network output and target output and makes adjustments on the weights (connections between neurons) in order to move the network outputs closer to the targets.Here, to identify physical activity, this paper applied a feed-forward neural network where the NN is trained by the back propagation algorithm.The NN model is illustrated in Fig 5.

IV. EXPERIMENTAL WORK
The main objective of this experimental work is to find best classification algorithm among the three supervised machine learning methods i.e.CBR, SVM and NN.This experimental work also presented a comparison on this three supervised machine learning methods and finally selects the best method to identify physical activity.For this experimental work, there are 192 cases with three classes (i.e.Deep breath, Activity, Relax) have been used where each case is consist of nine features.These 192 cases are then divided into two sets randomly; the training data set contain 162 cases and the test set contains 30 cases.Thus the experimental work has been conducted with three experiment data sets and they are named as LibraryA and TestA, LibraryB and TestB, LibraryC and TestC.The cases are selected as randomly, however, test sets are containing different cases that mean if a case belongs to one test set then the case is not considered again for another test set.

A. Using Case-based Reasoning (CBR)
The classification accuracy of the CBR retrieval classification scheme has been evaluated by developing a prototypical system where the main goal of the experiment is to see how accurate the CBR approach can classify with the extracted features from the signals.The experimental work has been conducted in two phases.In first phase, the training data sets i.e.LibraryA, LibraryB and LibraryC are used to train the CBR classification scheme where weight of the features are adjusted manually to achieve maximum accuracy.Here, for the retrieval, a "leave-one-out" retrieval technique is used i.e. one case is taken from the case library (i.e.162 cases) as a query case and then the system retrieves the most similar cases.Among the retrieved cases, top similar case is considered, if the top case's class is matched with the query case class then we count the correctly classification as 1.In second phase, same procedure is used but the testing data sets are used as in query.This means TestA is evaluated while considering LibraryA; TestB is evaluated while considering LibraryB and TestC is evaluated while considering LibraryC.It can be seen from table II, the accuracy of the CBR approach has been achieved for LibraryA, LibraryB and LibraryC are 85%, 90% and 86% respectively.Similarly, on testing data sets, the accuracy of the CBR approach has been achieved for TestA, TestB and TestC are 80%, 87% and 90% respectively.Moreover, the average accuracy on training data sets is 87% and for testing data sets is 86%.The confusion matrix of the testing data sets i.e.TestA, TestB and TestC are presented in Table III, Table IV

B. Using Support Vector Machine (SVM)
Support Vector Machine (SVM) is applied on the same training (LibraryA, LibraryB and LibraryC) and testing (TestA, TestB and TestC) data sets.The classification accuracy of the SVM classification scheme has been evaluated where the main goal of the experiment is to see how accurate the SVM approach can classify with the extracted features from the signals.Here, the training session has been performed base on the training data set using the LibSVM tool developed by [9] on MATLAB.For the kernel, the RBF kernel function is applied as it gives better accuracy [10].The SVM model best parameters were chosen after applying 7-fold cross validation.As soon as the model is ready, the test data sets i.e.TestA, TestB and TestC are then evaluated and the accuracy values both for training and testing data sets are presented in Table VI.Here, the percentage of the correctly classification are presented as accuracy value.It can be seen from table VI, the accuracy of the SVM approach has been achieved for LibraryA, LibraryB and LibraryC are 89%, 73% and 74% respectively.Similarly, on testing data sets, the accuracy of the SVM approach has been achieved for TestA, TestB and TestC are 63%, 53% and 70% respectively.www.ijacsa.thesai.orgMoreover, the average accuracy on training data sets is 79% and for testing data sets is 62%.The confusion matrix of the testing data sets i.e.TestA, TestB and TestC are presented in Table VII

C. Using Neural Network (NN)
Similarly, Neural Network (NN) is applied using MATLAB on the training sets LibraryA, LibraryB and LibraryC.Here, a feed-forward network with the default tan-sigmoid transfer function is used in the hidden layer and linear transfer function is used in the output layer.The number of hidden layer was fixed as 25; however, several thousand iterations have been performed to achieve a better accuracy.The minimum accuracy for the training data sets have been considered as 80% and to achieve this value, LibraryA and LibraryB are used 12800 times iteration and LibraryC is used 51200 times iteration.Thus three NN models are created based on training data sets and the test data sets are classified and evaluated.The percentage of the correctly classification in terms of accuracy value are presented in Table X.It can be seen from table X, the accuracy of the NN approach has been achieved for LibraryA, LibraryB and LibraryC are 80%, 83% and 84% respectively.Similarly, on testing data sets, the accuracy of the NN approach has been achieved for TestA, TestB and TestC are 57%, 60% and 60% respectively.Moreover, the average accuracy on training data sets is 82% and for testing data sets is 59%.Considering NN the confusion matrix of the testing data sets i.e.TestA, TestB and TestC are presented in Table XI

D. Comparision on CBR vs SVM vs NN
As one of the contribution of this paper is to select proper and best supervised machine learning algorithm which can be used to identify physical activity of elderly based on their pulse rate sensor signal.So, in this section, a comparison on CBR, SVM and NN is presented, here only test data sets are considered.It can be seen from Table XIV, using CBR the sensitivity, specificity and overall accuracy are 87%, 85% and 86% respectively; using SVM the sensitivity, specificity and overall accuracy are 53%, 67% and 62% respectively; using NN the sensitivity, specificity and overall accuracy are 73%, 52% and 59% respectively.The comparison on CBR, SVM and NN considering sensitivity, specificity and overall accuracy are presented in Fig 10.Based on the experimental work, the CBR approach is selected to identify physical activity of elderly based on pulse rate.In order to evaluate the physical activity classification by the CBR system, 12 measurements have been collected from 12 subjects.Each measurement is ten minutes long in length and the subjects are asked for walk at least once i.e. one out of ten minutes.Thus, each of the 12 measurements contains physical activity between 0 to 6 times in ten minutes length data.Each case is divided in 10 windows and each of them is 1 minute long (since each case is 10 minutes long).The main objective of this work is to see whether the CBR approach can identify activity.Here, as a case library, the CBR used 192 measurement cases which is categories in 3 classes (i.e.Deep breath, Activity and Relax) discussed earlier.For the retrieval in CBR approach, top similar case is considered to calculate the classification accuracy and the results are presented in Table XV.As can be seen from Table XV, around 84% (i.e. 26 out of 31 physical activities) is correctly classified and around 16% (i.e. 5 out of 31 physical activities) is misclassified.However, considering test_case_4, the CBR approach classified one window as activity whereas there was no activity contains in the case.

VI. SUMMARY AND CONCLUSION
This paper presents an application of supervised machine learning algorithm to identify physical activity of elderly based on pulse rate.The pulse rate is used as a physiological parameter since it has an effect with activity that is pulse rate can be increased while performing exercise and decreased while resting.Moreover, the pulse rate sensor is very simple and can easily be integrated on the body than other physiological sensors for example ECG.The contribution of the paper is in two folds: 1) selection of a supervised machine learning algorithm which fits well in this domain 2) identification physical activity of elderly using selected machine learning algorithm and based on pulse rate.To select a supervised machine learning algorithm, this work studied the implementation of three popular classification techniques, i.e. case-based reasoning, support vector machine and neural network.The study was conducted through an empirical evaluation where three experimental libraries of data sets have been used.Each library is containing 192 pulse rate signals and there are 9 features are extracted from each signal.The feature extraction has been done by considering time, frequency and time-frequency domains.The signals are labelled in 3 classes (i.e.Deep breath, activity and relax) according to the control of data collection procedure.Considering the experimental work, a comparison has been done and presented among the three implemented machine learning algorithms, i.e.CBR, SVM and NN.The comparison between these techniques shows that the CBR model yields better results, i.e. the sensitivity, specificity and overall accuracy was above 85%.After selecting the machine learning algorithm, the CBR approach is applied in 12 unknown pulse rate measurements.According to the evaluation the CBR approach was able to identify physical activity 84% accurately, that is 26 out of 31 activities are correctly classified.Thus, the case-based retrieval classification scheme shows the possibility of the identification of physical activities of elderly.However, a comparison considering accelerometer signal with the pulse rate is needed which is now under study.In future, we would like to evaluate the proposed approach considering larger samples and want to calculate energy consumption based on pulse rate.Nevertheless, considering the evaluation result it might be worth and more reliable to use pulse rate measurements besides the accelerometer signal in order to classify physical activity of elderly.

Fig. 1 .
Fig. 1.Example of pulse rate changes between activity and relax state.

Fig. 3 .
Fig. 3.CBR cycle.Adapted from[28] www.ijacsa.thesai.orgvalues obtained from the case library.The function returns 1 if the values are the same and returns 0 if the values are dissimilar.This is known as a local similarity function.B.Support Vector Machine (SVM)Support Vector Machine (SVM) is a supervised machine learning that analyses data and identifies pattern and commonly used for classification and regression analysis[1][2][3][4]. Traditionally, SVM algorithm is designed for binary or two-class classification.Providing a training data set with output class where each sample belongs to one of two classes, the SVM training algorithm builds a model that classifies a new example into one class or the other.

Fig. 4 .
Fig. 4.Training and testing steps of activity classification using SVM.

.
Fig 4 presented the training and testing steps.The pulse rate data was pre-processed and features are extracted both for training and testing data.But training data has been used to build a classifier model based on SVM training.Finally, the testing data is used to measure the performance of the model.

Fig. 5 .
Fig. 5. NN model to identify physical activity based on elderie's pulse rate.As can be seen in Fig 5, there are 9 features considered as input, 25 hidden neurons, and 3 output neuron with 3 final outputs (i.e.deep breath, physical activity and relax classes).

Fig 6
Fig 6 illustrated  the performance of the CBR approach while classifying the elderlies' pulse rate data into three classes i.e.Deep breath, Activity and Relax.Here, the accuracy value shows that the CBR approach can achieve the classification accuracy between 80 and 90 percentages for all the test data sets.

Fig. 6 .
Fig.6.Performance of CBR classification based on three test data sets.

Fig 7
Fig 7 presents the performance of the SVM approach to classify elderlies pulse rate sensor signals.The accuracy values show that most of the time (i.e. in Activity and Relax classes) SVM is achieved its classification accuracy between 50% and 60%; however, it also achieved 90% accuracy while considering Deep breath class.

Fig. 7 .
Fig. 7. Performance of SVM classification based on three test data sets.

Fig. 8 .
Fig. 8. Performance of NN classification based on three test data sets.Same as CBR and SVM, Fig 8 presents the performance of the NN approch to classify the eldeires pulse rate signls.Here, the accurecy vlues for Deep breath and Activity lies between 60% and 80%, however, the cases in Relax class has been classified very poorly.According to the Fig 8, the accuracy of the Relax class was between 30% and 40%.

Fig. 9 .
Fig. 9. Classification performance based on CBR, SVM and NN.The three test data sets (i.e.TestA, TestB and TestC) contains 90 cases totally, among of them each class (i.e.deep breath, Activity, and Relax) consists of 30 cases.Percentage of correctly classification has been calculated for all three classes where CBR, SVM and NN algorithms are applied.The results of percentage of correctly classification in terms of accuracy are presented in Fig 9.As can be seen from Fig 9, CBR can perform well (i.e.> 83%) in correctly classification compare to other two machine learning algorithms (i.e.SVM and NN).

Fig. 10 .
Fig. 10.Classification performance based on CBR, SVM and NN.According to Fig 10, using CBR the activity classification shows to be superior to SVM and NN considering sensitivity, specificity and overall accuracy which is above 85%.It can be observed that the SVM and NN have achieved their sensitivity, specificity and overall accuracy between 50% and 70%.

TABLE I .
AN EXAMPLE OF CASE REPRESENTATION

TABLE II .
PERCENTAGE OF CORRECTLY CLASSIFICATION USING CBR

TABLE III .
CONFUSION MATRIX BASED ON TESTA USING CBR

TABLE VI
, Table VIII and Table IX.

TABLE VII .
CONFUSION MATRIX BASED ON TESTA USING SVM

TABLE X
, Table XII and Table XIII.

TABLE XI .
CONFUSION MATRIX BASED ON TESTA USING NN

TABLE XII .
CONFUSION MATRIX BASED ON TESTB USING NN

TABLE XIII .
CONFUSION MATRIX BASED ON TESTC USING NN

TABLE XIV .
STATISTICAL ANALYSIS OF THE CLASSIFICATIONS

TABLE XV .
IDENTIFICATION OF PHYSICAL ACTIVITY ON 12 PULSE RATE