Survey on Human Activity Recognition based on Acceleration Data

Human activity recognition is an important area of machine learning research as it has many utilization in different areas such as sports training, security, entertainment, ambientassisted living, and health monitoring and management. Studying human activity recognition shows that researchers are interested mostly in the daily activities of the human. Therefore, the general architecture of HAR system is presented in this paper, along with the description of its main components. The state of the art in human activity recognition based on accelerometer is surveyed. According to this survey, Most of the researches recently used deep learning for recognizing HAR, but they focused on CNN even though there are other deep learning types achieved a satisfied accuracy. The paper displays a two-level taxonomy in accordance with machine learning approach (either traditional or deep learning) and the processing mode (either online or offline). Forty eight studies are compared in terms of recognition accuracy, classifier, activities types, and used devices. Finally, the paper concludes different challenges and issues online versus offline also using deep learning versus traditional machine learning for human activity recognition based on accelerometer sensors. Keywords—Human activity recognition; accelerometer; online system; offline system; traditional machine learning; deep learning


I. INTRODUCTION
Human Activity recognition (HAR) is the root of many applications, such as those which deal with personal biometric signature, advanced computing, health and fitness monitoring, and elder-care, etc. [1].The input of HAR models is the reading of the raw sensor data and the output is the prediction of the user's motion activities [2].

A. Sensor Approaches
There are two types of sensors to recognize the human activities; using external or wearable sensors.In the past, the sensors were settled in predetermined points of interest, therefore the detecting of activities is essentially based on the interaction of the users with the sensors.One of the examples of external sensors applications is the intelligent home [3][4][5][6][7], which has a capability to identify the complicated activities, eating, taking a shower, washing dishes, etc., because they depend on data that is collected from various sensors which are placed in specific objects.Those objects are supported by peoples' interaction with them (e.g., stove, faucet, washing machine, etc.).However, there is no useful response if the user is out of the sensor area or the activities of the user do not need to interact with those objects.Moreover, the composition and servicing of sensors require high costs.
Also, some of the extensive researches [8][9][10][11] have been focused on the recognition of activities and gestures from video sequences.This is most appropriate for security and interactive applications.Microsoft developed the Kinect game console that let the user interact with the game using the gestures without any controller devices.However, there are some issues in video sequences of HAR such as [2]:  The privacy, as no one wants to be always monitored and recorded by cameras.
 The pervasiveness, it is difficult to attach the video recording devices to the target of individuals in order to collect the images of their entire body during daily living activities.
 Video processing techniques are comparatively costly and consuming time.
The above-mentioned limitations motivate to use a wearable sensor in HAR.Where the measured attributes almost depend on the following: environmental variables (such as temperature and humidity), movement of the user (such as using GPS or accelerometers), or physiological signals (such as heart rate or electrocardiogram).These data are indexed over the time dimension.
Accelerometer sensors sense the acceleration event from mobile phone, WII remote, or wearable sensors.The raw data stream from the accelerometer is the acceleration of each axis in the units of g-force.The raw data is represented in a set of 3D space vectors of acceleration.A time stamp can also be returned together with the three axes readings.Most of the existing accelerometers provide a user interface to configure the sampling frequency so that the user have to choose the best sampling rate which match his needs.There are many causes that encourage to develop new techniques for enhancing the accuracy under more factual conditions.However, the first works on HAR date back to the late 90's [12], [13].

B. Challenges Face HAR System Designers
Any HAR system design relies on the activities to be recognized.The activities kinds and complexity are able to affect the quality of the recognition.some of challenges which face researches are (1) how to select the attributes to be measured, (2) how constructing the system with portable, unobtrusive, and inexpensive data acquisition, (3) how www.ijacsa.thesai.orgextracting the features and designing the inference methods, (4) how collecting the data in the real environment, (5) how recognizing activities of the new users without the need of retraining the system, and (6) how can be implemented in the mobile devices which meeting energy and processing limitations [14].
Oscar et al. [2] distinguished activities into seven groups such as Ambulation, Transportation, Phone usage, Daily activities, Exercise/Fitness, Military and Upper body.However, according to our survey eight different groups of activities can be distinguished by reorganizing the activities categorization in [2] such as the activities of phone usage were combined into Daily activities category, upper body and military categories are removed because they were not used in our survey, Household activities, Kitchen activities, Self-care activities, and Transitional activities were added.Those eight categories and the individual activities that belong to each category are summarized in Table I.The abbreviations and acronyms are defined in Table V.

C. Offline Versus Online HAR Systems
The recognition of human activity could be done using offline or online techniques.Whenever online processing is not necessary for the application, the offline processing can always be used.For example, if the tracking of person's daily routine is the goal such as in [15], the data was collected during the day by using the sensors and then it could be uploaded to a server at the end of the day.The data can be processed offline for classification purposes only.
However, some of the applications such as fitness coach where the user applies the given program which contains on a set of activities with sequence and duration.It is widely required to identify what the user is currently doing [16]; therefore it requires to use online technique.
Another application can be the recruitment for participatory sensing applications [17].For instance, the application aimed to collect the information from users during walking in a specific location in the city.Thus, online recognition of activities becomes significant.Some researches on human activities, which works on offline recognition, are using machine learning tools such as WEKA [18][19][20].Nowadays, some of clouding systems are being used for online recognition [21] [22].

D. Machine Learning Techniques
The success of HAR process depends on which machine learning technique is suitable in the problem case.There are two different approaches: first approach depend on traditional machine learning such as KNN, Naïve Bayes, Bayes Net, IBK, J48, Random forest, SVM, DTW, etc., the second approach depend on deep learning such as convolution neural network, recurrent neural network, vanilla RNN forward, and Gated Recurrent Unit RNNs, etc.
Recognize the human activity is mission.The paper surveys the state of the art traditional machine learning and deep learning for HAR.Section II presents the general components of HAR system.Section III explores the difference between online and offline systems.Section IV compares between traditional and deep learning techniques.Section V shows the main issues for recognizing activities and the most important solutions to each one of them.Finally, a general conclusion is presented in Section VI.The Human Activity Recognition process consists of four main phases: Data Acquisition, Pre-Processing, Feature Extraction, and Classification.As shown in Fig. 1, HAR systems consist of several phases which are: Data Acquisition: It is the first phase in the activity recognition for collecting the data by using the sensors.
Pre-processing: It is the second phase after the data is collected.It has important roles such as removing the noise of the raw data, using windowing or segmentation schema on the collected data.Using the raw sensor data in the classification process may be not a suitable decision, therefore, the raw data needs some transformations such as breaking the continuous raw sensor data into the windows of a certain duration.For the sake of the energy efficiency, it is serious to take a low sampling frequency in order to reduce the time of sensors working.The work time for the powerful sensor is low when the low sampling frequency is used.However, using of the low sample frequency to recognize the activities is still an open question.According to Kwapisz et al. [23], the sampling rate might be no less than 20 Hz for detecting daily activities.Some of the sampling data may be lost when using a low sampling frequency as well as it is hard to recognize the activities when the sensing device has low-resolution.Thus there is a trade-off between consumption of the energy and the rate of recognition.Liang et al. [24] proposed a method for energy-efficient.That method is based on tri-axial accelerometer which embedded in a smartphone in order to recognize the user's activities.They aimed to reduce the likelihood of time-consuming frequencydomain features for lower computational complexity and modify the sliding window size for improving the accuracy of recognition.
Feature extraction: The segmented data is collected as a series of pattern containing three values 3D acceleration components.It converts the signal into the most significant features which are unique for the activity.It is better to extract features of the data which is based on a temporal window rather than using the raw data which depend on classifying every single data point.Using the features, rather than the raw data, leads to reduce the effects of noise and also reducing the computational load of classification algorithms.Standard features are divided into time and frequency domain.Janidarmian et al. [25] stated that the time or frequency domain and heuristic features are the most effective in the context of activity recognition.Table II    Classification: It is the final phase of the human activity recognition process.The trained classifiers are used for classifying the various activities.The classification can be done either offline or online.A machine learning tool build on powerful processing may be used offline processing and the mobile phone itself or the cloud server may be used in the online processing.

III. ONLINE VS. OFFLINE HAR SYSTEMS
In the first, the training data are used for training the classifiers [25].The human activities classifier can be trained online or offline as well as the classification process itself can be done online or offline.Offline classification (non-real-time) is sufficient solution when the user does not find an urgent need to receive immediate feedback.In the other side real-time classification (online) assists user for receiving real-time feedback.Liang et al. [24] proposed a framework for activity recognition using offline data training and online classification.

A. Online vs. Offline Training Phase
Online training means that the classifiers are trained on the hosting device, such as mobile, cloud, or Raspberry Pi, in real time.On the other hand, on offline training, a desktop machine is usually used for training the classifiers beforehand.As well as the raw data of the activities, which is collected by sensors, is stored and in later time these data are used for training the classification model as shown in Fig. 2. In the online training phase the raw data are not stored for later use but instead they are immediately processed for training to save time.
According to our readings, as shown in relevant references in Fig. 2, most of researchers prefer offline training.Only 8 out of all 45 studies were using online training in real time.One of the reasons for using offline method is that the training process is computationally expensive.

B. Online vs. Offline Classification
In the final stage of classification, there are two ways to classify the activity for a specific label, which are online or offline, according to the training data.The most semisupervised classification has been implemented and evaluated offline [2].As well as there are more studies for supervised classification online and offline.All studies of the online and offline classification, which are described in details in the following two sections, are summarized in Fig. 3. Fig. 3 shows the body sensors positions and related sensor type, device type, classifier type, and reference noted.The circles in Fig. 3 display two pieces of information, the types of devices which are used and the interior sensors.For example point 6 (the palm of the right hand) have three shapes which means some of the studies were used in this position for classification human activities by using three devices that are described in the following: a) A red circle contains (A): Smartphone was used for sensing the accelerometer data in the reference [26] and [27].The reference [26] used NB and KNN for classification human activities but the reference [27] used logitBoost classifier.
b) A red circle contains (A and G): Smartphone was used to collect Accelerometer and Gyroscope data for classifying human activities.
c) A green circle contains (A): WAS device was used for collecting the Accelerometer data.
The lines in Fig. 3 display two pieces of information, the point number and the location.For example point 20 has a red line which means this point is in a position on the back of the body.But when the line is a black color this means the point is in the front of the body position.The point 18 has a yellow line because the studies used the side pocket for holding the devices.The back pockets (point 10 and 11) were used in the same studies; therefore, their lines are merged.1) Online classification: Online HAR systems are needed in the healthcare field for continuously monitoring to the patients with physical or mental pathologies in order to their protection, safety, and recovery.A lot of studies are based on giving real-time feedback for assisting people; some of them are summarized in Table III.In this survey, 20 studies in Table III were compared for displaying the activities which were classified, the classifier that was used, and accuracy which was achieved by using that classifier, the window length, overlap, and latency time of feedback, the type of training, reference of used dataset, and finally the type of devices which were used in collecting data.
According to our reading, there are 14 studies out of 20 focused to recognize the physical daily activities such as walking, running, sitting, and standing.One study [28] from 20 reviewed studies depends on online training.The mobile phone has become more robust in the available resources, such as battery, memory, CPU.
In online activity recognition, the mobile phone is used locally for collecting the data, pre-processing and classification; therefore, there are 12 out of 20 reviewed studies using the smart phone.2) Offline classification: When the online activity recognition (real-time activity recognition) is not critical for the application, the offline recognition is the best choice.In the offline classification, the user doesn't wait for the feedback of the recognition system in real-time.The user receives the results of the recognition after offline analysis and classification.For example, if the goal is monitoring the elderly person's activities of daily living [29], the data were collected during the day using sensors and then at the end of the day, it can be uploaded to a server for analysis and sending feedback to care manager or relatives etc.A lot of researchers use offline HAR system as a research base for examining their new techniques that are proposed to be used in the designing of an efficient HAR system, such as the techniques which are proposed for data collecting, pre-processing, feature extraction, or classification.
For example in [30], the offline classification was used for comparing seven classifiers in order to find the optimal one.Also in [31], a new classifier was developed for achieving appropriate accuracy in the child's behaviors detection.Some of researchers used offline classification for studying the efficiency of using accelerometer alone or to be combined with another sensor in the data collection phase.Studies included in our survey mainly used triaxial acceleration signal, while some of them used additional signals to improve recognition accuracy such as [29], [32][33][34].
The acceleration signal, which is recorded from the object, depends on the location of sensor and the activity being performed.In general, the magnitude of acceleration signal increase from the head to ankle.Vertical accelerations produced during level walking range from −2.9 m/s2 to 7.8 m/s2 at the lower back, to 16.7 m/s2 to 32.4 m/s2 at the tibia [51].What is an appropriate position for a single tri-axial accelerometer to detect the kind of activities?This is an open question and some of the researchers tried to answer that question [52] [53].Cleland et al. [52] collected the data from six various locations on the body, (lower back, chest, left wrist, left hip, left thigh, and left foot).SVM is used to determine which position is best to place accelerometers for detecting the activities.
The training dataset affects the classification efficiency; therefore, some of the researchers used offline classification to focus on developing a methodology to extract the appropriate candidates for building the training dataset such as in [54].Davila et al. proposed a new method to classify human activities (e.g., sit, walk, lie and stand) by using a data-driven architecture which depends on an iterative learning framework.Their suggested solution optimizes the performance of the model by selecting the most appropriate training dataset for non-linear multi-class classification that makes use of an SVM classifier, while also reducing the computational load.They achieved 76.1% when using WAV-F in the pre-processing phase.As well as they tried to improve the accuracy and reached to 81.9% when using band-pass Finite Impulse Response (FIR) with a WAV-F [36].Table IV displays the activities, Pre-processing and feature extraction techniques, classifier and its accuracy, and devices which are used in the activity recognition for some of offline researches.

IV. TRADITIONAL AND DEEP LEARNING TECHNIQUES
In recent years the intelligent machine techniques is advanced very quickly such as smartphones, smartwatch, and wearable sensors.Those devices can now use applications provided with Artificial Intelligence for predicting human activity, depending on the raw accelerometer sensor signal.The primary goal of HAR is using machine learning models with high accuracy when predicting the human activity.Many traditional learning techniques like Decision Tree, Random Forest, AdaBoost, Support Vector Machine, K-nearest neighbor, Naïve Bayes, etc. achieved good accuracy.However, deep learning is highly used in HAR.After studying a lot of researches we found that there are three different strategies for building machine learning model as shown in Fig. 4.  Deep learning model flow: Collecting the data, extracting the features, then using one of deep learning technique, and finally extract the activity label such as in [32] [34] [49]  Deep feature extraction and model building flow: Collecting the data, then deep learning technique for automatic extracting features, and finally using softmax layer such as in [35] [37-39] [45] to predict the label of activity.
Arifoglu et al. [86] compared three variants of Recurrent Neural Networks, Which is one of deep learning algorithm, With 5 traditional techniques, SVMs, Naïve Bayes, HMMs, Hidden Semi-Markov Models, and Conditional Random Fields (CRFs), on public dataset collected by van et al. [87].The data capture daily-life activities such as sleeping, cooking, leaving home, etc. using sensors placed at the homes in less than a month.The results obtained indicate that deep learning is competitive with those traditional methods.
By investigating 48 different research, it is found 56.25% of researches focused on using traditional algorithms for classification, some of those researches adapting the traditional algorithm for achieving high accuracy, 33.33% using deep learning, and 10.4% using algorithms (deep learning or/and traditional) for examining their proposed system but they didn't focus on the algorithm such as in [33] [26].Maekawa et al. [33] compared two traditional algorithms, AdaBoost and DT, for examining the efficiency of classifier when collecting the data from heterogeneous sensors.Hayashi et al. as well as Hammerla et al. [41] proposed a new method that called the empirical cumulative distribution function (ECDF) to representing sensor data for improving the efficiency of feature extraction phase.Ayu et al. [26] proposed a system for realtime activity recognition and compared two traditional classifiers (NB and KNN) for exploring the influence of training data size on recognition accuracy.Nhac et al. [28] comparing four traditional classifiers (RF, NB, KNN, SVM) for examining their proposed Mobile Online Activity Recognition System (MOARS) to automatically recognize several activities of smartphone users.Kwapisz et al. [23] focused on developing a public dataset that is collected with the smartphone for human activity recognition.DT, MLP, and Logistic Regression are compared for testing the quality of their collected dataset.The average accuracy of all traditional algorithms, which are listed in Table III and Table IV, is displayed in Fig. 5 as well as the frequency of applying each algorithm.The algorithm frequency means the number of using algorithm.
The most frequently traditional classifier that is used in those studies is KNN as illustrated in Fig. 5. KNN may be used more frequently for classifying the researcher's proposed system or just for comparing with the main classifier.However, the frequency (the number of papers that used KNN classifier) of applying KNN reached 90.1%, its average accuracy is 75.48%.QDA and SR achieved high average accuracy, 95% and 96.1% respectively, in spite of the number of papers that used them are small.Some of researchers developed a new classifier based on combining different traditional classification algorithms such as in [27] [60] [75] and that is called "Hybrid T".Neural networks are a powerful biologically-inspired programming paradigm which enables a computer to learn from observational data and Deep learning, a powerful set of techniques for learning in neural networks.Neural network (NN) achieved accepted average accuracy 93.23% as shown in Fig. 5.
Deep learning has various architectures such as DBN [37] [65] [69], RNN [43], CNN [38][39][40] [45] [49] [50] [70], etc.Therefore, Fig. 6 displayed the average accuracy of all kinds of deep learning architectures and its frequency in this study.It is found that the average accuracy of all deep learning architectures are mostly close however the most frequently used is CNN.
According to this study, the overall average accuracy of traditional machine learning algorithms is 83.3%, which is less than the average accuracy of deep learning algorithms that can reach 94.9%, although the number of studies used traditional machine algorithms are more than those used deep learning.
Traditional machine learning algorithms are typically linear, in that they can be represented by only one node that linearly transforms input to output.Previously called artificial neural networks, deep learning uses multiple nodes, organized like the neural networks to model how human brains work.The more nodes and layers in a neural network, the more sophisticated its learning capabilities can become.Although people still use the term "neural networks", today deep learning networks represent how information flows across nodes which are like how information in the human brain flows across neurons.In the recent years, researches tend to use deep machine learning rather than traditional as illustrated in Fig. 7.
According to all studies investigated in this paper, using deep learning appears in the year 2014 till 2018.After 2014 the using deep learning in activity recognition is more than using traditional algorithms up to now.www.ijacsa.thesai.orgV. ISSUES AND CHALLENGES Some of various sensors are used to collect the raw data for activities recognition.There are three categories of sensors: video sensors [8][9][10][11], environmental-based sensors [3][4][5][6][7][8], and wearable sensors.Camera is the video sensor which is located in the specific places.RGB camera received less focus in HAR research, probably because of its tradition in scene capture and Human movement in 3D space [88].As well as identifying the human from the image requires more constraints because the process needs the high machine processing [89].Therefore, the quality of the real-time HAR system should be affected [90].
The wearable sensor systems should deal with occlusion and restriction challenges that happen in HAR system which uses RGB camera.However, the main drawback of the wearable sensor is the accuracy of recognition.Because the HAR system, which based on a wearable sensor, needs to the subject for wearing and attaching with several sensors on the different body parts.That is too much of hassle, uncomfortable for the users.As well as VO et al., [91] mentioned that the quality of HAR system, which based on wearable sensors, could not be effective because the subject can forget to use the dedicated sensor.
The location of wearable sensor or smartphone is more sensitive because it effects on the accuracy of the recognition.Reading raw data of accelerometer, which is embedded in the smartphone or wearable sensor, rely on the position and the orientation of the sensor on the subject's body.For example, the moving data reading is totally different when a user walking while holding a phone in his/her hand or pocket.Therefore many of the research faced this issue in their attempt to find the optimal solution [52].
Online HAR systems require continuous sensing and updating the classification model and both of them are energy consuming.Updating the data online may require significant computing resources (e.g., mobile phone memories).In general, various activities have different sampling frequency.There is a trade-off between sampling rate (which affects quality of feature extraction) and the efficiency of recognition.Ustev et al. [92] attempted to reduce both computing energy and resource cost by selecting the optimal sampling frequency and classification features.This enabled him to remove the calculation of time-consuming and frequency-domain features.Online classification system main concern is time consumed on the process as endpoint user is expecting an instant result.As for Offline systems are more concerned with processing power.Offline classification systems depends on the processing power of the setup in hand which -in case of mobile devices -very weak, while in Online systems classification is done on high processing that enables quick classification and instant result to the endpoint user.
In traditional machine learning, the features have to be extracted from the raw sensor data by any domain expert to reduce the complexity of the data as well as making the patterns more clearly for learning algorithm.Deep learning try to learn high-level features from the data in an incremental way and that is the major advantage when using deep learning algorithms.Therefore there is no need for domain expertise and hard-core feature extraction.Regarding problem-solving approach, machine learning techniques break the problem into different parts to be solved first then their results are combined at the final stage while deep learning aims to solve the problem end to end.For example, for a multiple object detection problem, Deep Learning techniques like Yolov2 system [93] takes the image as input and provide the location and name of objects at the output.In the other side machine learning algorithms like SVM, a bounding box object detection algorithm is required first to identify all possible objects to have the specific object as input to the learning algorithm in order to recognize relevant objects.
High-end machines are required for applying deep learning and that is the opposite requirements of traditional machine learning algorithms.The important part of executing deep learning is GPU which its algorithms take a long time because there is a large number of parameter.For example Popular Deep Residual Networks algorithm takes about two weeks to train completely from scratch [94] while the training of traditional Machine Learning algorithms takes few seconds to few hours.In the testing phase, the scenario is completely opposite.Deep learning algorithm takes much less time to run in test time whereas, if you compare it with KNN (a type of traditional machine learning algorithm), testing time is increasing whilst the size of data is increasing.Although this is not applicable to all machine learning algorithms, as some of them have small testing times.

VI. CONCLUSION
This paper surveys the state-of-the-art in human activity recognition based on measured acceleration components.We stated the general structure of activity recognition system online and offline, traditional and deep learning machine learning algorithms.Moreover, those studies focus on recognizing the number of activities and different classification methods used for the recognition process.Forty-eight researches are qualitatively compared in regards to the activities, devices that are used, learning models, dataset, and recognition accuracy.Finally, we discuss the different challenges and issues of these studies.As well as this survey has shown that recently deep learning was used more than traditional machine learning, it also showed that CNN deep learning is mostly used; even though RNN [43] and AE [35] achieved a satisfying accuracy which is higher than 96%.

Fig. 3 .
Fig. 3. Locations and Specifications of Different Sensors Devices, Interiors Sensors, and Classifiers which used in the HAR Studies.

Fig. 4 .
Fig. 4. General Structure of Traditional and Deep Learning Models.

Fig. 7 .
Fig. 7. Traditional and Deep Learning Algorithms used Per Year.

TABLE I
displays the details of those features.

TABLE II .
LIST OF FEATURES

TABLE III .
ON-LINE CLASSIFICATION OF HAR smartphone www.ijacsa.thesai.org

TABLE IV .
OFF-LINE CLASSIFICATION OF HAR