Virtual Rehabilitation Using Sequential Learning Algorithms

Rehabilitation systems are becoming more important now because patients can access motor skills recovery treatment from home, reducing the limitations of time, space and cost of treatment in a medical facility. Traditional rehabilitation systems served as movement guides, later as movement mirrors, and in recent years research has sought to generate feedback messages to the patient based on the evaluation of his or her movements. Currently the most commonly used algorithms for exercise evaluation are Dynamic time warping (DTW), Hidden Markov model (HMM), Support vector machine (SVM). However, the larger the set of exercises to be evaluated, the less accurate the recognition becomes, generating confusion between exercises that have similar posture descriptors. This research paper compares two HMM classifiers and Hidden Conditional Random Fields (HCRF) plus two types of posture descriptors, based on points and based on angles. Point representation proves to be superior to angle representation, although the latter is still acceptable. Similar results are found in HCRF and HMM. Keywords—Kinect Skeletal; Sequential Learning Algoritms; Virtual Rehabilitation; Virtual Reality Therapy


I. INTRODUCTION
Rehabilitation systems are now becoming more important because patients can access motor skills recovery treatment from home.The treatment has several components, one of the main ones being the repetition of movements, which requires the assistance of medical personnel to indicate if the movement was performed correctly to count the repetitions performed.It is necessary to go to a medical center, request an appointment and require the assistance of a therapist.
Virtual rehabilitation systems are intended to meet the needs of the mechanical part of the motor skills recovery treatment [14].Initially these rehabilitation systems were focused only on being a guide of movements, since they showed an avatar that carried out the example of the movement to be carried out.Later, with the introduction of cheaper motion sensors such as Kinect(Kv1) in 2010, systems were built that served as a mirror, i.e. showed the user their movements on the screen so that the patient could visualize and self-correct or simply have a record that could be evaluated later by the therapist.In recent years, research has sought to generate intelligent assistants such as motion counters or motion evaluators who also send feedback messages in the form of text or voice to inform patients of the quality of their movements.One way to evaluate movements is to apply a sequential learning algorithm to position descriptors obtained by a depth sensor such as Kv1 [1], [2], [3].
The most commonly used algorithms for motion evaluation are DTW, HMM and SVM [1], [2], [3].However, the greater the number of exercises to be evaluated, the less accurate the recognition of these movements is, which leads to the search for improvements in the performance of the algorithm used.This research paper aims to compare two HMM and HCRF classifiers, in addition to two types of posture descriptors, based on points and based on angles.
The point based posture descriptor proves to be superior to the angle based posture descriptor, although the latter is still acceptable.While with the HCRF and HMM algorithms similar results are found.The remaining part of the paper is organized as follows.A brief review of virtual rehabilitation state-of-the-art in this paper is explained in Section 2. Details of the methodology are described in Section 3. A results with the experiment settings is introduced in Section 4. Conclusions are presented in Section 5 and some future Works are provided in Section 6.

II. STATE-OF-THE-ART
In recent years there has been an increase in the number of studies related to virtual rehabilitation with feedback [13], however, there are few studies that test new algorithms applied to exercise evaluation, as shown below.
Uttarwar et al. [4], propose a rehabilitation system for shoulder injuries.They use HMM for recognition and histograms to calculate accuracy.For the training, 10 sequences of each exercise were used, performed by 3 healthy subjects for 4 exercises.100% accuracy is reported when the patient performs a single exercise for 30 seconds.For multiple exercises, the ranking gets a lower accuracy score .The training and testing package is very limited with respect to other jobs [5], [6], [7], [2] and [8].
M. Capecci et al. [2], propose the use of the Hidden semi-Markov model (HSMM) to model the time evolution of movements and compare them against DTW, perform tests with 5 different exercises with 33 people.As statistical models, HSMMs model the distribution of features from multiple demonstrations.They report that the proposed algorithm outperforms DTW in terms of correlation with a clinical evaluation, demonstrating the applicability of this approach.The number of subjects is larger than in other jobs [5], [6], [7], and [8].However, the number of exercises is less than these.
Anton et al. [9], propose a system for monitoring rehabilitation exercises to track patient progress, without the need for a specialist.The system can be trained to detect exercise and compare movement with the correct form, providing feedback to the patient and tracking progress.Of 100 sequences of skeletons describing simple activities such as hand or leg movements, 97% were correctly recognized.However, evaluation errors can be expected at any time, propagated from the error of the MATCH algorithm.In addition, errors based on Kinect's schematic error may occur for the detected attachment points.Another disadvantage of the system is that it only considers the angles of the segment of the skeleton in two dimensions when projecting the segments in the XY plane, without taking into account the perspective [9].An evaluation with Kinect v2 may yield greater results.Descriptors like in [6], [7] allow you to represent perspective.
Vemulapalli et al. [7], propose a new skeletal representation that explicitly models 3D geometric relationships between various parts of the body using rotations and translations in 3D space.Using the proposed representation, human actions can be modeled as curves in this Lie group.They then perform the classification using a combination of DTW, Fourier temporal pyramid representation and linear SVM.Experimental results in three sets of action data show that the proposed representation works better than many existing skeletal representations.The descriptors used, besides being light, obtain results higher than 90%, being necessary to use them with other algorithms such as those of [10].
Batabyal et al. [5], demonstrate that a set of active 3D skeleton coordinates can be used effectively for action recognition.The 3D joint coordinates based on Kinect suffer from low frequency noise.However, the covariance-based function could successfully eliminate the unwanted effect of noise.At the same time, mapping characteristics to the lowest dimensional variety may improve the result of the classification.The 3D descriptor they use proves to be meaningful with a large data set.
Wang et al. [10], present a discriminatory hidden state approach to gesture recognition.The proposed model combines the two main advantages of current approaches to gesture recognition: the ability of CRFs to use long-range dependencies and the ability of HMMs to model the latent structure.Results have shown that HCRFs outperform both CRFs and HMMs for certain gesture recognition tasks.For arm gestures, the multi-class HCRF model outperforms the HMM and CRF even when not using long-range dependencies, demonstrating the advantages of discriminative joint learning.HCRF and CRF are used for the recognition of gestures, while still needing to be tested with exercises.
Liu et al. [11], present an assistive physical rehabilitation system based on skeletal detection with Kinect.They construct a location of standardized three-dimensional Cartesian coordinates of correct postures in the OpenNI system.They also use the support vector machine (SVM) as a classifier to define the accuracy of the posture.Finally, the system can judge the correctness of the users' positions.Considering only 15 joints, and leaving aside the less significant ones, seems to have a favorable result when recognizing gestures.
From the previous analysis it is observed that algorithms such as HCRF still need to be tested with movements particularly of exercises, and HMMs tested with a wider set of movements, in addition there are several descriptors that could significantly represent the user's posture, which should be tested with algorithms such as HMM and HCRF.
In this paper we applied HMM described in [4] by Utarwar et al., and HCRF in [10] by Wang et al.Using the position descriptor based on points [5] and the position descriptor based on angles by Anton et al. [9], considering three dimensions XYZ, described in [7] by Vemulapalli et al.It was therefore necessary to assess whether the position descriptor based on angles is as significant as the position descriptor based on points with two different HMM and HCRF algorithms.

III. METHODOLOGY
We conducted the research in four stages, as shown in Figure 1: Stage 1: Data set collection using Kv1, which was divided into two subsets: training set and test set.In the second stage, the training set was trained in four situations: HCRF with point based descriptor, HCRF with angle based descriptor, HMM with point based descriptor and HMM with angle based descriptor.For the third stage, the test set was evaluated in all four cases, and contrasted with the expected result.In the final stage, the confusion matrix was developed for each situation and the ROC curve was plotted for the four test cases.

A. Data acquisition
The Skeletal tracking functionality of the kv1 allows the human skeleton to be tracked using an algorithm that identifies the parts of the human body of people within the sensor's field of view.It is by means of this algorithm that the points referring to the parts of the human body have been obtained.And through these reference points it has been possible to apply different algorithms for the recognition of gestures.
In other words, each sequence of movements corresponds to a sequence of postures, each posture is represented by twenty points that refer to the joints of the human body, each point has three coordinates, in total each posture has sixty values that describe it.And each movement sequence has a multiple of sixty for its representation.
There are two ways to represent these reference points, using point based descriptors and angle based descriptors.
1) Pose descriptor based on points: Each skeleton is represented by twenty joints, each joint has a pre-defined enumeration and identification, listed in table I, which are distributed as a sample in Figure 2, and each point is represented in space 3 D, by the coordinates X, Y and Z (1), as in [5], [7].

Source: Kinect
2) Pose descriptor based on angles: Eighteen angles formed by the reference points of the human body were identified, Figure 3, four angles are excluded: the angles formed by the hands and feet with the extremities, since the observations are made on movement exercises involving movements of the extremities of the body, the minimum movements of the hands   II To calculate the fourteen selected angles, place the three points forming the angle, figure 4, convert these points into two vectors ( 2) and ( 3), and apply the angle formula between two vectors in three dimensions (4), as in [2]. (2) An angle of for this type of descriptors is formed from three points relating to the human posture, these points must be adjacent.The angle being the one that has as its center the joint that joins the other two joints.Excluded angles are shown with an asterisk in table II.
The point based descriptor was used to capture the exercise sequences.Ten exercises from a Patient and Caregiver Guide were identified for this purpose, these being exercises to be performed at home after a stroke.Recordings of 34 repetitions were made for each exercise, 24 were added to the training set and 10 to the test set.In total there were 240 training sequences and 100 test sequences, the description of the data set in Table III.
Exercises of the upper limbs.Self-passives and mobilizations made by the patient himself, Figures 5 and 6.
Trunk exercises.Mobilizations made by the patient himself.Figures 7 and 8.

B. Training
A set of 240 sequences of 10 types of exercises were trained, with two HCRF and HMM algorithms, using two different descriptors, based on points and based on characteristic angles, table IV 1) HMM: The parameters are learned automatically by means of the Baum Welch Learning algorithm, which is created with a Forward topology with 10 states for a 60dimensional alphabet, in addition, the classifier is created assuming a Gaussian distribution and it is defined that the training must be carried out until the tolerance is less than 0.0025 or the maximum of iterations defined as 300 have been fulfilled.In addition, the covariance matrices are prevented from degenerating adding a regularization value to the diagonal so that they remain positive.
2) After defining the parameters, the algorithm is trained with the sequences destined for this purpose, table III.HCRF: The Hidden Resilient Gradient Learning algorithm is created, which is established with a Forward topology with 10 states for a 60 dimensional alphabet, in addition, the classifier is created assuming a Gaussian distribution, it is defined that the training must be performed until the tolerance is less than 0.0025 or the maximum of iterations defined as 300 have been fulfilled.
After defining the parameters, the algorithm is trained with the sequences destined for this purpose, Table III

C. Recognition
For recognition in cases where point based descriptors are used, words and their corresponding tags are sent.In the case of angle-based recognition, the angle points are first converted and sent to the algorithm.
The set of training sequences has associated tags that identify the movement that is performed in that sequence, then 100 exercise sequences are consulted, and the answers generated are contrasted with their respective tags.In addition, the time required for the consultation is calculated.

IV. RESULTS
HCRF with point representation gets 100% accuracy, just like HMM with point representation.While the same algorithms with angle representation get 95% and 96% respectively.In the case of HCRF with angle representation, there is confusion between classes: class 1, class 4 and class 10, and in the case of HMM with angle representation the confusion is in classes: class 4 and class 5, table V.The angle display with HMM gets 1489.99 ms and with HCRF 2094.39 ms respectively.While the dotted representation gets 3781.51 ms and 4371.9 ms for each algorithm respectively.The angle representation with both algorithms obtains a shorter response time than with its counterpart, Table V.
Of the 100 test sequences, 100 are correctly classified, Figure 9.The ROC curve shows that the point based descriptor has more area under the curve with both algorithms, figure 13, so for the data set presented it is the most recommended to use.

V. CONCLUSION AND FUTURE WORK
A total of 340 sequences of postures from 10 rehabilitation exercises were collected, 240 of which were for training and 100 for testing.These sequences were evaluated with two HCRF and HMM algorithms using two types of descriptors, based on posture points and based on posture angles.The dotbased descriptor was found to be superior to the angle-based descriptor, however, the latter is still acceptable.While HCRF and HMM have similar results, it is necessary to perform tests with a set of sequences superior to the present one.
With both HCRF and HMM, the point based descriptor proves to be more accurate, however the results with the angle based descriptor are still acceptable.Angle representation could be improved with more analysis of the relevance of angles to certain movements.HCRF and HMM show slightly similar results, being HMM the least space and time consuming, therefore it is still necessary to do experiments with a greater number of sequences and groups of exercises, to determine the most appropriate

Fig. 1 .
Fig. 1.Step 1: Collect the data set using the Kinect v1 (training subset and test subset).Stage 2: Trained the training subset with four algorithm cases: HCRF with points, HCRF with angles, HMM with points and HMM with angles.Stage 3: Evaluated the test subset with all four cases.Stage 4: Analysis using the confounding matrix and the ROC curve.

Fig. 2 .
Fig. 2. Representation of the posture using characteristic points.The skeleton is made up of 20 joints, with a predefined enumeration and identification.

Fig. 3 .
Fig. 3. Representation of the posture by characteristic angles.The skeleton is formed by 18 angles, excluding angles formed by the hands and feet.Finally, 14 angles are selected.

Fig. 4 .
Fig. 4. Formation of a characteristic angle from three characteristic points of posture.A, B and C represent some adjacent characteristic point, beta the characteristic angle formed by A, B and C. Source: Kinect.

Fig. 5 .
Fig.5.Exercise.Elbow flexion[12] Of the 100 test sequences, 96 are correctly classified and 4 are incorrectly classified, figure10.Of the ten elements of class 9, 7 are correctly classified to class 9 and 3 are incorrectly classified to class 4, figure10.Of the ten elements of class 10, 9 are correctly classified to class 10 and 1 is incorrectly classified to class 5, figure10.Of the 100 test sequences, 100 are correctly classified, figure11.Of the 100 test sequences, 95 are correctly classified and 5 are erroneously classified, figure12.Of the ten elements of class 4, 8 are correctly classified to class 8 and 2 are incorrectly classified to class 1 and 10,

TABLE II .
CHARACTERISTIC ANGLES.ANGLE EXCLUDED.
and feet being of little relevance, finally fourteen angles of importance are selected, table

TABLE III .
TRAINING AND TEST DATA

TABLE V .
RESULTS