Application of Kinect Technology and Artificial Neural Networks in the Control of Rehabilitation Therapies in People with Knee Injuries

In the field of physiotherapy, the recognition of the poses of the human body is obtaining more research so that the patient has an accelerated recovery rate in his rehabilitation. Nowadays, it is not so challenging to have devices like Microsoft Kinect that allow us to interact with the user for the recognition of poses and body gestures. The objective of this work to capture the data of the joints of a person's body through a set of angles using the Kinect device, then artificial neural networks with the Back-Propagation algorithm were used for machine learning, and their precision was determined. The results found on the performance of the neural network show that 99.70% accuracy was achieved in the classification of the patients' postures, which can be used as an alternative in the rehabilitation therapies of patients with knee injuries. Keywords—Machine learning; artificial neural network; kinect; physiotherapy; rehabilitation


I. INTRODUCTION
The most common injuries in a person's early stage are bruises and fractures, not to mention that they can present pain from different anatomical structures, such as osteochondritis of the knee (Osgood Schlatter disease) [1]. Knee injuries are common in children, and in people who engage in some form of sports activity or overuse their knee, most studies report traumatic knee injuries [2].
According to the National Institute of Statistics and Informatics (INEI), in Peru, 5.2% of the population (1 million 619 thousand people) have some type of disability, "88.6% of the population with a disability did not receive treatment or therapy for rehabilitation and only 11.4% if they received any treatment or therapy. Among those who received treatment or rehabilitation therapy we can mention physical rehabilitation therapies (46.1%), psychological treatment (18.9%), psychiatric treatment (11.3%), speech therapy (11.0 %), emotional support (3.8%), occupational therapy (3.6%), another type (5.4%)".
The rehabilitation process is commonly carried out in a rehabilitation center, in which two main problems arise: a) in the rehabilitation environment, the patient is motivated by the therapist's guidance, while at home he is completely unmotivated, and b) some patients may perform incorrect exercises by compensating for limited movement with other stereotypes. Knee injuries, which are the most common disabling injuries in athletic and physically active individuals, can be treated with the use of knee braces and bandaging techniques that are widely used to reduce and / or prevent the severity and incidence of injuries of knee as well as the use of kinesiotaping to improve muscle strength and jump performance in the knee [3]. This is one of the main problems that patients can get bored from repeated rehabilitation activities or cannot properly perform the exercises. This can negatively influence patients to enable them to undergo rehabilitation.
On the other hand, there are several studies where the Kinect tool is used to apply its benefit to different forms of rehabilitation [4] such as the proposal to improve specific rehabilitation such as flexion, abduction and extension of the limbs through digital applications [5], in conjunction with artificial neural networks (ANN) as new approaches for the medical evaluation of injured patients [6].
In this context, the proposal is to carry out a prototype system for monitoring/training physical physiotherapy of patients with knee injuries. This system is based on artificial neural networks, where the input data corresponds to the detection of the joints that belong to the knee; for this, the Microsoft Kinect v1.0 tool is used, which provides exact measurements of this data.
The rest of this document is organized as follows: Section II, review of related works on movement recognition with Kinect; Section III, a summary of methods, data set, neural network model; Section IV presents experimental results on the recognition rate of human movements and the effectiveness in monitoring the patient, and the discussion regarding other works. Finally, we conclude the results in Section V.

II. RELATED WORK
The recognition of human movements has been studied for several years [7], and the techniques for machines to learn to recognize them have been evolving. It is where the concept of Machine Learning appears, which, through its techniques, have helped improve the performance and precision of machines to recognize specific human activities [8] [9].
In the study by Da Gama et al. [10], a review of interactive systems is made to help patients in various therapies. Among these technologically advanced systems is Microsoft's Kinect, which has helped lead the way in how user interaction technology facilitates and complements many clinical applications.
Another work presented by Morando et al. [11] makes use of various technologies such as Kinect, Leap Motion, Band 2. The work presents a whole system that helps the patient to perform his rehabilitation exercises based on serious games. Some indicators are defined that provide information about the patient's performance and an upcoming evaluation to determine if the patient performs an adequate or wrong movement. In [12] it is corroborated that the Kinect allows the measurement of patient movement during training and exercise, providing good quality medical images with sufficient precision for clinical practice, in addition to being less expensive than most medical detection devices, which made it viable and a good fit for this job.
Other studies such as in [13] that propose techniques to improve the precision of gesture recognition and movement analysis, for this, based on the Kinect data, they extract three essential characteristics for physiotherapy exercises: body posture, movement trajectory, and range of movement. With these data, using techniques such as Hidden Markov Model (HMM) and Dynamic Time Warping (DTW), improved accuracies were obtained with up to 56% for HMM and 32% for DTW.
There are several works in this regard, such as in [14], which is one of the first to use two techniques, such as Support Vector Machines (SVM) [15] and Random Forests (RF) to classify the cinematic activities read by Kinect accurately. This study establishes a comparison in terms of classification performance between these two techniques, wherein the end, it is concluded that RF exceeds SVM in precision, but SVM requires less time to train compared to RF.
Neural networks have been used in [16] where a method for estimating human pose in real-time for multiple people from video using convolutional neural networks has been presented. In other work of Morando et al. [17], which is the continuation of [11] where Machine Learning techniques are used to recognize human movements and evaluate patient performance in an automated way, the same idea has come in several recent research but using newer techniques.
The reliability and validity of a new software program based on Kinect is seen in the work of Ressman, Rasmussen-Barr an Grooten [18], the objective of this study was to establish the test-retest reliability and the validity of the Qinematic (TM) construct to evaluate the activity of the leg. The results of the construct validity study indicate that Qinematic (TM) at 6 degrees of medial displacement can identify subjects with a knee-standing position.
As described, most jobs vary based on knee injury problems. From the reviewed studies, we saw that an alternative way to support knee injury rehabilitation is by using a Kinect device. Also, the data that is captured with the Kinect device can be adequately managed using machine learning techniques such as artificial neural networks.

III. MATERIALS AND METHODS
The present investigation is exploratory and descriptive. The perception of therapists in hospitals for the recovery of knee injuries in patients is analyzed, as well as the characteristics of their body postures. In this way, the facilities for rehabilitation in people's health are identified.

A. Materials
The following tools were used for software development with Kinect and with the c # programming language: • Kinect v1.0 sensor.

B. Description of the Ontology of the Proposal
Our proposal consists of a physiotherapeutic rehabilitation monitoring system; for this, we use the Kinect sensor to capture the user's joints (corresponding to the lower limbs) in real-time. On the other hand, we have artificial intelligence in charge of recognizing and classifying the movements that said user is making during his physical rehabilitation session. To understand the problem domain, an ontology was designed. Fig. 1 shows all the elements that intervene in the functionality of the system and how they are related, all based on classes, class properties, associations or dependencies, and inheritance.
As can see in the hierarchy, several elements intervene, but of all of them, the main classes are the ones that list: 1) User: General information about who the user is, plus the anatomical parts of the body and joints observed: Body classes, part of the body, part, limb, arm, leg, bone and joint: Classes Body, BodyPart, Part, Limb, Arm, Leg, Bone, and Joint.
2) Sensor: Raw data provided by an acquisition device, such as a sensor like Kinect or other.
3) Gestures and poses: Gesture, GestureSegment, and Pose classes. 4) NeuralNetwork: It provides the machine learning model, as well as the algorithm to be trained and recognize or predict a posture.

C. Description of the Physiotherapeutic Monitoring System
Our proposal follows a three-layer architecture. As seen in Fig. 2, the left part of the system interacts with the physiotherapist through a web application that can be accessed from a PC or smartphone; this provides maintainability and scalability; while on the other hand, the patient interacts with an application based on Windows Presentation Foundation (WPF) of Microsoft.NET to create Windows applications of secure development. This application is connected to a Microsoft Kinect v1.0 sensor. Communication between the application server and web client is done through HTTPS, which is recommended for security, since the application treats sensitive private data of people, such as the health of a patient. Table I shows the data dictionary used in the system.
Both the physiotherapist and the patient must validate their user to enter the system; in the case of the patient, this is very important since their account also maintains information (stored in a database) about their progress with rehabilitation exercises. The physical therapist reviews such progress. The WPF application is the one that implements the Back-Propagation algorithm in order to recognize the patient's movements and detect if the patient performs the proposed exercise correctly or incorrectly.

D. Artificial Neural Networks
The method used to classify human postures involves using an artificial neural network with dense layers. Entries for ANN are expressed by feature vectors associated with human poses captured by Kinect.
Likewise, the output of ANN data is expressed by values 0 and 1, which corresponds to an incorrect and correct posture, respectively. The output is calculated by an activation function f (1). ANN training was performed using the Back-Propagation algorithm, as shown in Eq. 1.
Where: α is a parameter of the model.
On the other hand, for the development part of the neural network, it has been decided to use AForge.NET [19]. It has a simple API for the construction of neural networks with different algorithms to train, such as BackPropagation, DeltaRule, ElasticNetwork, among others. AForge.NET is used as a tool to develop the motion detection module, and it is comprised of a set of libraries that are useful for neural networks. Therefore, C # was used for development, and it is easy to integrate into Aforge.NET, the tool selected for the motion detection module [20].  • MainWindows: Represents the WPF application that instantiates components of the Kinect SDK. Displays a frame that draws the skeleton in interconnected green segments. Also, here the posture recording processes are executed, managed by concurrent flags and threads. Kinect v1 provides 30 FPS, and since each frame represents a SkeletonFrame (containing the required Skeleton object), it can achieve 30 Skeleton objects per second.
• PostureRecognition: Instances a NeuronNetwork object, in addition to having the main training and predict methods, to be used directly by a user.
• NeuronNetwork: Uses the ActivationNetwork class, which is the neural network itself, and creates an instance of BackPropagationLearning (both AForge.NET classes) that executes the neural network training. 511 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 8, 2020 • NeuralNetworkPatternBase: Provides how data is processed before entering the neural network. Our approach is based on the use of angles as in [21], that is, we take the Skeleton objects stored somewhere, then we access the Joins properties of each object, and with these data, we calculate these angles, which serve as inputs to the neural network.

E. Data Collection and Exercises
The data set that we use for the training and evaluation of the model was obtained from the postures performed by the patients with the supervision of a specialist. These data were captured through a Microsoft Kinect device, where the time of each session lasts between 7 and 15 seconds, where the correct posture is first recorded, and then the patient performs the postures to determine if they are doing it correctly.
In Fig. 4, the points recognized by the Kinect device are shown, which correspond to the joints of the human body that consider 20 points of the skeleton. The skeletal tracking system of the Kinect provides for each posture a set of twenty joints expressed by coordinates (x, y, z), where these coordinates are evaluated as vectors for every three joints that give the angles that serve for the predictions of the postures through the supervised neural network. We focus only on the knee joint; it also takes some arm movements for the balance of the patient, to describe the specific physiotherapy postures [22] [23] [24]. The coordinates x i , y i , z i (0 < i < 19) depend on the size and position of the patient in the scene. The approach used in this work is that with the angles formed between three points through geometry, the angles of the joints are obtained that will give us a better perception of the results, Those are the entries for the network after doing the training is saved in a file in .td format, inside a folder called Training-3, then the trained network is saved in a file for later use. Table II shows the four postures that are considered for knee therapy. The postures that the child must simulate for his recovery and that the program will tell him if I do the correct posture. The first posture is Standing since the knees must be very straight, posture 2 is Frog here the knee bends an angle of more than 90 degrees, posture 3 is Lateral Frog here you can see from another view the angles that the knees form if they are uniform and posture 4 in Right knee raised were to raise the bent right knee with the support of the arm lift for the balance of the child. Then the patient will stand in front of the Kinect, and do they pose so that the system displays the name of the pose only if it was done correctly, as shown in Fig. 5.  Vol. 11, No. 8, 2020 F. Pre-Processing of Skeleton Data To improve and optimize the training process is necessary to have the normalized data, we focus on having a representation of the human body independent of the position of the sensor, the distance of the body and the sensor and the structure of the patient's body.
Since the skeleton is represented in three-dimensional space, the first feature extracted is the position. As each joint has three values of three coordinates (x, y, z), and a skeleton consists of 20 joints. However, we only use 14 joints; therefore, our data will have a dimension of 32. This can be seen in Fig. 5, where each join (A, B, C ..., T) that is relevant, is labeled for the capture of the different postures except for the joins L, P, CT, and R.
On the other hand, in the pre-processing, it is necessary that, from the 14 joints, the ten angles are obtained for the characteristic vector. These angles are generated from the union of 2 adjacent vectors, for example, the ) they represent angles that are obtained from the GHS and EFQ joins respectively. Therefore, the feature vector consists of 10 angles, as shown in Eq. 2: (2) Fig. 6 shows the Joint captured by the Kinect.

G. Model Training
For training, a person executes the postures defined for rehabilitation [9]. This allows 90 samples to be recorded for each pose. In the end, it has been possible to collect (21 poses x 90 samples) 1890 samples distributed among 11 different positions, of which four were used, which are shown in Table  II, for more details of how the data of a sample are, before processing it is shown in Fig. 7.
These samples are divided into two random sets: m train for training and m test to calculate the precision of the model.
The method used to classify human postures involves using a dense, layered artificial neural network. The choice of the ANN was made after making a comparison of the precision between different techniques in recognition of postures. Table  III shows the comparison of the precision of techniques in recognition of postures.    Table III shows that for the BLSTM (bidirectional shortterm memory neural network) [25] where the 3D skeleton is detected for each user in the study group, the precision achieved is 70.72%. On the other hand, RF obtained an average general precision of 85.17%, surpassing SVM, which obtained 83.05%, also mentioning that SVM has performance advantages in training time, and RF is more suitable for classifying activities in real-time. However, the use of ANN in the creation of a platform for the recognition of human poses and gestures in order to help patients suffering from stroke disease, an accuracy of 92.00% was obtained [8]. Therefore, we see that ANN's offer excellent precision when dealing with body postures.
Entries for ANN are expressed by feature vectors associated with human poses captured by Kinect. The ANN also has a configuration adapted to the number of input values and the number of output classes. Here are more details: • the neural network is trained together with the Back-Propagation algorithm.
• the α parameter of the neural network is configured with an initial value of 1.0.
• the network receives inputs consisting of 10 real numbers (angles from 0 to 180 degrees), • the network has a hidden layer of 20 neurons.
• the network output is made up of 11 neurons that correspond to the total number of positions defined to train the model, and 513 | P a g e www.ijacsa.thesai.org • the number of iterations for the training, when dealing with angles between 0 to 180 degrees, up to 100,000 iterations had to be used to achieve reasonable precision rates. Fig. 8 shows the Neural network diagram. In the testing stage, some postures made by a patient are recorded, each sample is processed, and ten angles that form a vector of characteristics are extracted, and then this vector is entered into the neural network model and the predicted class and its associated precision.

IV. RESULTS AND DISCUSSION
The tests were available, with hundreds of samples distributed among the four positions considered. The results are displayed on the console, following the form: precision and classified posture. Fig. 8 shows the results that were obtained from the tests carried out with the stored objects.
The first seven samples were extracted from each test posture. The prediction results are shown in Table III. For reasons of space, each posture has been identified by a letter: • D -Frog (from the front) Fig. 9 shows the sample of tests performed on saved objects. Table IV shows that the samples reach a significantly high precision (the highest of 99.70% and the lowest of 98.22%, both averages), but a detail to consider is that the most challenging positions to execute present a slight variation between the training data and are therefore less likely to achieve the best accuracy.
On the other hand, in the neural network, we used a Sigmoid activation function, which gave good results for a 2layer model; also the initial parameter α of 1.0 was not so decisive, this is derived from the results seen by Choubik and Mahmoudi [9] when using a network with a number of different layers (from 1 to 4), where the parameter α did not make the difference but, using another activation function.  However, Choubik and Mahmoudi also show other clustering-based learning techniques such as Support Vector Machines (SVM) or k-nearest neighbors (KNN), in the case of SVM, accuracies of up to 100% were obtained using a linear kernel function, while with KNN accuracies were also obtained of 100%, although SVM showed higher precision rates as the amount of data increased more. Le and Nguyen [21] used SVM in the same way, obtaining a precision of 100%, testing with different combinations of angles formed by the joints of the body.

V. CONCLUSIONS
In this work, we proposed one more way to take the data provided by Kinect to convert it into an entry that represents as much information as possible about the body joints, all in order to recognize postures. The entrances are represented by ten angles made up of different groups of three-body joints.
The developed ontology allowed to broadly identify the most outstanding classes or entities of the application, as well as their relationships with other parts of the system. 514 | P a g e www.ijacsa.thesai.org The neural network trained with the Back-propagation algorithm allowed to give good predictions of the postures based on the angles of the joints, and the more data have, the more significant the recognition precision.
Despite having obtained high classification precision such as 99.70%, there is still room for improvement, considering that Clustering techniques are usually better with data of many dimensions, we may consider extending this work in the future using SVM as learning.