3D Skeleton model derived from Kinect Depth Sensor Camera and its application to walking style quality

Feature extraction for gait recognition has been created widely. The ancestor for this task is divided into two parts, model based and free-model based. Model-based approaches obtain a set of static or dynamic skeleton parameters via modeling or tracking body components such as limbs, legs, arms and thighs. Model-free approaches focus on shapes of silhouettes or the entire movement of physical bodies. Model-free approaches are insensitive to the quality of silhouettes. Its advantage is a low computational costs comparing to model-based approaches. However, they are usually not robust to viewpoints and scale. Imaging technology also developed quickly this decades. Motion capture (mocap) device integrated with motion sensor has an expensive price and can only be owned by big animation studio. Fortunately now already existed Kinect camera equipped with depth sensor image in the market with very low price compare to any mocap device. Of course the accuracy not as good as the expensive one, but using some preprocessing we can remove the jittery and noisy in the 3D skeleton points. Our proposed method is part of model based feature extraction and we call it 3D Skeleton model. 3D skeleton model for extracting gait itself is a new model style considering all the previous model is using 2D skeleton model. The advantages itself is getting accurate coordinate of 3D point for each skeleton model rather than only 2D point. We use Kinect to get the depth data. We use Ipisoft mocap software to extract 3d skeleton model from Kinect video. From the experimental results shows 86.36% correctly classified instances using SVM.


INTRODUCTION
In recent years, there has been an increased attention on effectively identifying individuals for prevention of terrorist attacks.Many biometric technologies have emerged for identifying and verifying individuals by analyzing face, fingerprint, palm print, iris, gait or a combination of these traits [1][2][3].
Human Gait as the classification and recognition object is the famous biometrics system recently.Many researchers had focused this issue to consider for a new recognition system [4][5][6][7][8][9][10][11].Human Gait classification and recognition giving some advantage compared to other recognition system.Gait classification system does not require observed subject's attention and assistance.It can also capture gait at a far distance without requiring physical information from subjects.
There is a significant difference between human gait and other biometrics classification.In human gait, we should use video data instead of using image data as other biometrics system used widely.In video data, we can utilize spatial data as well as temporal data compare to image data.
There are 2 feature extraction method to be used in gait classification: model based and free model approach [12].Model-based approaches obtain a set of static or dynamic skeleton parameters via modeling or tracking body components such as limbs, legs, arms and thighs.Gait signatures derived from these model parameters employed for identification and recognition of an individual.It is obvious that model-based approaches are view-invariant and scale-independent.These advantages are significant for practical applications, because it is unlikely that reference sequences and test sequences taken from the same viewpoint.Model-free approaches focus on shapes of silhouettes or the entire movement of physical bodies.Model-free approaches are insensitive to the quality of silhouettes.Its advantage is a low computational costs comparing to model-based approaches.However, they are usually not robust to viewpoints and scale.
Gait therapist have a problem to calculate the quality improvement of the therapy that they did.They could calculate the gait quality using some device in the lab and in practical not too efficient.We propose a method that can measure gait disable quality and classify the result by only capturing the object walking in front of camera.
Imaging technology developed quickly this decades.Motion capture (mocap) device integrated with motion sensor has an expensive price and can only be owned by big animation studio.Fortunately now already existed Kinect camera equipped with depth sensor image in the market with very low price compare to any mocap device.Of course the accuracy not as good as the expensive one, but using some preprocessing we can remove the jittery and noisy in the 3D skeleton points.Our proposed method is part of model based feature extraction and we call it 3D Skeleton model.3D skeleton model for extracting gait itself is a new model style considering all the previous model is using 2D skeleton model.The advantages itself is getting accurate coordinate of 3D point for each skeleton model rather than only 2D point.We use Kinect to get the depth data.We use Ipisoft mocap software to extract 3d skeleton model from Kinect video.Those 3D skeleton model exported to BVH www.ijarai.thesai.organimation standard format file and imported to our programming tool which is Matlab.We use Matlab to extract the feature and use a classifier.We create our own gait disable dataset in 3D environment since there are not exist such a dataset before.

II. PROPOSED METHOD
The classification of disable gait quality in this paper consists of three part, preprocessing, feature extraction, and classification.Figure 1 shows the complete overview of proposed human disable gait quality classification.

A. Preprocessing
First, the Video data using Kinect and IpiRecorder to record the depth data along with RGB video data is captured.To get the video data, there are some recommendation should be considered: 1) using 9 by 5 feet room space, to get best capture.
2) Object should be dressed in casual slim clothing, avoid shiny fabrics.
3) We should ensure that the whole body including arms and legs is visible during the recording states.beginning from T-Pose and the recording can be started.
Second, the depth video data in IPISoft motion capture application is processed.IPISoft will create the 3D skeleton model from video depth recorded using some tracking motion method.The first step is to take only the gait scene, and remove unimportant video scene or we call the Region of Interest (ROI) video.Figure 3 below show the example of video recording.
Third, Create the skeleton 3d model using the tracking motion method, remove the jittery and noises, and export the skeleton model to BVH file format in IPISoft.
Fourth, Read the BVH file, extracted the feature, and classify the feature.

B. Dataset
Unfortunately, there are no Kinect Video Depth gait dataset exists until now.All exist gait dataset is using ordinary camera like USF gait dataset, SOTON gait dataset, and CASIA gait dataset.Figure 2 show the example of CASIA gait dataset.To get the 3D coordinates from skeleton model, we need to get the tree structure and the channel data first.Those data provided by BVH File. Figure 6 show the result of the skeleton model in 3D coordinates.Once we got the coordinates, we can calculate the angle of knee using 3 coordinate of Hip, Left Thigh, and Left Shin from the skeleton.
Thus the angle of  is represented as follows: C. Feature Extraction Using the knee angle feature we created, we can extract the features required for human gait classifications.The angles of knee angle can be calculated using simple trigonometry formula as shown in Figure 7 and equation [1] - [4].Illustration to calculate angle between two lines using points coordinates Figure 8 shows examples of the extracted features.In the Figure 8 there are seven peoples' knee angles of features.Their human gaits are evaluated with five grades from A to E. Gait cycles are different each other.Therefore, normalization is required with the longest human gait.Figure 9 shows the normalized human gait derived knee angles of features.

D. Classification
SVM (Support Vector Machine) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis.The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probabilistic binary linear classifier.www.ijarai.thesai.orgIII.EXPERIMENTAL RESULT Using C4.5 pruned tree, we got 77.27% correctly classified instances.Table 1 shows Confusion Matrix from C4.5 Classifier.Also Table 1 shows the detailed accuracy by class using C4.5 Classifier.The decision tree results from C4.5 is as follows,

IV. CONCLUSION
The proposed method uses Kinect depth sensor camera and Ipisoft motion capture software to generate 3D skeleton model.Ipisoft itself is special purpose application to create skeleton so user can use the motion to their computer generated character motion.The skeleton generated will then extract the knee angle feature and use the feature to measure the gait disable quality.The purpose of this research is to analyze whether Kinect and Ipisoft can be used for extracting feature from 3D skeleton in gait related research.Experiments are done using knee angle feature and 18 video dataset in each class.From the experimental results shows 86.36% correctly classified instances using SVM.

Fig. 2 .
Fig.2.Example of CASIA gait datasetTo conduct the experiment, we should prepare the dataset.We will use the Kinect Gait Dataset to measure gait quality in disable person.The proposed research will analyze the capability of Kinect and 3D Skeleton model accuracy for gait classification.We also use fake gait patient to be the subject of disable gait person.The subject is analyzing neuropathic patient and classify the gait quality into 5 classes.The first class is for normal gait and calls it class A. The worst quality for neuropathy gait is called class E. The dataset provide 18 Videos each class, thus total data is 90.The dataset will provide knee left angle feature only because there will only left foot simulated.

Figure 3
Figure 3 shows the T-pose position before the video recording start.The top right image showing the RGB video sequence.The color gradient used to represents the depth in video data.Blue color means the object is close to the camera and red color means the object is far from camera.

Fig. 3 .Figure 4
Fig.3.T-Pose Position before the recording begin Figure 4 shows the 3D skeleton tracking motion sequence.First task is specifying subject's physical parameter like gender and height.IpiSoft will detect the ground plane automatically and provide the 3D skeleton in T-Pose position.Our next job is try to put the T-Pose skeleton in the same position with the subject T-Pose position in the first sequence of video.This time also we should determine the Region of Interest video to be processed.Instead of all the video sequence that we use, we could only take the most important part of the video sequence.Once we put the skeleton to the same position with the subject,

Fig. 7 .
Fig.7.Illustration to calculate angle between two lines using points coordinates

Figure 7
Fig.9.Figure 7 after normalization Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other.An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible.New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

TABLE II .
Detailed accuracy by class using C4.5 Classifier Using SVM, we got 86.36% correctly classified instances.

TABLE IV .
Detailed accuracy by class using SVM classifier