Recognizing Human Actions by Local Space Time and LS-TSVM over CUDA

Local space-time features can be used to make the events adapted to the velocity of moving patterns, size of the object and the frequency in captured video. This paper purposed the new implementation approach of Human Action Reorganization (HAR) using Compute Unified Device Architecture (CUDA). Initially, local space-time features extracted from the customized dataset of videos. The video features are extracted by utilizing the Histogram of Optical Flow (HOF) and Harris detector algorithm descriptor. A new extended version of SVM classifier which is four time faster and has better precision than classical SVM known as the Least Square Twin SVM (LS-TSVM); a binary classifier which use two non-parallel hyperplanes, is applied on extracted video features. Paper evaluates the LS-TSVM performance on the customized data and experimental result showed the significant improvements. Keywords—Motion detection; human action recognition; LSTSVM; GPU Programming; Compute Unified Device Architecture (CUDA)


INTRODUCTION I.
Action recognition from video objective is to recognize the action and goal by analyzing the series of frames and their relationship that define the classification of the action.The pose estimation is an element of computer vision to transform 3D looking objects into 2D images from the video feeds to detect the corners and edges by using free-form contours [1].Mostly it uses multiple methods and combine them consecutively to prevent the limitations of each.The pose assessment and action reorganization, both are vital elements of vision based human motion understanding.They are used in many applications like intelligent surveillance system, learning humanly moves in games and human interaction with computer systems [2], [3].
Another very important use of action recognition and pose estimation could be the storage of video as an abstract data like the human brain does.Harris detector algorithm is very efficient approach for corner detection in image processing.Motivation for the Harris detection is matching problem during the motion of pictures, pitch problem to find the best patch from first image to second.Histogram of Optical Flow (HOF) is used to detect edges from the images, it also supports the gradient structure which has property of photometric transformation, human detection, local shape, relatively invariant to local geometric transformation coarse spatial sampling and fine orientation sampling works best [4], [5].Support Vector Machines (SVM) is used to perform classification in a nonlinear manner.It can also be worded as function estimation, with the optimization of convex accompanied by the primal-dual interpretation and distinctive solution [6].Whereas, the LS-TSVM can be used as both linear and nonlinear classification including function estimation, solving linear systems, regularization networks, link with Gaussian processes and valid in primal-dual optimization formulations and high dimensional input spaces [7], [8], kernel versions of Fisher Discriminant Analysis (FDA) and Sparse approximation and robust regression [9].
Finally, LS-TSVM uses two non-parallel hyperplanes in such a manner that each of the hyperplane is close to the one of the other classes and leaves the existing concurrently.TWSVM proves itself four times faster than a normal SVM by solving two smaller size two smaller-sized Quadratic Permutation Polynomial (QPPs).SVM and TWSVM are initially developed for solving the binary classification problems.Yet, classification of the multi-class problems is usually come across in real-world situations.That"s why extension to multi-class classification problems from classical SVM and TWSVM are still ongoing research.Nevertheless, there are two serious problems in SVM for multi-class classification problems.One is the how fast a machine can learn a model and other is methods for handling potential unbalance of samples in dissimilar classes.For two different classes, the purposed LS-TSVM method, solves the unbalance problem by using different variable.Henceforth, solving linear equation system, enhanced the model learning speed and turn out to be faster.On the basis of this analysis, the paper aims to expand from SVM to LS-TSM in HAR.Linux system with GPU installed and programming was done over CUDA to increase the algorithm performance.

REPRESENTATION II.
To represent the corner detection paper uses Harris detection method which uses a gradient formulation to detect response at any shift (x,y) [5].
If E(u,v) is close constant patches, it will be near 0. E(u,v) will be higher provided unique patches.It is clear that E(u,v) should be higher.In this work, bilinear approximation for small shifts [u, v] is used and is shown below.www.ijacsa.thesai.org In the above equation, M is 2×2 matrix that is calculated by following image derivations equation: Calculating a weighted sum (simple case, w=1) which is windowing function where Ix, Iy are the product of components of gradient and the calculating the corner response by: Measure or corner response: The "k" is empirically defined constant, whose value is k = 0.04-0.06"R" only depends on eigenvalues of M, for a corner R is higher, with higher edge magnitude R is negative and for a flat region |R| is small [3], [5] as shown in Fig. 1.
Joint angles are parsed out from the captured video feeds and all the theoretical methods of action recognition are applied to them.There are so many challenges are involved while applying such approaches to the video feeds including accurately and precisely detecting and extracting joints, tracking the joints with limitation of visual, variations in size, scale, pose etc.In the field of object reorganization, the paper suggests the idea of using optical flow in motion sequence is very much efficient, based on the research and successful feature histogram results extraction.Yet, as it is known that the size of the descriptor or the number of pixels in person varies eventually.Also, there are some issues involve in using optical flow to minimize the background noise of the image and in computation as well, abnormality in scale changes, problem with direction of motion.To prevent these problems, the optical flow distribution is used.It is obvious that when the object moves with a fixed background in scene, it creates a very specific profile of optical flow.For example, a sample for waving hand sequence depicts optical flow patterns; the optical flow profile will be different at different scale of same motion or activity such as zoom-in and zoomout.In case of zoomed out the magnitude of OF vector would be smaller and vice versa.Likewise, if the waving person direction changes, the OF examined would be an image in the vertical axis to that examined.Therefore, based on optical flow, work has computed the feature that depicts the activity profile at every instance of time, which does not affect by the change of scale or direction of movement [10], [11].
Work uses local space time feature [12] to handle the moments of non-constant motion by primitive events belonging to progressive two dimensional images.
where in order to allocate position of feature using the local maxima of H = det (µ)k trace 3 (µ) over (x, y, t).
In space and time, Gaussian kernel associated spatial and temporal scale parameters (σ, τ ), are used to define spatiotemporal feature neighborhood.By the help of automatically selecting scales parameters (σ, τ ), it is feasible to adopt the feature size to match the spatiotemporal [14], [15] level of original image structure.Also, shape of the feature can be varied according to the speed of local patterns which makes the feature more steady and stable along with the use of dissimilar number of camera motions [16].In order to gain scale invariance, ineffectiveness of velocity of the camera motion, paper uses both of these methods.

CLASSIFICATION LEAST SQUARE TWIN SUPPORT III.
VECTOR MACHINE LS-TSVM is four time faster and has better precision than classical SVM in binary classification.LS-TSVM uses the classical SVM and Twin SVM to prevent the limitations of each by using two non-parallel hyperplanes.
A system of linear equations can be used to solve: Where the R is the n-dimensional real space containing the x and y the i th data sample and y i ϵ {+1, -1} is the class label.Likewise number of patterns are "l".
Decision function to classify the patterns used by SVM: SVM uses hyper-plane to separate pattern of two classes, illustrated in Fig. 2. www.ijacsa.thesai.orgFollowing are the plans in with above hyper-plan lies: Here, R is the normal vector in n-dimensional Real Space and b ϵ R is a bias term.To find R SVM solves QPP: Where C>0 represents slack variables and Ɛ i is the penalty parameter similarly i = l … l. how much the data sample is misclassified is defined by the slack variable and QPP mentioned above is solved using the dual form.SVM dual formulation changes according to the amount of patterns in the dataset.Complexity for the 1 training pattern is O(1 3 ) [6].
In order to perform classification of the patterns of two classes Twin SVM uses below mentioned decision function By optimization of a pair of QPP TWSVM can attain two non-parallel hyper-planes in order to execute classification task.QPPs are: where patterns of positive c1 comes from matrices * and negative c2 from * and know that c1 and c2 > 0 which represents the penalty parameters for misclassification of the data sample.
Two hyperplanes defined by Twin SVM which are not parallel in n-dimensional space is as follow and ( 14) In order to solve smaller size QPPs, Twin SVM used the pattern of one class to provide its" constraints.Where the complexity of the Twin SVM is O(2x(1/2)3) provided that number of patterns in both classes is almost 1/2.Hence, by the above Fig. 3, it is proved that the Twin SVM is 4x speedy than simple SVM.This work enhanced algorithm performance by using CUDA programming, with the high-performance GPU with 8 core CPU and 12 GB of RAM.
Harris Detection method is used to detect edges using the intersection of two edges and point of intersection represents direction of change in two edges.In order to detect it, high distinction of gradient of the image play a vital role.HOF method is used to detect edges from each fame of video [9], it also supports the gradient structure which has property of photometric transformation, human detection, local shape, relatively invariant to local geometric transformation coarse spatial sampling and fine orientation sampling works best.Finally, LS-SVM is used for classification of the actions.It uses the both linear and nonlinear classification and function estimation.It improves the performance and efficiency by providing the two hyperplane and finding the least distance between each hyperplane for classification.In the field of human action recognition and pose estimation, paper has demonstrated the LS-TSVM feature by analyzing the motion patterns over CUDA.This paper has implemented a novel method over GPU for action recognition using the both methods motion descriptor term as Local feature and Histogram Local Feature with LS-TSVM which proves much efficient and effective than the other approaches.In order to evaluate, it uses customized video dataset in human action recognition system.

Fig. 3 .
Fig. 3. Geometric representation of binary twin support vector machine.EXPERIMENTS IV.First it detected the video feed and converted each frame into the grayscale.Then converted grayscale images into the threshold values to start analyzing image data with Harris detection and HOF to take care of the video changing effects.After that, apply two methods for motion detection and recognition one is LS-TSVM which works mutually with motion descriptor called local features (LF) and Optical Histogram local feature.Then, paper compares both methods for performance evaluation to different approaches for classification.METHOD V.