An End-to-End Model of ArVi-MoCoGAN and C3D with Attention Unit for Arbitrary-view Dynamic Gesture Recognition

Huong-Giang Doan; Hong-Quan Luong; Thi Thanh Thuy Pham

doi:10.14569/IJACSA.2024.01503122

DOI: 10.14569/IJACSA.2024.01503122

PDF

An End-to-End Model of ArVi-MoCoGAN and C3D with Attention Unit for Arbitrary-view Dynamic Gesture Recognition

Author 1: Huong-Giang Doan

Author 2: Hong-Quan Luong

Author 3: Thi Thanh Thuy Pham

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 15 Issue 3, 2024.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Human gesture recognition is an attractive research area in computer vision with many applications such as human-machine interaction, virtual reality, etc. Recent deep learning techniques have been efficiently applied for gesture recognition, but they require a large and diverse amount of training data. In fact, the available gesture datasets contain mostly static gestures and/or certain fixed viewpoints. Some contain dynamic gestures, but they are not diverse in poses and viewpoints. In this paper, we propose a novel end-to-end framework for dynamic gesture recognition from unknown viewpoints. It has two main components: (1) an efficient GAN-based architecture, named ArVi-MoCoGAN; (2) the gesture recognition component, which contains C3D backbones and an attention unit. ArVi-MoCoGAN aims at generating videos at multiple fixed viewpoints from a real dynamic gesture at an arbitrary viewpoint. It also returns the probability that a real arbitrary view gesture belongs to which of the fixed-viewpoint gestures. These outputs of ArVi-MoCoGAN will be processed in the next component to improve the arbitrary view recognition performance through multi-view synthetic gestures. The proposed system is extensively analyzed and evaluated on four standard dynamic gesture datasets. The experimental results of our proposed method are better than the current solutions, from 1% to 13.58% for arbitrary view gesture recognition and from 1.2% to 7.8% for single view gesture recognition.

Keywords: Dynamic gesture recognition; attention unit; generative adversarial network

Huong-Giang Doan, Hong-Quan Luong and Thi Thanh Thuy Pham, “An End-to-End Model of ArVi-MoCoGAN and C3D with Attention Unit for Arbitrary-view Dynamic Gesture Recognition” International Journal of Advanced Computer Science and Applications(IJACSA), 15(3), 2024. http://dx.doi.org/10.14569/IJACSA.2024.01503122

@article{Doan2024,
title = {An End-to-End Model of ArVi-MoCoGAN and C3D with Attention Unit for Arbitrary-view Dynamic Gesture Recognition},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2024.01503122},
url = {http://dx.doi.org/10.14569/IJACSA.2024.01503122},
year = {2024},
publisher = {The Science and Information Organization},
volume = {15},
number = {3},
author = {Huong-Giang Doan and Hong-Quan Luong and Thi Thanh Thuy Pham}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

An End-to-End Model of ArVi-MoCoGAN and C3D with Attention Unit for Arbitrary-view Dynamic Gesture Recognition

Upcoming Conferences