Exploring Regression-based Approach for Sound Event Detection in Noisy Environments

Soham Dinesh Tiwari; Karanth Shyam Subraya

doi:10.14569/IJACSA.2022.01307102

DOI: 10.14569/IJACSA.2022.01307102

PDF

Exploring Regression-based Approach for Sound Event Detection in Noisy Environments

Author 1: Soham Dinesh Tiwari

Author 2: Karanth Shyam Subraya

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 13 Issue 7, 2022.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Sound-event detection enables machines to detect when a particular sound event has occurred in addition to classifying the type of event. Successful detection of various sound events is paramount in building secure surveillance systems and other smart home appliances. However, noisy events and environ-ments exacerbate the performance of many sound event detection models, rendering them ineffective in real-world scenarios. Hence, the need for robust sound event detection algorithms in noisy environments with low inference times arises. You Only Hear Once (YOHO) is a purely convolutional architecture that uses a regression-based approach for sound-event-detection instead of the more common, frame-wise classification-based approach. The YOHO architecture proved robust in noisy environments, outperforming convolutional recurrent neural networks popular in sound event detection systems. Additionally, different ways to enhance the performance of the YOHO architecture are explored, experimenting with different computer vision architectures, dy-namic convolutional layers, pretrained audio neural networks and data augmentation methods to help improve the performance of the models on noisy data. Amongst several modifications to the YOHO architecture, the Frequency Dynamic Convolution Layers helped improve the internal model data representations by enforcing frequency-dependent convolution operations, which helped improve YOHO performance on noisy audios in outdoor and vehicular environments. Similarly, the FilterAugment data augmentation method and Convolutional Block Attention Module helped improve YOHO’s performance on the VOICe dataset containing noisy audios by augmenting the data and improving internal model representations of the input audio data using attention, respectively.

Keywords: Sound Event Detection (SED); sound event clas-sification; frequency dynamic convolution; audio processing; Fil-terAugment; data augmentation; vision transformers; Pretrained Audio Neural Networks (PANN); Convolutional Block Attention Module (CBAM)

Soham Dinesh Tiwari and Karanth Shyam Subraya, “Exploring Regression-based Approach for Sound Event Detection in Noisy Environments” International Journal of Advanced Computer Science and Applications(IJACSA), 13(7), 2022. http://dx.doi.org/10.14569/IJACSA.2022.01307102

@article{Tiwari2022,
title = {Exploring Regression-based Approach for Sound Event Detection in Noisy Environments},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2022.01307102},
url = {http://dx.doi.org/10.14569/IJACSA.2022.01307102},
year = {2022},
publisher = {The Science and Information Organization},
volume = {13},
number = {7},
author = {Soham Dinesh Tiwari and Karanth Shyam Subraya}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Exploring Regression-based Approach for Sound Event Detection in Noisy Environments

Upcoming Conferences