Dual Stage Learning Based Dynamic Time-Frequency Mask Generation For Audio Event Classification

Donghyeon Kim; Jaihyun Park; David K. Han; Hanseok Ko; Int Speech Commun Assoc

doi:10.21437/Interspeech.2020-2152

Back

Conference proceeding

Dual Stage Learning Based Dynamic Time-Frequency Mask Generation For Audio Event Classification

Donghyeon Kim, Jaihyun Park, David K. Han, Hanseok Ko and Int Speech Commun Assoc

INTERSPEECH 2020, v 2020-, pp 836-840

01 Jan 2020

DOI: https://doi.org/10.21437/Interspeech.2020-2152

Additional Links

Abstract

Audiology & Speech-Language Pathology

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Software Engineering

Life Sciences & Biomedicine

Science & Technology

Technology

Audio based event recognition becomes quite challenging in real world noisy environments. To alleviate the noise issue, time-frequency mask based feature enhancement methods have been proposed. While these methods with fixed filter settings have been shown to be effective in familiar noise backgrounds, they become brittle when exposed to unexpected noise. To address the unknown noise problem, we develop an approach based on dynamic filter generation learning. In particular, we propose a dual stage dynamic filter generator networks that can be trained to generate a time-frequency mask specifically created for each input audio. Two alternative approaches of training the mask generator network are developed for feature enhancements in high noise environments. Our proposed method shows improved performance and robustness in both clean and unseen noise environments.

Metrics

23 Record Views

6 citations in Web of Science

8 citations in Scopus

Details

Title: Dual Stage Learning Based Dynamic Time-Frequency Mask Generation For Audio Event Classification
Creators: Donghyeon Kim - University of Seoul
Jaihyun Park - University of Seoul
David K. Han - DEVCOM Army Research Laboratory
Hanseok Ko - University of Seoul
Int Speech Commun Assoc
Publication Details: INTERSPEECH 2020, v 2020-, pp 836-840
Series: Interspeech
Publisher: Isca-Int Speech Communication Assoc
Number of pages: 5
Grant note: US Army Research Laboratory; United States Department of Defense; US Army Research Laboratory (ARL) 2017000210001 / Korea Environmental Industry & Technology Institute (KEITI) through the Public Technology Program - Korean Ministry of Environment (MOE)
Resource Type: Conference proceeding
Language: English
Academic Unit: Electrical and Computer Engineering
Web of Science ID: WOS:000833594100173
Scopus ID: 2-s2.0-85098177852
Other Identifier: 991021930829704721

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types: Domestic collaboration; International collaboration
Web of Science research areas: Audiology & Speech-language Pathology; Computer Science, Artificial Intelligence; Computer Science, Software Engineering

Dual Stage Learning Based Dynamic Time-Frequency Mask Generation For Audio Event Classification

Additional Links

Abstract

Metrics

Details

InCites Highlights

Drexel University Social media