Keyword Spotting (KWS) is an essential component in a smart device for alerting the system when a user prompts it with a command. As these devices are typically constrained by computational and energy resources, the KWS model should be designed with a small footprint. In our previous work, we developed lightweight dynamic filters which extract a robust feature map within a noisy environment. The learning variables of the dynamic filter are jointly optimized with KWS weights by using Cross-Entropy (CE) loss. CE loss alone, however, is not sufficient for high performance when the SNR is low. In order to train the network for more robust performance in noisy environments, we introduce the LOw Variant Orthogonal (LOVO) loss. The LOVO loss is composed of a triplet loss applied on the output of the dynamic filter, a spectral norm-based orthogonal loss, and an inner class distance loss applied in the KWS model. These losses are particularly useful in encouraging the network to extract discriminatory features in unseen noise environments.
Discriminatory and orthogonal feature learning for noise robust keyword spotting
Creators
Donghyeon Kim - Korea University
Kyungdeuk Ko - Korea University
David K. Han - Drexel University
Hanseok Ko - Korea University
Dong Ho Kim - Psychology
Publication Details
IEEE signal processing letters, v 29, pp 1-5
Publisher
IEEE
Grant note
2021002280004 / Korea Ministry of Environment(MOE)
Korea Environment Industry Technology Institute(KEITI) through Exotic Invasive Species Management Program
Resource Type
Journal article
Language
English
Academic Unit
Psychological and Brain Sciences (Psychology); Electrical and Computer Engineering
Web of Science ID
WOS:000853834100005
Scopus ID
2-s2.0-85137881036
Other Identifier
991019173704604721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool: