Logo image
Underwater acoustic target recognition on ShipsEar dataset: a convolution-free, purely attention-based approach with audio spectrogram transformer
Thesis   Open access

Underwater acoustic target recognition on ShipsEar dataset: a convolution-free, purely attention-based approach with audio spectrogram transformer

Tam Phi
Master of Science (M.S.), Drexel University
Jun 2023
DOI:
https://doi.org/10.17918/00001751
pdf
Phi_Tam_20233.30 MBDownloadView

Abstract

Shipsear Specaugment Electric transformers Underwater acoustics Signal Processing
The task of classifying underwater audio source has various marine-oriented applications, including maritime and environmental monitoring, detection of marine life, and underwater surveillance. However, Underwater Acoustic Target Recognition (UATR) remains challenging due to several factors. These include high cost of acquiring acoustic data, lack of of publicly accessible labeled datasets, interference induced by both underwater and over-the-air noise sources, and other environmental factors such as sound speed variations, weather, bottom boundary conditions, etc. In this thesis, we propose a convolution-free, purely attention-based transformer architecture for the task of classifying underwater acoustic source. The architecture, adopted from the Audio Spectrogram Transformer (AST) applied so far for air acoustic setting, is capable of capturing long-range global context for enhanced underwater acoustic signal classification. Additionally, we further improve the performance by leveraging transfer learning from Vision Transformer (ViT), which has been pretrained on the ImageNet dataset. The AST model is trained on the ShipsEar dataset, which consists of 90 instances of ship-emitted underwater sounds including 11 types of vessels and one background noise category. These sound recordings, obtained under real conditions in shallow waters, encapsulate both of natural and anthropogenic. Currently, the state-of-the-art Convolutional Long Short-term Memory (ConvLSTM) applied to this task achieves an accuracy of 0.9878 and an F1 score of 0.9878. However, our proposed AST model surpasses the ConvLSTM performance, reaching an accuracy of 0.9890 and an F1 score of 0.9897

Metrics

146 File views/ downloads
377 Record Views

Details

Logo image