Underwater acoustic target recognition on ShipsEar dataset: a convolution-free, purely attention-based approach with audio spectrogram transformer

Tam Phi

doi:10.17918/00001751

Back

Underwater acoustic target recognition on ShipsEar dataset: a convolution-free, purely attention-based approach with audio spectrogram transformer

Thesis

Open access

Underwater acoustic target recognition on ShipsEar dataset: a convolution-free, purely attention-based approach with audio spectrogram transformer

Tam Phi

Master of Science (M.S.), Drexel University

Jun 2023

DOI:

https://doi.org/10.17918/00001751

Files and links (1)

pdf

Phi_Tam_20233.30 MBDownload View

PDF Open Access Open Access (License Unspecified)

Abstract

Shipsear

Specaugment

Electric transformers

Underwater acoustics

Signal Processing

The task of classifying underwater audio source has various marine-oriented applications, including maritime and environmental monitoring, detection of marine life, and underwater surveillance. However, Underwater Acoustic Target Recognition (UATR) remains challenging due to several factors. These include high cost of acquiring acoustic data, lack of of publicly accessible labeled datasets, interference induced by both underwater and over-the-air noise sources, and other environmental factors such as sound speed variations, weather, bottom boundary conditions, etc. In this thesis, we propose a convolution-free, purely attention-based transformer architecture for the task of classifying underwater acoustic source. The architecture, adopted from the Audio Spectrogram Transformer (AST) applied so far for air acoustic setting, is capable of capturing long-range global context for enhanced underwater acoustic signal classification. Additionally, we further improve the performance by leveraging transfer learning from Vision Transformer (ViT), which has been pretrained on the ImageNet dataset. The AST model is trained on the ShipsEar dataset, which consists of 90 instances of ship-emitted underwater sounds including 11 types of vessels and one background noise category. These sound recordings, obtained under real conditions in shallow waters, encapsulate both of natural and anthropogenic. Currently, the state-of-the-art Convolutional Long Short-term Memory (ConvLSTM) applied to this task achieves an accuracy of 0.9878 and an F1 score of 0.9878. However, our proposed AST model surpasses the ConvLSTM performance, reaching an accuracy of 0.9890 and an F1 score of 0.9897

Metrics

146 File views/ downloads

377 Record Views

Details

Title: Underwater acoustic target recognition on ShipsEar dataset
Creators: Tam Phi
Contributors: David Han (Advisor)
Awarding Institution: Drexel University
Degree Awarded: Master of Science (M.S.)
Publisher: Drexel University; Philadelphia, Pennsylvania
Number of pages: ix, 27 pages
Resource Type: Thesis
Language: English
Academic Unit: College of Engineering (1970-2026); Electrical (and Computer) Engineering (1970-2026); Drexel University
Other Identifier: 991021212415404721

Underwater acoustic target recognition on ShipsEar dataset: a convolution-free, purely attention-based approach with audio spectrogram transformer

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media