Logo image
Musical structure analysis with object detection networks
Thesis   Open access

Musical structure analysis with object detection networks

Christopher Uzokwe
Master of Science (M.S.), Drexel University
13 Dec 2021
DOI:
https://doi.org/10.17918/00000676
pdf
Uzokwe_Christopher_202113.37 MBDownloadView

Abstract

Musical composition Electrical engineering Neural networks (Computer science) Musical form Pattern perception Computer Vision Machine Learning
Musical structure analysis is the segmentation of musical audio, with the goal of partitioning the music into its separate sections, and labelling those sections (chorus, verse, intro, etc.). Musical structure analysis is a part of the larger family of music information retrieval tasks - whose goals range from artist, genre, or instrumentation identification, to audio to lyrics alignment, to musical structure analysis, and more. To date, state of the art performance in automatic musical structure analysis has been achieved using well established acoustic features with boundary detection and clustering algorithms. In other Music IR tasks, there has been significant advances in performance using deep learning, specifically Convolutional Neural Networks, which were pioneered in image understanding problems. Many have examined images of self-similarity matrices, computed from a song's acoustic features (e.g., STFT, mel-spectrogram, and mel-frequency cepstrum coefficients), which can provide a compelling visual representation of musical sections, but remain difficult to automatically segment. In this thesis, a state-of-the-art image detection network is trained on self-similarity images to bound individual segments of a song. This is motivated by the similarities between bounding objects in object detection, and bounding objects on the self-similarity matrix. The performance of a pre-built version of the network is assessed, as well as a highly modified version built from scratch. Characteristics of the network are examined, and how they transfer from image recognition to musical structure analysis. Parameters and training conditions are optimized, and methods of improvement are discussed.

Metrics

61 File views/ downloads
78 Record Views

Details

Logo image