Musical structure analysis is the segmentation of musical audio, with the goal of partitioning the music into its separate sections, and labelling those sections (chorus, verse, intro, etc.). Musical structure analysis is a part of the larger family of music information retrieval tasks - whose goals range from artist, genre, or instrumentation identification, to audio to lyrics alignment, to musical structure analysis, and more. To date, state of the art performance in automatic musical structure analysis has been achieved using well established acoustic features with boundary detection and clustering algorithms. In other Music IR tasks, there has been significant advances in performance using deep learning, specifically Convolutional Neural Networks, which were pioneered in image understanding problems. Many have examined images of self-similarity matrices, computed from a song's acoustic features (e.g., STFT, mel-spectrogram, and mel-frequency cepstrum coefficients), which can provide a compelling visual representation of musical sections, but remain difficult to automatically segment. In this thesis, a state-of-the-art image detection network is trained on self-similarity images to bound individual segments of a song. This is motivated by the similarities between bounding objects in object detection, and bounding objects on the self-similarity matrix. The performance of a pre-built version of the network is assessed, as well as a highly modified version built from scratch. Characteristics of the network are examined, and how they transfer from image recognition to musical structure analysis. Parameters and training conditions are optimized, and methods of improvement are discussed.
Metrics
54 File views/ downloads
72 Record Views
Details
Title
Musical Structure Analysis with Object Detection Networks
Creators
Christopher Uzokwe
Contributors
Youngmoo Kim (Advisor)
John MacLaren Walsh (Advisor)
Awarding Institution
Drexel University
Degree Awarded
Master of Science (M.S.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Number of pages
viii, 48 pages
Resource Type
Thesis
Language
English
Academic Unit
College of Engineering (1970-2026); Electrical (and Computer) Engineering [Historical]; Drexel University