Thesis
A machine learning and pitch-invariant approach to automatic music transcription for the saxophone
Master of Science (M.S.), Drexel University
May 2020
DOI:
https://doi.org/10.17918/00000324
Abstract
This paper is about the automatic transcription of the saxophone in polyphonic music. The ultimate solution that was found was using a pitch invariant feature we call "bands with combs". This feature is derived from determining where the peak frequency of the enhanced autocorrelation spectrogram (from audacity) is, and then using this to extract the harmonic information from the regular spectrogram. This feature is then trained and predicted on with a neural network and RNN with LSTM. With this seemingly novel approach to pitch invariant training and irrelevant audio disregarding, a 94% note identification accuracy was seen on real world saxophone jazz examples with complex note articulations. This paper also looked into alternative solutions such as high confidence relearning. Which is an attempt to become familiarized with the sounds of a song that are being predicted on, by incorporating high confidences samples from that song into the training set. This approach when used until convergence, found particularly high success in the removal of the note labels in non note areas while still maintaining an overall positive gain in the note labeling.
Metrics
54 File views/ downloads
80 Record Views
Details
- Title
- A machine learning and pitch-invariant approach to automatic music transcription for the saxophone
- Creators
- Michael Bok
- Contributors
- Brian Stuart (Advisor)
- Awarding Institution
- Drexel University
- Degree Awarded
- Master of Science (M.S.)
- Publisher
- Drexel University; Philadelphia, Pennsylvania
- Number of pages
- viii, 54 pages
- Resource Type
- Thesis
- Language
- English
- Academic Unit
- Computer Science (Computing) (2013-2026); College of Computing and Informatics (2013-2026); Drexel University
- Other Identifier
- 991014915049204721