Logo image
A machine learning and pitch-invariant approach to automatic music transcription for the saxophone
Thesis   Open access

A machine learning and pitch-invariant approach to automatic music transcription for the saxophone

Michael Bok
Master of Science (M.S.), Drexel University
May 2020
DOI:
https://doi.org/10.17918/00000324
pdf
Bok_Michael_20202.81 MBDownloadView

Abstract

Music transcription Saxophone Machine Learning
This paper is about the automatic transcription of the saxophone in polyphonic music. The ultimate solution that was found was using a pitch invariant feature we call "bands with combs". This feature is derived from determining where the peak frequency of the enhanced autocorrelation spectrogram (from audacity) is, and then using this to extract the harmonic information from the regular spectrogram. This feature is then trained and predicted on with a neural network and RNN with LSTM. With this seemingly novel approach to pitch invariant training and irrelevant audio disregarding, a 94% note identification accuracy was seen on real world saxophone jazz examples with complex note articulations. This paper also looked into alternative solutions such as high confidence relearning. Which is an attempt to become familiarized with the sounds of a song that are being predicted on, by incorporating high confidences samples from that song into the training set. This approach when used until convergence, found particularly high success in the removal of the note labels in non note areas while still maintaining an overall positive gain in the note labeling.

Metrics

54 File views/ downloads
80 Record Views

Details

Logo image