Audio Fingerprinting with Robustness to Pitch Scaling and Time Stretching
Resumen
Current audio fingerprinting systems
are becoming increasingly robust against noise and
filter distortions, however songs that have been
pitch scaled and time stretched are still likely to
pass undetected. This research focuses on
expanding an existing landmark-based
fingerprinting method to identify songs that have
been pitch scaled and time stretched to escape
current systems while still sounding natural to the
human ear. Two feature extraction methods have
been explored with the purpose of resolving each
task individually. The constant Q spectrogram was
used for feature extraction, instead of a
conventional spectrogram, to identify songs that
have been pitch scaled. Mel-frequency Cepstral
Coefficients were used as features for the other
task. The goal is to verify whether or not low-level
spectral based features alone are capable of
handling such transformations in a song instead of
needing to use mid-level or high-level musical
features as is the case with other Song ID methods.
Key Terms - Audio Fingerprinting, Feature
Extraction, Music Information Retrieval, Music
Similarity.