Jamify - Model Details


The goal is to build a recommendation engine specifically for musicians that can provide musical notation to play generated from an input band. The idea behind this project is that learning to play songs is often motivated by listening to artists you really enjoy, however when looking for the music notation for these artists, they can either not exist or the skill level required to play is beyond my level.

The Data

To solve this problem I combine 2 datasets, for the recommendation engine the data of user-song listening histories comes from a subset of the Million Song Dataset which itself is over 500Gb though I use just 2.4Gb, which equates to about 17 million entries. The corresponding metadata comes from the The Echo Nest, that is now taken over by Spotify earlier in 2016.

The data on the music notation comes from scraping over 1 million webpages from ultimate-guitar.com which includes music notation for the guitar, ukulele, bass guitar, drums and violin. Most look in the following form:



Out of all the music notation scraped the distribution was as follows:

Roughly 36% of the music notation is labelled into novice, intermediate, and advanced, so it will be my goal to label the remaining based on some classification model I will build.


I've had experience with building recommendation systems before, as can be seen in R here, and using Spark here, so the reason for using Python is that I will be eventually incorporating the results into a PostgreSQL database, and then using Flask to serve the reults on a (this) website, so to do eevrything in one language would be nice. Also I would like to have the experience and challenge of building recommendation engines in Python to compare with other languages.

I will use logistic regression to make the classifications because it will be extremely insightful to discover what make songs classified as novice easy to play, and what makes songs classified advanced, hard to play. These results could be used for teaching applications in further studies.

The Recommendation Engine

From the user-song-plays dataset I will create a sparse matrix using SciPy's in-built functions. From this I will use latent semantic analysis to quickly perform cosine similarity using singular value decomposition (SVD). The dataset contains over 200,000 unique users and 300,000 unique songs, so SVD will create 2 matrices with one dimension that is highly reduced. For example if we use SVD and reduce to rank 50, we end up with 2 matrices that are around 200,000 x 50, and 50 x 300,000, their matrix multiplication is approximately equal to the original matrix. The point is that now any cosine similarity we want to perform we do on one of these smaller matrices, and since matrix multiplication has over n2 order of complexity using the Coppersmith-Winograd algorith, which is pretty much as good as it gets, any reduction in size is highly beneficial.

Jeremy Kum has a great explanation on the subject here that I will not try to outdo, for those that want a more in-depth understanding. Implementation in Python was aided by Ben Frederickson's great blog post.

Along with SVD I also wanted to try implicit matrix factorization (MF), that Chris Johnson has this paper, and credit to Ethan Rosenthal for pointing to me to the implementation.

I use cosine song-song similarity to detemnine which songs are closest to the songs in the listening history of each user, rank them according to the similarity and recommend the top k-songs.

In order to determine that I have not just got a random song generator I validate the model by performing miss-one-out for each user and trying to predict the missing song. I use the mean-average-precision (MAP) for k=100 top-ranked songs (MAP@100) on 10,000 users. For a frame of reference if the MAP@100 is equal to 1, then the algorithm always recommends the correct song at the top of the list, if the MAP@100 is zero then the algorithm never recommends any songs in the list of 100 of the top-ranked songs, that matches the missing song. Alternatively, if the MAP@100 is 0.01 the target missing song is, on average, ranked around 100 on the list of songs suggested for each user.

The results comparing SVD of 3 different ranks with ranks, implicit matrix factorization and a random song picker to verify that my algorithms were not simply picking random songs. The reults of MAP@100 are shown below:

The MAP@100 for random song picker was = 1.7424e-6 for 10,000 users, so my algorithms bested the random song picker by 4 orders of magnitude . It is also worth noting that for every increase in the SVD rank size by factor 2, i.e. 25 → 50, the time complexity increases by more than a factor of 2 due to all the matrix multiplications, so some trade-off is required.

The Most General Song

As an .interesting thought experiment I decided to fit the huge matrix of songs and users by decomposing using SVD with a rank of 1. This will essentially give me the most general song, that minimizes cosine similiarity with all other songs.

The result is the same song recommendation to all users, which was Marvin Sease' Show me what you got.

User-User versus Item-Item Similarity

Next I wanted to compare using item-item similarity with user-user similarity. User-user similarity works byfinding similar users and recommending songs based on songs in their listening history.

The results I achieved are shown below.

Again for reference the MAP@100 for the random song generator is 1.7424e-6. The songs recommended from item-item similarity were in general better than user-user similarity, since the MAP@100 was between 3 and 4 times higher, which is relieving since the user-user similarity takes longer to compute by about a factor of 10.

The model I decided with is SVD with rank 50 becuase my ultimate goal is to precompute the song and add them to a database, so that when users are using the website it is fast. To precompute all the similiarites will take a lot of time (around 1.5 - 2 days) with SVD ran 100 or implicit MF, so until I can find a free weekend I will use SVD rank 50, since it can be done overnight.

Rating Difficulty Levels of Songs

Guitar & Ukulele Chords

Rating the difficulty composed of coming up with features that I determined to be difficult for each instrument, and constructing those features with the messy and inconsistent data from scraping ultimate-guitar.com. For the music notation consisting of chords, which were tablature types labelled chords and ukulele chords, the chords themselves became natural and intuitive features, and because there are over 400 chords that appear in more than 100 unique songs, they make good features.

As some exploratory data analysis I looked at the count distribution of various chords observed in the dataset, where the different colors represent the various chord families and their size is their count frequency. We see that the major and minor chords are largest since they appear more often and the more obscure chords appear less often.

We can also see how guitar chords preogress within songs. Below is a plot of what chords follow each other with songs.

From this plot we can idetify popular chord progrssions including the most popular chord progression in western music history - D-A-Bm-G, which appears in a range of songs from Journey's Don't Stop Believin' to MGMT's Kids to A-Ha's Take on Me.

Guitar, Bass, & Drum Tabs

Tablature notation was a little different due to the inconsistent nature of the user submission of tabs, though there was some general standard format with which to compose tablature, each user generally varied their submission with their own unique style. This made it challenging .

The features for the instruments I used were:

Guitar & BassDrums
Normalized count of notes on each fretPercent bass drum
Count of notes in quick successionPercent cymbal
Number of hammer-onsPercent high-hat
Number of pull-offsPercent snare
Number of hammer-ons immediately followed by a pull-offPercent tom
Number of slides from one note to anotherPercent floor
Number of times the note is sustained
Mean fret of the song
Standard deviation of the frets
Percent of notes played

I trained the model using 10-fold cross-validdated logistic regression with an l2 regularization parameter. I compared l1 and l2 regualarization parameters and l2 out-performed l1, (accuracies will be shown later). I wanted to use logistic regression due to the interpretability of the model. The model accuracies for the various instruments were:

Guitar Chords76.9%
Ukulele Chords77.7%
Guitar Tabs52.5%

The ROC curves obtained from the model are shown below for ukulele chords, and guitar chords which performed well:

The low resolution for the advanced curves is due to the lack of advanced classified observations in the training and test set.

Other Models

The training data was fitted using a variety of other models, and the resulting accuracies of the accuracy on the validation sets and test sets are shown below. The models were fitted on 20,000 guitar tabs

Model5-fold CV AccuracyTest Accuracy
Logistic Regression - L1 penalty43.063 ± 10.614%47.618%
Logistic Regression - L2 penalty48.025 ± 5.789%49.512%
Nearest Neighbours49.788 ± 1.449%50.800%
Linear SVM49.940 ± 2.034%49.266%
RBF SVM57.712 ± 2.420%57.052%
Decision Trees57.871 ± 0.806%56.733%
Random Forest58.573 ± 1.457%58.807%
Adaboost 59.331 ± 1.326%60.179%
Naive Bayes50.850 ± 2.180%51.659%
Quadratic Discriminant47.618 ± 9.749%50.479%

We can also plot the learning curves and the ROC curves of the model used, logistic regression with l2 penalty, and adaboost.

Logistic Regression


We can see that the accuracy levels once the sample size reaches around 13,000. Also the models have a hard time predicting intermediate class music notation. However this can be overcome, by predicting the novice and advanced categories well, and the remaining will be determinedintermediate.

Interpretation of Coefficients

As a sanity check we can look at the coefficients for ukulele chord in the novice classification. The features for the model are all the ukulele chords that appear in more than 100 of the over 140,000 total ukulele songs. I hope to understand what chords make easy songs easy to play, which should have a high positive coefficients. Chords to make the song "not easy" to play will have a high negative coefficient. This is slightly different from making the song hard to play, which would have a high positive coefficient for the advanced category.

Coefficients for novice classification

C predicted easy

Abm7 not predicted easy

We can see that the model performs along the lines of inituition. Chords that appear easy to play, i.e., the C chord, make the song easy to play.


To conclude Jamify is a song recommender that uses SVD of rank 50 to recommend songs based on song-song similarity of songs by bands inputted by the user. This algorithm is used as a trade-off bewteen speed and performance in the recommendation engine. From the algorithms explored if I had the time to input into he database I would use implicit factorization as it performed the best in the test set, with the largest MAP@100 score.

Logistic regression is used to calculate the difficulty of the unlabelled music notation, and was used because of its interpretability. For those interested in understanding how this model can be applied to other songs, they can easily see which chords or techniques make songs easy or difficult to play. If I were just concerned with accuracy I would use the Adaboost classifier due to its high performance of all classifiers used. However it should be noted that the difficulty levels of those labelled may not be ground truth because it is just a single users determination of the song difficulty level. Due to this, aiming for perfect fit may not be the best use of time.

Overall this project was extremely fun to implement, if you have any questions or comments you can forward through my available streams below. And don't forget to ROCK ON