Skip to content

Embed music albums into a latent space leveraging their reviews by Pitchfork.

License

Notifications You must be signed in to change notification settings

hardyqr/pitchfork-album-embedding

Repository files navigation

Pitchfork Album Embeddings

sample

[demo] (built upon Tensorflow's Standalone Embedding Projector)

Data

[download] [Reddit discussion]

This dataset lists 20873 pitchfork review articles from Jan. 5, 1999 to Jan. 11, 2019, covering entries like genre, date, score, review content, etc. Each review article corresponds to one album.

Method

Step 1: create a weighted graph of albums

We first create a weighted graph where each node represents an album. We compute the TF-IDF value between every two reviews and get a pairwise similarity matrix of all reviews(albums). Note that we do not use state-of-the-art text representation models like BERT as such representation reflects little information on the entities mentioned (which can be crucial for clustering related albums). Based on the intuition that an album's review is likely to mention similar artists/albums, we use TF-IDF to capture the keywords that may matter for describing the similarity of two albums. We set a threshold to filter out all edges below the bar and a sparser but more informative weighted graph is obtained.

Step 2: node2vec on the weighted graph

We apply weighted node2vec on the obtained graph using the default hyperparameters.

About

Embed music albums into a latent space leveraging their reviews by Pitchfork.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published