Building a Neo4j Recommendation System with Cypher query

May 11: Melodiya, One Month Graph Challenge

Vlad Batushkov
6 min readMay 11, 2019

--

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

55 years ago, May 11, 1964 was founded main gramophone records company named “Melodiya” (Melody) of USSR. Melodiya records was a window into the music world for millions of soviet citizens and a way to become famous for performers. To be true, Melody held a monopoly on sound recording in the USSR, that is not good, for sure. I remember in my childhood in my family we keep a cardboard box full of vinyl records. And I guess 99% of them was made by Melodiya company.

Today, inspired of music topic, I plan to build a graph of music genres and groups. I will simulate some user preferences and write a recommendation query for myself based on preferences of other people with similar music interests to mine.

Schema of today’s topic as simple as possible. People like some music bands, bands play music of some genre. This is it.

Graph

Last thing, I want to do is to spend time to fill graph by data. Thank you very much last.fm for such a great web-site structure, easy to parse. They also have an API, but I don’t think it would be faster to do, than using APOC load html.

Now I have initial picture: 19 most valuable music genres. Great choice — great results. Now I need to add bands to each of it. After small research of last.fm, I also find an easy way to do it. Parse first page of bands, working in particular genre.

As you can see it is already fun. Some bands working in several genres, this is why we see here net of nodes connected to each other.

First idea comes into my head is to find a band workings in the many genres. Let’s name it «the most mix-genre bands».

Radiohead, Coldplay and Muse our mixed-genre leaders.

Also could be interesting to find a genre with bands connected with this genre, and at the same time these bands connected to other genres. Let’s name it «the most mix-band genres».

Very important to set DISTINCT in count function. Otherwise query will count same band N times more, based on N other genres connected to that brand. Alternative, Rock and Hip-hop our the most mixed-band genres these days.

Ok. Now the most hardest part. I need to add some people and their preferences. I back to web-site and found at band’s page special tab with list of Listeners. I think it is a golden mine. Those lists must give me a good amount of data of bands fans. I also almost sure, that it contains persons, who listen many bands. Because, otherwise my hopes will be crashed.

My advice is to control and limit resources, so parsing persons for each band better only for some specific genre. Also you can run query without any condition and it will run forever and on infinity more. I pick only 6 genres, that I prefer: Indie, Rock, British, Alternative, Metal, Electronic.

MATCH (p:Person)-[:LIKES]->(b:Band)
WITH p, count(b) as bands_likes
RETURN p.name as name, bands_likes
ORDER BY bands_likes DESC

As you can see, even for such a small portion of data active audience found. Now, it is a time to create my “avatar” with connections to bands, that I like (here all, that I can remember for this moment):

WITH ["system of a down", "linkin park", "franz ferdinand", "oasis", "the killers", "arctic monkeys", "daft punk", "chemical brothers", "underworld", "kasabian", "queen", "red hot chili peppers", "the strokes", "foals", "the black keys", "rammstein", "imagine dragons", "coldplay"] as favorites
MATCH (b:Band)
WHERE apoc.coll.indexOf(favorites, toLower(b.name)) > -1
MERGE (p:Person { name: "vlad batushkov" })
MERGE (p)-[:LIKES]->(b)

This is not a full picture, but by this list I can recognize my music preferences.

MATCH (p:Person { name: "vlad batushkov" })-[:LIKES]->(b:Band)-[:OF]->(g:Genre) RETURN p, b, g

Finally, I want to get the answer to the main question. What music bands can be recommended to me? Ok, how to solve this task. One of the ways is: to find all persons, who listen same music bands as me. Within these persons find bands (excluding bands, that I already like) with biggest amount of followers.

I think results is correct. I agree with results, because actually like music of Muse, The Prodigy, The Kooks too, but forget to include into my list. And all the rest recommended bands. Hmm… Now I want to listen them.

Resume

It is good example for data analyses. If you look into the domain model of other services, you can define that theirs scopes are bordered. And they are manage just several types of main entities, while all the other is just additional things to support main. This mean, we can extract such sub-domains and apply basic algorithms on it to find one or another things, like recommendations, for example.

What else I can do here? Most probably, is to apply one of the algorithms from algo library. If I can apply it, recommendation system will shine in all the beauty. And this is a good goal for me to do it in the nearest future.

Anyway, music preferences is good area to play with. I hope you like this topic and my today’s post. Write down your ideas and suggestions in comments. Your feedback very appreciated.

Similar topics

Friend’s Recommendation with Advanced Cypher Querying and APOC

Analyse Magazine Domain Model with Advanced Neo4j Cypher Querying and APOC

Analyse Neo4j Graph of Books domain model with Jaccard Similarity Algorithm

Resources

--

--

Vlad Batushkov

Engineering Manager @ Agoda. Neo4j Ninja. Articles brewed on modern tech, hops and indie rock’n’roll.