Friend’s Recommendation with Advanced Cypher Querying and APOC

May 29: Everest, One Month Graph Challenge

Vlad Batushkov

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

On May 29, 1953, the highest peak of the Himalayas and the world — Jomolungma (Sagarmatha, Everest) — 8848 meters above sea level was conquered for the first time by New Zealander Edmund Hillary and Sherpa Norgay Tenzing.

I am person, for whom mountains is a ski, not climbing. This is why I want to analyze not the highest mountains, but ski resorts and countries good to visit, if you are interesed in this kind of activity.

But results would not be real recommendations by real skiers or snowboarders. I don’t have enough time and data for this. I simply will generate them.

So my plan is to generate some bulk of friends of skiers and snowboarders, let them visit bunch of ski centers and then share to each other some recommendations. As any other day of challenge, I will do some small graph and work with Cypher and APOC.

Graph

List of countries and ski resorts comes from SKI.RU:

WITH [{ name: "Russia", id: "1" },{ name: "Finland", id: "2" },{ name: "Austria", id: "3" },{ name: "France", id: "4" },{ name: "Switzerland", id: "5" },{ name: "Italy", id: "6" },{ name: "Andorra", id: "7" },{ name: "Bulgaria", id: "8" },{ name: "Sweden", id: "9" },{ name: "Germany", id: "10" },{ name: "Norway", id: "11" },{ name: "Slovakia", id: "12" },{ name: "Czechia", id: "16" }] as cs
UNWIND cs as c
WITH "https://www.ski.ru/az/resorts/country/" + c.id as url, c
CALL apoc.load.html(url, { name: "table.curorts_table:eq(0) tbody tr td:eq(1) a" }) YIELD value
UNWIND value.name as item
WITH apoc.text.regexGroups(item.text, "\\((.+)\\)")[0][1] as name, c
WHERE name IS NOT NULL AND apoc.text.indexOf(name, '�') = -1
MERGE (cc:Country { name: c.name })
MERGE (sr:SkiResort { name: name })
MERGE (sr)-[:AT]->(cc)

As usually, not everything is perfect with parsing, so I prevented some ugly data from usage.

I have 13 countries and 405 ski resorts. Now let’s build a small society of 100 persons and 150 friend relationships.

MATCH (n) DETACH DELETE n;
CALL apoc.generate.er(100, 150, 'Person', 'FRIEND');

APOC function so smart, that gives name to each Person.

Let’s add relations to understand, who visited ski-resorts. Similar task was solved in my Volkswagen post of May 26. Btw, interesting, why ski-resort, not snowboard-resorts.

MATCH (p:Person)
MATCH (sr:SkiResort)
WITH collect(id(sr)) as resorts, range(1, 5) as nums, p
WITH apoc.coll.randomItem(nums) as num, resorts, p
WITH apoc.coll.randomItems(resorts, num, true) as pickedResorts, p
MATCH (s:SkiResort)
WHERE apoc.coll.indexOf(pickedResorts, id(s)) > -1
MERGE (p)-[:VISIT]->(s)

Time to do last thing — setup Ski or Snowboard preferences. Very important, that person can like ski OR snowboard only. It is impossible to like both things. I don’t know really why, but this is reality we live in. Ask one of your friend addicted to snowboard, they are very serious about it.

MERGE (s1:Sport { name: "Ski" })
MERGE (s2:Sport { name: "Snowboard" })
WITH s1, s2
MATCH (p:Person)
WITH [s1, s2] as s, p
WITH apoc.coll.randomItem(s) as ss, p
MERGE (p)-[:LIKES]->(ss)

First of all, let’s see some basic metrics about friends and resorts we got.

MATCH (sr:SkiResort)<-[:VISIT]-(p:Person)
WITH p, count(sr) as visitedResorts
MATCH (p)-[:FRIEND]-(f:Person)
WITH p, count(f) as friends, visitedResorts
RETURN p.name as name, friends, visitedResorts
ORDER BY friends DESC, visitedResorts DESC
LIMIT 15

Fine. What about Ski-Resorts? Let’s find best of them to visit by ski or snowboard, based on existed number visits.

MATCH (sr:SkiResort)<-[:VISIT]-(p:Person)-[:LIKES]->(s:Sport)
WITH sr.name as resort, s.name as sport, count(p) as visitors
RETURN resort, sport, visitors
ORDER BY visitors DESC
LIMIT 10

Not much visits. This is bad for us. How we can recommend place, where only 1 person spent vacation once. Too much resorts loaded and too less people created in graph.

But it is easy to fix. Let’s simply let our persons visit more places! Run “visitors” script again as many times as you want.

MATCH (p:Person)
MATCH (sr:SkiResort)
WITH collect(id(sr)) as resorts, range(1, 5) as nums, p
WITH apoc.coll.randomItem(nums) as num, resorts, p
WITH apoc.coll.randomItems(resorts, num, true) as pickedResorts, p
MATCH (s:SkiResort)
WHERE apoc.coll.indexOf(pickedResorts, id(s)) > -1
MERGE (p)-[:VISIT]->(s)
After additional 2 runs

Cool! Now let’s declare our recommendation scenario.

Person can ask his friends to recommend some ski-resort, they visited. Not only direct friends can help with advice, but also friends of your friends. Only one critical condition is snowboardier can help to snowboardier and skier to skier. C’est la vie.

MATCH (p:Person)-[:VISIT]->(sr:SkiResort)
WITH p, count(sr) as resorts
WHERE resorts < 7
WITH p
MATCH (s:Sport)<-[:LIKES]-(p)-[:FRIEND]-(f:Person)-[:VISIT]-(sr:SkiResort)
WHERE (f)-[:LIKES]->(s) AND NOT (p)-[:VISIT]->(sr)
WITH p, collect(sr) as resorts1
MATCH (s:Sport)<-[:LIKES]-(p)-[:FRIEND]-(:Person)-[:FRIEND]-(f:Person)-[:VISIT]-(sr:SkiResort)
WHERE (f)-[:LIKES]->(s) AND NOT (p)-[:VISIT]->(sr)
WITH p, collect(sr) as resorts2, resorts1
WITH p, apoc.coll.unionAll(resorts1, resorts2) as resorts
WITH p, apoc.coll.duplicatesWithCount(resorts) as bests
WITH p, apoc.coll.sortMaps(bests, 'count')[0] as best
WITH p.name as name, coalesce(best.item.name, "NONE") as resortToVisit, coalesce(best.count, 0) as recommendedByPersons
OPTIONAL MATCH (sr:SkiResort)-[:AT]->(c:Country)
WHERE sr.name = resortToVisit
WITH name, resortToVisit, recommendedByPersons, coalesce(c.name, "NONE") as country
RETURN name, resortToVisit, country, recommendedByPersons
ORDER BY recommendedByPersons DESC
LIMIT 15

Interesting to understand why skier Linwood Considine don’t have any suggestion from her friends or friend of friends? Seems like all places they can reccomend she already visited. And all other firends are snowboarders. Sad story for her.

Resume

For me this topic become fun. I can not imagine in the beginning, that here so much things to analyse, build and enjoy.

Still many things you can apply here if you have time. Any idea for algorithm, for example? Must be something to try. Also I hope, that all my queries are correct. If not — tell me about it in comments, please.

Btw, what about you? Are you skier or snowboardier?

Similar topics

Analyse Magazine Domain Model with Advanced Neo4j Cypher Querying and APOC

Analyse Neo4j Graph of Books domain model with Jaccard Similarity Algorithm

Building a Neo4j Recommendation System with Cypher query

Resources

Sign up to discover human stories that deepen your understanding of the world.

Vlad Batushkov
Vlad Batushkov

Written by Vlad Batushkov

Engineering Manager @ Agoda. Neo4j Ninja. Articles brewed on modern tech, hops and indie rock’n’roll.

No responses yet

Write a response