Photo by Christopher Burns on Unsplash

Friend’s Recommendation with Advanced Cypher Querying and APOC

May 29: Everest, One Month Graph Challenge

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

On May 29, 1953, the highest peak of the Himalayas and the world — Jomolungma (Sagarmatha, Everest) — 8848 meters above sea level was conquered for the first time by New Zealander Edmund Hillary and Sherpa Norgay Tenzing.

I am person, for whom mountains is a ski, not climbing. This is why I want to analyze not the highest mountains, but ski resorts and countries good to visit, if you are interesed in this kind of activity.

But results would not be real recommendations by real skiers or snowboarders. I don’t have enough time and data for this. I simply will generate them.

So my plan is to generate some bulk of friends of skiers and snowboarders, let them visit bunch of ski centers and then share to each other some recommendations. As any other day of challenge, I will do some small graph and work with Cypher and APOC.

Graph

List of countries and ski resorts comes from SKI.RU:

WITH [{ name: "Russia", id: "1" },{ name: "Finland", id: "2" },{ name: "Austria", id: "3" },{ name: "France", id: "4" },{ name: "Switzerland", id: "5" },{ name: "Italy", id: "6" },{ name: "Andorra", id: "7" },{ name: "Bulgaria", id: "8" },{ name: "Sweden", id: "9" },{ name: "Germany", id: "10" },{ name: "Norway", id: "11" },{ name: "Slovakia", id: "12" },{ name: "Czechia", id: "16" }] as cs
UNWIND cs as c
WITH "https://www.ski.ru/az/resorts/country/" + c.id as url, c
CALL apoc.load.html(url, { name: "table.curorts_table:eq(0) tbody tr td:eq(1) a" }) YIELD value
UNWIND value.name as item
WITH apoc.text.regexGroups(item.text, "\\((.+)\\)")[0][1] as name, c
WHERE name IS NOT NULL AND apoc.text.indexOf(name, '�') = -1
MERGE (cc:Country { name: c.name })
MERGE (sr:SkiResort { name: name })
MERGE (sr)-[:AT]->(cc)

As usually, not everything is perfect with parsing, so I prevented some ugly data from usage.

I have 13 countries and 405 ski resorts. Now let’s build a small society of 100 persons and 150 friend relationships.

MATCH (n) DETACH DELETE n;
CALL apoc.generate.er(100, 150, 'Person', 'FRIEND');

APOC function so smart, that gives name to each Person.

Let’s add relations to understand, who visited ski-resorts. Similar task was solved in my Volkswagen post of May 26. Btw, interesting, why ski-resort, not snowboard-resorts.

MATCH (p:Person)
MATCH (sr:SkiResort)
WITH collect(id(sr)) as resorts, range(1, 5) as nums, p
WITH apoc.coll.randomItem(nums) as num, resorts, p
WITH apoc.coll.randomItems(resorts, num, true) as pickedResorts, p
MATCH (s:SkiResort)
WHERE apoc.coll.indexOf(pickedResorts, id(s)) > -1
MERGE (p)-[:VISIT]->(s)

Time to do last thing — setup Ski or Snowboard preferences. Very important, that person can like ski OR snowboard only. It is impossible to like both things. I don’t know really why, but this is reality we live in. Ask one of your friend addicted to snowboard, they are very serious about it.

MERGE (s1:Sport { name: "Ski" })
MERGE (s2:Sport { name: "Snowboard" })
WITH s1, s2
MATCH (p:Person)
WITH [s1, s2] as s, p
WITH apoc.coll.randomItem(s) as ss, p
MERGE (p)-[:LIKES]->(ss)

First of all, let’s see some basic metrics about friends and resorts we got.

MATCH (sr:SkiResort)<-[:VISIT]-(p:Person)
WITH p, count(sr) as visitedResorts
MATCH (p)-[:FRIEND]-(f:Person)
WITH p, count(f) as friends, visitedResorts
RETURN p.name as name, friends, visitedResorts
ORDER BY friends DESC, visitedResorts DESC
LIMIT 15

Fine. What about Ski-Resorts? Let’s find best of them to visit by ski or snowboard, based on existed number visits.

MATCH (sr:SkiResort)<-[:VISIT]-(p:Person)-[:LIKES]->(s:Sport)
WITH sr.name as resort, s.name as sport, count(p) as visitors
RETURN resort, sport, visitors
ORDER BY visitors DESC
LIMIT 10

Not much visits. This is bad for us. How we can recommend place, where only 1 person spent vacation once. Too much resorts loaded and too less people created in graph.

But it is easy to fix. Let’s simply let our persons visit more places! Run “visitors” script again as many times as you want.

MATCH (p:Person)
MATCH (sr:SkiResort)
WITH collect(id(sr)) as resorts, range(1, 5) as nums, p
WITH apoc.coll.randomItem(nums) as num, resorts, p
WITH apoc.coll.randomItems(resorts, num, true) as pickedResorts, p
MATCH (s:SkiResort)
WHERE apoc.coll.indexOf(pickedResorts, id(s)) > -1
MERGE (p)-[:VISIT]->(s)
After additional 2 runs

Cool! Now let’s declare our recommendation scenario.

Person can ask his friends to recommend some ski-resort, they visited. Not only direct friends can help with advice, but also friends of your friends. Only one critical condition is snowboardier can help to snowboardier and skier to skier. C’est la vie.

MATCH (p:Person)-[:VISIT]->(sr:SkiResort)
WITH p, count(sr) as resorts
WHERE resorts < 7
WITH p
MATCH (s:Sport)<-[:LIKES]-(p)-[:FRIEND]-(f:Person)-[:VISIT]-(sr:SkiResort)
WHERE (f)-[:LIKES]->(s) AND NOT (p)-[:VISIT]->(sr)
WITH p, collect(sr) as resorts1
MATCH (s:Sport)<-[:LIKES]-(p)-[:FRIEND]-(:Person)-[:FRIEND]-(f:Person)-[:VISIT]-(sr:SkiResort)
WHERE (f)-[:LIKES]->(s) AND NOT (p)-[:VISIT]->(sr)
WITH p, collect(sr) as resorts2, resorts1
WITH p, apoc.coll.unionAll(resorts1, resorts2) as resorts
WITH p, apoc.coll.duplicatesWithCount(resorts) as bests
WITH p, apoc.coll.sortMaps(bests, 'count')[0] as best
WITH p.name as name, coalesce(best.item.name, "NONE") as resortToVisit, coalesce(best.count, 0) as recommendedByPersons
OPTIONAL MATCH (sr:SkiResort)-[:AT]->(c:Country)
WHERE sr.name = resortToVisit
WITH name, resortToVisit, recommendedByPersons, coalesce(c.name, "NONE") as country
RETURN name, resortToVisit, country, recommendedByPersons
ORDER BY recommendedByPersons DESC
LIMIT 15

Interesting to understand why skier Linwood Considine don’t have any suggestion from her friends or friend of friends? Seems like all places they can reccomend she already visited. And all other firends are snowboarders. Sad story for her.

Resume

For me this topic become fun. I can not imagine in the beginning, that here so much things to analyse, build and enjoy.

Still many things you can apply here if you have time. Any idea for algorithm, for example? Must be something to try. Also I hope, that all my queries are correct. If not — tell me about it in comments, please.

Btw, what about you? Are you skier or snowboardier?

Similar topics

Analyse Magazine Domain Model with Advanced Neo4j Cypher Querying and APOC

Analyse Neo4j Graph of Books domain model with Jaccard Similarity Algorithm

Building a Neo4j Recommendation System with Cypher query

Resources

--

--

--

Engineering Manager @ Agoda. Neo4j Featured Community Member. Certified Neo4j Professional. Articles brewed on web, hops and indie rock’n’roll.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Visualizing Overland, When All Roads Led to Rome

What is BIG DATA

Week 6 — Copy Move Forgery Detection

Exploring Undernourishment: Part 5 — Research Area 2: Most Successful Countries

SHORT BLOG #7: Comment on Tai Zhang’s Long Blog 2

Yet another stock price predicting attempt

PREDICTIVE ANALYTICS IN DAY TO DAY LIFE

Lotto weekly sales increased by 50%↑

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vlad Batushkov

Vlad Batushkov

Engineering Manager @ Agoda. Neo4j Featured Community Member. Certified Neo4j Professional. Articles brewed on web, hops and indie rock’n’roll.

More from Medium

Tom Lackner — VP Engineering — Classic.com — on Qdrant, NFT, challenges and joys of ML engineering

Crane Demo with Alexey@DataTalks.Club

Enjoy latency-optimised networks globally, powered by real-time analytics

Arrikto Academy: Sharing Certificates & Badges