Working with Point type and Distance in Neo4j Graph

May 24: Eurovision, One Month Graph Challenge

Vlad Batushkov
5 min readMay 24, 2019

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

Based on the experience of the festival of the Italian pop songs in San Remo, whose popularity had already gone beyond Italy, the Swiss television director M. Bezanson suggested annually holding a European pop song contest. The aim of the competition was to determine the best song in Europe.

On May 24, 1956, the first Eurovision song contest was held in the Swiss city of Lugano.

Today I want to build a small graph, that represents the contest domain model. And my main goal is to define a winner. But not like this happened today. We live in wonderful time of crystal truth and justice. Everybody really care to define winning combination of singer, song and performance. Judges doing secret voting, people call and votes online. Then this numbers multiples, divide, again multipy on Pi or whatever and we got exclusive trustful results.

What about my approach? Singer? Song? Performance? C’mon. I will fully rely on geopolitics.

Graph

List of 26 finalists of Eurovision 2019. I put csv (with ; delimeter) file here: https://vbatushkov.bitbucket.io/eurovision.csv

Headers: num;country;singer;song;lang;place;points

LOAD CSV WITH HEADERS FROM 'https://vbatushkov.bitbucket.io/eurovision.csv' AS line FIELDTERMINATOR ';'
MERGE (c:Country { name: line.country })
MERGE (s:Singer { name: line.singer })
MERGE (sg:Song { name: line.song })
MERGE (sg)-[:SING_BY]->(s)-[:FROM]->(c)

I like this csv load function, it is so easy to use. Wonderful. Much better that parsing of html. Always with html structure troubles.

Here some of the beautiful molecules we got after upload.

I don’t really need to work with singers and songs. Main attention on countries now.

My today’s session concept like this: each country have to vote. Based on geopolitical map country gonna vote to closest neighbours. As closer your capital as a neighbour city, so much points you will got. Fair enough. Song contest? Never heard.

List of capitals for 202 countries. I put csv (with ; delimeter) file here: https://vbatushkov.bitbucket.io/capitals.csv

Headers: country;capital;latitude;longitude

LOAD CSV WITH HEADERS FROM 'https://vbatushkov.bitbucket.io/capitals.csv' AS line FIELDTERMINATOR ';'
MATCH (c:Country { name: line.country })
MERGE (cap:Capital { name: line.capital, location: point({ latitude: toFloat(line.latitude), longitude: toFloat(line.longitude) }) })
MERGE (c)-[:WITH]->(cap)
RETURN count(cap)

Now let’s define neighbourhood relationships between cities. Each relation would have a value of vote points. Remind you, Eurovision vote system is qute “special” thing, if not so say better word. My geopolitical vote system will be old good set of 12, 10, 8–1 points to their 10 favourite songs. Pardon! Closest neighbours. All capitals connect to all.

MATCH (c1:Capital)
MATCH (c2:Capital)
WHERE id(c1) > id(c2)
WITH distance(c1.location, c2.location) as distance, c1, c2
MERGE (c1)-[:DISTANCE { value: distance }]-(c2)

Define closest city to each capital and set scores.

MATCH (c1:Country)-[:WITH]->(cc1:Capital)-[d:DISTANCE]-(cc2:Capital)<-[:WITH]-(c2:Country)
WITH c1, apoc.coll.sortMaps(collect({ id: id(c2), value: d.value }), '^value')[0..10] as top10
WITH c1, top10, [{v:12},{v:10},{v:8},{v:7},{v:6},{v:5},{v:4},{v:3},{v:2},{v:1}] as scores
WITH c1, apoc.coll.zip(top10,scores) as top10scores
WITH c1, [x IN top10scores | { id: x[0].id, score: x[1].v, dist: x[0].value }] as scores
UNWIND scores as s
MATCH (cc:Country)
WHERE id(cc) = s.id
MERGE (c1)-[:SCORE { value: s.score }]->(cc)

I decide to show up links around my Belorussian singer ZENA. Hope, she is not Zena, who Warrior Princess.

Btw, strange to see here Iceland!? Must be some mistake in spatials values or calculations. Need to investigate it and maybe fix it later.

Anyway, now I can find a winner. So easy.

MATCH (so:Song)-[:SING_BY]->(si:Singer)-[:FROM]->(c:Country)<-[s:SCORE]-(:Country)
RETURN c.name as country, si.name as singer, so.name as song, sum(s.value) as score
ORDER BY score DESC
LIMIT 10

My winner is Czech Republic’s singer or maybe band, I dunno, “Lake Malawi” with song “Friend of a Friend”. It is fun, seems like song’s name somehow related to graphs too.

Sad, that my results would be same every year. Hm, I can improve it with some randomization. Maybe, you want to try? I left all analysis for your, guys. I alraedy wrote some amount of Cypher code today, so my mission is done.

Resume

I think it would be good to refactor existed relationships. Instead of connections from all capitals to each other, just connect nearest cities, and then apply for example, Single Source Shortest Path algorithm from each city to nearest 10. It can be more clever solution. Maybe not. Need to try.

Geography is interesting domain area to play with. So, maybe I will come back to this topic and finish my reasearches in better way.

Thank you for reading. Clap-clap-clap, if your country ever won Eurovision pop contest.

Resources

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Vlad Batushkov
Vlad Batushkov

Written by Vlad Batushkov

Engineering Manager @ Agoda. Neo4j Ninja. Articles brewed on modern tech, hops and indie rock’n’roll.

No responses yet

Write a response