
Working with Point type and Distance in Neo4j Graph
May 24: Eurovision, One Month Graph Challenge
Welcome word
In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge
Domain model
Based on the experience of the festival of the Italian pop songs in San Remo, whose popularity had already gone beyond Italy, the Swiss television director M. Bezanson suggested annually holding a European pop song contest. The aim of the competition was to determine the best song in Europe.
On May 24, 1956, the first Eurovision song contest was held in the Swiss city of Lugano.
Today I want to build a small graph, that represents the contest domain model. And my main goal is to define a winner. But not like this happened today. We live in wonderful time of crystal truth and justice. Everybody really care to define winning combination of singer, song and performance. Judges doing secret voting, people call and votes online. Then this numbers multiples, divide, again multipy on Pi or whatever and we got exclusive trustful results.
What about my approach? Singer? Song? Performance? C’mon. I will fully rely on geopolitics.
Graph
List of 26 finalists of Eurovision 2019. I put csv (with ; delimeter) file here: https://vbatushkov.bitbucket.io/eurovision.csv
Headers: num;country;singer;song;lang;place;points
LOAD CSV WITH HEADERS FROM 'https://vbatushkov.bitbucket.io/eurovision.csv' AS line FIELDTERMINATOR ';'
MERGE (c:Country { name: line.country })
MERGE (s:Singer { name: line.singer })
MERGE (sg:Song { name: line.song })
MERGE (sg)-[:SING_BY]->(s)-[:FROM]->(c)
I like this csv load function, it is so easy to use. Wonderful. Much better that parsing of html. Always with html structure troubles.
Here some of the beautiful molecules we got after upload.

I don’t really need to work with singers and songs. Main attention on countries now.
My today’s session concept like this: each country have to vote. Based on geopolitical map country gonna vote to closest neighbours. As closer your capital as a neighbour city, so much points you will got. Fair enough. Song contest? Never heard.
List of capitals for 202 countries. I put csv (with ; delimeter) file here: https://vbatushkov.bitbucket.io/capitals.csv
Headers: country;capital;latitude;longitude
LOAD CSV WITH HEADERS FROM 'https://vbatushkov.bitbucket.io/capitals.csv' AS line FIELDTERMINATOR ';'
MATCH (c:Country { name: line.country })
MERGE (cap:Capital { name: line.capital, location: point({ latitude: toFloat(line.latitude), longitude: toFloat(line.longitude) }) })
MERGE (c)-[:WITH]->(cap)
RETURN count(cap)

Now let’s define neighbourhood relationships between cities. Each relation would have a value of vote points. Remind you, Eurovision vote system is qute “special” thing, if not so say better word. My geopolitical vote system will be old good set of 12, 10, 8–1 points to their 10 favourite songs. Pardon! Closest neighbours. All capitals connect to all.
MATCH (c1:Capital)
MATCH (c2:Capital)
WHERE id(c1) > id(c2)
WITH distance(c1.location, c2.location) as distance, c1, c2
MERGE (c1)-[:DISTANCE { value: distance }]-(c2)

Define closest city to each capital and set scores.
MATCH (c1:Country)-[:WITH]->(cc1:Capital)-[d:DISTANCE]-(cc2:Capital)<-[:WITH]-(c2:Country)
WITH c1, apoc.coll.sortMaps(collect({ id: id(c2), value: d.value }), '^value')[0..10] as top10
WITH c1, top10, [{v:12},{v:10},{v:8},{v:7},{v:6},{v:5},{v:4},{v:3},{v:2},{v:1}] as scores
WITH c1, apoc.coll.zip(top10,scores) as top10scores
WITH c1, [x IN top10scores | { id: x[0].id, score: x[1].v, dist: x[0].value }] as scores
UNWIND scores as s
MATCH (cc:Country)
WHERE id(cc) = s.id
MERGE (c1)-[:SCORE { value: s.score }]->(cc)

I decide to show up links around my Belorussian singer ZENA. Hope, she is not Zena, who Warrior Princess.
Btw, strange to see here Iceland!? Must be some mistake in spatials values or calculations. Need to investigate it and maybe fix it later.
Anyway, now I can find a winner. So easy.
MATCH (so:Song)-[:SING_BY]->(si:Singer)-[:FROM]->(c:Country)<-[s:SCORE]-(:Country)
RETURN c.name as country, si.name as singer, so.name as song, sum(s.value) as score
ORDER BY score DESC
LIMIT 10

My winner is Czech Republic’s singer or maybe band, I dunno, “Lake Malawi” with song “Friend of a Friend”. It is fun, seems like song’s name somehow related to graphs too.
Sad, that my results would be same every year. Hm, I can improve it with some randomization. Maybe, you want to try? I left all analysis for your, guys. I alraedy wrote some amount of Cypher code today, so my mission is done.
Resume
I think it would be good to refactor existed relationships. Instead of connections from all capitals to each other, just connect nearest cities, and then apply for example, Single Source Shortest Path algorithm from each city to nearest 10. It can be more clever solution. Maybe not. Need to try.
Geography is interesting domain area to play with. So, maybe I will come back to this topic and finish my reasearches in better way.
Thank you for reading. Clap-clap-clap, if your country ever won Eurovision pop contest.