Build a Neo4j Graph of Moscow Metro Map with Spatial Values and Shortest Path Algorithm

May 15: Moscow Metro, One Month Graph Challenge

Vlad Batushkov

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

Today I want to discuss transport topic again. All because 84 years ago on this date was opened a first line of the Moscow Metro. First line ran from Sokolniki station to Park Kultury station with a branch to Smolenskaya, its total length was about 11 km. That line includes 13 stations and 17 lobbies. Today’s Moscow Metropolitan long for 383 km of 261 stations. Impressive.

My today’s Moscow Metro graph will includes 12 lines and their 223 stations. I plan to build a query that helps me to understand the approximate duration of trip, that actually very helpful things to organize your time spent in public transport.

Graph

APOC helps me again to grab data from Wikipedia and to create a graph, this time full of stations and relationships between them.

WITH "https://en.wikipedia.org/wiki/List_of_Moscow_Metro_stations" as url
CALL apoc.load.html(url, { name: "div.mw-parser-output table.sortable tbody tr td:eq(1) a", coord: "div.mw-parser-output table.sortable tbody tr td:eq(7)", line: "div.mw-parser-output table.sortable tbody tr th span:eq(1) span" }) YIELD value
WITH collect(value) as data
WITH { names: data[2].name, coords: data[0].coord, lines: data[1].line } as root
WITH [ x IN range(0, 222) | { id: x, name: root.names[x].text, line: apoc.text.regexGroups(root.lines[x].attributes.title, "\\d+\\S*")[0][0], latitude: toFloat(apoc.text.regexGroups(root.coords[x].text, "(\\d{2}.\\d{4,10})(; )(\\d{2}.\\d{4,10})")[0][1]), longitude: toFloat(apoc.text.regexGroups(root.coords[x].text, "(\\d{2}.\\d{4,10})(; )(\\d{2}.\\d{4,10})")[0][3]) }] as result
UNWIND result as item
WITH item
CALL apoc.create.node(["Station", "Line" + item.line ], { id: item.id, name: item.name, latitude: item.latitude, longitude: item.longitude }) YIELD node
RETURN node

APOC helped me to set Line number as a label dynamically. Also good to mention, that id property I will use in future to link stations in proper order, properties latitude and longitude to calculate distance between stations. Actually, I will do it right now. Let’s connect all stations from same line into chain of actual metro line:

MATCH (s:Station)
WITH apoc.node.labels(s)[1] as line, s
WITH collect(s.id) as ss, line
UNWIND apoc.coll.pairsMin(ss) as value
MATCH (s1:Station { id: value[0] })
MATCH (s2:Station { id: value[1] })
MERGE (s1)-[:TO]->(s2)-[:TO]->(s1)

Not a metro at all. Lines must be connected to each other at some transfer stations, so this kind of relationships I will add into graph manually. I will do it only for stations where Ring line crossed with others:

MATCH (s1:Line5 { name: "Kiyevskaya" })
MATCH (s2:Line5 { name: "Krasnopresnenskaya" })
MATCH (s3:Line5 { name: "Belorusskaya" })
MATCH (s4:Line5 { name: "Novoslobodskaya" })
MATCH (s5:Line5 { name: "Prospekt Mira" })
MATCH (s6:Line5 { name: "Komsomolskaya" })
MATCH (s7:Line5 { name: "Kurskaya" })
MATCH (s8:Line5 { name: "Taganskaya" })
MATCH (s9:Line5 { name: "Paveletskaya" })
MATCH (s10:Line5 { name: "Dobryninskaya" })
MATCH (s11:Line5 { name: "Oktyabrskaya" })
MATCH (s12:Line5 { name: "Park Kultury" })
MATCH (ss1:Line3 { name: "Kiyevskaya" })
MATCH (ss2:Line4 { name: "Kiyevskaya" })
MATCH (ss3:Line7 { name: "Barrikadnaya" })
MATCH (ss4:Line2 { name: "Belorusskaya" })
MATCH (ss5:Line9 { name: "Mendeleyevskaya" })
MATCH (ss6:Line6 { name: "Prospekt Mira" })
MATCH (ss7:Line1 { name: "Komsomolskaya" })
MATCH (ss8:Line3 { name: "Kurskaya" })
MATCH (ss9:Line10 { name: "Chkalovskaya" })
MATCH (ss10:Line7 { name: "Taganskaya" })
MATCH (ss11:Line8 { name: "Marksistskaya" })
MATCH (ss12:Line2 { name: "Paveletskaya" })
MATCH (ss13:Line9 { name: "Serpukhovskaya" })
MATCH (ss14:Line6 { name: "Oktyabrskaya" })
MATCH (ss15:Line1 { name: "Park Kultury" })
MERGE (s1)-[:TO]->(s12)-[:TO]->(s1)
MERGE (s1)-[:TO]->(ss1)-[:TO]->(s1)
MERGE (ss2)-[:TO]->(ss1)-[:TO]->(ss2)
MERGE (s1)-[:TO]->(ss2)-[:TO]->(s1)
MERGE (s2)-[:TO]->(ss3)-[:TO]->(s2)
MERGE (s3)-[:TO]->(ss4)-[:TO]->(s3)
MERGE (s4)-[:TO]->(ss5)-[:TO]->(s4)
MERGE (s5)-[:TO]->(ss6)-[:TO]->(s5)
MERGE (s6)-[:TO]->(ss7)-[:TO]->(s6)
MERGE (s7)-[:TO]->(ss8)-[:TO]->(s7)
MERGE (ss9)-[:TO]->(ss8)-[:TO]->(ss9)
MERGE (s7)-[:TO]->(ss9)-[:TO]->(s7)
MERGE (s8)-[:TO]->(ss10)-[:TO]->(s8)
MERGE (ss11)-[:TO]->(ss10)-[:TO]->(ss11)
MERGE (s8)-[:TO]->(ss11)-[:TO]->(s8)
MERGE (s9)-[:TO]->(ss12)-[:TO]->(s9)
MERGE (s10)-[:TO]->(ss13)-[:TO]->(s10)
MERGE (s11)-[:TO]->(ss14)-[:TO]->(s11)
MERGE (s12)-[:TO]->(ss15)-[:TO]->(s12)

Here connections between Lines and Ring Line. For sure, for correct result, I need to connect Lines between each other, as in the real map. But it is too much for today. Anyway, graph already start looks like a Moscow Metro map.

Let’s processing last requirement for query. To be able to understand how much it itakes from station 1 to station 2, I need to have a distance between stations. Using latitude and longitude properties this is easy task. But. If you tells me that from station 1 to station 2 takes 65 km. I am, as I city traveller, can’t do much with this information. Distance also possible to interpretire as time spended in ride. This is a good thing to do. Knowing that average speed of train is 35km/h, I want set time property in minutes for all relationships like [:TO { time: 4 }] takes 4 minutes of ride between stations.

MATCH (s1:Station)-[d:TO]->(s2:Station)
WITH toFloat(distance(point({ latitude: s1.latitude, longitude: s1.longitude }), point({ latitude: s2.latitude, longitude: s2.longitude }))/1000) as distanceKm, s1, s2, d
WITH s1.name as from, s2.name as to, toFloat(distanceKm / 35 * 60) as time, d
WITH from, to, round(coalesce(time, 0.0)) as t, d
SET d.time = t
RETURN from, to, d

Ok, “cost” of trip in the right place and we can use it for short path calculations. The last step is a query itself. And for example, my input parameters are Dinamo Stadium as station FROM and Nakhim as station TO.

MATCH (from:Station { name: "Dinamo" }), (to:Station { name: "Nakhimovsky Prospekt" })
CALL algo.shortestPath.stream(from, to, "time")
YIELD nodeId, cost
RETURN algo.asNode(nodeId).name AS name, cost as totalTime

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Vlad Batushkov
Vlad Batushkov

Written by Vlad Batushkov

Engineering Manager @ Agoda. Neo4j Ninja. Articles brewed on modern tech, hops and indie rock’n’roll.

No responses yet

Write a response