
Build a Neo4j Graph of Moscow Metro Map with Spatial Values and Shortest Path Algorithm
May 15: Moscow Metro, One Month Graph Challenge
Welcome word
In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge
Domain model
Today I want to discuss transport topic again. All because 84 years ago on this date was opened a first line of the Moscow Metro. First line ran from Sokolniki station to Park Kultury station with a branch to Smolenskaya, its total length was about 11 km. That line includes 13 stations and 17 lobbies. Today’s Moscow Metropolitan long for 383 km of 261 stations. Impressive.
My today’s Moscow Metro graph will includes 12 lines and their 223 stations. I plan to build a query that helps me to understand the approximate duration of trip, that actually very helpful things to organize your time spent in public transport.
Graph
APOC helps me again to grab data from Wikipedia and to create a graph, this time full of stations and relationships between them.
WITH "https://en.wikipedia.org/wiki/List_of_Moscow_Metro_stations" as url
CALL apoc.load.html(url, { name: "div.mw-parser-output table.sortable tbody tr td:eq(1) a", coord: "div.mw-parser-output table.sortable tbody tr td:eq(7)", line: "div.mw-parser-output table.sortable tbody tr th span:eq(1) span" }) YIELD value
WITH collect(value) as data
WITH { names: data[2].name, coords: data[0].coord, lines: data[1].line } as root
WITH [ x IN range(0, 222) | { id: x, name: root.names[x].text, line: apoc.text.regexGroups(root.lines[x].attributes.title, "\\d+\\S*")[0][0], latitude: toFloat(apoc.text.regexGroups(root.coords[x].text, "(\\d{2}.\\d{4,10})(; )(\\d{2}.\\d{4,10})")[0][1]), longitude: toFloat(apoc.text.regexGroups(root.coords[x].text, "(\\d{2}.\\d{4,10})(; )(\\d{2}.\\d{4,10})")[0][3]) }] as result
UNWIND result as item
WITH item
CALL apoc.create.node(["Station", "Line" + item.line ], { id: item.id, name: item.name, latitude: item.latitude, longitude: item.longitude }) YIELD node
RETURN node
APOC helped me to set Line number as a label dynamically. Also good to mention, that id property I will use in future to link stations in proper order, properties latitude and longitude to calculate distance between stations. Actually, I will do it right now. Let’s connect all stations from same line into chain of actual metro line:
MATCH (s:Station)
WITH apoc.node.labels(s)[1] as line, s
WITH collect(s.id) as ss, line
UNWIND apoc.coll.pairsMin(ss) as value
MATCH (s1:Station { id: value[0] })
MATCH (s2:Station { id: value[1] })
MERGE (s1)-[:TO]->(s2)-[:TO]->(s1)

Not a metro at all. Lines must be connected to each other at some transfer stations, so this kind of relationships I will add into graph manually. I will do it only for stations where Ring line crossed with others:
MATCH (s1:Line5 { name: "Kiyevskaya" })
MATCH (s2:Line5 { name: "Krasnopresnenskaya" })
MATCH (s3:Line5 { name: "Belorusskaya" })
MATCH (s4:Line5 { name: "Novoslobodskaya" })
MATCH (s5:Line5 { name: "Prospekt Mira" })
MATCH (s6:Line5 { name: "Komsomolskaya" })
MATCH (s7:Line5 { name: "Kurskaya" })
MATCH (s8:Line5 { name: "Taganskaya" })
MATCH (s9:Line5 { name: "Paveletskaya" })
MATCH (s10:Line5 { name: "Dobryninskaya" })
MATCH (s11:Line5 { name: "Oktyabrskaya" })
MATCH (s12:Line5 { name: "Park Kultury" })
MATCH (ss1:Line3 { name: "Kiyevskaya" })
MATCH (ss2:Line4 { name: "Kiyevskaya" })
MATCH (ss3:Line7 { name: "Barrikadnaya" })
MATCH (ss4:Line2 { name: "Belorusskaya" })
MATCH (ss5:Line9 { name: "Mendeleyevskaya" })
MATCH (ss6:Line6 { name: "Prospekt Mira" })
MATCH (ss7:Line1 { name: "Komsomolskaya" })
MATCH (ss8:Line3 { name: "Kurskaya" })
MATCH (ss9:Line10 { name: "Chkalovskaya" })
MATCH (ss10:Line7 { name: "Taganskaya" })
MATCH (ss11:Line8 { name: "Marksistskaya" })
MATCH (ss12:Line2 { name: "Paveletskaya" })
MATCH (ss13:Line9 { name: "Serpukhovskaya" })
MATCH (ss14:Line6 { name: "Oktyabrskaya" })
MATCH (ss15:Line1 { name: "Park Kultury" })
MERGE (s1)-[:TO]->(s12)-[:TO]->(s1)
MERGE (s1)-[:TO]->(ss1)-[:TO]->(s1)
MERGE (ss2)-[:TO]->(ss1)-[:TO]->(ss2)
MERGE (s1)-[:TO]->(ss2)-[:TO]->(s1)
MERGE (s2)-[:TO]->(ss3)-[:TO]->(s2)
MERGE (s3)-[:TO]->(ss4)-[:TO]->(s3)
MERGE (s4)-[:TO]->(ss5)-[:TO]->(s4)
MERGE (s5)-[:TO]->(ss6)-[:TO]->(s5)
MERGE (s6)-[:TO]->(ss7)-[:TO]->(s6)
MERGE (s7)-[:TO]->(ss8)-[:TO]->(s7)
MERGE (ss9)-[:TO]->(ss8)-[:TO]->(ss9)
MERGE (s7)-[:TO]->(ss9)-[:TO]->(s7)
MERGE (s8)-[:TO]->(ss10)-[:TO]->(s8)
MERGE (ss11)-[:TO]->(ss10)-[:TO]->(ss11)
MERGE (s8)-[:TO]->(ss11)-[:TO]->(s8)
MERGE (s9)-[:TO]->(ss12)-[:TO]->(s9)
MERGE (s10)-[:TO]->(ss13)-[:TO]->(s10)
MERGE (s11)-[:TO]->(ss14)-[:TO]->(s11)
MERGE (s12)-[:TO]->(ss15)-[:TO]->(s12)

Here connections between Lines and Ring Line. For sure, for correct result, I need to connect Lines between each other, as in the real map. But it is too much for today. Anyway, graph already start looks like a Moscow Metro map.

Let’s processing last requirement for query. To be able to understand how much it itakes from station 1 to station 2, I need to have a distance between stations. Using latitude and longitude properties this is easy task. But. If you tells me that from station 1 to station 2 takes 65 km. I am, as I city traveller, can’t do much with this information. Distance also possible to interpretire as time spended in ride. This is a good thing to do. Knowing that average speed of train is 35km/h, I want set time property in minutes for all relationships like [:TO { time: 4 }] takes 4 minutes of ride between stations.
MATCH (s1:Station)-[d:TO]->(s2:Station)
WITH toFloat(distance(point({ latitude: s1.latitude, longitude: s1.longitude }), point({ latitude: s2.latitude, longitude: s2.longitude }))/1000) as distanceKm, s1, s2, d
WITH s1.name as from, s2.name as to, toFloat(distanceKm / 35 * 60) as time, d
WITH from, to, round(coalesce(time, 0.0)) as t, d
SET d.time = t
RETURN from, to, d
Ok, “cost” of trip in the right place and we can use it for short path calculations. The last step is a query itself. And for example, my input parameters are Dinamo Stadium as station FROM and Nakhim as station TO.
MATCH (from:Station { name: "Dinamo" }), (to:Station { name: "Nakhimovsky Prospekt" })
CALL algo.shortestPath.stream(from, to, "time")
YIELD nodeId, cost
RETURN algo.asNode(nodeId).name AS name, cost as totalTime

Around 30 minutes for this ride. Hmm, not sure, but sounds realistic. I also can redner a route of trip with nodes. Cool.

Resume
I did a good graph today, I think. If connect every station of Lines as in the reality, it would be even much more fun. But in general things around Spatial values works pretty well in Neo4j and it is easy to use. Algorithms library again did a job. See you at the next topic.
Similar topics
Neo4j graph Data Modeling of Star Wars Universe with APOC load json
Simple example of N-ary relationship to understand basics of Neo4j Graph Data Modeling
Initial Neo4j Graph Data Modeling of Train Ticket Booking System
Unusual use of Neo4j Common Neighbors Algorithm on people community graph
Neo4j Shortest Path and Dijkstra Algorithms
Apply Neo4j Similarity Algorithm to analyse Chess “openings”