
Use APOC Generate function to build a Neo4j Graph of small Galaxy
May 25: Towel Day, One Month Graph Challenge
Welcome word
In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge
Domain model
Towel Day — a tribute to Douglas Adams. Today all Douglas Adams fans are encouraged to carry a towel with them on this day. Let the towel be visible (make sure that the towel catches the eye) — use it as a topic for conversation, so that even those who never read “The Hitchhiker’s Guide to the Galaxy” go and find a copy. The towel can be wrapped around the head, used as a weapon, soaked with nutrients — anything!
What I want to try today must be fun. My plan is: generate a small graph of Galaxy with some small number of stars and random number of highways to each other. And then, start from the center of this small Galaxy, ride to the farest star at the edge of it. I hope that someone, who drive into same way can helps us with transportation.
Graph
Let’s generate a small Galaxy and name it Towel Galaxy, for example. Is it enough setup 1 million of stars into it? Just to compare, Milky Way Galaxy contains between 100–400 billion stars. Ok, our Towel Galaxy is super-duper small.
MATCH (n) DETACH DELETE n;
CALL apoc.generate.ba(1000000, 1, 'Star', 'HIGHWAY');

This small picture includes about 1000 stars. But even this small amount looks beautiful. Now I need to define what star is a center of Towel Galaxy and label it as a Core.
Before you run next query, please, be sure, that your machine ready to handle Galaxy’s capacity of 1 million stars.
CALL algo.betweenness.stream('Star', 'HIGHWAY') YIELD nodeId, centrality
MATCH (s:Star) WHERE id(s) = nodeId
RETURN s.uuid AS starId, centrality
ORDER BY centrality DESC
LIMIT 1
This query is bad idea. It try to manage tonns of data. I don’t even have results of this Betweenness Centrality algorithm execution. Computations taken an infinity and a little bit more, so I simply cancel it.
But I have idea to another approach.
Let’s define all Edge stars first. All stars with only 1 neighbour.
MATCH (:Star)-[:HIGHWAY]->(s:Star)
WHERE NOT (s)-[:HIGHWAY]->(:Star)
SET s:Edge
RETURN count(s) as numberOfEdges
In our Galaxy 667076 “edge” stars. But not all of them our destination points. Let’s calculate the all longest paths, that exists in the Galaxy. Because in query I use direction of arrow all roads starts from center.
MATCH p = ((s:Star)-[*]->(e:Edge))
RETURN length(p) as depth
ORDER BY depth DESC
LIMIT 10

Now I know, that the longest trip from center of the Galaxy to the Edge we can will visit 20 stars.
Time to find Core of Towel Galaxy. I know the max depth already, I know that Edge is an endpoint, so all I need is just one star within 20 steps from Edge. Let’s mark centeral star as a Core and the edge of the Galaxy edge as an End.
MATCH (s:Star)-[*20]->(e:Edge)
SET s:Core, e:End
RETURN s
Now I know Core and End. Let’s count all longiest trips from Core to Edges. As you remember, one of them have depth of 20 steps and some others with depth of 19. Let’s print out all possible trips from the Core to the Edges of Towel Galaxy.
MATCH (e:Edge)
MATCH trip = ((c:Core)-[*19..20]->(e))
RETURN trip
LIMIT 100

Now we are ready for hitchhiking! Have a nice trip!
Resume
I want to have you suggestions about proper usage of Centrality algorithms in cases like this. How to apply it properly? Any examples are welcomed.
By the way, can you answer now to the Ultimate Question of Life, the Universe, and Everything?