Neo4j Betweenness Centrality Algorithm detailed and fun example

May 19: Pioneriya, One Month Graph Challenge

Vlad Batushkov

--

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

Pioneriya (term, based on meaning to be pioneer, first, best) — the collective name of children’s communist organizations that exist in different countries. I am not a fan this shit, but at May 19, 1922 (97 years ago) first pioneer organization has been created in the USSR. And you know what? My country still one of the biggest (maybe even the biggest one) follower of this movement nowadays, in 2019, damn! This is why from many other events, I picked this one.

Today I want to build small graph of society. And then I will try to find the easiest way to brainwash all of this people. But, to make it fast, I need to detect right persons to effect to them. I hope some of graph algorithms can help me with it.

I will start with generation of a random people with “friend” relationships. Then we find most “powerful” persons and turn them to follow Pioneer Organization (Because we can expect the highest impact to others). In first iteration we turn this first people to be pioneers. Then every next iteration all friends of this person also become pioneers. And so on, until everybody become pioneers. Something like Zombie Apocalypse, but on Communism rails.

Graph

1000 happy persons live happy lives in small Graphville town. Many of them even make a friendship with each other.

MATCH (n) DETACH DELETE n;
CALL apoc.generate.er(1000, 1500, 'HappyPerson', 'FRIEND');

I want to build normal society, not a post-punk anti utopia, so I will give normal name to every person. Must be better, than use UUID for people. But who know, what happens in future.

WITH "https://en.wikipedia.org/wiki/Category:Russian_masculine_given_names" as url
CALL apoc.load.html(url, { data: "div.mw-category-generated ul li a" }) YIELD value
WITH [x IN value.data|apoc.text.replace(x.text, "( \\(given name\\))|( \\(Slavic name\\))|( \\(name\\))|( \\(disambiguation\\))", "")] as boyNames
WITH "https://en.wikipedia.org/wiki/Category:Russian_feminine_given_names" as url, boyNames
CALL apoc.load.html(url, { data: "div.mw-category-generated ul li a" }) YIELD value
WITH [x IN value.data|apoc.text.replace(x.text, "( \\(given name\\))|( \\(Slavic name\\))|( \\(name\\))|( \\(disambiguation\\))", "")] as girlNames, boyNames
WITH apoc.coll.union(boyNames, girlNames) as allNames
MATCH (p:HappyPerson)
SET p = {}, p.name = apoc.coll.randomItem(allNames)

Now I want to be sure, that graph is Connected. I don’t allow outcasts lives in Graphville. This rule I can check this with usage of Connected Components algorithm.

CALL algo.unionFind.stream('HappyPerson', 'FRIEND') YIELD nodeId, setId
MATCH (p:HappyPerson)
WHERE id(p) = nodeId
SET p.setId = setId
…and more

942 connected persons and all the rest outside of main socail group. Because I want to have whole town full of pioneers, I can just get rid of outcasts. This is why in one sunny Sunday morning all outcasts suddenly disappeared. Something bad happend in this little town…

MATCH (p:HappyPerson)
WITH { setId: p.setId, size: count(p.setId) } as item
WITH collect(item) as stats
WITH apoc.coll.sortMaps(stats, 'size')[0] as targetGroup
MATCH (outcast:HappyPerson)
WHERE outcast.setId <> targetGroup.setId
DETACH DELETE outcast

Now I want to run my pioneerpidemia desease script with 10 random persons selected as first pioneers here. And let’s see how much iterations I need to fill whole town by pioneers.

MATCH (p:HappyPerson)
WITH collect(id(p)) as list
WITH apoc.coll.randomItems(list, 10, false) as pioneers
MATCH (p:HappyPerson)
WHERE apoc.coll.indexOf(pioneers, id(p)) > -1
SET p:Pioneer
Random: initial

Now I will repeat same operation, until all town fall in love with pioneer organization.

MATCH (:Pioneer)-[:FRIEND]-(p:HappyPerson)
SET p:Pioneer

With this query you can check current amount of pioneers vs normal persons.

MATCH (p:HappyPerson)
WITH collect(p) as ppl
WITH [x IN ppl WHERE x:Pioneer|x] as pioneers, [x IN ppl WHERE NOT x:Pioneer|x] as normals, ppl
WITH toFloat(size(pioneers)) as pioneerSize, toFloat(size(normals)) as normalSize, toFloat(size(ppl)) as allSize
WITH apoc.math.round(pioneerSize / allSize, 3) * 100 as pioneerPercentage, pioneerSize, normalSize, allSize
RETURN allSize, normalSize, pioneerSize, toString(pioneerPercentage) + " vs " + toString(100 - pioneerPercentage) as pioneerVSnormal
Random: iteration 1

For example after first iteration I got 46 pioneers, 5% of town. (10 initial + 36 added). Ok, let’s execute step by step until the end.

Random: iteration 2
Random: iteration 3
Random: iteration 4

I am already scared. 61% of pioneers now. Next iteration could be the last one.

Random: iteration 5
Random: iteration 6

97% after 6 steps. Can Graphville still survive? Maybe, if Bruce Willis somewhere around. Ok, let’s finish with it.

Random: iteration 7
Random: iteration 8

Welcome to bright future of Communism, comrade. Graphville full of pioneers now. And it was necessary to do a 8 steps to did it with Random pick of first pioneers. Now I want to try one more time, but let’s apply Betweenness Centrality algorithm and find top 10 first influencers in Graphville society. Clean up Pioneer label and start again.

CALL algo.betweenness.stream('HappyPerson', 'FRIEND', { direction: 'both' }) YIELD nodeId, centrality
MATCH (p:HappyPerson) WHERE id(p) = nodeId
WITH collect({ personId: id(p), name: p.name, centrality: centrality }) as persons
WITH apoc.coll.sortMaps(persons, 'centrality')[..10] as top10
WITH [x IN top10|x.personId] as ids
MATCH (p:HappyPerson)
WHERE apoc.coll.indexOf(ids, id(p)) > -1
SET p:Pioneer
RETURN p

I hope this time it would be faster. I will skip all the pictures and just represent the table-based results.

Betweenness: iteration 1
Betweenness: iteration 2
Betweenness: iteration 3
Betweenness: iteration 4
Betweenness: iteration 5
Betweenness: iteration 6
Betweenness: iteration 7

Seven steps now. This time, you can see the huge difference. At iteration 2 Betweenness got 30%, at iteration 5 Betweenness got 97.6%. Actually, all the time it was 1 step ahead. This is exactly, what I wanted to saw.

I don’t really want to start again and instead of 10 random start with only 1 initial pioneer. But I guess then Betweenness gives me even bigger advantage. But I think, enough for this story. Let’s leave Graphville and let citizens to read Capital.

Resume

Today I learned great lesson about usage of very useful algorithms.

Once, when I become a dictator, I will recall this example and ask my generalissimus to install algo library to Neo4j Community Edition.

Similar topics

Neo4j PageRank Algorithm and Path Pattern Matching on Flights Domain Model

Unusual use of Neo4j Common Neighbors Algorithm on people community graph

Neo4j kShortestPaths Algorithm apply to City Map

Neo4j Cosine Similarity Algorithm example

Analyse Neo4j Graph of Books domain model with Jaccard Similarity Algorithm

Resources

--

--

Vlad Batushkov

Engineering Manager @ Agoda. Neo4j Ninja. Articles brewed on modern tech, hops and indie rock’n’roll.