Image by tookapic from Pixabay

Unusual use of Neo4j Common Neighbors Algorithm on people community graph

May 22: Sherlock, One Month Graph Challenge

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

Today is 160 years anniversary of birthday of British writer and author of one of the most famous detective stories in the world. I talking about Sir Arthur Ignatius Conan Doyle (22 May 1859–7 July 1930) and his amazing detective Sherlock Holmes.

We know numerous adaptations, books, movies, series based on original stories based on Sherlock Holmes “world”. Today my small graph challenge dedicated to this great writer and his well-known character.

I want to build up small graph of persons, their hobbies and goods. Then based results of Common Neighbors algorithm, I will try to predict, who is the possible murderer. Somehow my approach will be related to Sherlock’s deduction method — a way of reasoning from general provisions to particular conclusions.

I think, I must mention, that it would be first crime story happened in Graphville. Many days after that horrible day, when slow cow was stopped by fast truck.

Graph

First of all I need small list of persons, 10 is more than enough. I already found some names closely related to Sherlock topic. Also I want to define some personal stuff and shared hobbies, that they might have.

UNWIND [ "Benedict", "Martin", "Rupert", "Una", "Mark", "Louise", "Andrew", "Amanda", "David", "Steven" ] as name
MERGE (:Person { name: name });
UNWIND [ "Jewelry", "Gloves", "Hat", "Glasses", "Cigar", "Lighter" ] as name
MERGE (:Stuff { name: name });
UNWIND [ "Swimming", "Garden", "Books", "Tennis", "Beer", "Stamps", "Yacht", "Chess", "Cars", "Cricket", "Blogs" ] as name
MERGE (:Hobby { name: name });

Perfect. Now let’s mix all this things in random way.

MATCH (p:Person)
WITH collect(p) as persons
MATCH (h:Hobby)
WITH collect(h) as hobbies, persons
MATCH (s:Stuff)
WITH collect(s) as goods, hobbies, persons, range(1, 20) as randoms
UNWIND randoms as random
WITH apoc.coll.randomItems(persons, 2, false) as friends, apoc.coll.randomItem(hobbies) as hobby, apoc.coll.randomItem(goods) as stuff
WITH friends[0] as p1, friends[1] as p2, hobby, stuff
MERGE (p1)-[:FRIENDS]->(p2)
MERGE (p1)-[:INTERESTED_IN]->(hobby)
MERGE (p2)-[:OWNER_OF]->(stuff)

What we have, basically: group of friends. Some of them share common interests. They have some belongings.

Before, I start ask for Cypher help, I want to write down a “murderer” definition. A murderer is a person who:

  1. Not a friend of killed person. And event not a possible friend of person. (Lowest score of possible friendship).
  2. Has motive based on a conflict of interests with killed person. They must share same interests and possible interests (Highest score of possible hobbies).
  3. Evidence based on fact of owing of thing, left on crime place. Or possible borrowed of this item from your direct friend.

Great. Now let’s play in game, who killed Benedict?

MATCH (b:Person { name: "Benedict" })
MATCH (p:Person)
WHERE NOT (b)-[:FRIENDS]-(p) AND id(b) <> id(p)
WITH algo.linkprediction.commonNeighbors(b, p, { relationshipQuery: "FRIENDS" }) AS score, p
RETURN p.name as name, score
ORDER BY score ASC
LIMIT 10

Rupert, Mark, Louise, Amanda and Steven are friends of Benedict.Worst friendship score has Una (0). At the tame time, Andrew and David has equal chances to have a friendship with Benedict (2 common friends). And very possible friend of Benedict is Martin (4 common friends). So most probably murderer is one of this 4 persons. Maybe I can let all the friends and Martin go home, but this 3 person stay here for longer.

Small visualization, why commonNeighbors decided this:

Now let’s check common and possible interests. Seems, like first of all I need to add “HOBBY_MATE” relashinship between everyone in this graph. Where hobbies property of relationships contains the list of hobbies names:

MATCH (p1:Person)-[:INTERESTED_IN]->(h:Hobby)<-[:INTERESTED_IN]-(p2:Person)
WHERE id(p1) > id(p2)
WITH p1, p2, collect(h.name) as hobbies
MERGE (p1)-[hh:HOBBY_MATE]->(p2)
ON CREATE SET hh.hobbies = hobbies
ON MATCH SET hh.hobbies = hobbies

What is possible Benedict’s hobby? I will assume, that this is hobby of his friends. So he most likely start following this interest too.

Because friends are really friendly, I will excluded them from suspicious. Only interested in conflicts of interests and possible conflicts of possible interests with possible friends of Benedict. Now I ready to count the person’s score by conflict of interests (common and possible hobbies).

MATCH (b:Person { name: "Benedict" })-[:FRIENDS]-(f:Person)
MATCH (p:Person)-[m:HOBBY_MATE]-(f)
WHERE NOT (b)-[:FRIENDS]-(p) AND id(b) <> id(p) AND id(p) > 0
WITH p.name as name, collect(m.hobbies) as hobbyArrayOfArrays
WITH name, apoc.coll.flatten(hobbyArrayOfArrays) as plainListOfHobby
WITH name, apoc.coll.toSet(plainListOfHobby) as uniqueHobbyLeftOnly, plainListOfHobby
RETURN name, plainListOfHobby, size(uniqueHobbyLeftOnly) as hobbyCount
ORDER BY hobbyCount DESC
LIMIT 10

Now results tells me, that from possible interests between Benedict and all other people, David and Andrew have biggest possibilities of conflicts. As you can remember, Una has worst friendship score (0), but now, without even shared interests, she out of my suspect.

Maybe I am wrong with me query, please, comment me about it. I am open for your advices. I also can say, that really tried to use algorithm again, but failed with this scenario. Maybe you can help me on that?

Anyway, what we have: Andrew and David — possible, but not higest score friends. And also got same number of possible conflict of interests with Benedict.

All depends on last clue: evidence. Some item, witch person own, or borrow of his/her direct friend only. Reminder: Benedict friends out of scope, but not their possibly borrowed stuff.

I don’t know what item would be found at place of crime. So, I will pre-calculate all possible results, that can help us to know the answer one step ahead police.

MATCH (b:Person { name: "Benedict" })
MATCH (p:Person)
WHERE NOT (b)-[:FRIENDS]-(p) AND id(p) <> id(b)
OPTIONAL MATCH (p)-[:OWNER_OF]-(s1:Stuff)
OPTIONAL MATCH (p)-[:FRIENDS]-(:Person)-[:OWNER_OF]-(s2:Stuff)
WITH p.name as name, collect([s1.name, s2.name]) as stuff
RETURN name, apoc.coll.sort(apoc.coll.toSet(apoc.coll.flatten(stuff))) as stuff
ORDER BY size(stuff) DESC
LIMIT 10

Oh, so much common things… That a pity. Maybe your results with different random generation of graph would be better? I just don’t have time to do that. So, let’s conclude.

Andrew and David — by all parameters they are main suspicious persons. This mean, I can not 100% predict the murderer.

To be true, I can not predict on 100% anything at all. Yes, Cypher not make a Sherlock from me, but at least, we both tried.

Resume

Today I learn a lot about OPTIONAL MATCH and APOC collection functions. Also I would like to have ability to use custom projections in Link Prediction algorithms. This kind of contracts already supports in many others algorithms.

I realize, that “the story” behind topic sometimes add a lot of unnesessary complexity. But this is me — person who generate this complexity. So, I must blame myself. Or follow my crazy ideas. Anyway, I think this make my daily blogging more fun. Possibly. Possibly with score 0. What do you think?

Similar topics

Neo4j PageRank Algorithm and Path Pattern Matching on Flights Domain Model

Neo4j Betweenness Centrality Algorithm detailed and fun example

Neo4j kShortestPaths Algorithm apply to City Map

Neo4j Cosine Similarity Algorithm example

Analyse Neo4j Graph of Books domain model with Jaccard Similarity Algorithm

Resources

--

--

--

Engineering Manager @ Agoda. Neo4j Featured Community Member. Certified Neo4j Professional. Articles brewed on web, hops and indie rock’n’roll.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

What Performance Metrics for Classification Problems Tell Us?

Linear Algebra for Machine Learning

The 5 Rules Of Analyzing Big Data

5 Rules Of Analyzing Big Data

Four Ways to Add Analytics Value to Your Business

Data in the Restaurant Industry, Part 2

Maximizing Grocery Profit Margins with a Weekly Ad Circular

Make a Basic Geo Map with Google Data Studio

Prospects for Big Data Analysis in Marketing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vlad Batushkov

Vlad Batushkov

Engineering Manager @ Agoda. Neo4j Featured Community Member. Certified Neo4j Professional. Articles brewed on web, hops and indie rock’n’roll.

More from Medium

Phenix and Videon Partner To Deliver Real-Time Streaming at Scale

SSD: Single Shot Multibox Detector

10 Open Source Multimodal Emotion Recognition Datasets

Parallelize Graph Computations Using Ligra Framework’s EdgeMap Interface