Car Owner domain Neo4j Graph Data Modeling and Pattern Matching Analyses
May 26: Volkswagen, One Month Graph Challenge
Welcome word
In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge
Domain model
May 26, 1938 (81 year ago), the day of foundation for the first Volkswagen factory.
The history of the “Volkswagen” concern began in 1934. When A. Hitler, who just headed the Germany, put a demand to develop a small, reliable and cheap car for the German people. Name of Volkswagen mean “volk” — the people, “wagen” — the car. So then, first vehicle of this new brand become super successful and init a start of brand’s great story. It was well-known “bug”.
Today I want to create random, but pretty fun domain model of automobile owners and find the most interesting guys from them. Final result schema of this model would be like this:

Graph
I found nice list of vehicles at https://github.com/n8barr/automotive-model-year-data/blob/master/data.sql and copy it into my repo at https://vbatushkov.bitbucket.io/auto.csv. Let’s load this data as a graph of brands and models.
LOAD CSV WITH HEADERS FROM "https://vbatushkov.bitbucket.io/auto.csv" AS line
MERGE (b:Brand { name: line.Brand })
MERGE (m:Model { name: line.Model, year: toInteger(line.Year) })
MERGE (m)-[:OF]->(b)

As you can see, it is my lovely “dandelion” schema. Next step is much more fun. I want to generate automobile owners. Some small amount of people, and let them own some cars (model’s instances).
When I start talking about cars, I memory Fury Road. So I will add characters from my favorite Mad Max franchise into this graph.
WITH "https://www.yahoo.com/entertainment/mad-max-character-names-118952450057.html" as url
CALL apoc.load.html(url, { data: "article p" }) YIELD value
UNWIND value.data as item
WITH apoc.text.regexGroups(item.text, "(\\d+. )(.+)") as name
WHERE name[0][2] <> ""
MERGE (:AutoOwner { name: name[0][2] })
Ok, time to share cars across owners.
MATCH (p:AutoOwner)
MATCH (m:Model)
WITH collect(id(m)) as models, range(2, 10) as numbersOfCars, p
WITH apoc.coll.randomItem(numbersOfCars) as numberOfCars, models, p
WITH apoc.coll.randomItems(models, numberOfCars, true) as pickedModels, p
MATCH (m:Model)-[:OF]->(b:Brand)
WHERE apoc.coll.indexOf(pickedModels, id(m)) > -1
CREATE (c:Car { title: b.name + " " + m.name + ", " + toString(m.year) })-[:IS]->(m)
CREATE (c)-[:OWNED_BY]->(p)
Car node is just an aggregation root, that represent a fact: person own this model of car. This is why I decided to put aggregated information inside it. Now I want to know what cars belongs to Max, Furiosa and Immortan Joe.
MATCH (p:AutoOwner)<-[:OWNED_BY]-(c:Car)-[:IS]->(m:Model)
WHERE p.name = "Max Rockatansky*~+^" OR p.name = "Imperator Furiosa^" OR p.name = "Immortan Joe^"
RETURN p, c, m
Sorry, for that special chars (*~+^) it comes from web-site, where I grab owners. And I am fine with this fact. Trying to kill perfectionist inside myself.

Mad Max got Mitsubishi Mighty Max — if it is not a destiny, then I don’t know what is it.
Ok, let’s analyse data a bit closer. What is the most popular car models in graph? Car model, that belongs to several people.
MATCH (p1:AutoOwner)<-[:OWNED_BY]-(c1:Car)-[:IS]->(m:Model)<-[:IS]-(c2:Car)-[:OWNED_BY]->(p2:AutoOwner)
WHERE id(p1) > id(p2)
RETURN p1, p2, c1, c2, m

Let’s also count who is owner of only “Retro” cars. There are many definitions of classes like Classic, Antique, Vintage and other . I will do my own range. All cars before 1960 would be a Classic models, in between 1960 and 1980 is Retro, and from 1980 simply Modern. I really don’t pretend on truth here, just I want to have some ranges. Let’s link classes with models. I not found better approach, than run 3 queries. Maybe you can suggest me, how to write it in just one, but fast query?
MERGE (cl:Class { name: "Classic" })
WITH cl
MATCH (m:Model)
WHERE m.year < 1960
MERGE (m)-[:CLASSIFIED_AS]->(cl);
MERGE (re:Class { name: "Retro" })
WITH re
MATCH (m:Model)
WHERE m.year >= 1960 AND m.year < 1980
MERGE (m)-[:CLASSIFIED_AS]->(re);
MERGE (mo:Class { name: "Modern" })
WITH mo
MATCH (m:Model)
WHERE m.year >= 1980
MERGE (m)-[:CLASSIFIED_AS]->(mo);
Let’s see who is our “collectionists”? What kind of car model they have?
MATCH (p:AutoOwner)<-[:OWNED_BY]-(c:Car)-[:IS]-(:Model)-[:CLASSIFIED_AS]-(cl:Class)
WITH p.name as owner, collect(DISTINCT cl.name) as classes, count(c) as carsCount
RETURN owner, carsCount, classes as followerOfClass
ORDER BY size(followerOfClass) DESC, carsCount DESC

Sad, but no one follow pure Retro or Classic models. Everyone prefer to have a modern cars. And only few persons own mixed models. Maybe this is only way to survive on deadly lands of Wasteland.
Resume
I had fun with building a car domain model and writing simple queries for it. What about you? Maybe we can apply some algorithm here? Please, share your ideas in comments.
What a lovely day.