Analyse Magazine Domain Model with Advanced Neo4j Cypher Querying and APOC

May 30: Gazette, One Month Graph Challenge

5 min readMay 30, 2019

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

May 30, 1631 (388 years ago) in France, a newspaper called “La Gazette” was published, and soon the word “Gazette” was included in all European languages. In Russian language we use exactly this word “Газета” (Gazeta).

So, as you understand, my last OMGChallenge topic and Neo4j graph would be dedicated to newspapers. I will build small graph of stats about audience of newspapers, magazines and other periodics. Based on this data I want to analyse where to put that or another advertise.

Graph

I found list of 100 United States magazines at Wikipedia and will use it as a base for further random generation of other necessary data. I left it here: https://vbatushkov.bitbucket.io/magazines.csv

LOAD CSV WITH HEADERS FROM 'https://vbatushkov.bitbucket.io/magazines.csv' AS line
MERGE (m:Magazine { name: line.name, circulation: toInteger(line.circulation)})

I want to have some statistical information about this magazines, and I don’t have real source, so, I will create it by myself with pure random approach. So, don’t blame me if some well-known magazine like Playboy will suddenly become a cheap woman-read magazine about ice-hokkey.

MERGE (m:Gender { name: "Man" })
MERGE (w:Gender { name: "Woman" })
MERGE (a1:Age { name: "Baby", from: 1, to: 5 })
MERGE (a2:Age { name: "Kid", from: 6, to: 12 })
MERGE (a3:Age { name: "Teenage", from: 13, to: 21 })
MERGE (a4:Age { name: "Young", from: 22, to: 35 })
MERGE (a5:Age { name: "Adult", from: 36, to: 65 })
MERGE (a6:Age { name: "Senior", from: 66, to: 100 })
MERGE (t1:Topic { name: "Sport" })
MERGE (t2:Topic { name: "Automobile" })
MERGE (t3:Topic { name: "Fashion" })
MERGE (t4:Topic { name: "Science" })
MERGE (t5:Topic { name: "Games" })
MERGE (t6:Topic { name: "Technologies" })
MERGE (t7:Topic { name: "Health" })
MERGE (t8:Topic { name: "Music" })
MERGE (t9:Topic { name: "Erotics" })
MERGE (t10:Topic { name: "Nature" })
MERGE (t11:Topic { name: "Lifestyle" })
MERGE (t12:Topic { name: "Cooking" })
MERGE (t13:Topic { name: "Travel" })
MERGE (t14:Topic { name: "Finance" })
MERGE (t15:Topic { name: "Politics" })

I decide to have just 3 segments: Gender, Age and Topic. Now let’s mix all together.

For each magazine I will set randomly property of price per unit and advCost as cost for advertisement. Also magazine can have 1 or 3 specific topics.

MATCH (m:Magazine)
WITH m, range(2,20) as prices, range(5,10) as costs
SET m.price = apoc.coll.randomItem(prices), m.advCost = apoc.coll.randomItem(costs)
WITH m
MATCH (t:Topic)
WITH m, collect(t) as topics, apoc.coll.randomItem([1,3]) as topicsNum
WITH m, apoc.coll.randomItems(topics, topicsNum, false) as pickedTopics
FOREACH (pt IN pickedTopics |
 MERGE (m)-[:ABOUT]->(pt)
)

Now let’s add Audience for each magazine. Here could be one or two Age segments for specific Gender or both genders. For example, some magazine can have 80% of audience like Teenage Woman + 20% of Senior Man. Another one can have 100% Kids of both genders.

MATCH (a:Age)
WITH collect(a) as ages
MATCH (m:Magazine)
WITH m, ages, apoc.coll.randomItem([1,2]) as ageNum, apoc.coll.randomItem([10,20,30,40,50,60,70,80,90]) as agePercentage
WITH m, apoc.coll.randomItems(ages, ageNum, false) as pickedAges, CASE WHEN ageNum = 1 THEN [100] ELSE [agePercentage, 100 - agePercentage] END AS agePercentages
WITH m, [x IN range(0, length(pickedAges) - 1) | { age: pickedAges[x].name, percent: agePercentages[x], gender: apoc.coll.randomItem(["Man","Woman","All"]) }] as data
UNWIND data as item
MATCH (g:Gender)
WHERE item.gender = "All" OR (item.gender = "Man" AND g.name = "Man") OR (item.gender = "Woman" AND g.name = "Woman")
WITH item, g, m
MATCH (age:Age { name: item.age })
MERGE (a:Audience { name: item.age + " " + item.gender })
MERGE (g)<-[:OF]-(a)-[:OF]->(age)
MERGE (m)-[:READ_BY { value: item.percent }]->(a)

All Audience ranges by Gender and Age. Looks cool.

As I expected, some fun facts about generated preferences. You can see, The Family Handyman journal is about Erotics, and Baby Man (1–5 year old) audience like to read, and especially Cosmopolitan. It is really fun.

Now let’s try to find best place for my advertising. But, what I want to advertise? Maybe, my blog? Why not?! One Month Graph Challenge is a good idea to popularize. I want more people to be inspired by software development and participated in stuff, like this challenge. Then criteria for my advertise would be like this:

Age: Could be some Teenage, for sure Young and Adult.
Gender: Man and Woman all together.
Topics: Technologies only, no other topics, that can match better.
Circulation? As much as possible. Let everybody know.
Price? Cheapest! Let everybody can buy.
Budget for advertising? I can pay, but not many. It depends on other params. Less is better for sure.

MATCH (t:Topic { name: "Technologies"})<-[:ABOUT]-(m:Magazine)-[r:READ_BY]->(a:Audience)-[:OF]-(age:Age)
WHERE age.name IN ["Teenage", "Young", "Adult"] AND (:Gender { name: "Man" })-[]-(a)-[]-(:Gender { name: "Woman" })
RETURN DISTINCT m.name as magazine, m.price as costToBuy, m.advCost as budgetToAdvertise, m.circulation as impact, a.name as audience, r.value as percentage
ORDER BY m.circulation DESC, m.price ASC, m.advCost ASC
LIMIT 10

“The New Yorker” might be a good option, but it have only 50% of required audience, so I will prefer to have a deal with “Money” magazine.

Resume

OMG, it is done!

Last topic was fun and interesting to discover. Building a proper structure for relationships between nodes was the most time consuming, but also fun part. If you have any ideas, comments or possibilities to advertise this challenge in any social media — just share. I would be very appreciated.

Time to conclude the things. Tomorrow’s post would be kind of post scriptum at the last day of month with closing the Challenge with publishing all graphs results.

Cheers!