Photo by Dmitry Nucky Thompson on Unsplash

Non-trivial APOC Load HTML example to build 007 Neo4j Graph

May 28: James Bond, One Month Graph Challenge

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

Today is a birthday of Ian Lancaster Fleming. English writer, author of novels about James Bond. The world well-known character of agent 007 already has been represented everywhere: films, television, radio, comics and video games. Now it is time for Neo4j.

I plan to build a small graph of Bondiana movies. You can say, hey, another one topic about scores, ranks, best and worst? Definitely maybe.

After simple analysis of 24 movies about agent 007 (excluding upcoming movie number 25) I want to figure out why we love James Bond so much.


Data first. Need data? Start with Wikipedia.

WITH "" as url
CALL apoc.load.html(url, { title: "table.wikitable tbody tr td:eq(0) a" }) YIELD value
UNWIND value.title[63..87] as item
MERGE (m:Movie { title: item.text })

Now, knowing films, I can append this nodes with more rich data from IMDB.

WITH "" as url
CALL apoc.load.html(url, { title: "div.lister-item-content h3.lister-item-header a:eq(1)", score: "div.lister-item-content div.ipl-rating-star.small span:eq(1)", bond: "div.lister-item-content p a:eq(2)", year: "div.lister-item-content span.lister-item-year", votes: "div.lister-item-content span[name=nv]:eq(1)" }) YIELD value
WITH collect(value) as data
WITH { title: data[3].title, score: data[0].score, bond: data[4].bond, year: data[1].year, votes: data[2].votes } as root
WITH [ x IN range(0, length(root.title) - 1) | { title: root.title[x].text, score: toFloat(root.score[x].text), bond:[x].text, year: apoc.text.regexGroups(root.year[x].text, "\\d{4}")[0][0], votes: toInteger(apoc.text.replace(root.votes[x].text, ",", "")) }] as result
UNWIND result as item
WITH item
WHERE <> "David Niven" AND <> "John Huston" AND item.title <> "Bond 25" // exclude some movies
MATCH (m:Movie { title: item.title })
SET m.year = toInteger(item.year), m.score = item.score, m.votes = item.votes
MERGE (a:Actor { name: })
MERGE (m)-[:BOND]->(a)

Here our 30 nodes of Bonds and movies, where they appeared. Let’s find score for each James Bond as avg score of all scores of his movies.

MATCH (m:Movie)-[:BOND]->(a:Actor)
WITH a, avg(m.score) as avg, sum(m.votes) as votes
WITH a, apoc.math.round(avg, 1) as avg, votes
SET a.score = avg
RETURN as bond, a.score as score, votes

Looks like, first and last Bond are the best. But this is not all the research.

Ok. What about the movies. For example, best movie for each Bond?

MATCH (m:Movie)-[:BOND]->(a:Actor)
WITH a, max(m.score) as max
MATCH (m:Movie { score: max })-[:BOND]->(a:Actor)
RETURN as bond, m.title as movie, max as score

Casino Royale. Damn. Seems like, I didn’t watch this movie. And it is a big blunder.

I don’t like the thing, that in this graph so little relationships. I want more. And need to find anything possible to link together. For example, let’s link all movies with NEXT relationship into one chain by year it’s released. As a diff property I will setup the difference between next movie release.

MATCH (m:Movie)
WITH collect(m) as m
WITH apoc.coll.sortMaps(m, '^year') as movies
UNWIND apoc.coll.pairsMin(movies) as value
MATCH (m1:Movie { title: value[0].title })
MATCH (m2:Movie { title: value[1].title })
MERGE (m1)-[:NEXT { diff: value[1].year - value[0].year }]->(m2)

Now I can represent the whole chain of Bond and their Movies.

MATCH (a1:Actor)<-[b1:BOND]-(m1:Movie)-[n:NEXT]-(m2:Movie)-[b2:BOND]->(a2:Actor)

And also find the longest period without James Bond new series.

MATCH (m1:Movie)-[n:NEXT]->(m2:Movie)
RETURN m1.year as from, m2.year as to, max(n.diff) as numberOfYearsWithoutBond
ORDER BY numberOfYearsWithoutBond DESC


That’s it. Today I built a super small graph and even invent few relationships to have at least something, to be able to spend my learning time on it.

Hope you find this stuff easy to try experimenting by yourself. Cheers.




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vlad Batushkov

Vlad Batushkov


Engineering Manager @ Agoda. Neo4j Featured Community Member. Certified Neo4j Professional. Articles brewed on web, hops and indie rock’n’roll.