Non-trivial APOC Load HTML example to build 007 Neo4j Graph

May 28: James Bond, One Month Graph Challenge

Welcome word

Domain model

Graph

WITH "https://en.wikipedia.org/wiki/James_Bond" as url
CALL apoc.load.html(url, { title: "table.wikitable tbody tr td:eq(0) a" }) YIELD value
UNWIND value.title[63..87] as item
MERGE (m:Movie { title: item.text })
WITH "https://www.imdb.com/list/ls006405458/" as url
CALL apoc.load.html(url, { title: "div.lister-item-content h3.lister-item-header a:eq(1)", score: "div.lister-item-content div.ipl-rating-star.small span:eq(1)", bond: "div.lister-item-content p a:eq(2)", year: "div.lister-item-content span.lister-item-year", votes: "div.lister-item-content span[name=nv]:eq(1)" }) YIELD value
WITH collect(value) as data
WITH { title: data[3].title, score: data[0].score, bond: data[4].bond, year: data[1].year, votes: data[2].votes } as root
WITH [ x IN range(0, length(root.title) - 1) | { title: root.title[x].text, score: toFloat(root.score[x].text), bond: root.bond[x].text, year: apoc.text.regexGroups(root.year[x].text, "\\d{4}")[0][0], votes: toInteger(apoc.text.replace(root.votes[x].text, ",", "")) }] as result
UNWIND result as item
WITH item
WHERE item.bond <> "David Niven" AND item.bond <> "John Huston" AND item.title <> "Bond 25" // exclude some movies
MATCH (m:Movie { title: item.title })
SET m.year = toInteger(item.year), m.score = item.score, m.votes = item.votes
MERGE (a:Actor { name: item.bond })
MERGE (m)-[:BOND]->(a)
MATCH (m:Movie)-[:BOND]->(a:Actor)
WITH a, avg(m.score) as avg, sum(m.votes) as votes
WITH a, apoc.math.round(avg, 1) as avg, votes
SET a.score = avg
RETURN a.name as bond, a.score as score, votes
ORDER BY score DESC
MATCH (m:Movie)-[:BOND]->(a:Actor)
WITH a, max(m.score) as max
MATCH (m:Movie { score: max })-[:BOND]->(a:Actor)
RETURN a.name as bond, m.title as movie, max as score
ORDER BY score DESC
MATCH (m:Movie)
WITH collect(m) as m
WITH apoc.coll.sortMaps(m, '^year') as movies
UNWIND apoc.coll.pairsMin(movies) as value
MATCH (m1:Movie { title: value[0].title })
MATCH (m2:Movie { title: value[1].title })
MERGE (m1)-[:NEXT { diff: value[1].year - value[0].year }]->(m2)
MATCH (a1:Actor)<-[b1:BOND]-(m1:Movie)-[n:NEXT]-(m2:Movie)-[b2:BOND]->(a2:Actor)
RETURN *
MATCH (m1:Movie)-[n:NEXT]->(m2:Movie)
RETURN m1.year as from, m2.year as to, max(n.diff) as numberOfYearsWithoutBond
ORDER BY numberOfYearsWithoutBond DESC
LIMIT 1

Resume

Resources

Engineering Technical Lead @ Agoda. Neo4j Featured Community Member. Certified Neo4j Professional. Articles brewed on web, hops and indie rock’n’roll.