
Neo4j graph Data Modeling of Star Wars Universe with APOC load json
May 4: Star Wars, One Month Graph Challenge
Welcome word
In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge
Domain model
May the 4th be with you. May the force be with you. This epic frase connect millions of people around the world. Everybody likes Star Wars. Serously. So, today I will try to build small graph of Star Wars. I hope public Star Wars API will help me in my challendge duty.
I faced parsing issue with direct use of https://swapi.co/api/, but this unexpected problem not stops me. Thanks to Florent Georges, who imported all the data to public github repo https://github.com/fgeorges/star-wars-dataset. Man, you saved my day! Finally, I published this json file as standalone resource here: https://vbatushkov.bitbucket.io/swapi.json.
Structure of json file looks like this:
{
"root": {
"people": "http://swapi.co/api/people/",
"planets": "http://swapi.co/api/planets/",
"films": "http://swapi.co/api/films/",
"species": "http://swapi.co/api/species/",
"vehicles": "http://swapi.co/api/vehicles/",
"starships": "http://swapi.co/api/starships/"
},
"people": [{
"url": "http://swapi.co/api/people/1/",
"name": "Luke Skywalker",
"homeworld": "",
"films": [],
"species": [],
"vehicles": [],
"starships": []
}],
"planets": [{
"url": "http://swapi.co/api/planets/3/",
"name": "Alderaan",
"residents": [],
"films": [],
}],
"films": [{
"url": "http://swapi.co/api/films/1/",
"title": "A New Hope",
"characters": [],
"planets": [],
"starships": [],
"vehicles": [],
"species": []
}],
"species": [{
"url": "http://swapi.co/api/species/5/",
"name": "Hutt",
"homeworld": "",
"people": [],
"films": []
}],
"vehicles": [{
"url": "http://swapi.co/api/vehicles/4/",
"name": "Sand Crawler",
"pilots": [],
"films": []
}],
"starships": [{
"url": "http://swapi.co/api/starships/5/",
"name": "Sentinel-class landing craft",
"pilots": [],
"films": []
}]
}
I left only important properties inside entities: url is used as a unique id, name (title) used as representation field and arrays must help to connect to other entities. Let’s overview upcoming labeled nodes from this data and all the relationships between them.
Labels: Film, Character, Planet, Species, Vehicle, Starship.
Relationships: Films have links to all other types, so every node potentially have relationship to film (*)-[:APPEARED_IN]->(:Film). Planet is a homewolrd for Species and Character, so relationship would be (:Species)- AND (:Character)-[:HOMEWORLD]->(:Planet). Character is one of some Species, lead to (:Character)-[:OF]->(:Species). Also Character can pilot different transport, so (:Character)-[:PILOT]->(:Starship) AND ->(:Vehicle). Here is the schema, that I expect to build:

Graph
As a small training, list all characters involved in Star Wars saga:
To simplify flow, first I only create nodes without relationships. Main nodes of the saga, includes filmes, characters and planets:
Then rest of the world, includes species, vehicles and starships:

Cool! But nodes without relationships still not a graph. Time to connect all the nodes, based on links they have.
Connect characters, planets and scecies with films they are appeared in:
Connect vehicels nd starships with films they are appeared in:
Some nodes still not connected. Why? Actually, json file have more things, than appeared in movies. It is easy to see by example of Ojom planet. Planet not present in any film, but it is a homeworld for Dexter Jettster. This relation will be added a bit later.
Planet
{
"name": "Ojom",
"residents": ["http://swapi.co/api/people/71/"],
"films": [],
"url": "http://swapi.co/api/planets/55/"
}Character
{
"name": "Dexter Jettster",
"homeworld": "http://swapi.co/api/planets/55/",
"films": ["http://swapi.co/api/films/5/"],
"species": ["http://swapi.co/api/species/31/"],
"vehicles": [],
"starships": [],
"url": "http://swapi.co/api/people/71/"
}
First, build all character-related relationships:
What is left? Right, species to planet as a homewolrd relationship:

I think a bit about data, and have an ideas to add labels like Transport to all nodes of vehicles and starships, and Hero — for only main characters of saga. Suggest your ideas in comments. Maybe we can improve and make something really fun. But for now, I will left it without changes.

Resume
Wow… Quite a big work actually. It takes much more time, than I expect. Definitely, I need to find a better way to link all the data, maybe in single script, instead of doing this many similar separated scripts. Anyway, it was real fun to descover Star Wars world.
I plan to comeback to this schema again with some easy queries across the data. Interesting to get some information from this beautiful graph. But later.
Similar topics
Simple example of N-ary relationship to understand basics of Neo4j Graph Data Modeling
Initial Neo4j Graph Data Modeling of Train Ticket Booking System
Unusual use of Neo4j Common Neighbors Algorithm on people community graph
Neo4j Shortest Path and Dijkstra Algorithms
Apply Neo4j Similarity Algorithm to analyse Chess “openings”
Build a Neo4j Graph of Moscow Metro Map with Spatial Values and Shortest Path Algorithm