Intro to Graph Databases Episode #5 – Cypher, the Graph Query Language

hey everyone. Welcome back to the Intro to
Graph Databases Series. This is episode five, and my name is Ryan Boyd from the Neo4j developer
relations team. I’ll be your guide today to teach you an introduction to Cypher, the graph
query language. In order to understand the value in Cypher as a graph query language,
it’s important for you to understand why we created Cypher. And that has to do with the
history of the Neo4j developer surface. When Neo4j first started in around year 2000, we had
an embeddable Java API. This Java API allowed you to imperatively traverse a graph, create
new relationship and nodes, and that sort of thing, but was only accessible within Java.
Come to around the 1X series of Neo4j in 2010 and we wanted to extend these capabilities
from Java into other clients – other clients that are acting outside of the database server
and they’re on the network. And for that, we created a REST-based API. And this REST-based
API was still pretty low level. We had some different requirements, though as we expanded Neo4j’s
adoption and Neo4j became more popular by different communities. So we really wanted
a declarative query language that’s readable and expressive. Very similar to how many developers
use SQL when interacting tables, we wanted a similar declarative language for graphs. We
wanted it to be able to do all CRUD operations. Create, read, update, delete, all the main
operations. Not just be a querying language. We also wanted to base it on patterns. The
core patterns that you’re looking for in a graph. And we wanted to make it really powerful
so that you could actually convert people from using the more imperative languages over
to this declarative language. And we wanted to allow it to be opened up and adopted by
other graph technology. So in order to do this, we invented the next
set of developer surface for Neo4j and that was Cypher over HGP, the Cypher query language
that we’re going to review today. And that Cypher over HGP library also allowed you to
access Neo4j remotely from outside of the Java environment, but provided this declarative
language in order to access it. And then as we advanced on and did a 3.0, 3.1, 3.2 series
of Neo4j, we also launched the Bolt Protocol, a binary protocol that makes it much more
type safe to interact with Neo4j and provides a series of official language drivers – such
as Java, .NET, Python, and JavaScript – and user-defined procedures and functions so that
you could still do those interactions with the Java API if you want to, but call them
from within Cypher. All of the procedures and functions can be called from within Cypher.
And that was very important to us. From an openist perspective, we did release Cypher
under the openCypher Project. openCypher aims to deliver a full and open specification of
the industry’s most widely adopted graph database query language Cypher. You can visit
to find out more about the openCypher Project and other databases such as SAP HANA, which
have adopted the Cypher technology. Now, as I review Cypher, I first want to give
you a reminder little bit of a recap on what property graphs are. A property graph has
a concept of a node and a relationship and properties on both those nodes and relationships.
So in the case here, we have Anne, who loves Dan, Dan who loves Anne back. Anne lives with
Dan and Anne drives a car which is owned by Dan. Dan also drives that same car. So you
can see here that it’s very easy to read this graph even without having a full understanding
of graph databases, and Cypher, and property graphs. It’s really easy to read what’s happening
here. And this is an important characteristic of Neo4j and graph databases is the Whiteboard
Model is the physical model. We try to reduce the number of translations between the business
owner, and the developer, and the underlying system, which is executing and storing data.
So we create the nodes and relationships in the underlying data store as sort of the nouns
and the verbs and create the properties on the nodes as sort of the adjectives and the
properties on the relationships as sort of the adverbs. So that’s the overview or recap
of the Property Graph Model. Cypher as a query language is based off of
patterns. It’s about creating patterns in a graph, patterns of nodes and relationships,
and then it’s about finding those patterns when you’re doing your queries. So here is
an example of one pattern that you might specify. It’s a fairly complex pattern, but very easy
to read, and understand, and even code in Cypher. So we want to find out who drives
a car that is owned by a lover. And in this case, we just write it out. Match a person
who drives a car which is owned by another person and the original person loves that
other person. It’s a really simple query to read and write, even though the question is
a little bit more complex. Now, patterns in Cypher use ASCII-Art. And for those of you
who weren’t around back in the day, ASCII-Art is basically using the keys on the keyboard
in order to generate graphics. And in this case, ASCII-Art for nodes means that you’re
using parentheses to surround nodes. And you can either just use a blank set of parentheses
if you’ll never need to refer to that node again after that part in the query, or you
can specify an alias inside the node, such as P here, in order to refer to that node
later in the query. There are also labels or tags on nodes. These allow you to group
nodes together by rolls and types. So in the case here of a person, a person
is also a mammal, so this person has a second label that is a mammal, and we’re going to
still refer to that person as P as the alias. Now, nodes can also have properties. So, for
instance, you might want to set the name on a person. In this case here, we’re setting
the name on a person as the string value Veronica. These properties can have a wide variety of
different types of values, including a lot of the basic Java types and arrays of the
basic types. Now, ASCII-Art for relationships. Well, relationships are wrapped with hyphens
or square brackets. [So you?]  you can see here. Let’s say you were trying to talk about
the hired relationships. So Joe hired John. You can see here that we could either specify
the relationship: hired with an alias H to refer to that relationship later in the query,
or we can just say that there is a relationship without specifying the type and without specifying
an alias. The direction of the relationship is specified with less than and greater than
simples. So you could see here person one. Let’s say, Kate, hired person two, let’s say,
John, or the vice versa. And in the vice versa case, we have Kate was hired by John or John
hired Kate. But it’s just showing the opposite direction with the less than symbol instead
of the greater than symbol. Relationships can also have properties which
can be specified using the [stance of light?] syntax here. In this case, specifying that
that person was hired as a type fulltime employee. Now I’ve mentioned the words aliases over
time here, and I want to reemphasize this. So the H in hired here, the P-one, the P-two,
these all represent references. So you’re defining references or aliases such that,
later in the query, you can access those references. So in this case here, if these were in MATCH
statement, we might want to return the person back to the– as a response to the query.
Or if these were in a MATCH statement, maybe we want to use that person that we found,
and we want to add additional properties or delete the person, or something along those
lines. So these are simply aliases that make it easier to access the MATCH nodes or relationships
later in the query. Now let’s give you a basic create statement
as well as a basic query statement. These are very simple examples here. Later on, we’ll
get more and more complex, and we’ll give you more complex query operations as well.
To create data inside the graph, you simply specify it very much the same way as you would
if you were trying to query for that data. So if we wanted to create two nodes and a
relationship between the nodes, in this case, we wanted to create two-person nodes as the
label, and have named properties on each of them, and the love’s relationship in between,
we simply use the CREATE statement in Cypher and we say, “CREATE: person brace and that
[Jade?] unlike syntax name Ann loves another person brace and the Jade syntax named Dan.”
And that’s all there really is to it. This is all that it takes to create these two nodes
and the corresponding relationship and the properties on the node in the graph.
Now let’s say we wanted to run a pretty basic query and say, “Okay, we know Ann loves someone.
Who does Ann love? Or whom does Ann love?” And that query is actually quite simple here.
We just say, “Find me a person named Ann who loves another person.” And then we’re– you
can see how we’re using the alias here called OP to return that other person back as a result
of the query. And this case, we’re returning the node and we get Dan of course                  so
Ann loves Dan. And this is a really simple example here, but it does show you the power
of Cypher. What happens if we want to add more properties
to our graph? Well, let’s find Ann’s car. We’re going to find a person whose name is
Ann. And you can see here, I use single quotes here when specifying Ann. You can use either
single quotes or double quotes. And let’s find a person whose name is Ann who drives
a car, and let’s figure out the car that Ann drives. We can actually do this in another
way. The first example was using the JSON-like syntax for specifying the name of the person
that we’re searching for. In this case, we’re actually just saying that the pattern that
we’re searching for is a person who drives a car, and then we’re restricting the traversal
by specifying the name as Ann. Both of these queries should really have the same performance,
but different ones are more readable than others by different people. So take those
two different options and understand that they mean the same thing. But in this case,
we’re returning the car that Ann drives. Now, let’s say we wanted to add greater description
to that car when you wanted to indicate the brand and the model of the car. We can easily
do that with the set operations. So the set syntax in Cypher allows us to set additional
properties on the node that we found. So it’s the same match statement. We’re trying to
find in the graph a person who drives a car, where the person’s name is Ann. And we’re
also trying to return that car. But before we return it, we’re going to set those two
additional properties, the brand and the model, and then the returned car will have those
properties set. One important aspect of dealing with graphs is dealing with the integrity
of the graph, the integrity of the data, and Neo4j really focuses on being a transactional
database. An OLTP has ACID compliance. We’re all about making sure that the data that you
set and the data that you return is all predictable. And one of the aspects of that is ensuring
uniqueness in the graph. We don’t want you to have a bunch of different Anns in your
graph. Let’s say that your graph looked like this. How would you know how to differentiate
one Ann from another Ann? That would be very difficult. So if we’re going to assume here
that the name Ann is unique amongst the population in our graph, we can actually ensure that,
that there can only be one Ann. We can do that with constraints. In this case
here, we’re going to say create constraint on the person-labeled nodes, assert that the
person’s name is unique. It’s a very simple constraint but it’s very powerful to prevent
us from adding multiple Anns. So let’s say we did try to add another Ann. If we try to
add another Ann by creating another person-labeled node with the name of Ann, we would actually
get an error that looks something like this. Constraint validation failed, indicating that
there’s another node in our graph already with the same label and the same property,
and that violates our constraint that we set. Of course, this is all fine and dandy, but we
don’t want to actually experience this error. We want to actually be able to ensure uniqueness
at the time that we create our nodes and relationships. So let’s say that we want Ann to have a pet
dog. So we want to say Ann has pet and named Sam. If we did something like this, create
a person named Ann, has a pet dog named Sam, we’d actually experience the same error as
we talked about before, a constraint validation error, because Ann already exsists in our
database. So instrad of using two create statements here, we can, instead, use a merged statement
and saying merged person named Ann. Basically, what this does is it looks in the graph for
a person named Ann and operates on that person node. If it does not exist, it will create
it. And that way, when you execute the next step in terms of creating the relationships,
you can create that dog attached to either the existing person or a new person.
In this case here, what we’re doing is saying, “Let me find an existing person in the graph
whose name is Ann. And if I do not find that, then when I create a new person with the name
of Ann, set the Twitter property to Ann’s Twitter handle. And then you’ll notice the
last statement here has actually changed the create statement for the pet to a merged statement.
And what this will do is create a pet– or has pet relationship from Ann to a dog named
Sam if that exact pattern doesn’t already exist in the graph, and only if that exact
pattern does not already exist in the graph. So this will not ever result in Ann having
two pets, both as dogs named Sam. All right. So we have created our graph, and
our graph has our Ann in it, has Sam in it, saying Ann has that pet Sam, very basic introduction
to Cypher and some of the create operations, a little bit of querying and a little bit
of modifying operations. And the next video in the series– okay. I probably shouldn’t
that again considering the delay between the last couple of videos. But we will teach you,
either through the next video or through our documentation tutorials in the Neo4j sandbox,
how you can do many more complicated queries on your graph. You can also learn through
our Neo4j online training at, or through classroom training which is accessible
there as well. So thank you very much and I hope you have a fantastic day. And feel
free to reach out if you have any questions

Leave a Reply

Your email address will not be published. Required fields are marked *