How to persist a graph data structure in a relational database?

I've considered creating a Vertices table and an Edges table but would building graphs in memory and traversing sub-graphs require a large number of lookups? I'd like to avoid excessive database reads. Is there any other way of persisting a graph?

Side note: I've heard of Neo4j but my question is really how to conceptually represent a graph in a standard database. I am open to some NoSQL solutions like mongodb though.


Solution 1:

The answer is unfortunately: Your consideration is completely right in every point. You have to store Nodes (Vertices) in one table, and Edges referencing a FromNode and a ToNode to convert a graph data structure to a relational data structure. And you are also right, that this ends up in a large number of lookups, because you are not able to partition it into subgraphs, that might be queried at once. You have to traverse from Node to Edge to Node to Edge to Node...and so on (Recursively, while SQL is working with Sets).

The point is...

Relational, Graph oriented, Object oriented, Document based are different types of data structures that meet different requirements. Thats what its all about and why so many different NoSQL Databases (most of them are simple document stores) came up, because it simply makes no sense to organize big data in a relational way.

Alternative 1 - Graph oriented database

But there are also graph oriented NoSQL databases, which make the graph data model a first class citizen like OrientDB which I am playing around with a little bit at the moment. The nice thing about it is, that although it persists data as a graph, it still can be used in a relational or even object oriented or document oriented way also (i.e. by querying with plain old SQL). Nevertheless Traversing the graph is the optimal way to get data out of it for sure.

Alternative 2 - working with graphs in memory

When it comes to fast routing, routing frameworks like Graphhopper build up the complete Graph (Billions of Nodes) inside memory. Because Graphhopper uses a MemoryMapped Implementation of its GraphStore, that even works on Android Devices with only some MB of Memory need. The complete graph is read from database into memor at startup, and routing is then done there, so you have no need to lookup the database.

Solution 2:

I faced this same issue and decided to finally go with the following structure, which requires 2 database queries, then the rest of the work is in memory:

Store nodes in a table and reference the graph with each node record:

Table Nodes

id  | title | graph_id
---------------------
105 | node1 | 2
106 | node2 | 2

Also store edges in another table and again reference the graph these edges belong to with each edge:

Table Edges

id | from_node_id | to_node_id | graph_id
-----------------------------------------
1  | 105          | 106        | 2
2  | 106          | 105        | 2

Get all the nodes with one query, then get all the edges with another.

Now build your preferred way to store the graph (e.g., adjacency list) and proceed with your application flow.

Solution 3:

Adding to the previous answers the fact that MS SQL Server adds support for Graph Architecture starting with 2017.

It follows the described pattern of having Nodes and Edges tables (which should be created with special "AS NODE" and "AS EDGE" keywords). Node and edge tables structure

It also has new MATCH keyword introduced "to support pattern matching and traversal through the graph" like this (friend is a name of edge table in the below example):

SELECT Person2.name AS FriendName
FROM Person Person1, friend, Person Person2
WHERE MATCH(Person1-(friend)->Person2)
AND Person1.name = 'Alice';

There is also a really good set of articles on SQL Server Graph Databases on redgate Hub.