2D-Historization in a graph database

Note: This text assumes that you know a little bit about graph databases and Neo4J in particular. If you don’t know Neo4J, please have a read at Neo4J’s documentation

The introduction to 2D-historization dealt with the theory behind representing and reading state changes of your data in a model where we keep the time when the state change was recorded together with the actual time, the time where we want the state change to be applied in the context of whatever it is the application is representing.

My point of orientation when implementing 2D-historized data was Ian Robinson’s post on time-based versioned graphs

In here we find the following principles when representing data in the graph:

Here is some cypher to create an example:

CREATE (p:Person { id: 112245 })
CREATE (pd:PersonData { name: 'Joe Bloggs' })-[:EXPANDS { recorded: 10, actual: 10}]->(p)
CREATE (pd2:PersonData { name: 'Joe Gonzalez' })-[:EXPANDS { recorded: 20, actual: 30}]->(p)

Consider the actual and recorded values to be e.g. days since inception…

The query to get the right history of PersonData changes is pretty straightforward:

MATCH (pd:PersonData)-[r]-(p) 
WHERE p.id = 112245
AND r.recorded <= 30
RETURN r.recorded, r.actual, pd.name
ORDER BY r.recorded DESC

(assuming existence of only the EXPANDS relationship)

However, we’ve previously seen that some entries may be cancelled out, e.g. when you record a future state and later on you record a new state that will be valid before the previously recorded one.

We can consider the following example:

CREATE (p:Person { id: 665544 })
CREATE (pd:PersonData { name: 'A. Brannigan' })-[:EXPANDS { recorded: 10, actual: 10}]->(p)
CREATE (pd2:PersonData { name: 'A. Durington' })-[:EXPANDS { recorded: 20, actual: 40}]->(p)
CREATE (pd3:PersonData { name: 'A. Lovegood' })-[:EXPANDS { recorded: 30, actual: 30}]->(p)

The logic how such state changes should be considered is explained in the previous blog post. To recap:

Can we express this in Cypher? Well, it’s not pretty, but it’s possible*):

MATCH (pd:PersonData)-[r]-(p) 
WHERE p.id = 665544 AND r.recorded <= 30
WITH { data: pd, recorded: r.recorded, actual: r.actual } as data
WITH data ORDER BY data.recorded DESC
WITH reduce(relevant = [], d in collect(data) | 
CASE 
WHEN last(relevant) IS NULL OR d.actual < last(relevant).actual THEN relevant+d 
ELSE relevant END) 
AS data
UNWIND data AS final 
RETURN final.actual, final.data

Depending at which point in time you look at the state you now get two different histories:

or

What the query does is to

As is often the case with graphs, there are quite a few ways you can go about storing data with meaningful relationships. Other representations could be

As usual with graphs, think in advance the kind of questions you will want to ask :)

*) Disclaimer: I am no cypher expert. The query expresses the logic outlined in the first blog post with some additional noise, some compaction may still be possible.