How can I express additional information (time, probability) about a relation in RDF?
I know that I can represent any relation as a RDF triplet as in:
Barack Obama -> president of -> USA
(I am aware that this is not RDF, I am just illustrating)
But how do I add additional information about this relation, like for example the time dimension? I mean he is in his second presidential period and any period last only for a lapse of time. And, how about after and before his presidential periods?
There are several options to do this. I'll illustrate some of the more popular ones.
Named Graphs / Quads
In RDF, named graphs are subsets of an RDF dataset that are assigned a specific identifier (the "graph name"). In most RDF databases, this is implemented by adding a fourth element to the RDF triple, turning it from a triple into a "quad" (sometimes it's also called the 'context' of the triple).
You can use this mechanism to express information about a certain collection of statements. For example (using pseudo N-Quads syntax for RDF):
:i1 a :TimePeriod .
:i1 :begin "2009-01-20T00:00:00Z"^^xsd:dateTime .
:i1 :end "2017-01-20T00:00:00Z"^^xsd:dateTime .
:barackObama :presidentOf :USA :i1 .
Notice the fourth element in the last statement: it links the statement "Barack Obama is president of the USA" to the named graph identified by :i
.
The named graphs approach is particularly useful in situations where you have data to express about several statements at once. It is of course also possible to use it for data about individual statements (as the above example illustrates), though it may quickly become cumbersome if used in that fashion (every distinct time period will need its own named graph).
Representing the relation as an object
An alternative approach is to model the relation itself as an object. The relation between "Barack Obama" and "USA" is not just that one is the president of the other, but that one is president of the other between certain dates. To express this in RDF (as Joshua Taylor also illustrated in his comment):
:barackObama :hasRole :president_44 .
:president_44 a :Presidency ;
:of :USA ;
:begin "2009-01-20T00:00:00Z"^^xsd:dateTime ;
:end "2017-01-20T00:00:00Z"^^xsd:dateTime .
The relation itself has now become an object (an instance of the "Presidency" class, with identifier :president_44
).
Compared to using named graphs, this approach is much more tailored to asserting data about individual statements. A possible downside is that it becomes a bit more complex to query the relation in SPARQL.
RDF Reification
Not sure this approach actually still counts as "popular", but RDF reification is the historically W3C-sanctioned approach to asserting "statements about statements". In this approach we turn the statement itself into an object:
:obamaPresidency a rdf:Statement ;
rdf:subject :barackObama ;
rdf:predicate :presidentOf ;
rdf:object :USA ;
:trueBetween [
:begin "2009-01-20T00:00:00Z"^^xsd:dateTime ;
:end "2017-01-20T00:00:00Z"^^xsd:dateTime .
] .
There's several good reasons not to use RDF reification in this case, however:
- it's conceptually a bit strange. The knowledge that we want to express is about the temporal aspect of the relation, but using RDF reification we are saying something about the statement.
- What we have expressed in the above example is: "the statement about Barack Obama being president of the USA is valid between ... and ...". Note that we have not expressed that Barack Obama actually is the president of the USA! You could of course still assert that separately (by just adding the original triple as well as the reified one), but this creates a further duplication/maintenance problem.
- It is a pain to use in SPARQL queries.
As Joshua also indicated in his comment, the W3C Note on defining N-ary RDF relations is useful to look at, as it goes into more depth about these (and other) approaches.