RDF star Articles - Enterprise Knowledge

The Resource Description Framework (RDF)

EK Team — Mon, 24 Feb 2025 19:33:53 +0000

Simply defined, a knowledge graph is a network of entities, their attributes, and how they’re related to one another. While these networks can be captured and stored in a variety of formats, most implementations leverage a graph based tool or database. However, within the world of graph databases, there are a variety of syntaxes or flavors that can be used to represent knowledge graphs. One of the most popular and ubiquitous is the Resource Description Framework (RDF), which provides a means to capture meaning, or semantics, in a way that is interpretable by both humans and machines.

What is RDF?

The Resource Description Framework (RDF) is a semantic web standard used to describe and model information for web resources or knowledge management systems. RDF consists of “triples,” or statements, with a subject, predicate, and object that resemble an English sentence. For example, take the English sentence: “Bess Schrader is employed by Enterprise Knowledge.” This sentence has:

A subject: Bess Schrader
A predicate: is employed by
An object: Enterprise Knowledge

Bess Schrader and Enterprise Knowledge are two entities that are linked by the relationship “employed by.” An RDF triple representing this information would look like this:

What is the goal of using RDF?

RDF is a semantic web standard, and thus has the goal of representing meaning in a way that is interpretable by both humans and machines. As humans, we process information through a combination of our experience and logical deduction. For example, I know that “Washington, D.C.” and “Washington, District of Columbia” refer to the same concept based on my experience in the world – at some point, I learned that “D.C.” was the abbreviation for “District of Columbia.” On the other hand, if I were to encounter a breathing, living object that has no legs and moves across the ground in a slithering motion, I’d probably infer that it was a snake, even if I’d never seen this particular object before. This determination would be based on the properties I associate with snakes (animal, no legs, slithers).

Unlike humans, machines have no experience on which to draw conclusions, so everything needs to be explicitly defined in order for a machine to process information this way. For example, if I want a machine to infer the type of an object based on properties (e.g. “that slithering object is a snake”), I need to define what a snake is and what properties it has. If I want a machine to reconcile that “Washington, D.C.” and “Washington, District of Columbia” are the same thing, I need to define an entity that uses both of those labels.

RDF allows us to create robust semantic resources, like ontologies, taxonomies, and knowledge graphs, where the meaning behind concepts is well defined in a machine readable way. These resources can then be leveraged for any use case that requires context and meaning to connect and unify data across disparate formats and systems, such as semantic layers and auto-classification.

How does RDF work?

Let’s go back to our single triple representing the fact that “Bess Schrader works at Enterprise Knowledge.”

We can continue building out information about the entities in our (very small) knowledge graph by giving all of our subjects and objects types (which indicate the general category/class that an entity belongs to) and labels (which capture the language used to refer to the entity).

These types and labels are helping us define the semantics, or meaning, of each entity. By explicitly stating that “Bess Schrader” is a person and “Enterprise Knowledge” is an organization, we’re creating the building blocks for a machine to start to make inferences about these entities based on their types.

Similarly, we can create a more explicit definition of our relationship and attributes, allowing machines to better understand what the “employed by” relationship means. While the above diagram represents our predicate (or relationship) as a straight line between two entities, in RDF, our predicate is itself an entity and can have its own properties (such as type, label, and description). This is often referred to as making properties “first class citizens.”

Uniform Resource Identifiers (URIs)

But how do we actually make this machine readable? Diagrams in a blog are great in helping humans understand concepts, but machines need this information in a machine readable format. To make our graph machine readable, we’ll need to leverage unique identifiers.

One of the key elements of any knowledge graph (RDF or otherwise) is the principle of “things, not strings.” As humans, we often use ambiguous labels (e.g. “D.C”) when referring to a concept, trusting that our audience will be able to use context to determine our meaning. However, machines often don’t have sufficient context to disambiguate strings – imagine “D.C.” has been applied as a tag to an unstructured text document. Does “D.C.” refer to the capital city of the US, the comic book publisher, “direct current,” or something else entirely? Knowledge graphs seek to reduce this ambiguity by using entities or concepts that have unique identifiers and one or more labels, instead of relying on labels themselves as unique identifiers.

RDF is no exception to this principle – all RDF entities are defined using a Uniform Resource Identifier (URI), which can be used to connect all of the labels, attributes, and relationships for a given entity.

Using URIs, our RDF knowledge graph would look like this:

These URIs make our triples machine readable by creating unambiguous identifiers for all of our subjects, predicates, and objects. URIs also enable interoperability and the ability to share information across multiple systems – because these URIs are globally unique, any two systems that reference the same URI should be referring to the same entity.

What are the advantages to using RDF?

The RDF Specification has been maintained by the World Wide Web Consortium (W3C) for over two decades, meaning it is a stable, well documented framework for representing data. This makes it easy for applications and organizations to develop RDF data in an interoperable way. If you create RDF data in one tool and share it with someone else using a different RDF tool, they will still be able to easily use your data. This interoperability allows you to build on what’s already been done — you can combine your enterprise knowledge graph with established, open RDF datasets like Wikidata, jump-starting your analytic capabilities. This also makes data sharing and migration between internal RDF systems simple, enabling you to unify data and reducing your dependency on a single tool or vendor.

The ability to treat properties as “first-class citizens” with their own properties allows you to store your data model along with your data, explaining what properties mean and how they should be used. This reduces ambiguity and confusion for both data creators, developers, and data consumers. However, this ability to treat properties as entities also allows organizations to standardize and connect existing data. RDF data models can store multiple labels for the same property, enabling them to act as a “Rosetta Stone” that translates metadata fields and values across systems. Connecting these disparate metadata values is crucial to being able to effectively retrieve, understand, and use enterprise data.

Many implementations of RDF also support inference and reasoning, allowing you to explore previously uncaptured relationships in your data, based on logic developed in your ontology. This reasoning capability can be an incredibly powerful tool, helping you gain insights from your business logic. For example, inference and reasoning can capture information about employee expertise – a relationship that’s notoriously difficult to explicitly store. While many organizations attempt to have employees self-select their skills or areas of expertise, the completion rate of these self-selections is typically low, and even those that do complete the selection often don’t keep them up to date. Reasoning in RDF can leverage business logic to automatically infer expertise based on your organization’s data. For example, if a person has authored multiple documents that discuss a given topic, an RDF knowledge graph may infer that this person has knowledge of or expertise in that topic.

What are the disadvantages to using RDF?

To fully leverage the benefits of RDF, entities must be explicitly defined (see best practices below), which can require burdensome overhead. The volume and structure of these assertions, combined with the length and format of Uniform Resource Identifiers (URIs), can make getting started with RDF challenging for information professionals and developers used to working with more straightforward (albeit more ambiguous) data models. While recent advancements in generative AI have great potential to make the learning curve to RDF less onerous via human-in-the-loop RDF creation processes, learning to create and work with RDF still poses a challenge to many organizations.

Additionally, the “triple” format (subject – predicate – object) used by RDF only allows you to connect two entities at a time, unlike labeled property graphs. For example, I can assert that “Bess Schrader -> employed by -> Enterprise Knowledge,” but it’s not very straightforward in RDF to then add additional information about that relationship, such as what role I perform at Enterprise Knowledge, my start and end dates of employment, etc. While a proposed modification to RDF called RDF* (RDF-star) has been developed to address this, it has not been officially adopted by the W3C, and implementation of RDF* in RDF compliant tools has occurred only on an ad hoc basis.

What are some best practices when using RDF to create a knowledge graph?

RDF, and knowledge graphs in general, are well known for their flexibility – there are very few restrictions on how data must be structured or what properties must be used for their implementation. However, there are some best practices when using RDF that will enable you to maximize your knowledge graph’s utility, particularly for reasoning applications.

All concepts should be entities with a URI

The guiding principle is “things, not strings”. If you’re describing something with a label that might have its own attributes, it should be an entity, not a literal string.

All entities should have a label

Using URIs is important, but a URI without at least one label is difficult to interpret for both humans and machines.

All entities should have a type

Again, remember that our goal is to allow machines to process information similarly to humans. To do this, all entities should have one or more types explicitly asserted (e.g. “Washington, D.C” might have the type “City”).

All entities should have a description

While using URIs and labels goes a long way in limiting ambiguity (see our “D.C.” example above), adding descriptions or definitions for each entity can be even more helpful. A well written description for an entity will leave little to no question around what this entity represents.

Following these best practices will help with reuse, governance, and reasoning.

Want to learn more about RDF, or need help getting started? Contact us today.

The post The Resource Description Framework (RDF) appeared first on Enterprise Knowledge.

RDF*: What is it and Why do I Need it?

EK Team — Fri, 24 Jul 2020 16:24:24 +0000

RDF* (pronounced RDF star) is an extension to the Resource Description Framework (RDF) that enables RDF graphs to more intuitively represent complex interactions and attributes through the implementation of embedded triples. This allows graphs to capture relationships between more than two entities, add metadata to existing relationships, and add provenance information to all triples, reducing the burden of maintenance.

But let’s back up…before we talk about RDF*, let’s cover the basics — what is RDF, and how is RDF* different from RDF?

What is RDF?

For example, take the English sentence: “Bess Schrader is employed by Enterprise Knowledge.” This sentence has:

A subject: Bess Schrader
A predicate: is employed by
An object: Enterprise Knowledge

Bess Schrader and Enterprise Knowledge are two entities that are linked by the relationship is employed by. An RDF triple representing this information would look like this:

(There are many ways, or serializations, to represent RDF. In this blog, I’ll be using the Turtle syntax because it’s easy to read, but this information could also be shown in RDF/XML, JSON for Linking Data, and other formats.)

The World Wide Web Consortium (W3C) maintains the RDF Specification, making it easy for applications and organizations to develop RDF data in an interoperable way. This means if you create RDF data in one tool and share it with someone else using a different RDF tool, they will still be able to easily use your data. This interoperability allows you to build on what’s already been done — you can combine your enterprise knowledge graph with established, open RDF datasets like Wikidata, jump starting your analytic capabilities. This also makes data sharing and migration between internal RDF systems simple, enabling you to unify data and reducing your dependency on a single tool or vendor.

For more information on RDF and how it can be used, check out Why a Taxonomist Should Know SPARQL.

What are the limitations of RDF (Why is RDF* necessary)?

Standard RDF has many strengths:

Like most graph models, it more intuitively captures the way we think about the world as humans (as networks, not as tables), making it easier to design, capture, and query data.
As a standard supported by the W3C, it allows us to create interoperable data and systems, all using the same standard to represent and encode data.

However, it has one key weakness: because RDF is based on triples, standard RDF can only connect two objects at a time. For many use cases, this limitation isn’t a problem. Consider my example from above, where I want to represent the relationship between me and my employer:

Simple! However, what if I want to capture the role or position that I hold at this organization? I could add a triple denoting my position:

Great! But what if I decide to add in my (fictional) employment history?

Now it’s unclear whether I was a consultant at Enterprise Knowledge or at Hogwarts.

There are a variety of ways to address this problem in RDF. One of the most popular is reification or n-ary relations, in which you create an intermediary node that allows you to group more than two entities together. For example:

Using this technique allows you to clear up confusion and model the complexity of the world. However, adding these intermediary nodes takes away some of the simplicity of graph data — the idea of an “employment event” isn’t exactly intuitive.

There are many other methods that have been developed to handle this kind of complexity in RDF, including singleton properties and named graphs/quads. Additionally, an entirely different type of non-RDF graph model, labeled property graphs, allows users to attach properties directly to relationships. However, labeled property graphs don’t allow for interoperability at the same scale as RDF — it’s much harder to share and combine different data sets, and moving data from tool to tool isn’t as simple.

None of these solutions retain both of the strengths of RDF: the interoperable standards and the intuitive data model. This crucial limitation of RDF has limited its effectiveness in certain applications, particularly those involving temporal or transactional data.

What is RDF*?

RDF* (pronounced RDF-star) is an extension to RDF that proposes a solution to the weaknesses of RDF mentioned above. As an extension, RDF* supplements RDF but doesn’t replace it.

The main idea behind RDF* is to treat a triple as a single entity. By “nesting” or “embedding” triples, an entire triple can become the subject of a second triple. This allows you to add metadata to triples, assigning attributes to a triple, or creating relationships not just between two entities in your knowledge graph, but between triples and entities, or triples and triples. Take our example from above. In standard RDF, if I want to express past employers and positions, I need to use reification:

In RDF*, I can use nested triples to simply denote the same information:

This eliminates the need for intermediary entities and makes the model easier to understand and implement.

Just as standard RDF can be queried via the SPARQL query language, RDF* can be queried using SPARQL*, allowing users to query both standard and nested triples.

Currently, RDF* is under consideration by the W3C and has not yet been officially accepted as a standard. However, the specification has been formally defined in Foundations of an Alternative Approach to Reification in RDF, and many enterprise tools supporting RDF have added support for RDF* (including BlazeGraph, AnzoGraph, Stardog, and GraphDB ). Hopefully this standard will be formally adopted by the W3C, allowing it to retain and build on the original strengths of RDF: its intuitive model/simplicity and interoperability.

What are the benefits of RDF*?

As you can see above, RDF* can be used to represent relationships that involve more than one entity (e.g. person, role, and organization) in a more intuitive manner than standard RDF. However, RDF* has additional use cases, including:

Adding metadata to a relationship (For example: start dates and end dates for jobs, marriages, events, etc.)

Adding provenance information for triples: I have a triple that indicates Bess Schrader works for Enterprise Knowledge. When did I add this triple to my graph? What was the source of this information? Who added the information to the graph?

Conclusion

On its own, RDF provides an excellent way to create, combine, and share semantic information. Extending this framework with RDF* gives knowledge engineers more flexibility to model complex interactions between multiple entities, attach attributes to relationships, and store metadata about triples, helping us more accurately model the real world while improving our ability to understand and verify where data origins.

Looking for more information on RDF* and how you can leverage it to solve your data challenges? Contact Enterprise Knowledge.

The post RDF*: What is it and Why do I Need it? appeared first on Enterprise Knowledge.