Why a Taxonomist Should Know SPARQL

April 1, 2020

EK Team

As the Knowledge and Information Management field moves towards adopting semantic technologies like ontologies and enterprise knowledge graphs, taxonomists and taxonomy managers need to know about W3C semantic web standards including RDF, SKOS, SPARQL because data is becoming more interconnected and complex, and we need to move beyond the traditional hierarchical taxonomy relationships in order to truly model our knowledge domains. Taxonomies can also use these standards to extend into ontologies, which increase the value of a taxonomist’s work by supporting AI initiatives and features. As my colleagues have defined in previous blogs, Ontologies are semantic data models that define the types of things that exist in our domain and the properties that can be used to describe them; Knowledge Graphs are the instantiation of our ontology models with real, live business data. EK recommends designing both of these using the W3C standards for interoperability which will be discussed in this blog. It is critical that Taxonomists and Taxonomy Managers become familiar with RDF, SKOS, and SPARQL as more and more taxonomies are being built and implemented using the underlying structure of RDF and SKOS. The top taxonomy management tools in this space are also built to support these semantic standards.

Knowing, and being able to leverage these semantic standards, will not only increase a Taxonomist or Taxonomy Manager’s ability to maintain and enhance taxonomy designs, but will also ensure that taxonomies are built to last as a source of truth for their domain and to serve as the building blocks to an ontology.

What a Taxonomist needs to know about RDF, SKOS, and SPARQL

The W3C, or World Wide Web Consortium, is an international standards organization that develops open standards to ensure the growth and longevity of the world wide web. Among these are the standards and recommendations for RDF, SKOS, and SPARQL. RDF stands for Resource Description Framework and is used to describe and model information for web resources or knowledge management systems. RDF consists of “triples” or statements that resemble a sentence. If we think back to elementary school English classes and sentence diagramming, we build sentences or triples that contain a subject, predicate, and object.

SKOS is built on RDF, and stands for Simple Knowledge Organization System and is another W3C recommendation for how taxonomies should be structured and represented.

SPARQL is pronounced “sparkle” and is a recursive acronym for “SPARQL Protocol and RDF Query Language”, which is a set of specifications from the W3C. SPARQL allows you to query one or more triples and return varied results based on the type of information we are looking for from our taxonomy or graph database. All that is needed to leverage SPARQL is 1) data that is represented in RDF format and (2) an endpoint inside an enterprise taxonomy/ontology management tool, or a publicly available endpoint like Wikidata.

What is the Value of RDF and SPARQL?

When metadata about concepts within a taxonomy is stored using RDF (Last modified date, created by, approval status, etc.) taxonomists can use SPARQL to interact with and ask questions about your taxonomy design in many different ways, including: to update the taxonomy, pull concrete values from the data, or even track changes for governance. A query could pull all concepts in draft status, or all concepts edited by a specific person in the last 30 days. We can also use SPARQL to explore our data by querying unknown relationships to discover new connections. We’ve received questions from clients that have prompted the need for SPARQL queries to do basic reporting on a taxonomy structure or to return a subset of the project data for updating another system. Consider if we only want a portion of the enterprise taxonomy that is used for the intranet and Content Management System (CMS) for the Digital Asset Management system (DAM). We can use a SPARQL query to pull only the concepts that live under a certain tree or broader concept to then import into the DAM.

The primary value of RDF is in the triples that allow us to make statements and connect different concepts beyond broader and narrower relationships, building a flexible and interoperable taxonomy. Specifically, RDF adds value in three main ways, all of which are related to the idea of use and reuse of information.

URIs (Uniform Resource Identifiers): do exactly what they sound like – identify resources with unique IDs without being specific to the resource’s location or use so that it can be reused.
Linked Open Data: Openly available data (triples) or models (taxonomies/ontologies) that can be sourced and used to enhance a custom taxonomy, or to negate the need to design a taxonomy that already exists (e.g. DBPedia)
Interoperability: The idea that by using semantic standards, all vocabularies or models built using those standards can be integrated and used with each other, and with other systems or applications.

Even though our business or enterprise taxonomies may be highly specific, internal vocabularies, we can still leverage RDF and SKOS to ensure interoperability behind our firewalls. Specifically, the use and reuse of the taxonomy in multiple systems so that all the systems, and all those users, are speaking the same language. This is also key for the development and implementation of knowledge graphs that will leverage RDF and SPARQL to pull information from the disparate systems together for greater usability.

What Kind of Information Can I Query?

When writing a SPARQL query you are typically saying “I want X information from Y data that meets Z conditions.” The conditions are written as triple patterns, which are similar to RDF triples but may include variables to add flexibility in how they match against the data. For example, if we have a taxonomy where all terms should have preferred labels in both English and French, and we need to get a list of terms from our taxonomy that still need French translations, we can use a SPARQL query using their scheme/top concept so that we can send terms to the appropriate SMEs to translate. This SPARQL query would follow the pattern above and ask “I want all concepts that are narrower terms of a Concept A that do not have a French prefLabel.” It might look like this:

Some SPARQL queries might be as simple as identifying how many concepts are under a parent concept in our taxonomy. Many taxonomy management tools will provide statistics on the total number of concepts within the taxonomy but may not provide those statistics at the remaining lower levels of the taxonomy hierarchy.

We have also used SPARQL to support the approval workflow and update process from the taxonomy management system to a second, custom application for tagging data. In this case, we needed a query that would return all the draft concepts and all their related triples (the information that makes up the concept) so the second application could be updated with the new concepts, leaving existing concepts as they were.

Conclusion

SKOS, RDF, and SPARQL work together to ensure interoperability and usability of your organization’s data and information by standardizing the way taxonomists design and manage taxonomies and streamlining the path toward ontologies and knowledge graphs. Leveraging what I’ve described in this blog, with the appropriate designs and implementations, can translate to Enterprise AI readiness for an organization and overall, better visibility and usage of your organization’s data and information.

Whether you are just beginning the process of designing a taxonomy, or are focused on implementation, semantic standards should be a primary consideration to ensure longevity, usability, and interoperability with many systems and tools. We are here to help you utilize these standards and implement them efficiently. Contact us.

Blog

Why a Taxonomist Should Know SPARQL

What a Taxonomist needs to know about RDF, SKOS, and SPARQL

What is the Value of RDF and SPARQL?

What Kind of Information Can I Query?

Conclusion