As EK works with our clients to integrate knowledge graphs into their technical ecosystems, client stakeholders often ask, “Why should we leverage knowledge graphs?” and more specifically, “Do we need a graph database?” Our consultants then collaborate with stakeholders to weigh the pros and cons of using a knowledge graph and graph database to solve their findability and discoverability use cases. In this blog, two of our senior technical consultants, Bess Schrader and James Midkiff, answer common questions about knowledge graphs and graph databases, focusing on how to best fit them into your organization’s ecosystem without overengineering the solution.
Why should I leverage knowledge graphs?
Knowledge graphs model information in the same way human beings think about information, making it easier to organize, store, and logically find data. This reduces the silos between technical users and business users, reduces the ambiguity about what data and information mean, and makes knowledge more sustainable and accessible. Knowledge graphs are the implementation of an ontology, a critical design component for understanding your organization.
Many graph databases also support inference, allowing you to explore previously uncaptured relationships in your data, based on logic developed in your ontology. This reasoning capability can be an incredibly powerful tool, helping you gain insights from your business logic.
Knowledge graphs are a concept, a way of thinking, and they aren’t always necessarily tied to a graph database. Even if you are against adopting a graph database, you should design an ontology for your organization’s data to visualize and align with how you and your colleagues think. Modeling your organization gives you the complete view and vision for how to best leverage your organization’s content. This vision is your knowledge graph, an innovative way for your organization to tackle the latest data problems. However, this ontology doesn’t have to be implemented in a graph database. The technical implementation should be built using technologies that efficiently support the use cases and are easy to maintain.
Does my use case require a graph database?
Any organization that wants to map its internal data to external data would benefit from a graph. If your use case includes publishing your data and connecting to other data sets, a knowledge graph and graph database (particularly one that uses the Resource Description Framework, or RDF) are the way to go to ensure the data is flexible and interoperable. Even if you do not intend to connect and/or publish data, storing robust definitions alongside data in a graph is one of the best ways to ensure that the meaning behind fields is not lost. With the addition of RDF*, the expressivity of a graph to describe organizational data is unmatched by other data formats.
When your ontology and instance data are all in the same place (a graph database), technical and non-technical users alike can always determine what a given field is supposed to mean. This ensures that your data is sustainable and maintainable. For example, many organizations use acronyms or abbreviations when setting up relational databases or nested data structures like JSON or XML. However, the definition and usage notes for these fields are often not included alongside the data itself. This leads to situations where data consumers and developers may find, for example, a field called “pqf” in a SQL table or JSON file created several years ago by a former employee. If no one at the organization knows what “pqf” means or what downstream systems might be using this field, this data becomes an unusable maintenance burden.
However, using well-formed ontologies and RDF knowledge graphs, this property “pqf” would be a “first-class citizen” with its own properties, including a label (“Prequalified for Funding”) and definition (“This field indicates whether a customer has been prequalified for a financial product. A value of ‘true’ indicates that the customer has been preapproved”), explaining what the property means and how it should be used. This reduces ambiguity and confusion for both developers and data consumers.
A majority of knowledge graph use cases involve information discovery and search. Graph databases are flexible, allowing you to easily adapt the model as new data and use cases are considered. Additionally, graphs make it painless to aggregate data from separate data sources and combine the data to create a single view of an employee, topic, or other important entity at your organization. Below are some questions to ask when faced with this question.
- Does the use case require data model flexibility and are the use cases going to adapt or change over time?
- Do you need to combine data from multiple sources into a single view?
- Do you need to be able to search for multiple types of information simultaneously?
If you answer yes to any of the above, graph databases are a good solution. Some use cases do not require cross-entity examination (i.e. asking questions across relationships) or are not easily calculated in a graph. In these cases, you should not invest in learning and implementing a graph database. As an alternative, you can create a dynamic model inside of a NoSQL database and provide search functionality via a search engine. You can also do network-based and machine learning calculations in your programming language of choice after a small data transformation. As stated previously, implementations should be largely driven by the use cases they are supporting and will support in the future.
I’m nervous about migrating to a new data format. Why should my team learn about and invest in graph database technologies?
In addition to the advantages described above, one major benefit of using RDF-compliant graph databases is that they’re based on standards that have been maintained by the W3C for over two decades. These standards, including RDF and SPARQL, were developed over 20 years ago to promote long-term growth for the web. In other words, RDF is not a trendy new format that may disappear in five years, and you can be confident when investing in learning about this technology. The use of standards provides freedom from proprietary vendor tools, enabling you to effortlessly create, move, integrate, and share your data between different standards-compliant software. Using semantic web standards also enables you to seamlessly connect your content and data to a taxonomy (whether internal or external), as most taxonomies are created and stored in an RDF format.
Similarly, SPARQL, the RDF query language, is based on pattern matching and can be easier to learn for non-technical users than more complex programming languages. SPARQL also allows for federated queries, which enable a user to query across multiple knowledge graphs stored in different graph databases, as long as the databases are RDF-compliant. Using federated queries, you could query your own data (e.g. a dataset of articles discussing the stock market and finances) in combination knowledge graphs like with Wikidata, a free and openly accessible RDF knowledge graph used by Wikipedia. This would allow you to take any article that mentions a stock ticker symbol, follow that symbol to the Wikidata entry, and pull back the size and industry of the organization to which the ticker refers. You could then leverage this information to filter your articles by industry or company size, without needing to gather that information yourself. In other words, federated queries allow you to query beyond the bounds of your own organization’s knowledge.
Many organizations do not need to externally share the knowledge graph data they create. The data may support externally-facing use cases, like chatbots, search, and knowledge panels, and this is usually more than sufficient to meet an organization’s knowledge graph needs. Taxonomies can be transformed and imported into any relational or NoSQL database in a similar manner that we use to translate all other data formats into RDF when building a graph. While graph databases can make this connection more seamless, they are by no means the only way to implement a taxonomy. Relational and NoSQL databases are more commonly used, making it easier to find the necessary skill sets to implement and maintain them. With so many developers used to query languages like SQL, the pattern-based nature of SPARQL can be difficult for developers to learn and adopt.
To be clear, graph databases are an investment. They’re a shift in how we approach and integrate data, which can lead to some adoption costs. However, they can also bring advantages to an organization in addition to what Bess mentioned above.
- Comprehensive, Connected Data – Graphs provide descriptive data models and the ability to query and combine multiple graphs together painlessly, without requiring the join tables, intermediary schemas, or rules often required by relational databases.
- Extendable Foundation – Knowledge graphs and graph databases enable the reuse of existing information as well as the flexibility to add more types of data, properties, and relationships to implement new use cases with minimal effort.
- Lower Costs – The upfront investment (licensing fees, the cost of migrating data, and the cost of hiring or growing the appropriate skill sets) can balance out in the long term given the flexibility to adapt the data model with evolving data and use cases.
Graph technologies are important to consider when building for the future and scale of data at your organization.
Like any major data architecture component, graph databases have their fair share of both pros and cons, and the choice to use them will ultimately come down to what fits the needs of each organization. If you’re looking for help in determining whether a graph database is a good fit for your organization, contact us.