The Top 5 Reasons for a Semantic Layer

February 14, 2024

Implementing Semantic Layers has become a critical strategic plan for many of our most advanced data clients. A Semantic Layer connects all organizational knowledge assets, including content items (files, videos, media, etc.) via a well defined and standardized semantic framework. If you are unfamiliar with Semantic Layers, read Lulit Tesfaye’s blog What is a Semantic Layer. It provides a great explanation of the Semantic Layer and how it can be implemented. There are a lot of great reasons organizations should implement Semantic Layers. My top five reasons are below.

Improved Findability and Confidence in Data

Improved Findability and confidence in data Data continues to grow at an alarming rate. Leaders want their organizations to be data-driven, but their direct reports need to know the data they require and have confidence in that data. A Semantic Layer helps with both of these issues. It uses a graph database and the metadata from your data catalog to offer a best-in-class search that returns data in the context of the business need. For example, if you are looking for all the data sets containing information about the average purchase price of a product, a graph-based search would have a result explaining what the purchase price is and then show all of the data sets that have purchase transactions with price information in them. Many of our retail clients have multiple data feeds from different purchasing systems. Showing all of this information together helps ensure that one of the feeds is not missed.

The information returned in this type of graph-based custom search is not limited to data sets. We have one client who uses the graph to capture the relationship between a dashboard, the dashboard objects, and the data tables that populate each component. Their graph-based search not only returns data sets, but also the dashboards and dashboard objects that display results. Their IT people use this to develop new dashboards with the correct data sets and their data scientists prioritize the data sets that power the dashboards they already use

Google has been using graph search for years. Now, this same technology is available in our data environments.

Enabling AI for Data

Enabling AI for data AI and ChatGPT are all over the news these days. It is a budget priority for every company executive I speak with. One of the most exciting use cases for Generative AI is the databot. Organizations that implement databots give their business users easy access to the metrics they need to do their job. Rather than trying to build dashboards that anticipate users’ needs, databots allow business users to ask questions of any level of complexity and get answers without knowing or understanding anything about the data behind the result. Software companies in the Semantic Layer are already showing demos of how business users can ask their data complicated natural language questions and get answers back.

Databots require integration with a Generative AI tool (LLM). This integration will not work without a Semantic Layer. The Semantic Layer, specifically the metadata, taxonomy, and graph framework, provides the context so that LLM tools can properly answer these data-specific questions with organizational context. The importance of the Semantic Layer has been proven in multiple studies. In one study, Juan Sequeda, Dean Allmegang, and Bryan Jacob of data.world produced a benchmark showing how knowledge graphs affect the accuracy of question answering against SQL databases. You can see the results of this study here. Their benchmark evaluated how LLMs answered both high complexity and low complexity questions on both high and low schema data sets. The results are below.

Low Complexity/Low Schema, knowledge graph accuracy was 71.1% while the SQL accuracy was 25.5%
High Complexity/Low Schema, knowledge graph accuracy was 66.9% while the SQL accuracy was 37.4%
Low Complexity/High Schema, knowledge graph accuracy was 35.7% while the SQL accuracy was 0%
High Complexity/High Schema, knowledge graph accuracy was 38.7% while the SQL accuracy was 0%

As these stats show, organizations implementing a Semantic Layer are better equipped to integrate with an LLM. One of the most striking results is that the schema is much less important than the availability of a knowledge graph in question response accuracy. If your organization is looking to integrate the use of LLMs into your data environment, a Semantic Layer is critical.

Reporting Across Data Domains

Reporting across data domains The Semantic Layer uses a combination of the semantic framework (metadata/ taxonomies/ontologies/knowledge graphs) to map data and related data tools to the entities that business users care about. This approach creates a flexible and more reliable way to manage data across different domains. It gives business users greater access to the information they need in a format that makes sense.

Reporting on metrics that cross data domains or systems continues to be challenging for large enterprises. Historically, these organizations have addressed this through complex ETL processes and rigid dashboards that attempt to align and aggregate the information for business users. This approach has several problems, including:

Slow or problematic ETL processes that erode trust in the information,
Over-reliance on a data expert to understand how the data comes together,
Problems with changing data over time, and
Lack of flexibility to answer new questions.

Implementing a Semantic Layer addresses each of these issues. Taxonomies provide a consistent way to categorize data across domains. The taxonomies are implemented as metadata in the data catalogs so business users and data owners can quickly find and align information across their current sources. The Knowledge Graph portion of the Semantic Layer maps data sets and data elements to business objects. These maps can be used to pull information back dynamically without the need for ETL processes. When an ETL process is required for performance purposes, how the data is related is defined in the graph and not in the head of your data developers. ETL routines can be developed against the knowledge graph rather than in code. As the data changes, the map can be updated so that the processes that use that data reflect the new changes immediately.

We developed a Semantic Layer for a retail client. Once it was in place, they could report on sales transactions from 6 different point-of-sale systems (each with a different format) in a way that used to be done using time-consuming and complicated ETL processes. They were also able to expand their reporting to show links between third-party sales, store sales, and supply chain issues in a single dashboard. This was impossible before the Semantic Layer was in place because they were overly reliant upon a small set of developers and dashboards that only addressed one domain at a time. Instead of constantly building and maintaining complex ETL routines that move data around, our client maps and defines the relationships in the graph and updates the graph or their metadata when changes occur. Business users are seeing more information than they ever have, and they have greater trust in what they are seeing.

Improved Data Governance

Improved Data Governance Data governance is critical to providing business users with data that they have confidence in for proper decision-making. The velocity and variety of today’s data environments makes controlling and managing that data seem almost impossible. Tools from the Semantic Layer are built to address the problem of scale and complexity organizations face. Data catalogs use metadata and built-in workflows to allow organizations to manage similar data sets in similar ways. They also provide data lineage information so that users know how data is used and what has been done to the data files over time. Metadata driven data catalogs give organizations a way to align similar data sets and a framework so that they can be managed collectively rather than individually.

In addition to data catalogs, ontologies and knowledge graphs can aid in enterprise data governance. Ontologies identify data elements representing the same thing from a business standpoint, even if they are from different source locations or have different field names. Tying similar data elements together in a machine-readable way allows the system to enforce a consistent set of rules automatically. For example, at a large financial institution we worked with, a knowledge graph linked all fields that represented the open date for an account. The customer was a bank with investment accounts, bank accounts, and credit card accounts. Because ontologies linked these fields as account open dates, we could implement constraints that ensured these fields are always filled out, use a standard date format, and have a date in a reasonable timeframe. The ability to automate constraints across many related fields, allows data administrators to scale their processes even as the data they are collecting continues to grow.

Stronger Security

Stronger Security The incremental growth of data has made controlling access to data sets (A.K.A. entitlements) more challenging than ever. Sensitive data, like HR data, must have limited access for those that need to know only. Licensed data could have contractual limitations as to the number of users and may not exist in your organization’s data lake. Often, data is combined from multiple sources. What are the security rules for those new data combinations? The number of permutations and rules as to who can see what across an organization’s data landscape is daunting.

The Semantic Layer improves the way data entitlements are managed using metadata. The metadata can define the source of the data (for licensed data) as well as the type of data so that sensitive data can be more easily found and flagged. Data administrators can use a data catalog to find licensed data and ensure proper access rules are in place. They can also find data about a sensitive topic, like salaries, and ensure that the proper security measures are in place. Data Lineage, a common feature in catalogs, can also help identify when a newly combined data set needs to be secured and who should see it. Catalogs have gone a long way to solve these security problems, but they are insufficient to solve the growing security challenges.

Knowledge graphs augment the information about data stored in data catalogs to provide greater insight and inference of data entitlements. Graphs map relationships across data and those relationships can be used to identify related data sets that need similar security rules. Because the graph’s relationships are machine-readable, implementation of many of these security rules can be automated. Graphs can also identify how and where data sets are used to identify potential security mismatches. For example, a graph can identify situations where data sets have different security requirements than the dashboards that display them. These situations can be automatically flagged and exposed to data administrators who can proactively align the security between the data and the dashboard.

In Conclusion

Layers are a natural evolution of the recognition that metadata is a first class citizen in the battle to get the right data to the right people at the right time. The combination of formal metadata and graphs gives data administrators and data users new ways to find, manage, and work with data.

Blog