Over the last decade, many organizations went through expensive migrations – either moving data into a data lake, a data warehouse, a modern data stack, or to the cloud. Yet, the business problems that many are looking to solve through these transformation initiatives still persist, including:
- Related data is fragmented, and information is not accessible at the time of need, resulting in siloed decisions and missing holistic context;
- Business meaning and knowledge is lost, despite expensive migrations;
- Data teams are struggling to collaborate effectively with business, domain/content owners, and data consumers;
- Complex infrastructure and proprietary platforms make it hard to enable consistent or meaningful connections, resulting in vendor lock as well as compliance, security, and regulatory violations; and
- The pace and dynamism of data affects trust in and the integrity of evolving data, resulting in stifled automation and progress towards innovation and enterprise AI.
So, what is a semantic layer and how does it address these challenges? Back in 2020, I first discussed a Semantic Layer through a white paper I published, What is a Semantic Architecture and How do I Build One?. In 2021, Gartner dubbed it “a data fabric/data mesh architecture and key to modernizing enterprise data management.” As the field continues to evolve and technical capabilities advance with developments in data and AI solutions, I have broken down the definition based on this fast-paced industry maturity to reflect the latest developments.
What is a Semantic Layer?
A semantic layer is a standardized framework that organizes and abstracts organizational data (structured, unstructured, semi-structured) and serves as a data connector for data and knowledge. Larger than a data fabric, that is more focused on structured data, a semantic layer connects all organizational knowledge assets including content items, files, videos, media, etc. via a well defined and standardized semantic framework. It allows organizations to represent organizational knowledge and domain meaning to systems and applications, defining the relationship between content and data. Specifically, a semantic layer:
- Makes data available for both humans and machines to understand;
- Captures and connects content and data based on business or domain meaning and value;
- Aggregates and unifies unstructured and structured data to connect data of all formats; and
- Enables data federation and virtualization.
A Semantic Layer is the culmination of a noticeable shift in business focus and realization that business insights are not gained from having data physically co-located in one place (like a data lake) but by understanding the meaning of data within an organization’s context and how it is related.
What are the Components of a Semantic Layer?
A semantic layer is not a single platform or application, but rather the actualization of a semantic approach to solving business problems by managing data in a manner that is optimized for capturing business meaning and context – and designing it for end user experience. A scalable semantic layer includes one or more of the following components to build a viable solution framework for today’s enterprise.
The most effective way to make datasets easier to organize, understand, and manage is to enhance them with rich and descriptive data, i.e., metadata. Metadata plays a key role in a semantic layer by providing essential information and context about the underlying data. This includes establishing a shared approach to provide the organization with information about data sources, standardized data, relationships between data elements, security and access controls, versioning, lineage, data quality and governance measures, and other relevant details to drive efficient labeling and categorization.
2. Taxonomy & Information Architecture
Business taxonomies allow us to describe, align, and represent organizational vocabulary in a structured format (through hierarchies), complementing metadata by providing an additional layer of organization. Taxonomy plays a crucial role in a semantic layer by ensuring consistent naming conventions and classification standards, reducing ambiguity and promoting a shared understanding of business concepts. The primary use case for many of our clients is to design taxonomies to be cross-functional so they can be applied across different departments and business units and ultimately facilitate data discovery and exploration of shared data through faceting. As such, taxonomies and information architecture promote a standardized approach to information and data management/governance practices and provide structured business context for the semantic layer to keep up with evolving business environments, processes, and terminologies.
3. Business Glossary
One of my favorite quotes from Socrates is, “The beginning of wisdom is the definition of terms.” And yes, business alignment is all about semantics! This quote highlights the importance of clearly defining terms, a principle that aligns with the purpose of a business glossary in establishing shared meanings within a business context. Within our context of data and knowledge management, a business glossary would ideally be part of the ontology and/or taxonomy and aligns business with technical understanding and serves as one of the most common components of a semantic layer by facilitating effective communication across the organization, and its systems.
A flexible data model/schema structure shifts the focus of traditional/tabular data solutions from the data itself to the relationships of data elements and their meaning. The role of ontology in a semantic layer is to provide a formal representation of the knowledge and relationships within a specific domain or subject area. This includes the creation of entities, attributes, and relationships that reflect the business concepts. As such, ontology goes beyond taxonomy and metadata by capturing not only the hierarchical structure of data but also the semantics and meaning of the relationships between different data concepts. Much like how a blueprint defines the structure, relationships, and purpose of each room in a building, ontology provides a logical schema to define the structure, relationships, and meaning of data in a system, enabling a clear and organized understanding of data that is typically related but siloed.
5. Knowledge Graph
For specific use cases, a knowledge graph is created when business concepts and defined relationships from ontological schemas are applied to data/content. A knowledge graph plays a significant role in a semantic layer by representing information as interconnected entities and relationships, providing a structured and graph-based approach to knowledge representation.
A knowledge graph allows organizations to connect heterogeneous data sources by linking entities and relationships across different datasets and to store business rules and logic with data and to transform raw data into meaningful information. A knowledge graph further aligns well with the principles of Linked Data, where entities are connected through links, creating a web of interconnected data. Examples of use cases for graph creation include the need to traverse relationships, apply calculations, aggregations, and other connections or manipulations that align with the business requirements on raw data. For example, one of our clients, a leading firm in global private equity, created a knowledge portal based on a semantic layer that gives them access to all of the key information about their most important business assets, such as deals, investments, bankers, partners, and employees. From a single application, business leaders can see information about these important assets pulled from over 20 different sources (connected by a knowledge graph). A director in the firm can look up an investment to see how it is doing, then view the employee who worked on the original deal, and then see all the other deals they worked on from a single location. Information is organized, not by the systems from which they originate, but by the business asset that the director is viewing. Their leaders now have better access to information and a more natural way to see how their business is performing.
What are the Applications and Use Cases for a Semantic Layer?
The primary role of a semantic layer is to simplify the interaction between users and disparate data sources. Similar to how an index aggregates and streamlines the search for relevant content in a book, a semantic layer abstracts the underlying complexity of enterprise data using consistent, standardized, and well defined metadata, without the need to move or migrate physical data from its source. It addresses traditional data management challenges by providing a standardized representation of data elements, making it simpler for users across the organization to access and understand organization’s data regardless of type, size, location and/or department.
Solutions and Sample Architecture
The specific tools and solutions to architect a semantic layer depend on the organization’s requirements, data governance maturity, and technologies in use. Although the market is continuously evolving and there are many tools currently emerging that purport to provide a semantic layer, the following solutions that provide the capabilities to manage semantics and context make up the building blocks of a scalable semantic architecture. In most cases, we find that these solutions already exist in-house for a majority of the organizations we work with and only require the right architecture and data model to build a usable semantic layer.
- Metadata Service: A semantic layer requires a repository that allows for the standardization and federation of shared vs. specialized metadata. This includes tools for organizing, applying, and managing metadata, business glossaries, and data dictionaries. Specifically, enterprise data catalogs (e.g., data.world, Informatica, etc.) master data management (MDM) systems or content or data storages with solutions that ensure the consistency of key metadata across multiple repositories.
- Taxonomy/Ontology Management: Data modeling tools that define data structures and relationships including the design, management, and application of taxonomies, ontologies, and business glossaries. This includes tools that manage and scale data models based on semantic web frameworks (such as OWL, RDF, and SKOS) and hierarchical structures (e.g., Progress/Semaphore, PoolParty, Synaptica), ontology editors, structured data modeling and governance through SHACL (e.g., TopBraid EDG), and some Content Management Systems (CMS) with taxonomy/ontology capabilities (e.g., SharePoint Term Store, Drupal or WordPress with appropriate plugins).
- Graph Data Storage: Although not a requirement for every semantic solution use case, a graph database is a core tool for building a semantic layer to represent and manage complex relationships between data entities. It furnishes organizations with an ability to store data with semantics, context, and relationships and employs a flexible schema for use cases that require the understanding and analytics of relationships between data. Depending on the organizational use case, most commonly used graph databases include Labeled Property Graph (LPG) databases that model data as nodes, edges, and properties (primarily effective for graph analytics use cases, e.g., Neo4j); RDF (Resource Description Framework) databases, also known as triple stores, that model data using triples – subject-predicate-object databases (primarily effective for interoperability as they follow the standards of the W3C for representing linked data, e.g., GraphDB, Stardog); and in-memory or distributed databases that provide graph capabilities as a service (e.g., Microsoft Azure Cosmos DB – Graph API, AWS Neptune). One of the most common use cases for a graph database is record linkage and deduplication which relies on entity resolution capabilities and involves identifying and linking different representations or instances of the same real-world entity within the graph coming from multiple sources. While this used to be handled through general-purpose tools and libraries like Python, Apache Flink/Spark, there are now tools emerging to specifically handle this use case at scale (e.g., Senzing).
- Expressive Query Language: A query language or interface serves as a tool that allows users to interact with the semantic layer without needing to write complex queries. This is essential for retrieving interconnected data within a semantic layer. As with any solution architecture development, the choice of a query language depends on the underlying data model, the type of semantic layer (standards based vs. platform specific), and the specific requirements of the application or system interacting with the semantic layer. The most common types of query languages that our clients upskill on for interacting with a semantic layer include SPARQL (SPARQL Protocol and RDF standards-based query language), Cypher/Gremlin (used for property graphs), and GraphQL (query language and runtime for APIs).
- Abstracted Integrations & Data Flow: As an abstraction framework, a semantic layer relies on data integration and transformation tools to connect, unify, and transform data from various sources into a structured and semantically rich format. These include ETL (Extract, Transform, Load) tools (such as Airflow, Informatica PowerCenter, Talend, etc.), data virtualization and integration platforms (such as Denodo, Cisco Data Virtualization) and API management tools (such as MuleSoft). These integration pipelines typically exist within most enterprise architectures and do not require extra investments.
- Security Layer: A security layer is essential for maintaining the confidentiality, integrity, and availability of data within the semantic layer. Security measures implemented within a semantic layer should follow organizational protocols for entitlement management and provisioning management to control access to different data elements based on user roles and permissions. This ensures that users only see and interact with data relevant to their roles.
- End User Applications: As the core purpose of a semantic layer is to connect end users with knowledge and data, a successful layer should be able to power a variety of end user applications that enable users to interact with the semantic layer. Looking at our 35+ semantic layer engagements to date, the most common applications that we continue to integrate include search, conversational chatbots and natural language processing interfaces (NLPs), business intelligence (BI) and Analytics platforms, visualization dashboards, and recommendation engines.
The evolution and maturity of the semantic layer is a testament to its importance to knowledge and data management. As organizations take on more complex use cases and adopt AI initiatives, the idea of working within one monolithic platform is becoming a thing of the past. Enterprise solutions are looking for ways to abstract their data in a system/application agnostic way in order to be able to work with systems of today and anticipate the solutions of tomorrow.
As a result, a semantic layer is gaining more adoption and allowing organizations to create that shared standard and interoperability. Additionally, a semantic layer enriches data representation by modeling complex relationships and providing a powerful framework for understanding and exploring interconnected knowledge. It enhances the capabilities of knowledge and content management as well as business intelligence and analytics teams, supporting advanced data analysis, discovery, modeling and decision-making on connected data.
When embarking on a semantic layer initiative, not understanding or planning for one or all of the core components and solutions discussed here is what often stalls projects and creates challenges or points of failure for many organizations. If you are looking to get started and learn more about how other organizations are approaching scale – read more from our case studies or contact us if you have specific questions.