Blog

Data Management Trends in 2022: Data Fabric v. Data Mesh v. DataOps? What is Right for Your Organization?

When we start most of our engagements with our clients, there are common and important challenges we’re typically presented with that help summarize the core needs that persist within the data management space today: 

  1. We don’t know what data we have and/or who owns it;
  2. We have not been successful at reaping the benefits of our data, even after we migrate it to a central location (for example, into a data lake or warehouse);
  3. We don’t have a quick way of knowing how many of our projects use or need similar data; 
  4. We are not sure of the number of data-centric or Artificial Intelligence (AI) initiatives that are taking place within our organization; and
  5. We don’t have the right solutions or skill sets to support enterprise AI or advanced data engineering efforts. 

These are fundamental challenges that have been confronting organizations for some time. Organizations need new approaches and solutions to avoid failure and achieve the results and benefits that they have been looking for. 

The good news is that the data and AI industry has made tremendous leaps in the past decades – especially supported by higher computing power, semantic solutions, and cloud technologies. As a result, the focus is now shifting to achieving economies of scale and alignment across the multiple data initiatives and needs that are quickly spinning up within pockets of the enterprise. The data trends we are seeing today exhibit this direction and center around approaches that enable reusability of data that is created for data-centric efforts, repeatability or delivery processes, and building the capacity to scale beyond a proof of concept or a business unit.

The data strategy trends we are seeing today are about aligning and achieving economies of scale across multiple data initiatives and needs that are brewing within pockets of an enterprise. This entails finding approaches that are repeatable, enable reusability of data that’s created for data-centric efforts, and build the capacity to scale beyond a proof of concept or a business unit.

Based on these experiences and what I see on our projects, there are three concepts that are trending in the data and enterprise AI community that leading organizations should be aware of.

Data Fabric 

A data fabric is a logical data architecture that serves as a data connection and knowledge layer. A data fabric enables data federation and virtualization of semantic labels or rules (e.g. taxonomies/business glossaries or ontologies) to capture and connect data based on business or domain meaning and value. A knowledge graph is a key component of a data fabric that serves as the data orchestration or discovery layer. The ultimate objective of this design concept is to aggregate and unify unstructured and structured data to connect data of all formats that is available for both humans and machines to understand.

A stacked diagram with Sources on the bottom, the Data Fabric Layer in the middle, and the Presentation Layer on top. The Sources include items such as a content management system, data lake / data warehouse, subscriptions, and external sources. The Data Fabric layer consists of an Integration/Processing Layer and a Storage and Metadata Layer. The Integration/Processing Layer contains APIs and ETL Pipelines; and the Storage and Metadata Layer contains items such as a metadata service, data catalog, taxonomy/ontology management tool, content storage, and a knowledge graph. The Presentation layer is made up of front-end application components such as search, research and analytics, recommendations / chatbots, and admin/governance. Conceptual Diagram for Data Fabric / Knowledge Graph layer

We have been working on multiple efforts where organizations are employing these data abstraction and delivery models through metadata and taxonomy unification, labeling or cataloging (data catalogs), data integration or orchestration (ETL/ELT pipelines), and data virtualization solutions to support multiple types of data users and applications. Great examples of the application and use cases of a data fabric for our clients include constructing Customer-360 and Enterprise-360 degree views, targeted content recommenders and chatbot applications that are powered by enterprise knowledge graphs.

A list of 4 scenarios that describe when data fabric is the best solution: 1) Your organization has highly interrelated data but is having challenges with unifying data from different locations, business units, and formats, 2) Lack of business context is hindering the results from your data efforts, 3) Your data changes frequently and you need a flexible morel and/or schema to keep it up to date, and 4) The results from your enterprise AI efforts and applications are a black-box or not explainable  

Data Mesh

Data Mesh is a distributed approach to enterprise data management focused on building business-focused data products. A working Data Mesh model ultimately enables decentralized data operations and a self-service infrastructure. This is achieved by employing tools or solutions that are based on a microservices architecture as well as open source linked data sources and software applications that decentralize data ownership and management. According to Zhamak Dehghani, the expert who coined the term and framework, key principles of Data Mesh include: 

  • Treating data as a product – The disruptive nature of a data mesh architecture empowers individual business units to own and manage their data and provide it to other departments and even customers as a product itself. 
  • Domain-driven ownership of data – Data are owned by those who create it, understand it and know how to manage it. This provides leadership and business users the ability to find, explore, create, label, merge, or add new data sources based on their use cases and needs in a globally governed environment. 
  • Self-serve data platforms – Business domain teams have self-serve platforms that abstract the complexity of data creation to intuitively create and consume data products.
  • Federated computational governance – The mesh is created based on interoperability standards and serves as an ecosystem where users can get value from aggregation and correlation of independent data products.

A large rectangle box contains Data Mesh elements: A data-as-a-product Layer and several domains. The domains feed into the data-as-a-product layer via ETL pipelines. The data-as-a-product layer contains several products labels “Product A,” “Product B,” “Product C,” etc. Above the Data Mesh rectangle, there is a Federated Data Access & Agile Data Governance layer containing interoperability standards and global governance. This Federated Data Access & Agile Data Governance layer feeds into the data-as-a-product-layer within the Data Mesh. Conceptual Diagram for a Data Mesh 

Data Mesh is not a single technology, system, or platform. Rather, it is a framework that facilitates the realization of the above-mentioned principles of data democratization. An example of this is a project where we are supporting one of the largest global digital marketing firms that has been collecting consumer data from hundreds of primary sources to build consumer marketing databases. We have been working with them to develop a distributed architecture that will ultimately allow them to deduplicate and standardize their consumer data of over 2 billion records into a clean and validated data product that can be used for reporting and marketing functions, customer services departments as well as a data-product to its retail affiliates.

3 scenarios in which Data Mesh is the best solution: 1) Your data engineers and consumers are struggling to collaborate effectively, 2) Your enterprise has challenges unifying and governing diverse data from multiple sources, 3) Your organization has core legacy data platforms / teams that are slowing down innovation efforts

DataOps

DataOps is a blanket term that synchronizes the people, process, data, technology and end-to-end value stream across the data development and processing lifecycle. It is the adoption of Agile frameworks along with software engineering development and operations (DevOps) practices for data analytics and machine learning programs to reduce data processing time and improve data quality at scale. Gartner calls this “XOps” (data, machine learning, model, and platform) to achieve efficiencies; and to ensure reliability, reusability and repeatability of data while reducing the duplication of technology and processes. In addition to embracing some of the aforementioned data trends and frameworks, successful DataOps organizations apply the following emerging data practices: 

  • Data Findability and Discoverability – Helping the organization discover and understand what data are available is the first step toward data democratization and access to the business value of data. Data discovery is one of the ultimate drivers for DataOps. DataOps facilitates this through access and governance control processes and data visualization solutions (like Tableau or PowerBI)  for business users to retrieve, analyze and curate the data they need. 
  • Agile Delivery – DataOps creates cross-functional, self-organizing teams that collaborate to build and deliver end-to-end data solutions including domain experts, product owners, data engineers and software engineering teams.
  • Governance and Scalability – Key forces behind DataOps are speed and quality with reliability in mind. Automation of data curation, governance, and labeling or cataloging, supported by shared data guidelines encourages teams across the organization to establish processes that produce and deliver data of high quality, in a standardized format that is easy to access and scale.

For one of our clients in the healthcare industry, our partnership with their DevOps team meant aligning and embedding DataOps practices in order to support consistent value delivery from their machine learning algorithms in the form of targeted content recommendations. 

In this scenario, our data solution combines a data fabric that is built on a knowledge graph with a DataOps approach that delivers personalized learning content as a microservice to their multiple applications and ultimately, to over 2 million of their respective customers. While their DevOps team is focused on optimizing and streaming feature builds and software delivery, the DataOps process focuses on automating the aggregation and enrichment of structured and unstructured data towards delivering value.

3 scenarios in which DataOps is the best solution: 1) There are multiple enterprise-wide data efforts that could benefit from each other, 2) Your data engineers and consumers are struggling to collaborate effectively, 3) There is a need to track data analytics efforts and products at scale to identify value of a given dataset towards enterprise level ROI.

Conclusion

These emerging practices are not mutually exclusive. Rather, each seeks to challenge the status quo in order to democratize, operationalize, and scale multiple data processing and development efforts and make data available for the whole organization. Additionally, with a common goal to move the business and domain owners closer to the data, these trends continue to demonstrate the shrinking gap between knowledge and data management within the enterprise. 

It is also worth noting that there is no single software application, stand-alone system or platform that provides such an environment or transformation. Adopting the relevant solution(s) for your organization requires a blend of architecture, technology, and alignment across people, process, culture, and the base technologies. 

How have you been navigating these challenges? Are you looking for help in realizing business value from these data trends and practices or are seeking guidance on your own organization’s transformation? Reach out to us and learn more.

Lulit Tesfaye Lulit Tesfaye is a Practice Lead for Data and Information Management strategy, design, and implementation. Lulit is driven by innovation in efficiency, user-centered design, and empowering organizations to smartly leverage their data and information assets along with the best today’s technologies have to offer. Lulit focuses on employing advanced capabilities such as Artificial Intelligence (AI) to ultimately deliver value. More from Lulit Tesfaye »