Top Ways to Get Your Content and Data Ready for AI

As artificial intelligence has quickly moved from science fiction, to pervasive internet reality, and now to standard corporate solutions, we consistently get the question, “How do I ensure my organization’s content and data are ready for AI?” Pointing your organization’s new AI solutions at the “right” content and data are critical to AI success and adoption, and failing to do so can quickly derail your AI initiatives.  

Though the world is enthralled with the myriad of public AI solutions, many organizations struggle to make the leap to reliable AI within their organizations. A recent MIT report, “The GenAI Divide,” reveals a concerning truth: despite significant investments in AI, 95% of organizations are not seeing any benefits from their AI investments. 

One of the core impediments to achieving AI within your own organization is poor-quality content and data. Without the proper foundation of high-quality content and data, any AI solution will be rife with ‘hallucinations’ and errors. This will expose organizations to unacceptable risks, as AI tools may deliver incorrect or outdated information, leading to dangerous and costly outcomes. This is also why tools that perform well as demos fail to make the jump to production.  Even the most advanced AI won’t deliver acceptable results if an organization has not prepared their content and data.

This blog outlines seven top ways to ensure your content and data are AI-ready. With the right preparation and investment, your organization can successfully implement the latest AI technologies and deliver trustworthy, complete results.

1) Understand What You Mean by “Content” and/or “Data” (Knowledge Asset Definition)

While it seems obvious, the first step to ensuring your content and data are AI-ready is to clearly define what “content” and “data” mean within your organization. Many organizations use these terms interchangeably, while others use one as a parent term of the other. This obviously leads to a great deal of confusion. 

Leveraging the traditional definitions, we define content as unstructured information (ranging from files and documents to blocks of intranet text), and data as structured information (namely the rows and columns in databases and other applications like Customer Relationship Management systems, People Management systems, and Product Information Management systems). You are wasting the potential of AI if you’re not seeking to apply your AI to both content and data, giving end users complete and comprehensive information. In fact, we encourage organizations to think even more broadly, going beyond just content and data to consider all the organizational assets that can be leveraged by AI.

We’ve coined the term knowledge assets to express this. Knowledge assets comprise all the information and expertise an organization can use to create value. This includes not only content and data, but also the expertise of employees, business processes, facilities, equipment, and products. This manner of thinking quickly breaks down artificial silos within organizations, getting you to consider your assets collectively, rather than by type. Moving forward in this article, we’ll use the term knowledge assets in lieu of content and data to reinforce this point. Put simply and directly, each of the below steps to getting your content and data AI-ready should be considered from an enterprise perspective of knowledge assets, so rather than discretely developing content governance and data governance, you should define a comprehensive approach to knowledge asset governance. This approach will not only help you achieve AI-readiness, it will also help your organization to remove silos and redundancies in order to maximize enterprise efficiency and alignment.

knowledge asset zoom in 1

2) Ensure Quality (Asset Cleanup)

We’ve found that most organizations are maintaining approximately 60-80% more information than they should, and in many cases, may not even be aware of what they still have. That means that four out of five knowledge assets are old, outdated, duplicate, or near-duplicate. 

There are many costs to this over-retention before even considering AI, including the administrative burden of maintaining this 80% (including the cost and environmental impact of unnecessary server storage), and the usability and findability cost to the organization’s end users when they go through obsolete knowledge assets.

The AI cost becomes even higher for several reasons. First, AI typically “white labels” the knowledge assets it finds. If a human were to find an old and outdated policy, they may recognize the old corporate branding on it, or note the date from several years ago on it, but when AI leverages the information within that knowledge asset and resurfaces it, it looks new and the contextual clues are lost.

Next, we have to consider the old adage of “garbage in, garbage out.” Incorrect knowledge assets fed to an AI tool will result in incorrect results, also known as hallucinations. While prompt engineering can be used to try to avoid these conflicts and, potentially even errors, the only surefire guarantee to avoid this issue is to ensure the accuracy of the original knowledge assets, or at least the vast majority of it.

Many AI models also struggle with near-duplicate “knowledge assets,” unable to discern which version is trusted. Consider your organization’s version control issues, working documents, data modeled with different assumptions, and iterations of large deliverables and reports that are all currently stored. Knowledge assets may go through countless iterations, and most of the time, all of these versions are saved. When ingested by AI, multiple versions present potential confusion and conflict, especially when these versions didn’t simply build on each other but were edited to improve findings or recommendations. Each of these, in every case, is an opportunity for AI to fail your organization.

Finally, this would also be the point at which you consider restructuring your assets for improved readability (both by humans and machines). This could include formatting (to lower cognitive lift and improve consistency) from a human perspective. For both humans and AI, this could also mean adding text and tags to better describe images and other non-text-based elements. From an AI perspective, in longer and more complex assets, proximity and order can have a negative impact on precision, so this could include restructuring documents to make them more linear, chronological, or topically aligned. This is not necessary or even important for all types of assets, but remains an important consideration especially for text-based and longer types of assets.

knowledge asset zoom in 2

3) Fill Gaps (Tacit Knowledge Capture)

The next step to ensure AI readiness is to identify your gaps. At this point, you should be looking at your AI use cases and considering the questions you want AI to answer. In many cases, your current repositories of knowledge assets will not have all of the information necessary to answer those questions completely, especially in a structured, machine-readable format. This presents a risk itself, especially if the AI solution is unaware that it lacks the complete range of knowledge assets necessary and portrays incomplete or limited answers as definitive. 

Filling gaps in knowledge assets is extremely difficult. The first step is to identify what is missing. To invoke another old adage, organizations have long worried they “don’t know what they don’t know,” meaning they lack the organizational maturity to identify gaps in their own knowledge. This becomes a major challenge when proactively seeking to arm an AI solution with all the knowledge assets necessary to deliver complete and accurate answers. The good news, however, is that the process of getting knowledge assets AI-ready helps to identify gaps. In the next two sections, we cover semantic design and tagging. These steps, among others, can identify where there appears to be missing knowledge assets. In addition, given the iterative nature of designing and deploying AI solutions, the inability of AI to answer a question can trigger gap filling, as we cover later. 

Of course, once you’ve identified the gaps, the real challenge begins, in that the organization must then generate new knowledge assets (or locate “hidden” assets) to fill those gaps. There are many techniques for this, ranging from tacit knowledge capture, to content inventories, all of which collectively can help an organization move from AI to Knowledge Intelligence (KI).    

knowledge asset zoom in 3

4) Add Structure and Context (Semantic Components)

Once the knowledge assets have been cleansed and gaps have been filled, the next step in the process is to structure them so that they can be related to each other correctly, with the appropriate context and meaning. This requires the use of semantic components, specifically, taxonomies and ontologies. Taxonomies deliver meaning and structure, helping AI to understand queries from users, relate knowledge assets based on the relationships between the words and phrases used within them, and leverage context to properly interpret synonyms and other “close” terms. Taxonomies can also house glossaries that further define words and phrases that AI can leverage in the generation of results.

Though often confused or conflated with taxonomies, ontologies deliver a much more advanced type of knowledge organization, which is both complementary to taxonomies and unique. Ontologies focus on defining relationships between knowledge assets and the systems that house them, enabling AI to make inferences. For instance:

<Person> works at <Company>

<Zach Wahl> works at <Enterprise Knowledge>

<Company> is expert in <Topic>

<Enterprise Knowledge> is expert in <AI Readiness>

From this, a simple inference based on structured logic can be made, which is that the person who works at the company is an expert in the topic: Zach Wahl is an expert in AI Readiness. More detailed ontologies can quickly fuel more complex inferences, allowing an organization’s AI solutions to connect disparate knowledge assets within an organization. In this way, ontologies enable AI solutions to traverse knowledge assets, more accurately make “assumptions,” and deliver more complete and cohesive answers. 

Collectively, you can consider these semantic components as an organizational map of what it does, who does it, and how. Semantic components can show an AI how to get where you want it to go without getting lost or taking wrong turns.

5) Semantic Model Application (Tagging)

Of course, it is not sufficient simply to design the semantic components; you must complete the process by applying them to your knowledge assets. If the semantic components are the map, applying semantic components as metadata is the GPS that allows you to use it easily and intuitively. This step is commonly a stumbling block for organizations, and again is why we are discussing knowledge assets rather than discrete areas like content and data. To best achieve AI readiness, all of your knowledge assets, regardless of their state (structured, unstructured, semi-structured, etc), must have consistent metadata applied against them. 

When applied properly, this consistent metadata becomes an additional layer of meaning and context for AI to leverage in pursuit of complete and correct answers. With the latest updates to leading taxonomy and ontology management systems, the process of automatically applying metadata or storing relationships between knowledge assets in metadata graphs is vastly improved, though still requires a human in the loop to ensure accuracy. Even so, what used to be a major hurdle in metadata application initiatives is much simpler than it used to be.

knowledge asset zoom in 4

6) Address Access and Security (Unified Entitlements)

What happens when you finally deliver what your organization has been seeking, and give it the ability to collectively and completely serve their end users the knowledge assets they’ve been seeking? If this step is skipped, the answer is calamity. One of the express points of the value of AI is that it can uncover hidden gems in knowledge assets, make connections humans typically can’t, and combine disparate sources to build new knowledge assets and new answers within them. This is incredibly exciting, but also presents a massive organizational risk.

At present, many organizations have an incomplete or actually poor model for entitlements, or ensuring the right people see the right assets, and the wrong people do not. We consistently discover highly sensitive knowledge assets in various forms on organizational systems that should be secured but are not. Some of this takes the form of a discrete document, or a row of data in an application, which is surprisingly common but relatively easy to address. Even more of it is only visible when you take an enterprise view of an organization. 

For instance, Database A might contain anonymized health information about employees for insurance reporting purposes but maps to discrete unique identifiers. File B includes a table of those unique identifiers mapped against employee demographics. Application C houses the actual employee names and titles for the organizational chart, but also includes their unique identifier as a hidden field. The vast majority of humans would never find this connection, but AI is designed to do so and will unabashedly generate a massive lawsuit for your organization if you’re not careful.

If you have security and entitlement issues with your existing systems (and trust me, you do), AI will inadvertently discover them, connect the dots, and surface knowledge assets and connections between them that could be truly calamitous for your organization. Any AI readiness effort must confront this challenge, before your AI solutions shine a light on your existing security and entitlements issues.

knowledge asset zoom in 5

7) Maintain Quality While Iteratively Improving (Governance)

Steps one through six describe how to get your knowledge assets ready for AI, but the final step gets your organization ready for AI. With a massive investment in both getting your knowledge assets in the right state for AI and in  the AI solution itself, the final step is to ensure ongoing quality of both. Mature organizations will invest in a core team to ensure knowledge assets go from AI-ready to AI-mature, including:

  • Maintaining and enforcing the core tenets to ensure knowledge assets stay up-to-date and AI solutions are looking at trusted assets only;
  • Reacting to hallucinations and unanswerable questions to fill gaps in knowledge assets; 
  • Tuning the semantic components to stay up to date with organizational changes.

The most mature organizations, those wishing to become AI-Powered organizations, will look first to their knowledge assets as the key building block to drive success. Those organizations will seek ROCK (Relevant, Organizationally Contextualized, Complete, and Knowledge-Centric) knowledge assets as the first line to delivering Enterprise AI that can be truly transformative for the organization. 

If you’re seeking help to ensure your knowledge assets are AI-Ready, contact us at info@enterprise-knowledge.com

Sara Mae O'Brien-Scott Sara Mae O'Brien-Scott is the Practice Lead for Semantic Design and Modeling at EK. She specializes in Knowledge and Semantic Engineering, and possesses extensive experience in metadata, taxonomy, ontology design, and knowledge graph implementations. Her expertise enables organizations to better leverage their information and knowledge assets, enhancing decision-making, and improving efficiency. More from Sara Mae O'Brien-Scott »
Zach Wahl Zach Wahl is an expert in knowledge and information management strategy, content strategy, and taxonomy design. He is passionate about forming and supporting high-functioning teams and facilitating results-focused outcomes with his clients. More from Zach Wahl »