In my previous blog, I wrote about building a taxonomy for two foundational use cases – findability and navigation – that we commonly design for at EK. This blog will focus on two advanced, yet still common, taxonomy use cases. Both of these more advanced use cases have more complex requirements and factors to consider during the design process, as I will dive into below.
Developing a taxonomy for an auto-tagging use case brings us to the more advanced applications of taxonomy. Auto-tagging refers to the advanced application of taxonomy in which terms are automatically applied to content as tags through text recognition, inheritance, or other automated means. This process is important because, if implemented and iterated upon correctly, it can save SMEs or content taggers time as they will not have to manually apply tags to the content. For example, let’s say that you would like to automatically apply your enterprise topical taxonomy to each piece of content in your Knowledge Base through the use of an auto-tagging tool, such as a taxonomy management system. When you design a taxonomy for auto-tagging, your main consideration should be that the taxonomy is designed for a machine as the end user, instead of a human. For the taxonomy to be leveraged by an auto-tagging tool, there are very different design requirements.
Current auto-tagging tools and capabilities are often limited by the text found within the content item being tagged. Whereas manual tagging performs best at determining the “aboutness” of content because of the subject matter expertise of human taggers, auto-tagging tools are not able to read between the lines or identify true “aboutness” like a human can. Auto-tagging parses through the content it is given and uses text recognition and context to determine the subject of the text and then applies tags based on terms from the taxonomy. Since determining the “aboutness” of content is often the biggest challenge of an auto-tagging tool, it is essential that the taxonomy is designed to best support the machine in finding key topics. Thus, a taxonomist should design a topical taxonomy for this use case.
The topical taxonomy should represent what the content is about in a way that reflects the subjects directly used in the text. The taxonomy should match the granularity of the content, and get into the details of what is presented in the content. With a taxonomy management system, the more granular child concepts can trigger the broader tag of a parent concept, demonstrating the importance of ensuring a detailed taxonomy. If the auto-tagging tool recognized and determined that the content was about a “van,” for example, the parent concept, “automobile,” could also be correctly identified.
A significant consideration when designing a taxonomy for auto-tagging is that, in order for the auto-tagging tool to best succeed, the taxonomy terms need to be explicitly mentioned in the text. Due to the granularity and consistency needed, bottom-up taxonomy design (as referenced in this blog), or analyzing the content itself when developing the taxonomy, is the most important part of the design process. The topics of your content items should help form the basis of your taxonomy. Analyze the content, determining the topic of each piece and consulting SMEs to accurately represent the language. Additionally, conduct a corpus analysis. A corpus analysis is an examination of the words that are both most commonly used and of most significance in a set of content items, and can be conducted by most taxonomy management tools on the market. You should seek to include many of the terms surfaced by corpus analysis in your taxonomy through terms and synonyms, as they reflect what is in the content itself.
And, I’ll bring it up again for this use case, too – don’t underestimate the power of alternative labels, or synonyms! Synonyms are your friends when designing a taxonomy to use for auto-tagging. The more relevant and accurate synonyms that are applied to taxonomy terms, the more likely the auto-tagging tool will be able to correctly parse through the text and recognize what the content is about. Of course, though, you should be careful to use synonyms correctly – for example, don’t use repetitive synonyms for multiple terms. For example, even if “vehicle” can oftentimes be synonymous with “car,” you should not include it as a synonym for “car” if “vehicle” is contained in the taxonomy as a broader term. This would be repetitive and ineffective.
Ontology / Graph Applications
The next advanced use case for a taxonomy is an ontology or graph application use case, such as, for example, a recommendation engine that suggests training courses to education professionals based on their profiles or a chatbot that allows employees of a consulting firm to request documents using natural language queries. For these use cases, modeling both information and relationships effectively is of the highest importance.
When developing your taxonomy for an ontology/graph use case, you should begin the analysis and design stages by thinking about the main questions that will be asked of the ontology/graph application. If the application is going to be a customer-facing chatbot, you should think through the questions that a customer will frequently ask, and seek to frame your taxonomy around the topics of these questions. In this way, a topical taxonomy is essential for an ontology/graph use case, as the topical taxonomy encompasses what the content is about, and the ontology will tie the topics together through the relationships that connect them to each other, and to other key business concepts, like “customer” or “product.”
Similar to the auto-tagging use case, you should conduct various types of analyses to best source terms for your taxonomy. Two of these include content and corpus analyses, which will allow your taxonomy to reflect the content and data stored in the knowledge graph, thus ensuring that the graph application will better understand the content. If your graph application relates to search, you should also conduct a keyword analysis to determine the most popular terms users search for. These popularly used terms should also be included in your topical taxonomy. In this way, your taxonomy will be equipped to support users’ search habits.
Additionally, consider leveraging linked open data to reflect vocabulary and terms used in the industry that are relevant in your topical taxonomy too. In order for your ontology or graph application to be most useful, it is important that your topical taxonomy model not only your company’s information, but also similar information relevant to your industry so that your graph application can search and extend beyond your individual content items.
Keeping ontology design in mind as you design your taxonomy is another opportunity to advance your taxonomy, as you can leverage the higher level concepts in your taxonomy for ontology classes, and the more specific taxonomy terms and synonyms as attributes or relationships. For example, the higher level taxonomy concept of “automobile” may become an ontology class, and “vehicle type” may become an attribute to incorporate the types of vehicles described in the child concepts of the “automobile” concept. One of my colleagues writes further about moving from a taxonomy to an ontology in the blog, “From Taxonomy to Ontology.”
As is evident from the various design considerations for the above two advanced use cases for a taxonomy, and the two foundational use cases described in my last blog, determining your use case before starting to design a taxonomy is essential for the success of the project. As taxonomists, we must also consider and plan for the possibility that a taxonomy may have multiple use cases that may even overlap and contradict each other from a design perspective, ultimately affecting the overall complexity of the taxonomy design.
Do you need help determining the use cases for a successful taxonomy design? Let us help. Contact us at [email protected] to get in touch.