For a long time, keyword search was the predominant method to provide search to an enterprise application. In fact, it is still a tried-and-true means to help your users find what they are looking for within your content. However, semantic search has recently gained wider acceptance as a plausible alternative to keyword search. In this Expert Analysis blog, two of our senior consultants, Fernando Aguilar and Chris Marino, explain these different methods and provide guidance on when to choose one over the other.
What’s the difference between a keyword search system and a semantic search system?
Keyword Search (Chris Marino)
The heart of a keyword search system is a data structure called an “inverted index.” You can think of it as a two-column table. Each row in the table corresponds to a term found in your corpus of documents. One column contains the term, and the other column contains a list of all your documents (by ID) where that particular term appears. The process of filling up this table with the content in your documents is called “indexing.”
When a user performs a search in a keyword system, the search engine takes the words from their query and looks for an exact match in the inverted index. Then, it returns the list of matching documents. However, instead of returning them in random order, it applies a ranking (or scoring) algorithm to ensure that the more relevant documents appear first. This ranking algorithm is normally based on a couple of factors: “term frequency” (the number of times the terms appear in the document) and the rarity of the word across your entire corpus of documents. For example, if you search for “vacation policy” in your company’s documents, “vacation” most likely appears less frequently than “policy,” so those documents with “vacation” should have a higher score.
Semantic Search (Fernando Aguilar)
Semantic search, also known as vector search, is a type of search method that goes beyond traditional keyword-based search and attempts to understand the intent and meaning behind the user’s query. It uses natural language processing (NLP) and machine learning algorithms to analyze the context and relationships between words and concepts in a query, and to identify the most relevant results based on their semantic meaning. This approach is often used in applications such as chatbots, virtual assistants, and enterprise search to provide more accurate and personalized results to users.
In contrast to keyword search, which relies on matching specific keywords or phrases in documents or databases, semantic search is able to understand the underlying meaning of the query and identify related concepts, synonyms, and even ambiguous terms. This enables it to provide more comprehensive and relevant results, especially in cases where the user’s intent may not be well-defined or where multiple meanings are possible.
What are the Pros and Cons of using Keyword Search vs Semantic Search?
Keyword Search (Chris Marino)
Keyword search is a workhorse application that has been around for decades. This fact makes it a natural choice for many search solutions. It tends to be easier to implement because it’s a more familiar application. It’s been battle-tested, and there are a wealth of developers out there who know how to integrate it. As with many legacy systems, there are many thought pieces, ample documentation, pre-built components, and sample applications available via a Google search (or just ask ChatGPT).
Another benefit of keyword search is its interpretability – the ability for a user to understand why a certain result matched the query. You can easily see the terms you have searched for in your results. Although there is an algorithm performing the scoring ranking, a search developer can quickly discern why a certain result appeared before another and make tweaks to impact the algorithm. Conversely, the logic behind semantic search results is more of a “black box” variety. It’s not always readily apparent why a particular result was returned. This has a significant impact on overall user experience; when users understand why they’re getting a search result, they trust the system and feel more positively towards it.
The biggest drawback of keyword search is that it lacks the ability to determine the proper context of your searches. Instead of seeing your search terms as concepts or things, it sees them simply as strings of characters. Take for instance the following query:
“What do eagles eat?”
Keyword search processes and searches for each term individually. It has no concept that you are asking a question or that “what” and “do” are unimportant. Further, there are many different concepts known as “Eagles”: the bird-of-prey, the 70’s rock group, the Philadelphia football team, and the Boston College sports teams. While a person can surmise that you’re interested in the bird, keyword search is simply looking for any mention of the letter string: “e-a-g-l-e.”
Semantic Search (Fernando Aguilar)
Semantic search has gained popularity in recent years due to its ability to understand the intent and meaning behind the user’s query, resulting in more relevant and personalized results. However, not all use cases benefit from it. Understanding the advantages, limitations, and the trade-offs between semantic and keyword search can help you choose the best approach for your organization’s specific needs.
Pros:
- Semantic search makes search results more comprehensive and inclusive by identifying and matching term synonyms and variations.
- Vector search provides more relevant results by considering a query’s context, allowing it to differentiate between “Paris,” the location, and “Paris,” a person’s name. It also understands the relationship between its terms, such as part-of-speech (POS) tagging, and identifying different terms as verbs, adjectives, adverbs, or nouns.
- It enables the user to express their intent more accurately by allowing them to make queries using natural language phrases, synonyms, or variations of terms and misspellings, leading to a more user-friendly search experience.
Cons:
- Calculating similarity metrics to retrieve search results is computationally intensive. Optimization algorithms are generally needed to speed up the process. However, faster search times come at the cost of decreased accuracy.
- The search results can be less relevant if users are accustomed to searching using one or two-term queries instead of using search phrases. Therefore, it is essential to analyze current search patterns before implementing vector search.
- Pre-trained language models need to be fine-tuned to learn and understand the relationships between words in the context of your business domain. Fine-tuning a language model will improve the accuracy of the search results, but training is usually time-consuming and resource intensive.
How do the use cases for each type of search differ?
Keyword Search (Chris Marino)
In general, any search use case is a good case for keyword search. It has been around for many years and, when configured correctly, can provide solid results at a reasonable cost. However, there are a few use cases that are particularly well-suited for keyword search: academic and legal search, primarily by librarians. It’s been my experience that these types of searchers have very exact, complex queries. Characteristics of these queries might include:
- Exact phrase matching
- Multi-field searches (“show me documents with X in Field 1, Y in Field 2, Z in Field 3 …”)
- Heavy boolean searches (“show me this OR these AND those but NOT that”)
In these instances, the user needs to ensure and validate that each result matches their exact query. They are not looking for suggestions. Precision (“show me exactly what I asked for”) is more important than recall (“show me things I may be interested in but didn’t specifically request”).
Semantic Search (Fernando Aguilar)
The primary use case differentiator will be determined by how search users format their queries. Semantic search will prove best for users that submit search phrases where context, word relationships, and term variations are present versus searching for a couple of exact terms. Hence, beyond a search query, chatbots, virtual assistants, or customer service applications are great examples where users may be conversationally asking questions.
What are the cool features found in keyword search vs semantic search?
Keyword Search (Chris Marino)
There are a number of features that keyword search provides to improve a searcher’s overall experience. Some of the main ones include facets, phrase-searching, and snippets.
Facets
Facets are filters that let you refine your search results to only view items that are of particular interest to you based on a common characteristic. Think of the left-hand side of an Amazon search results page. They are based on the metadata associated with your documents, so the richer your metadata, the better options you can provide to your users. In an enterprise setting, common facets are geography-based ones (State, Country), enterprise-based ones (Department, Business Unit), and time-based ones (Published Date, Modified Date – whose values can even contain relative values like “Today”, “”, “Last 7 days”, “This Year”).
Phrase searching
Phrase searching allows you to find exact phrases in your documents, normally by including the phrase within quotation marks (“”). A search for “tuition reimbursement” will only return documents that match this exact phrase, and not documents that only mention “tuition” or “reimbursement” independent from one another.
Snippets
Snippets are small sections from your document which include your search terms and are displayed on the search results page. They show the search terms in the context of the overall document, e.g., the main sentence that contains the terms. This helps by providing a visual cue to help the searcher understand why this particular document appears. Normally, the search results page displays the title of the document, but often your search term does not appear in the title. By displaying the snippet, with the search term highlighted, the user feels validated that the search has returned relevant information. We refer to this as “information scent.”
Semantic Search (Fernando Aguilar)
Currently, semantic search is one of the most promising techniques for improving search and organizing information. While semantic methods have already proven effective in a variety of fields, such as computer vision and natural language processing, there are several cool features that make semantic search an exciting area to watch for enterprise search. Some examples include:
- Concurrent Multilingual Capabilities: Vector search can leverage multilingual language models to retrieve content regardless of the language of the content or the query itself.
- Text-to-Multimodal Search: Natural language queries can retrieve un-tagged video, image, or audio content, depending on the model used to create the vectors.
- Content Similarity Search: Semantic search can also take content as query input, so applications can retrieve similar content to the one the user is currently viewing.
Conclusion
If perfecting the relevancy of your search results isn’t directly tied to your organization’s revenue or mission achievement, keyword search provides an efficient, proven, and effective method for implementing search in your application. On the other hand, semantic search will be a better solution when the clients are using natural language to describe what they are looking for, when the content to be retrieved is not all text-based, or when an API (not a person) is consuming your search.
Check out some of our other thought leadership pieces on search:
5 Steps to Enhance Search with a Knowledge Graph
Dashboards – The Changing Face of Search
And if you are embarking on your own search project and need proven expertise to help guide you to success, contact us!