Expert Analysis: Keyword Search vs Semantic Search – Part Two

In Part 1 of this series from two of our senior consultants, Fernando Islas and Chris Marino, the focus was on the differences between Keyword Search and Semantic Search. In Part 2 of this Expert Analysis blog, Islas and Marino are back to focus on the different tools available for each method, as well as discuss what it takes to move them from the design phase to a full-blown production system. They also offer some advice on how to realize the benefits of each in the same solution. 

Note: We recognize that in the industry, semantic search is used interchangeably for any search method that can infer the users’ intent or context from their queries, beyond keywords. In this blog, we will be specifically discussing vector search.

What are the different tools available for Keyword Search and Vector Search?

Keyword (Chris Marino)

There are many tools available to choose from when implementing keyword search, both open-source (or “free and open”) and proprietary. These distinct options offer the classic choice in software development – “build vs. buy.”  

Do you prefer the flexibility of building your search solution from the ground up while investing in developer time and resources? Or, do you prefer to buy a solution which you can start quickly and leverage many built-in features, though at the cost of a substantial subscription? 

The main benefit of choosing the “build” option is that you have absolute flexibility in how you design and build your search system. You control all aspects of the solution – ingesting content, structuring your search index, developing your Search UI – and can tailor this to your exact specifications. The downside of this approach is that you have to account for additional time and resources – you are literally building your search solution from the ground up.

There are many excellent options for the build option, including:

  • Elasticsearch – most widely-known solution which comes in many different flavors tailored to your needs
  • Solr – established open-source tool, though not quite as popular
  • OpenSearch – offering from Amazon, derived from Elasticsearch

On the buy side, there are a vast number of tools that can be purchased on a subscription basis to provide excellent search capability for your enterprise. These full-featured applications come equipped with the functionality you expect from a enterprise search solution: 

  • Connectors to integrate with your internal systems for indexing content 
  • Standard UI for displaying search results 
  • Pre-built suite of search features including faceting, auto-completion, and suggestions

The drawbacks to these solutions are that the subscription prices are quite steep and you lack the flexibility to design these systems exactly as your organization needs. 

Some of the most popular tools include:

  • Squirro
  • Elastic Cloud (subscription-based Elasticsearch environment)
  • Sinequa
  • Mindbreeze
  • Glean
  • Algolia
  • Coveo
  • Lucidworks

A logical third category is simply using Microsoft as a solution, which offers its own proprietary set of search tools. If your focus is solely on search within the Microsoft ecosystem, it makes sense to leverage their search offerings. However, trying to integrate external content into the Microsoft search experience can be very challenging. 

Vector (Fernando Islas)

Opposite to keyword search, which relies on sparse vectors (derived from the corpus vocabulary), vector search relies on dense vectors, encodings produced by a language model to capture the semantic meaning and context of the text. However, they are both vectors, and as such, existing keyword search engines have been adding this functionality to their features, so it is common to find tools and vendors available for both search types.

We can find tools such as Elasticsearch, Vectorsearch, Weaviate, Pinecone, Milvus, and Faiss among the most popular in the vendor landscape. Microsoft and AWS also have their vector search offerings: in Microsoft, vector search within Azure Cognitive Search and vector search on embeddings in Azure Cosmos DB for MongoDB vCore, and Amazon OpenSearch Service with vector search collections in AWS.

When selecting a vector search solution, factors such as pricing, security, user-friendliness, content type (e.g., text, images, media), scalability requirements, and ease of integration into existing infrastructure should be carefully evaluated. Additionally, the solution’s efficiency, relevance ranking capabilities, and the availability of support and maintenance should be assessed to ensure it aligns with their specific use case and long-term goals.

What are the steps from development to deployment for Keyword Search and Vector Search?

Keyword (Chris Marino)

Developing and deploying a search solution is straightforward and fits well within an iterative process. One of the benefits to this approach is that you follow a set of repeatable steps per source system. As you proceed with additional systems, you become more familiar with the process and adept at executing it.

The steps include:

  • Analyzing the content in your source system to know what type of information it holds (structured, semi-structured, unstructured) and accounting for any security considerations like permissions and access controls.
  • Setting up your connectors which control your indexing routines to access the data from the source system, aggregate it, and index it into your search engine.
  • Configuring your search engine, including mapping your fields correctly, storing the content in the most efficient manner for querying, and accounting for your security requirements. 
  • Developing your Search UI which revolves around your search result pages – the look and feel of the results, the incorporation of action-oriented results, faceting and other rich search features.
  • Testing for performance and relevancy to ensure that users are getting the results they expect from queries and that the system responds in a timely, intuitive manner.
  • Iterating by incorporating user feedback and making modifications in any of the previous steps to improve the overall experience.

As a general rule, we estimate that it takes 2-4 weeks per source system depending on its complexity which is affected by items such as permissions and access controls, volume of data, and disparity of data.

Vector (Fernando Islas)

Every search initiative should start with investigating the organization’s current search landscape, including analytics on users’ queries, the expected content to be retrieved, the content’s relevance, and the users’ interaction with the search platform. This analysis will surface the content types, users, use cases, and teams or divisions that would benefit the most from introducing vector search. With this in mind, a typical pipeline for vector search instantiation would include the following:

  • Data Preparation and Processing: Begin by locating and identifying all the content that needs to be indexed in the search engine and ensure it’s properly cleaned, structured, and formatted for vectorization and indexing.
  • Large Language Model (LLM) Selection: Choose an appropriate large language model (LLM) that aligns with your data type and use case, considering factors like pre-trained embeddings, architecture, and scalability. 
  • LLM Fine-Tuning (Optional but Recommended): Fine-tune the selected LLM if necessary, using domain-specific data to improve its performance in capturing semantic similarities within your content.
  • Content Vectorization and Indexing: To convert your content into dense vectors, implement the vectorization process. Then, create an index structure to efficiently store and retrieve these vectors, selecting suitable indexing and optimization algorithms.
  • Scalability and User Interaction: Ensure the deployed vector search system is scalable to accommodate increasing data and user loads. Focus on user interaction aspects, providing documentation, training, and mechanisms for user feedback to optimize and enhance the search experience continually. 

Is there a way to leverage both Keyword and Vector Search in a solution?

Vector (Fernando Islas)

A hybrid search approach combines traditional keyword search with advanced vector search techniques to provide users with more accurate and relevant results. The user submits a single query that retrieves keyword-based and vector-based results independently. Then, the results from keyword and vector searches are merged into a unified ranking, offering users a comprehensive view of content relevance based on the presence of keywords and semantic similarities. While this hybrid approach provides superior search precision and context awareness, it requires ongoing tuning to optimize the balance between keyword and vector-based relevance, which may not only vary from use case to use case but often on a query-to-query basis.

Keyword (Chris Marino)

As Fernando explains, the concept of hybrid search has become very popular with the introduction of vector-based, semantic search. A search application benefits from the best of both worlds – melding the “tried and tested” nature of keyword search with the more modern semantic search experience. One contributing factor towards a successful implementation is keeping in mind where each method excels.

Keyword search excels for:

  • Boolean searches
  • Exact phrase matching
  • Search lookup by metadata (specific attributes like ZIP Code, State, Business Unit, etc.)

Semantic Search excels for:

  • Searches requiring context
  • Searches which return relevant results even when the exact search terms are not found within the content

Conclusion

In this Expert Analysis, we covered the differences between Keyword and Semantic Search from a tooling and production deployment perspective. Choosing the right technologies and tools for your search project can be challenging and requires a thoughtful, reasoned approach.

If you are embarking on a search project and need proven expertise to help guide you to success, contact us!

EK Team EK Team A services firm that integrates Knowledge Management, Information Management, Information Technology, and Agile Approaches to deliver comprehensive solutions. Our mission is to form true partnerships with our clients, listening and collaborating to create tailored, practical, and results-oriented solutions that enable them to thrive and adapt to changing needs. More from EK Team »