How to Optimize Search Relevance By Tweaking your Search Approach

In my previous blog, we looked at how important it is to shape the data in your search index to your needs. In this blog we will discuss the querying side of search. Let’s examine the most important part of search for users – the relevancy of search results.

Field-Centric v. Term-Centric Search

There are two popular approaches used in both Elasticsearch and Solr for retrieving documents based on the user’s query terms. These approaches are called field-centric search and term-centric search, and each approach tries to solve common issues in a search system.

Field-centric search focuses on bringing back results that have the most matches to the search terms in any field. For example, if you have a recipe search application and you think that your users will search by recipe name across fields, you could configure search to bring back results where the recipe name is found most frequently across the title, description, recipe, and ingredients fields. 

Term-centric search focuses on bringing back the closest matches to the search terms, regardless of what field each term may be found in. Using the same example of a recipe search application, a term-centric search could look for matches to the search terms in the title, description, ingredients and recipe field and return results ranked by how precisely the query terms – or part of them – are matched in a field. Let us look at each of these approaches in more detail.

Field-centric Approach

A field-centric approach to search takes the entire query a user enters and runs it against field(s) in a document, prioritizing documents that have the term(s) in most fields and pulling them to the top of the results returned. Field-centric search is an easy starting place to begin cultivating more relevant results. It is especially powerful for queries with terms that are unique in a field as they will be calculated with a higher weight for their rarity within the search index. The downside is that field-centric approaches can potentially ignore some terms that users are searching for. In the table below you can see that the terms “eggplant”, “parmesan”, are passed to several fields to be queried on the left side in a most_fields type of search. The search type most_fields is field-centric and prioritizes documents that have terms in most of the fields specified.

Elasticsearch query Explained Query

/recipes/_validate/query?explain
{

  “query”: {

    “multi_match”: {

      “query”:   “eggplant parmesan”,

      “type”:    “most_fields”,

      “fields”:  [  “title”, “description”, “ingredients”, “recipe” ]

        }

     }

  }

((recipe:eggplant recipe:parmesan) | (description:eggplant description:parmesan) | (ingredients:eggplant ingredients:parmesan) | (title:eggplant title:parmesan))
Results
Score     Title            Description     Ingredients     Recipe
1.45        eggplant    eggplant          eggplant         eggplant
0.87        eggplant    parmesan

Term-centric Approach

Term-centric search takes each term, searches it across fields, and prioritizes documents that match most of the query terms highest; drawing documents that have terms across various fields closer to the top. This is powerful because each term has some level of influence on the score of each result returned no matter which field it was found in. Documents that do not have all of the search terms are pushed behind documents that do. The term-centric approach however can lose the nuance of field signals. This is because term-centric search treats each field the same. This issue is obvious when a term-centric search includes a bigrams, a pair of consecutive terms analyzed as one token. In a bigramed field, “eggplant parmesan” is indexed as one term instead of the separate terms “eggplant” and “parmesan”. If all the other fields in a term-centric search tokenized terms on whitespace, the search would drop the bigrammed field because it does not share a common query analyzer. In the table below we can see an example of a term-centric search using the cross_fields type. Documents that have more of the terms matched are prioritized higher than those that have a couple of terms matched in multiple fields.

Elasticsearch query Explained Query

/recipes/_validate/query?explain
{

“query”: {

“multi_match”: {

“query”: “eggplant parmesan”,

“type”: “cross_fields”,

“fields”: [ “title”, “description”, “ingredients”, “recipe” ]

     }

  }

}

((recipe:eggplant recipe:parmesan) | (description:eggplant description:parmesan) | (ingredients:eggplant ingredients:parmesan) | (title:eggplant title:parmesan))
Results
Score     Title            Description     Ingredients     Recipe
1.38        eggplant    parmesan
0.69        eggplant    eggplant          eggplant         eggplant

A Blended Approach

Both approaches can suffer from a failure to express signals that measure the generalized ways users expect search to work. This results in frustration from users because what they view as simple generic search terms are not able to be discovered in complex documents. To optimize your search, tailor fields to align to your users’ intent while also treating all search terms fairly and prioritize bringing back documents that have most of the search terms found in a query. You can accomplish this through a combination of term-centric and field-centric search. This approach uses term-centric search as a base for searching across all documents – treating all queried terms as significant – then layers on top the field-centric search which is more choosy and pushes stronger matches to the top. Below is an example of a blended term and centric combined search. The first multi_match is a field-centric search and the second is a term-centric search

Elasticsearch query Explained Query

/recipes/_validate/query?explain
{

“query”: {

“bool”: {

“should”: [

{

“multi_match”: {

“query”: “eggplant parmesan”,

“fields”: [“title”,”description”],

“type”: “most_fields”

}

},

{

“multi_match”: {

“query”: “eggplant parmesan”,

“fields”: [“title”,”description”,”recipe”],

“type”: “cross_fields”

}}]}}}

((description:eggplant description:parmesan) | (title:eggplant title:parmesan))~1.0 (blended(terms:[recipe:eggplant, description:eggplant, ingredients:eggplant, title:eggplant]) blended(terms:[recipe:parmesan, description:parmesan, ingredients:parmesan, title:parmesan])
Results
Score     Title            Description     Ingredients     Recipe
2.26       eggplant    parmesan
1.56       eggplant    eggplant          eggplant         eggplant

This blended approach will satisfy both a generic and broad search as well as a specific and precise search. This achieves the important objective of trying to provide precise results while also giving the user at least some content. The great thing about this blended approach is that you can still tweak the amount of term-centric vs field-centric search effects you would like to have in the overall search. This is done through various forms of boosting search, which we will talk about in my next and final post in the series.

Conclusion

We hope you take away some understanding of the different methods and approaches you can use to control search. As you begin to develop your search strategy, think about it in terms of your priorities. At the bottom is returning any results because that is better than returning 0 results. As your search becomes more specialized towards your needs, it should provide more precise results. Having a blend of both field-centric and term-centric provides a best-of-both-worlds approach. If you would like to discuss how Enterprise Knowledge can help you articulate your search priorities and develop an approach aligned with your goals contact us.

EK Team EK Team A services firm that integrates Knowledge Management, Information Management, Information Technology, and Agile Approaches to deliver comprehensive solutions. Our mission is to form true partnerships with our clients, listening and collaborating to create tailored, practical, and results-oriented solutions that enable them to thrive and adapt to changing needs. More from EK Team »