Blog

How to Optimize Search Relevance: Boosting and Filtering

What if when searching your company’s intranet for the terms “service desk,” the first 10 results have nothing to do with a web page where you can submit an IT service ticket? Or what if you are searching for “reports,” but you only want results related to the marketing department? In both of these scenarios, you can utilize boosts and filters to prioritize your search results based on what is being searched for by your users, as these two techniques enable you to pull a subset of documents either up or down the relevancy rank of a result set.

In my previous blog, we examined the ways in which the search ranking function can be adjusted through a field-centric versus term-centric approach, as well as how one can blend these approaches to return relevant and precise results to users. In this final blog of my search relevance optimization series, we are going to examine how to hone your user’s intent through the boosting and filtering of documents to improve the results that are brought back to your users.

First let’s discuss how and who sets the priorities of what is boosted and filtered. Large organizations that have many knowledge domains often have competing priorities of what information is considered highly relevant. A centralized KM strategy can help manage these priorities so that search relevancy follows business objectives in measurable ways. The cost of not having a centralized strategy may include a loss in productivity of users, increased technical debt from constantly implementing one-off changes, and losses in revenue as customers are not able to find a precise product. Having search product owners who manage these business requirements and priorities can bring together an organization’s needs into a uniform strategy.

boosting and filtering documents

Boosting

Boosting your search gives you the power to mathematically prioritize specific documents as more relevant than others by evaluating how query terms are matched in fields.  For example, if you had a recipe search engine, you may want to boost the title field so that queries that match terms in the title field are shown higher than say those documents that match terms in the comments. Boosting can require some manual tweaking. Adding too big of a boost can miss the mark of your user’s search. Adding too little of a boost can wash out the effect of having a boost. 

At its core, boosting involves taking a base score of a document and either adding or multiplying a boosting factor to the score of a field within a document. Additive boosting adds a factor to the document’s base score. Multiplicative boosting multiplies the document’s base score by some factor. Boosting can happen at either the index or query time, but for the purposes of this blog, we will just focus on query time, as query time boosting gives the ability to tweak the boost factor per query without the added overhead of reindexing every time you want to tweak the boost value. 

Knowing when to choose between additive and multiplicative boosting can be tricky. If we ran a search on some documents and the max score was 0.6642324, all the documents returned would have a score that is less than 1. We could simply promote a subset by adding a constant such as 3. Mathematically what we’re saying though is that we want to boost these documents by almost 500% (Constant 3 / max score of .66 = 5). If a document goes from a score of .664 to 3.664 that’s a 450% increase.. The problem here is that the boost’s effect is relative to this query. If we ran another query that had a max score of 9.12245, then the additive boost of 3 would only be boosted by about a 33% increase. A multiplicative boost would keep the factor relative for both max-scored queries. So for example a multiplicative boost of 1.1 would have the same relative boost of a 110% for both the query with a max score of 0.6642324 and the query with a max score of 9.12245.

Base Score Additive Boost Final Score Base Score Multiplicative Boost Final Score
0.6642324 3 3.6642324 0.6642324 1.1 0.73065564
9.12245 3 12.12245 0.6642324 1.1 10.034695

The listing below is an example Elasticsearch query that boosts the title field by 25%.

query = {

{

  "query": {

    "multi_match": {

      "query":   "search relevance for beginners",

      "type":    "cross_fields",

      "fields":  [ "title"^.25, "description", "full-text", "review" ]

    }

  }

}

Filtering 

Filtering restricts your search results to a subset of matching documents. This helps increase precision by removing documents that don’t pertain to a subquery. It is important to note, however, that adding  filters is not necessarily easier or less complex than boosting, as you can have multiple filters applied to your base query. 

Filtering is often implemented through faceted search. When you see the facet categories on the left side of Amazon.com (department, avg. customer review, etc.) those are actually filters on the back-end. Facets also tell the user more details on what kind of documents they can search on and how they are categorized. For example, searching “robe” on Amazon will bring back the categories of “Women’s Sleepwear” and “Men’s Novelty Sleep & Loungewear.” To see what this looks like in the form of an Elasticsearch query, reference the box below in which we are filtering out our main search for “reports” by those documents that have a department value of marketing.

{

  "query": { 

    "bool": { 

      "must": [

        { "match": { "title":   "reports"}}

      ],

      "filter": [ 

        { "term":  { "department": "marketing" }}

      ]}}}

Conclusion

A good search application is built by understanding what your users are looking for and ensuring the data reflects these intentions. In this article we learned how applying various weights and filters on fields can reshape the priority of how documents are ranked. We explored how a factor is relative to each query in additive boosting, but more broadly relative with multiplicative boosting. We also learned how to increase precision by filtering for a subset of documents. These techniques in search relevancy may help craft a result set that aligns with your user’s intent.

A search governance strategy is at the heart of every good search application. This helps ensure stale and outdated content are not boosted inappropriately and helps clarify the organization’s search priorities. As organizations mature in their search capability they improve ways to boost and filter content through machine learning algorithms. We’ll explore these algorithms in a future series. 

In this series on search relevance we explored how to shape fields to provide signals to your search application on what type of data you’re looking for. If you would like to discuss how Enterprise Knowledge can help you articulate your search priorities and develop an approach aligned with your goals, contact us.

 

Stephon Harris Stephon Harris Stephon Harris is a full-stack software developer with a passion for building modern web applications to provide engaging user experiences while also crafting maintainable solutions for clients. He blends diligence for both end-users and developers by seeing through the perspectives of different roles and supporting them with an organization's vision. More from Stephon Harris »