Elasticsearch: Use function_score and soft_score to customize the score of search results

Elasticsearch: Use function_score and soft_score to customize the score of search results

We will introduce the basics of using  function_score  , and introduce some very useful and effective use cases of function core technology.

 

Introduction

The concept of scoring is at the core of any search engine (including Elasticsearch). Scoring can be roughly defined as: finding data that meets a set of criteria and returning it in order of relevance. Relevance is usually achieved through  an algorithm similar to  TF-IDF , which attempts to find the most similar document in text to the submitted query. Although TF-IDF and its similar algorithms (such as BM25 ) are great, sometimes other algorithms or other scoring heuristics must be used to solve the correlation problem. Here, the script_score  and  function_score  functions of Elasticsearch become very useful. This article will introduce the usage of these tools.

An example of a domain where text similarity is not the most important factor is geographic search. If you are looking for a good coffee shop near a given point, ranking the coffee shop by textual similarity to the query is not very useful for users, but ranks them by their geographic location nearby.

Another example might be a video on a video sharing site, where the search results might consider the relative popularity of the video. If a pop star uploads a video with a given title and receives millions of views, then the video should probably be better than an unpopular video with similar textual relevance.

When using Elasticsearch for full-text search, the _score field calculated by BM25 is used to sort in descending order by default. When we need to sort by other fields in descending or ascending order, we can use the sort field to pass in the sort field and method we want. When simply using a few fields in ascending and descending order combinations cannot meet our needs, we need to customize the sorting feature. Elasticsearch provides the function_score DSL to customize the scoring, so that it can be sorted according to the custom _score .

In actual use, we must pay attention to: soft_score and function_score are resource-consuming . You only need to calculate the scores of a set of filtered documents.


Let's use an example to illustrate how to customize our score through script_core and function_core.

 

Prepare data

Let's first download our test data:

git clone https://github.com/liu-xiao-guo/best_games_json_data copy the code

Then we import this data into our Elasticsearch through Kibana:

During the import process, we select the Time field as year and specify the corresponding date format:

We specify our index name as best_games:

We can look at a sample document like the following format:

"_source": { "global_sales": 82.53, "year": 2006, "image_url": "https://upload.wikimedia.org/wikipedia/en/thumb/e/e0/Wii_Sports_Europe.jpg/220px-Wii_Sports_Europe.jpg", "platform": "Wii", "@timestamp": "2006-01-01T00:00:00.000+08:00", "user_score": 8, "critic_score": 76, "name": "Wii Sports", "genre": "Sports", "publisher": "Nintendo", "developer": "Nintendo", "id": "wii-sports-wii-2006" } Copy code

From the above we can see that there are two very important fields in this document: critic_score and user_score. One is the difficulty of the game, and the other is the popularity of the game.

 

Normal query

1. let's take a look at what happens if we don't use any score customization.

GET best_games/_search { "_source": [ "name", "critic_score", "user_score" ], "query": { "match": { "name": "Final Fantasy" } } } Copy code

In the above query, in order to illustrate the convenience of the problem, in the returned result, we only return the name, critic_score and user_score fields. We include all games with "Final Fantasy" in the name field, then the displayed result is:

"hits": [ { "_index": "best_games", "_type": "_doc", "_id": "2qccJ28BCSSrjaXdSOnC", "_score": 8.138414, "_source": { "user_score": 9, "critic_score": 92, "name": "Final Fantasy VII" } }, { "_index": "best_games", "_type": "_doc", "_id": "6KccJ28BCSSrjaXdSOnC", "_score": 8.138414, "_source": { "user_score": 8, "critic_score": 92, "name": "Final Fantasy X" } }, { "_index": "best_games", "_type": "_doc", "_id": "6qccJ28BCSSrjaXdSOnC", "_score": 8.138414, "_source": { "user_score": 8, "critic_score": 90, "name": "Final Fantasy VIII" } }, ... Copy code

From the above results, we can see that Final Fantasy VII is the best match. Its score is the highest.

 

Soft_score query

Join us as we are the operator of the game, then we may have the method of ranking that we want by ourselves. For example, although all the results match, we may not only match Final Fantasy, but we want to add user_score and critic_score (although you can use one of them). We want to calculate our score this way.

Final score = score*(user_score*10 + critic_score)/2/100

That is, we multiply user_score by 10 to become a 100-point system. Add it to the critic_score, then divide by 2 and divide by 100 to get the weighting coefficient of the final score. This weighting factor is multiplied by the score obtained in the previous step to obtain the final score value. After such a transformation, we found that our score is not only the relevance of full-text search, but also closely related to our user experience and the coefficient of the game.

So how do we use this?

Referring to the official Elastics document  soft_score , we now do the following search:

GET best_games/_search { "_source": [ "name", "critic_score", "user_score" ], "query": { "script_score": { "query": { "match": { "name": "Final Fantasy" } }, "script": { "source": "_score * (doc['user_score'].value*10+doc['critic_score'].value)/2/100" } } } } Copy code

In the above query, we can see that we have used the new formula:

"script": { "source": "_score * (doc['user_score'].value*10+doc['critic_score'].value)/2/100" } Copy code

Then the result of my query is:

"hits": [ { "_index": "best_games", "_type": "_doc", "_id": "2qccJ28BCSSrjaXdSOnC", "_score": 7.405957, "_source": { "user_score": 9, "critic_score": 92, "name": "Final Fantasy VII" } }, { "_index": "best_games", "_type": "_doc", "_id": "K6ccJ28BCSSrjaXdSOrC", "_score": 7.0804205, "_source": { "user_score": 8, "critic_score": 94, "name": "Final Fantasy IX" } }, { "_index": "best_games", "_type": "_doc", "_id": "6KccJ28BCSSrjaXdSOnC", "_score": 6.9990363, "_source": { "user_score": 8, "critic_score": 92, "name": "Final Fantasy X" } }, { "_index": "best_games", "_type": "_doc", "_id": "6qccJ28BCSSrjaXdSOnC", "_score": 6.917652, "_source": { "user_score": 8, "critic_score": 90, "name": "Final Fantasy VIII" } }, ... Copy code

We can see from the above that the final score _score is a completely different value. We also see that although the first place Final Fantasy VII has not changed, the second place has changed from Final Fantasy X to Final Fantasy IX.

For script operations, there are some predefined functions that we can call, and they can help us speed up our calculations.

We can refer to Elastic's official documentation to help us understand more deeply.

 

Function score query

function_score  allows you to modify the document score retrieved by the query. For example, this function is useful if the score function is computationally expensive and sufficient to calculate the score on the filtered set of documents.

To use function_score, the user must define a query and one or more functions that calculate a new score for each document returned by the query.

function_score can be used with only one function, such as:

GET/_search { "query": { "function_score": { "query": { "match_all": {} }, "boost": "5", "random_score": {}, "boost_mode": "multiply" } } } Copy code

Here it multiplies the scores of all documents by 5 and a random_score (returns a value between 0 and 1). Then the score is a value from 0 to 5:

"hits": [ { "_index": "chicago_employees", "_type": "_doc", "_id": "Hrz0_W4BDM8YqwyDD06A", "_score": 4.9999876, "_source": { "Name": "ADKINS, WILLIAM J", "Job Titles": "SUPERVISING FIRE COMMUNICATIONS OPERATOR", "Department": "OEMC", "Full or Part-Time": "F", "Salary or Hourly": "Salary", "Annual Salary": 121472.04 } }, { "_index": "kibana_sample_data_logs", "_type": "_doc", "_id": "eXNIHm8BjrINWI3xYF0J", "_score": 4.9999495, "_source": { "agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24", "bytes": 6630, "clientip": "77.5.51.49", "extension": "", "geo": { "srcdest": "CN:ID",  ... Copy code

Although this score does not have much practical meaning, it allows us to see different documents every time we enter a web page, rather than a fixed result obtained strictly according to a fixed match.

We can also use union_score together with soft_score:

GET best_games/_search { "_source": [ "name", "critic_score", "user_score" ], "query": { "function_score": { "query": { "match": { "name": "Final Fantasy" } }, "script_score": { "script": "_score * (doc['user_score'].value*10+doc['critic_score'].value)/2/100" } } } } Copy code

Then the displayed result is:

"hits": [ { "_index": "best_games", "_type": "_doc", "_id": "2qccJ28BCSSrjaXdSOnC", "_score": 60.272747, "_source": { "user_score": 9, "critic_score": 92, "name": "Final Fantasy VII" } }, { "_index": "best_games", "_type": "_doc", "_id": "K6ccJ28BCSSrjaXdSOrC", "_score": 57.623398, "_source": { "user_score": 8, "critic_score": 94, "name": "Final Fantasy IX" } }, { "_index": "best_games", "_type": "_doc", "_id": "6KccJ28BCSSrjaXdSOnC", "_score": 56.96106, "_source": { "user_score": 8, "critic_score": 92, "name": "Final Fantasy X" } }, { "_index": "best_games", "_type": "_doc", "_id": "6qccJ28BCSSrjaXdSOnC", "_score": 56.29872, "_source": { "user_score": 8, "critic_score": 90, "name": "Final Fantasy VIII" } },   ... Copy code

The observant reader may see it. Our score is different from the previous soft_score result, but the order of our search results is the same.

In the above script, we used hard coding, that is, hard-coded 10 into the script. If there is a situation, I want to modify this value to 20 or other values in the future and look at the results of the query again. Due to the change of script, it needs to be recompiled, which is not very efficient. A better way is to write as follows:

GET best_games/_search { "_source": [ "name", "critic_score", "user_score" ], "query": { "script_score": { "query": { "match": { "name": "Final Fantasy" } }, "script": { "params":{ "multiplier": 10 }, "source": "_score * (doc['user_score'].value*params.multiplier+doc['critic_score'].value)/2/100" } } } } Copy code

Script compilation is cached to speed up execution. If the script has parameters that need to be considered, it is best to reuse the same script and provide parameters for it.

 

boost_mode

boost_mode is used to define how the newly calculated score is combined with the query score.

mulitplyMultiply query score and function score (default)
replaceOnly use feature scores, query scores will be ignored
sumAdd query score and function score
avgaverage value
maxMaximum query score and function score
minQuery the minimum value of score and feature score

field_value_factor

The field_value_factor function allows you to use fields in the document to influence the score. Similar to using the script_score function, but it avoids the overhead of scripting. If used in a multi-valued field, only the first value of the field is used in the calculation.

For example, suppose you have a document indexed with a numeric likes field and want to influence the score of the document through this field, then an example of doing so is as follows:

GET/_search { "query": { "function_score": { "field_value_factor": { "field": "likes", "factor": 1.2, "modifier": "sqrt", "missing": 1 } } } } Copy code

The function_score above will calculate the score according to the field_value_factore as follows:

sqrt(1.2 * doc['likes'].value) Copy code

The field_value_factor function has many options:

fieldThe field to be extracted from the document.
factorThe optional factor by which the field value is multiplied. The default is 1.
modifierThe modifier applied to the field value can be one of the following: none, log, log1p, log2p, ln, ln1p, ln2p, square, sqrt or reciprocal. The default is none.
missingIf the document does not have this field, use this value. Just like reading from the document, modifiers and factors still apply to it.

For our example, we can also use the following method to recalculate the score:

GET best_games/_search { "_source": [ "name", "critic_score", "user_score" ], "query": { "function_score": { "query": { "match": { "name": "Final Fantasy" } }, "field_value_factor": { "field": "user_score", "factor": 1.2, "modifier": "none", "missing": 1 } } } } Copy code

user_score factor 1.2

"hits" : [ { "_index" : "best_games", "_type" : "_doc", "_id" : "2qccJ28BCSSrjaXdSOnC", "_score" : 87.89488, "_source" : { "user_score" : 9, "critic_score" : 92, "name" : "Final Fantasy VII" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "6KccJ28BCSSrjaXdSOnC", "_score" : 78.128784, "_source" : { "user_score" : 8, "critic_score" : 92, "name" : "Final Fantasy X" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "6qccJ28BCSSrjaXdSOnC", "_score" : 78.128784, "_source" : { "user_score" : 8, "critic_score" : 90, "name" : "Final Fantasy VIII" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "K6ccJ28BCSSrjaXdSOrC", "_score" : 78.128784, "_source" : { "user_score" : 8, "critic_score" : 94, "name" : "Final Fantasy IX" } }, ...

 

functions

doc doc functions function function:

GET/_search { "query": { "function_score": { "query": { "match_all": {} }, "boost": "5", "functions": [ { "filter": { "match": { "test": "bar" } }, "random_score": {}, "weight": 23 }, { "filter": { "match": { "test": "cat" } }, "weight": 42 } ], "max_boost": 42, "score_mode": "max", "boost_mode": "multiply", "min_score": 42 } } }

boost 5 5 functions

GET best_games/_search { "query": { "function_score": { "query": { "match": { "name": "Final Fantasy" } }, "boost": "1", "functions": [ { "filter": { "match": { "name": " XIII" } }, "weight": 10000000 } ], "boost_mode": "multiply" } } }

name XIII 10000000

"hits" : [ { "_index" : "best_games", "_type" : "_doc", "_id" : "KqccJ28BCSSrjaXdSOrC", "_score" : 8.1384144E7, "_source" : { "global_sales" : 5.33, "year" : 2009, "image_url" : "https://www.wired.com/images_blogs/gamelife/2009/09/ffxiii-01.jpg", "platform" : "PS3", "@timestamp" : "2009-01-01T00:00:00.000+08:00", "user_score" : 7, "critic_score" : 83, "name" : "Final Fantasy XIII", "genre" : "Role-Playing", "publisher" : "Square Enix", "developer" : "Square Enix", "id" : "final-fantasy-xiii-ps3-2009" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "OKccJ28BCSSrjaXdSOvC", "_score" : 7.2601472E7, "_source" : { "global_sales" : 2.63, "year" : 2011, "image_url" : "https://i.ytimg.com/vi/tSJH_vhaYUk/maxresdefault.jpg", "platform" : "PS3", "@timestamp" : "2011-01-01T00:00:00.000+08:00", "user_score" : 6, "critic_score" : 79, "name" : "Final Fantasy XIII-2", "genre" : "Role-Playing", "publisher" : "Square Enix", "developer" : "Square Enix", "id" : "final-fantasy-xiii-2-ps3-2011" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "2qccJ28BCSSrjaXdSOnC", "_score" : 8.138414, "_source" : { "global_sales" : 9.72, "year" : 1997, "image_url" : "https://r.hswstatic.com/w_907/gif/finalfantasyvii-MAIN.jpg", "platform" : "PS", "@timestamp" : "1997-01-01T00:00:00.000+08:00", "user_score" : 9, "critic_score" : 92, "name" : "Final Fantasy VII", "genre" : "Role-Playing", "publisher" : "Sony Computer Entertainment", "developer" : "SquareSoft", "id" : "final-fantasy-vii-ps-1997" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "6KccJ28BCSSrjaXdSOnC", "_score" : 8.138414, "_source" : { "global_sales" : 8.05, "year" : 2001, "image_url" : "https://www.mobygames.com/images/promo/l/192477-final-fantasy-x-screenshot.jpg", "platform" : "PS2", "@timestamp" : "2001-01-01T00:00:00.000+08:00", "user_score" : 8, "critic_score" : 92, "name" : "Final Fantasy X", "genre" : "Role-Playing", "publisher" : "Sony Computer Entertainment", "developer" : "SquareSoft", "id" : "final-fantasy-x-ps2-2001" } }, ...

Final Fantasy XIII

 

Elasticsearch

Elasticsearch Decay function

 

Function Elasticsearch trending

24 10000 1 1000 Elasticsearch

gauss orgin scale offset decay origin scale 24 offset 1h 1   0.5

best_games

GET best_games/_search { "_source": [ "name", "critic_score", "user_score" ], "query": { "function_score": { "query": { "match": { "name": "Final Fantasy" } }, "functions": [ { "gauss": { "@timestamp": { "origin": "2016-01-01T00:00:00", "scale": "365d", "offset": "0h", "decay": 0.1 } } } ], "boost_mode": "multiply" } } }

2016-010-01 365 0.1 1

"hits" : [ { "_index" : "best_games", "_type" : "_doc", "_id" : "OKccJ28BCSSrjaXdSOvC", "_score" : 6.6742494E-25, "_source" : { "user_score" : 6, "critic_score" : 79, "name" : "Final Fantasy XIII-2" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "2qccJ28BCSSrjaXdSOnC", "_score" : 0.0, "_source" : { "user_score" : 9, "critic_score" : 92, "name" : "Final Fantasy VII" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "6KccJ28BCSSrjaXdSOnC", "_score" : 0.0, "_source" : { "user_score" : 8, "critic_score" : 92, "name" : "Final Fantasy X" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "6qccJ28BCSSrjaXdSOnC", "_score" : 0.0, "_source" : { "user_score" : 8, "critic_score" : 90, "name" : "Final Fantasy VIII" } }, { "_index" : "best_games", "_type" : "_doc", "_id" : "FqccJ28BCSSrjaXdSOrC", "_score" : 0.0, "_source" : { "user_score" : 7, "critic_score" : 92, "name" : "Final Fantasy XII" } }, ...

Final Fantasy XIII-2

 

1 www.elastic.co/blog/found-

2 medium.com/horrible-ha

3 www.elastic.co/guide/en/el

4 juejin.im/post/5df8f4

5 www.elastic.co/guide/en/el