Elasticsearch sort by children
Two entities: collection and product. The collection is the parent of product.
I need to search by product's terms and show collections with 4 products each.
Collections and products can be partially matched, but the best match first. If the match is not full, some terms have priority.
Example: Searching "color:red" and "material:stone" need to show red stones first, any other red next (this is about collections match and product match).
So, all of this solved by request below:
{
"query": {
"has_child": {
"type": "products",
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"match_all": {}
},
"boost": 1
}
},
{
"constant_score": {
"filter": {
"terms": { "_name": "colors", "colors": [5] }
},
"boost": 1.2
}
},
{
"constant_score": {
"filter": {
"terms": { "_name": "materials", "productTypes": [6] }
},
"boost": 1
}
}
]
}
},
"score_mode": "max",
"inner_hits": {
"size": 4,
"sort": [
"_score"
]
}
}
},
"sort": [
"_score"
]
}
Ok, now the trouble.
Need to sort by price. As ASC, as DESC. Price is the property of the product.
Need to sort by the price of matched products, so can't move price to the collection. Need to sort by price as a collection as products. Collections sorted by minimal (or maximal) price of matched products.
Need to sort by price only 100% matched products (well, partially matched can be sorted too, but after). I mean, sort must be like ORDER BY _score, price
Example, that I want to get, sort by price asc, [nn] means partially matched product:
Collection1
100 - 200 - 800 - [99]
Collection2
300 - 500 - [10] - [20]
Collection3
400 - 450 - 500 - [100]
I found that sort by the child is not supported. And the suggestion to recalculate score. But I'm using the score for sort by match. My try was
{
"query": {
"has_child": {
"type": "products",
"query": {
"function_score": {
"query": {
"bool": {
"should": [
... same query as above ...
]
}
},
"functions": [
{
"script_score": {
"script": "ceil(_score * 100) * 100000 + (99999 - doc['price'].value/100)",
"lang": "expression"
}
}
]
}
},
"score_mode": "max",
"inner_hits": {
"size": 4,
"sort": [
"_score",
{
"price": {
"order": "desc"
}
}
]
}
}
},
"sort": [
"_score"
]
}
But I'm really confused with results to score that I can see in answer. Asking for help :) Or, maybe, drop this and create a nested index?
UPD: Found that was wrong with score. By default, elastic combine score and result of script_score. So score was ceil(_score * 100) * 100000 + (99999 - doc['price'].value/100) * _score
- that can broke idea, but easy to fix with boost_mode
parameter of function_score
. Result query:
{
"query": {
"has_child": {
"type": "products",
"query": {
"function_score": {
"query": {
"bool": {
"should": [
... same query as above ...
]
}
},
"functions": [
{
"script_score": {
"script": "ceil((log10(_score)+10) * 100) * 100000 + (99999 - doc['price'].value)",
"lang": "expression"
}
}
],
"boost_mode": "replace"
}
},
"score_mode": "max",
"inner_hits": {
"size": 4,
"sort": [
"_score",
{
"price": {
"order": "desc"
}
}
]
}
}
},
"sort": [
"_score"
]
}
boost_mode == 'replace
means "use function result as score". Also, used log10 to be sure how many digits in _score.
For sort by price DESC need to change formula to ceil((log10(_score)+10) * 100) * 100000 + (doc['price'].value)
UPD2
Formula ceil((log10(_score)+10) * 100) * 100000 + (99999 - doc['price'].value)
returns 100099952
for price 48
and for price 50
(boost == 1, queryNorm == 1) because single precision limitation.
New formula ceil((log10(_score)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value) * 1000))
- reduced number of digits for score and switched from price to lg of price and reduced number of digits too. Feedback welcome.
Solution 1:
Thanks for sharing, updated latest formula to ceil((log10(_score+1)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value +1) * 1000))
added +1 to score because in some cases it returns errors like this:
"function score query returned an invalid score: -Infinity for doc: 4580"
Update: Got another error:
"function score query returned an invalid score: NaN for doc: 1739"
changed formula to ceil((log10(_score+1)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value +1) * 1000))
added +1 to doc value to fix this
Update 2: Got another error:
"function score query returned an invalid score: NaN for doc: 1739"
changed formula to ceil((log10(_score+1)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value > 0 ? doc['price'].value : 1) * 1000))
replaced +1 with expression
Update 3: Got another error:
doesn't have an error message anymore it's hard to find now, but it was similar to the previous :(
changed formula to ceil(_score+1) + ceil((doc['price'].value > 0 ? doc['price'].value : 1) * 100)
simplified formula, so I can understand and it still working today :)