E-commerce customers often need to sort products by price, size, and other SKU-level attributes. Our job is to make this process as easy and pleasant for them as we can, because the more products they find, the more they buy. How do we help them find what they need?For general searching and faceting in Solr/Lucene, we use Block Join Query. If users don’t know exactly what product they need or want to see our available products ranked by price or size, we run into the challenge of mapping the hierarchical structure of a typical e-commerce catalog into an inverted index.
Based on Grid Dynamics’ many years of Solr/Lucene experience, we suggest doing this with either the Index-time approach or the Query-time approach
Using this approach, we can collect and move up (propagate) values for sortable attributes from child documents to parent document during indexing. There is no built-in functionality in Solr that does this, so we need to write a bit of custom code to perform the following actions:
If you’re using Data Import Handler for data population purposes, SOLR-9479 can help.
Now, at query time we just need to sort parent documents by min(max) values of their multivalued attributes.
One disadvantage of this approach is that we have to write our own code (at least with Solr 6 and lower). Another disadvantage is a possible false positive cross match. For eg. if a customer is searching for something “green,” which is a value of the SKU-level COLOR attribute, and wants to sort products by price in the results list, he’ll probably expect that products will be sorted according to “green” SKU prices only. Unfortunately, by implementing an index-time approach the results will be sorted by prices from all SKUs.
Fortunately, we can address both of index-time approach disadvantages by designing a Solr sorting clause to achieve the same results without “propagating,” i.e. moving attributes up from child to parent. In order to do that, we are going to construct that clause based on a couple of Solr’s features.
Solr has a function query with syntax {!func}$field_name which is parsed into FunctionQuery. The score of this query is a $field_name’s attribute value. We can also sort documents by the scores of the function queries. The following clause will sort SKU documents by prices in descending order.
...sort={!func}price desc...
ToParentBlockJoinQuery supports several score calculation modes. For example, a score for a parent could be calculated as a min(max) score among of all its children’s scores. So, with the piece of code below we can sort parent documents by their children’s prices in descending order.
...sort={!parent which=doc_type:parent score=max v={!func}price} desc…
However, using this code will generate an exception because:
"msg": "Child query must not match same docs with parent filter.."
Basically, this code produces a limitation: a child query inside ToParentBlockJoinQuery must match child and only child documents even if it is only used for sorting. To get around this, put a “children only” filter into the child query:
...sort={!parent which=doc_type:parent score=max v=’+doc_type:child +{!func}price’} desc…
The child filter can be extended in the same manner to take only prices from “green” SKUs into account, or any other appropriate SKU-level filter.
And that’s how to use ToParentBlockJoin for efficient hierarchical sorting.
Working with an inverted index isn’t hard, as you have just learned, and e-commerce customers will be glad you put in the effort to set it up. We say this over and over, because it’s always true: anything we can do that makes it easier for customers to buy from us is a good thing.
Stay tuned for future posts about interesting, often little-known or under-utilized Solr attributes. And, as always, if you have any questions, leave them as a comment below.
Andrey Kudryavstev