In the previous article we discussed the Endeca rules model and explained how to re-implement this model using Elasticsearch. We needed to implement inverted search to trigger our rules and we leveraged powerful percolator feature in Elasticsearch which greatly simplified our implementation. In this blog post, we will discuss how to approach implementation of Endeca rules if you are running Solr.
Unfortunately, Solr currently does not have a percolator-like functionality. We believe it will be available soon because Lucene 8.2 support is already merged. Meanwhile, we can employ an alternative approach to implement inverted search based purely on Solr queries. We will use the same example we used in previous article for illustration.
Firstly, let’s recall the particular trigger types that we will have to implement:
Match phrase: the search phrase contains search terms sequentially in a strict order but may also contain other words before or after.
Example: The rule is configured with search terms = “how to”. The search phrase “how to make an order” will trigger this rule. At the same time, the search phrase “how can I get to the store” will not trigger the rule.
Match all: the search phrase contains all search terms in any order with optional additional words in any position.
Example: The rule is configured with search terms = “oven best pizza”. The search phrase “what is the best oven for cooking pizza” will trigger this rule.
Match exact: the rule will be triggered only and only when the search phrase is exactly equal to search terms. No additional words are allowed.
Example: The rule is configured with search terms “order status”. Only the search phrase “order status” will trigger this rule, not any other.
We will use those triggers as an example and we will use default Solr configuration for simplicity. So, lets roll our sleeves and get some inverted search up&running!
First, after launching Solr we need to create the new core/collection for named rules. We can do it from the core admin page or from the terminal by executing .bin/solr create -c rules command.
We are going to use the same logical rule structure as in the previous post. The rule will be modeled as a parent document, with triggers represented as child documents. So how the example from the previous post will look in the case of Solr?
In this post, we will extend our simple rule engine functionality with two essential features : rule collapsing and sorting. Rule collapsing refers to the situation when multiple rules of the same type fire, and we have to select the one with the highest priority (represented as lowest priority number).
Let’s start with basic rule structure. Note that we used *_i, *_s and *_t suffixes in order to map integer, string and text field types respectively.
{ "id": "1", "priority_i": 1, //1 "action_t": "<some serialized action>", // 2 "actionType_s": "<REDIRECT/FACET/BOOST/BURY....>", "scope_s": "rule", "_childDocuments_": [ { "id": "1", "keyword_s": "<phrase to be triggered on>", "keyword_t": "<phrase to be triggered on>", //3 "keyword_words_count_i":<Integer value. Count of words in keywordfield> , //3 "matchmode_s": "<MATCHEXACT/MATCHPHRASE/MATCHALL>", "scope_s": "trigger" }, { ..... } ] }
Now, lets convert our sample rules into the Solr input structure:
{ "id": "1", "priority_i": 2, "action_t": "http://retailername.com/FAQ", "actionType_s": "REDIRECT", "scope_s": "rule", "_childDocuments_": [ { "id": "tr1", "keyword_s": "how to", "matchmode_s": "MATCHPHRASE", "scope_s": "trigger", "keyword_t":"how to", "keyword_words_count_i":2 } ] }, { "id": "2", "priority_i": 1, "action_t": "http://retailername.com/orders", "actionType_s": "REDIRECT", "scope_s": "rule", "_childDocuments_": [ { "id": "tr2", "keyword_s": "order status", "matchmode_s": "MATCHEXACT", "scope_s": "trigger", "keyword_t":"order status", "keyword_words_count_i":2 } ] }, { "id": "3", "priority_i": 1, "action_t": "http://retailername.com/top10ovens", "actionType_s": "REDIRECT", "scope_s": "rule", "_childDocuments_": [ { "id": "tr3", "keyword_s": "oven best pizza", "matchmode_s": "MATCHALL", "scope_s": "trigger", "keyword_t": "oven best pizza", "keyword_words_count_i": 3 } ] }
After the indexing, we will have our core filled with our sample rules.
Since we have 3 different match modes, in order to build our inverted search query, we need to create a disjunction boolean query. We will show you the final result and then walk you through every part of the query.
Let’s use the keyword “how to cook” as an example. Below is a complete request how to match rules using the “how to cook” user keyword.
http://localhost:8983/solr/rules/select?exactQuery=keyword_s:"how to cook"&fq={!collapse field=actionType_s sort='priority_i asc'}&matchAllQuery= {!frange l=0 u=0 incl=true incu=true v='sub(sum(max(0, query({!lucene v="keyword_t:how^=1"})),max(0, query({!lucene v="keyword_t:to^=1"})),max(0, query({!lucene v="keyword_t:cook^=1"}))),field(keyword_words_count_i))'} &phraseQuery=keyword_s:"how to" OR keyword_s:”to cook”&q={!parent which=scope_s:rule v=$triggerQuery}&triggerQuery=+(({!lucene v=$exactQuery} AND filter(matchmode_s:MATCHEXACT)) OR ({!lucene v=$phraseQuery} AND filter(matchmode_s:MATCHPHRASE)) OR ({!lucene v=$matchAllQuery} AND filter(matchmode_s:MATCHALL))) AND filter(scope_s:trigger)
As you can see this request correctly returns rule no.1 associated with a matchPhrase trigger configured on “how to”.
So, lets analyze all parts of this complex query
http://localhost:8983/solr/rules/select?
is a request to regular select RequestHandler
q={!parent which=scope_s:rule v=$triggerQuery}&
ToParentBlockJoinQuery is needed to match Rule (parent document) by it’s matched triggers (child documents)
triggerQuery=+(({!lucene v=$exactQuery} AND filter(matchmode_s:MATCHEXACT))OR ({!lucene v=$phraseQuery} AND filter(matchmode_s:MATCHPHRASE)) OR ({!lucene v=$matchAllQuery} AND filter(matchmode_s:MATCHALL))) AND filter(scope_s:trigger)&
This is the main query for matching triggers. As you can see, this query is a disjunction query with 3 clauses for 3 different match modes. The specific queries for each type are extracted to separate nested params exactQuery, phraseQuery and matchAllQuery
exactQuery=keyword_s:"how to cook"&
MatchExact query, It is very straightforward - we just need to check if that keyword field content is exactly the same as the user's query. As we are only looking for exact match, un-tokenized string field is used.
phraseQuery=keyword_s:"how to" OR keyword_s:”to cook”&
MatchPhrase query. Here the query parser needs to cut all possible n-grams from the user search phrase. As we have a very short example keyword, we have only two n-grams “how to” and “to cook”. Using this approach, we are matching only those triggers which contain some subphrase of the user keyword.
matchAllQuery={!frange l=0 u=0 incl=true incu=true v='sub(sum(max(0,query({!lucene v="keyword_t:how^=1"})),max(0, query({!lucene v="keyword_t:to^=1"})),max(0, query({!lucene v="keyword_t:cook^=1"}))),field(keyword_words_count_i))'}&
MatchAll query is the trickiest one, leading to inverted search problem. We will discuss it separately to properly explain all the details
fq={!collapse field=actionType_s sort='priority_i asc'}
Collapse Filter query in order to fetch only no.1 rule of each type with the lowest priority
Formally speaking, matchAll query means that we have to find such rules, where the tokens configured in the trigger are the subset of tokens from the user query. We don't know which tokens will match, but we know that the number of matched tokens should be exactly the same as the total number of tokens in the trigger.
We conveniently store the number of tokens in the keyword in the field keyword_words_count_i.
We will use S0lr function query framework to perform this precise matching. Function queries were designed for match scoring, but with some simple tricks we can use them for precise filtering as well:
{!frange l=0 u=0 incl=true incu=true v='//5 sub( // 4 sum( // 2 max(0, query({!lucene v="keyword_t:how^=1"})), // 1 max(0, query({!lucene v="keyword_t:to^=1"})), max(0, query({!lucene v="keyword_t:cook^=1"})) ), field(keyword_words_count_i)) // 3 '}
We will unwind this query from inside out, so follow the numbers in the listing:
That's it. Now we are able to perform inverted search and match our AllMatch triggers.
Lets consider some more examples:
The request for “order status” keyword, which correctly matches rule no. 2 associated with matchExact trigger configured on phrase “order status” goes as follows:
http://localhost:8983/solr/rules/select?exactQuery=keyword_s:%22order%20status%22&fq= {!collapse%20field=actionType_s%20sort=%27priority_i%20asc%27}&matchAll Query={!frange%20l=0%20u=0%20incl=true%20incu= true%20v=%27sub(sum(max(0,%20query ({!lucene%20v=%22keyword_t:order^=1%22})),max(0,%20query ({!lucene%20v=%22keyword_t:status^=1%22}))), field(keyword_words_count_i))%27}&phraseQuery=keyword _s:%22order%20status%22&q={!parent%20which=scope_s:rule%20v=$triggerQuery} &triggerQuery=+ (({!lucene%20v=$exactQuery}%20AND%20filter(matchmode_s:MATCHEXACT)) %20OR%20({!lucene%20v=$phraseQuery}%20AND%20filter (matchmode_s:MATCHPHRASE))%20OR%20({!lucene%20v=$matchAllQuery} %20AND%20filter(matchmode_s:MATCHALL)))%20+filter(scope_s:trigger)
The request for “best oven for pizza” keyword, which correctly matches rule no. 3 associated with matchAll trigger configured on words set “oven best pizza” goes as follows:
http://localhost:8983/solr/rules/select?http://localhost:8983/solr/rules/select?exactQuery =keyword_s:%22best%20oven%20for%20pizza%22&matchAllQuery= {!frange%20l=0%20u=0%20incl=true%20incu=true%20v =%27sub(sum(max(0,%20query({!lucene%20v=%22keyword_t:best^=1%22})) ,max(0,%20query({!lucene%20v=%22keyword_t:oven^=1%22})) ,max(0,%20query({!lucene%20v=%22keyword_t:for^=1%22})) ,max(0,%20query({!lucene%20v=%22keyword_t:pizza^=1%22}))) ,field(keyword_words_count_i))%27}&phraseQuery=keyword_s: %22best%20oven%22%20OR%20keyword_s:%22oven%20for%22%20OR%20keyword_s: %22for%20pizza%22%20OR%20keyword_s:%22best%20oven%20for%22%20OR%20keyword_s: %22oven%20for%20pizza%22&q={!parent%20which=scope_s:rule%20v= $triggerQuery}&triggerQuery=+ (({!lucene%20v=$exactQuery}%20AND%20filter(matchmode_s:MATCHEXACT)) %20OR%20({!lucene%20v=$phraseQuery}%20AND%20filter (matchmode_s:MATCHPHRASE))%20OR%20({!lucene%20v=$matchAllQuery} %20AND%20filter(matchmode_s:MATCHALL)))%20AND%20filter (scope_s:trigger)&fq={!collapse%20field=actionType_s%20sort= %27priority_i%20asc%27}
We can also consider keyword “how to cook best pizza” which is matching both “how to” matchPhrase trigger and “oven best pizza” matchAll trigger, but because of collapsing filter query(fq) we are getting only rule no. 3 with the highest priority.
http://localhost:8983/solr/rules/select?exactQuery=keyword_s:%22how%20to%20oven%20best%20pizza%22&matchAllQuery= {!frange%20l=0%20u=0%20incl=true%20incu=true%20v=%27sub (sum(max(0,%20query({!lucene%20v=%22keyword_t:how^=1%22})), max(0,%20query({!lucene%20v=%22keyword_t:to^=1%22})), max(0,%20query({!lucene%20v=%22keyword_t:oven^=1%22})), max(0,%20query({!lucene%20v=%22keyword_t:best^=1%22})), max(0,%20query({!lucene%20v=%22keyword_t:pizza^=1%22}))), field(keyword_words_count_i))%27}&phraseQuery=keyword_s:% 22how%20to%22%20OR%20keyword_s:%22to%20oven%22%20OR%20keyword_s: %22oven%20best%22%20or%20keyword_s:%22best%20pizza%22%20OR%20keyword_s :%22how%20to%20oven%22%20OR%20keyword_s:%22to%20 oven%20best%22%20OR%20keyword_s :%22oven%20best%20pizza%22%20OR%20keyword_s:% 22how%20to%20oven%20best%22%20OR%20keyword_s:% 22to%20oven%20best%20pizza%22&q= {!parent%20which=scope_s:rule%20v=$triggerQuery}&triggerQuery=+ (({!lucene%20v=$exactQuery}%20AND%20filter(matchmode_s:MATCHEXACT)) %20OR%20({!lucene%20v=$phraseQuery}%20AND%20filter (matchmode_s:MATCHPHRASE))%20OR%20({!lucene%20v=$matchAllQuery} %20AND%20filter(matchmode_s:MATCHALL)))%20AND%20filter(scope_s:trigger) &fq={!collapse%20field=actionType_s%20sort=%27priority_i%20asc%27}
In this blog post, we discussed the trickiest part of Endeca rule migration, matchAll trigger implementation. Full fledged implementation should also include other aspects, such as:
Happy searching!