Elasticsearch is used to power our product search. Internal users may search by item code, but most users are searching for your product by the words they associate with the product. When we connect your ERP to your commercebuild site, we’re pulling up the item code and the product name, and that’s typically not enough data to capture how your users will search.
You’ll need to add more information to a product to make it more presentable, to give the user reasons to buy your product, improve your SEO, and also to make it easier to find in your search. In this article we’ll look at how Elasticsearch uses the content in commercebuild to generate search results.
Expectations of Elasticsearch
Elasticsearch has hundreds of engineers and it’s far beyond the scope of this article to precisely explain every nuance of search. It’s also configurable: our implementation of search might be slightly different than another company, and we’ve selected what we think is the best for our clients on this platform. What I hope to provide is information that will be most helpful to understand our search and how to use the content of a products to get the search results you expect.
How our Elasticsearch implementation works: The short version
Scope of the search
Every product page is treated as a unique document. When you search for a word or phrase, Elasticsearch is scoring the document relevance against your query. The items with the highest score show up first. Remember, we don’t necessarily look at all the fields of a product. In your settings you can turn on or off fields to help control your results.
Frequency
If you’re searching for “chair”, an item with the word “chair” twice will score higher than an item with the word “chair” once.
Before you start piling in words repetitively to try to boost a result, note that it’s your users that will have to suffer through a paragraph where every sentence starts with “This chair . . ” Your likely course of action is to make sure that you’re even and consistent with how you describe all your products. You’re not trying to “game” search, you’re trying to serve your users useful content.
Frequency, pt. 2
Also note that Elasticsearch uses a measure called Inverse document frequency, too. If you have 1000 products, and “chair” is used in almost all the product descriptions, then we expect the value of “chair” as a search term to be lower because it’s too frequent. A word that appears everywhere can’t be a relevant word for search: The goal of search is to narrow and sort the results.
Field length normalization
This measures how much information is stored in a field. If there are two products, both containing the word “chair” once, the instance where “chair” appears along side the fewest words will contribute to a higher score. This is a measure to reward brevity and clarity.
Coordination
Searching for the three words “red adjustable chair” is actually searching each word separately. There is no way to search “red” and “adjustable” and “chair”. Elasticsearch is always searching “red” or “adjustable” or “chair”. The coordination parameter helps rank a product with “red adjustable” higher than a product with just “chair”. A product page that contains more of the terms, will contribute to a better scores.
Boost
Boost is a unique tool in that users can increase the value of a field like secondary description above others – it’s the only item here that isn’t directly connected the the content itself.
Scoring
The above parameters of search all combine together to calculate one value for all the products you carry, every time a search is initiated. A multi-word search isn’t just using coordination, it’s using all the frequency and field length normalization, it’s ignoring fields that are off, and it’s taking into account your boost value. There is no single factor that guarantees a result, it’s a wholistic measure.
Conclusion
When you use concise, well written, accurate content for your products you should see accurate results in your search. The goal of this article isn’t to induce you to try to game the results, it’s to help understand the results that you have, and what factors contribute to the products you see on the page.
Notes
In the example of searching for “red adjustable chair”, I’m writing this with quotes for clarity in this document. It’s also common in a web search engine to put quotes around a phrase to get results that match the whole string. If you typed in a string that included quote marks in Elasticsearch, the actual search would be for “red or adjustable or chair” where it’s looking for documents with a quote mark followed by the word red, as opposed to just the three letters of the word red. In that above case where a user encloses the search string in quotes, it’s likely that only products with the word adjustable would show up.
See more about parameters and boosting, see this article on refining search results