Learning what query expansion to improve generalization of strong cross-encoder rankers means and why SEOs should care.

Welcome to a new Hamsterdam Research post.
This week, we’ll take a look at a paper from Google Research titled, “Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?“
The paper’s authors include Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, and Michael Bendersky.
We’ll start with the paper’s abstract and then dig into more of its details.
But first, why should SEOs and marketers care about this research paper?
The techniques discussed in the paper, namely query expansion and the use of LLMs, can lead to more effective search results. By understanding how search engines are becoming more sophisticated in understanding queries, we as SEOs can refine our strategies and content.
In short, we’re all familiar with the advice to write for readers. This research helps reveal what that means in terms of a search engine’s accuracy for retrieving relevant information.
Let’s delve into the abstract.
“Query expansion has been widely used to improve the search results of first-stage retrievers, yet its influence on second-stage, crossencoder rankers remains under-explored. A recent study shows that current expansion techniques benefit weaker models but harm stronger rankers. In this paper, we re-examine this conclusion and raise the following question: Can query expansion improve generalization of strong cross-encoder rankers? To answer this question, we first apply popular query expansion methods to different crossencoder rankers and verify the deteriorated zero-shot effectiveness. We identify two vital steps in the experiment: high-quality keyword generation and minimally-disruptive query modification. We show that it is possible to improve the generalization of a strong neural ranker, by generating keywords through a reasoning chain and aggregating the ranking results of each expanded query via selfconsistency, reciprocal rank weighting, and fusion. Experiments on BEIR and TREC Deep Learning 2019/2020 show that the nDCG@10 scores of both MonoT5 and RankT5 following these steps are improved, which points out a direction for applying query expansion to strong cross-encoder rankers.”
First off, what’s the difference between first-stage retrievers and second-stage, cross-encoder rankers?
Well, both are components of modern search systems, each playing a distinct role in the retrieval and ranking of information.
First-stage retrievers do the initial retrieval of documents from the index, typically employing techniques like BM25, a bag-of-words retrieval model that scores documents based on the presence and frequency of query terms. In short, these retrievers focus on quickly retrieving a subset of potentially relevant documents, prioritizing recall over precision.
Second-stage, cross-encoder rankers re-rank the initially retrieved documents using more sophisticated ML models, often based on DNNs, to better assess the relevance of each document to the query. These rankers focus on fine-grained relevance assessments, considering the semantic relationships between a query and document and prioritizing precision over recall.
It’s in this second stage of ranking, the re-ranking stage, that the researchers are exploring using query expansion.
What is query expansion?
It’s a technique used to improve the accuracy of search results by adding terms (expansions) to the original search query to clarify the user’s intent and allow the search engine to retrieve more relevant documents.
How does query expansion apply to the second stage of ranking?
Well, cross-encoder rankers thoroughly examine a small set of documents from the first stage, prioritizing precision. This makes them sensitive to the specific wording and format of the query, which makes query expansion challenging.
The authors research looks at ways to enhance query expansion in cross-encoder rankers. They found that by generating high-quality keywords through a “reasoning chain” and combining the results of each expanded query improved the performance of second-stage rankers. Or as they define it, “We identify two vital steps in the experiment: high-quality keyword generation and minimally-disruptive query modification.”
Let’s delve a bit more into the high-quality keyword generation.
“The first step of query expansion is to generate keywords {š¤1,š¤2, ā¦, š¤š } semantically similar to the query,” they explain.
There are two sources of signals generally used, including corpus-based signals through Pseudo-Relevance Feedback (PSF) and “more recent approaches leveraging signals from LLMs by prompting.”
PSF involves analyzing a set of documents initially retrieved for a given query to extract additional keywords. This method assumes the initially retrieved documents are relevant.
LLMs by prompting refers to generating semantically similar keywords to the original query by prompting LLMs to produce additional terms.
What’s interesting here is that using LLMs for keyword research is kind of a questionable tactic in SEO because it’s not based on user data like using third-party SEO tools. However, we see here that Google could be using LLMs for query expansion in its search engine. Perhaps it’s time to rethink the value of that approach.
In addition to keyword generation, the paper also mentions minimally disruptive query modification.
This simply refers to a technique used to incorporate additional keywords into a search query without significantly altering the original query’s structure or format. Recall how precision was prioritized in second-stage ranking.
The authors propose two methods of doing this. The first is direct concatenation, where keywords are added to the original query. However, this can become disruptive as the number of keywords can change the original query’s structure.
Another approach involves individual concatenation and fusion, whereby each keyword is individually concatenated with the query and then the ranking results are combined or fused. This approach involves less disruption to the original query while still leveraging the additional information.
The authors found the latter approach was most effective, combining the individually concatenated keywords using a technique called reciprocal rank weighting, which assigns weights to each expanded query based on the rank of the most relevant document retrieved for that query.
Minimally disruptive query modification techniques are pivotal to improving the performance of cross-encoder rankers by allowing them to leverage additional keywords without compromising their sensitivity tot he original query’s structure.
The authors conclude, “Our solution is to leverage an LLM to generate high-quality, concise keywords through a reasoning chain” — a method of LLM prompting that asks the model to reason about the query rather than just produce keywords — “and individually evaluate the ranking scores of each expansion before aggregating them together.”
This allowed for improvement over directly using more popular query expansion methods.
What can we take away from this as SEOs?
This paper helps us understand that ranking of results is a multi-step process, first prioritizing recall and next precision. Furthermore, Google could be using LLMs for query expansion in the second-stage of ranking.
This helps us look past keyword-matching of pages to queries to understand more advanced ML methods could be used for ranking, as well as the potential of LLMs for keyword generation.
Thanks for reading. Happy marketing! š¤
Related posts
Content marketing fundamentals for websites and socials.
Content marketing fundamentals for websites and socials. *This post is in-progress and will be finished in the coming days. š Earlier this month, I declaredā¦
Exploring language model embeddings for Bayesian optimization & why marketers should care.
Exploring language model embeddings for Bayesian optimization & why marketers should care. Welcome to a new week of š¹ Hamsterdam Research, where we’re looking atā¦
What is the cherry-picking of information from RAG systems like Perplexity and Copilot?
What is the cherry-picking of information from RAG systems like Perplexity and Copilot? SEO for AI search isn’t only about getting mentions and citations, butā¦
Leave a Reply