Google AI Overviews: What We Can Learn from the Links in Liz Reid’s Latest Blog Post in The Keyword

Last updated:

October 5, 2024

I don’t know about The Rock, but I’m certainly curious about what Google is cooking, particularly when it comes to AI Overviews and the Gemini era of Search.

One cupboard I always check are the links in Google’s documentation, articles, and statements.

I’m curious to follow them to see what zesty context they might add to a topic.

So when Liz Reid, Google’s head of Search, published a new blog post last week about AI Overviews, the first thing I did was check the links. There were nine total.

Where do they go?

What mysteries might they unravel?

Are they affiliate links?

Kidding about that last part.

Exploring those nine links for more context about AI Overviews (AIO) is the goal of this post.

Now, the links don’t refer to new or secret pages or anything like that, so I wouldn’t say the information is groundbreaking.

However, it does speak to featured snippets, data voids, content policies, correcting inaccuracies, and other topics that have enriched my contextual understanding of Search.

Many of the documents I’d seen before, but like with anything, the context counts. 😉

“But how can we trust what Google says?!”

Fair point, but can’t we say that about everything and everyone?

My approach is always the same.

First I consider a source’s background, expertise, and motivations, as well as the context of their words, and then I form my own opinion, usually much later on and through experience.

I’ve done pretty well at SEO for my clients over the years by listening to what Googlers say, along with other sources I trust.

That’s why I believe Liz Reid’s latest blog post — which is to say, the post Google published on its official blog, The Keyword, with her byline — has value, and its links add helpful context.

I hope you’ll feel the same after taking this journey with me!

Here are the links we’ll explore:

“oddities“
“errors“
“other Search features“
“take it seriously“
“featured snippets“
“see that yourself on Google Trends“
“information gap“
“republished“
“content policies“

But first, here’s a little backstory for context, as well as a quote from my favorite band to help set the tone:

“Think for yourself.”
– TOOL

The context

Shortly after their roll out at Google I/O on May 14th, AI Overviews were a hot topic of discussion, but not always for desirable reasons.

Many of us SEOs who’d been using AIO’s precursor (SGE with Search Labs) since last year understood its answers were a work in progress.

Sure enough, in the days after AIO launched, users were spotting — and in some cases deliberately triggering — strange answers and sharing them on social media. (You can visit the Goog Enough X (Twitter) account to see examples.)

The story went viral, quickly reaching large news publications:

Screenshot from 5/26.

Even recently, I saw several stories on this topic in my Discover feed.

Many of the bad AIO examples I saw were from sarcastic UGC or blog content taken out of context.

On that point, if you haven’t read Professor Emily Bender’s recent newsletter about AI Overviews, she talks about us losing “the ability to situate the information in its context,” which is helpful context itself for this discussion:

We've all been laughing at the obvious fails from Google's AI Overviews feature, but there's a serious lesson in there too about how it disrupts the relational nature of information. More in the latest Mystery AI Hype Theater 3000 newsletter: https://t.co/32V63sHMhC
— @emilymbender@dair-community.social on Mastodon (@emilymbender) May 28, 2024

Some examples of bad AI Overviews were fabrications, but there were also objectively bad answers.

In the introduction to last week’s Hamsterdam SEO news recap, I included a quote from Sundar Pichai, Google’s CEO, which was from an interview he’d done with The Verge, where he was presented with an arguably plagiaristic AI Overviews example:

“The thing with Search — we handle billions of queries. You can absolutely find a query and hand it to me and say, ‘Could we have done better on that query?’ Yes, for sure. But in many cases, part of what is making people respond positively to AI Overviews is that the summary we are providing clearly adds value and helps them look at things they may not have otherwise thought about.” [Highlights added to all quotes.]
– Sundar Pichai to The Verge

I get the point about “billions of queries.”

As a searcher, I’d also agree AI Overviews can add value.

Sometimes their answers satisfy my search intent directly, other times they lead me to obscure documents, videos, or social posts I likely wouldn’t have found (at least as easily) in normal search results.

Often, AI Overviews don’t appear in my searches. (In fact, it seems they’re showing considerably less often these days.)

When AIOs show but they’re not helpful, I scroll past them. No harm, no foul.

That’s as a user, anyway.

When it comes to SEO strategies, I’ve heard it suggested AIO could replace web search, but I really don’t think so.

To me, the future of Search feels like improved personalization, such as with Google’s AI-organized result pages.

I often go back in my mind to the “slice and dice” and “explore” user journey analogies from Google’s Searchology 2009 presentation.

There are many types of search journeys, and AI Overviews are just one of many ways to get information — and earn visibility and qualified clicks for clients. 😉

Having said all that, whether or not AI Overviews are helpful in the aggregate hasn’t been the focus lately.

The critical press around odd AIO answers has gotten loudly amplified.

So Google responded.

Liz Reid, whose byline was on the original SGE post in The Keyword last year, is listed as the author in a recent post called “AI Overviews: About last week“:

If you’d like analysis of that post, Barry Schwartz wrote a summary in SEL, which is how I learned about it:

Google explains how it is improving its AI Overviews https://t.co/ElCJcY0URE
— Barry Schwartz (@rustybrick) May 31, 2024

The first thing I noticed about Liz Reid’s post is that it has no featured image. I respect the tone that sets given the topic, whether it was intentional or not:

The second detail I noticed is the title, “About last week,” speaks about the AI Overviews mishaps in the past tense.

It places them in a moment in time, like a bad affair that’s ended, rather than an ongoing concern. I found that clever, even if the final sentence mentions Google will “keep improving.”

But the third thing I noticed about the post, which will be the point of this article, was that it had several links.

I was reading it on my phone the first time and unable to dig into the links, so I had Siri set a reminder for me to revisit it.

Which brings us to our current journey.

Here are the links we’ll investigate again, with themes added, in case you want to jump ahead or bookmark anything:

“oddities” – featured snippets
“errors” – featured snippets
“other Search features” – knowledge panels
“take it seriously” – user feedback
“featured snippets” – featured snippets
“see that yourself on Google Trends” – eating rocks
“information gap” – data voids
“republished” – satire
“content policies” – Google Search

As you begin this journey, remember our quote from TOOL above, and keep everything in context. 😉

Enjoy the journey, my friends!

Link 1: “oddities”

The link on “oddities” in the introduction was the first link in the blog post.

Here’s the context it’s from:

“We know that people trust Google Search to provide accurate information, and they’ve never been shy about pointing out oddities or errors when they come across them — in our rankings or in other Search features.”
– Liz Reid, The Keyword (2024)

The link goes to a SER article called “Google Works To Fix How Many Legs Horses & Snakes Have” from April 30th, 2019:

Whoa, Nelly.

The context was odd responses in featured snippets, like miscounts of limbs on a horse and snake:

Source

The SER article quotes Danny Sullivan, Google’s Search Liaison, explaining why these weird results happen (the original tweet is deleted):

Here are a couple of things to keep in mind from what Danny said:

“It’s a search that’s not often done because people largely already know how many legs a horse has.”
“… there’s not a lot of authoritative content on the topic because who’s writing about something well known like this …” (Source.)

Now consider these quotes from Liz Reid’s 2024 article about AI Overviews:

“‘How many rocks should I eat?’ Prior to these screenshots going viral, practically no one asked Google that question.”
“There isn’t much web content that seriously contemplates that question, either.” (Source.)

We’ll dig into the topic of “data voids” later, but you can see how the same causes of weird featured snippets in 2019 may have contributed to strange AI Overviews in 2024.

How do these issues get fixed, or more accurately, “improved”?

Here’s Danny’s response from 2019 (from a deleted tweet):

“… when we encounter things like this, we look to understand why what got selected gave a false positive & work to improve so an entire range of queries gets better.” (Source.)

And here’s what Liz wrote in 2024:

“… we don’t simply ‘fix’ queries one by one, but we work on updates that can help broad sets of queries, including new ones that we haven’t seen yet.” (Source.)

It’s not about fixing specific bad examples but rather the larger systems that allow them. That’ll be important context for later when we talk about supervised learning and user feedback.

But what about those weird featured snippets from 2019 …

Did they get improved?

Here’s a recent featured snippet for the horse query:

The Wikipedia page below the answer is still the same URL as referenced in the 2019 SER screenshot.

We can also see its snippet bolds an answer of “five legs.”

The rest of this story is pretty wild.

It’s not an actual Wikipedia page about horses, but rather a humorous anecdote related to the Verifiability policy page.

The page’s disclaimers even say “it has not been thoroughly vetted by the community” and “Such material is not meant to be taken seriously”:

Google is maybe seeing the query as both navigational (for the page) and informational.

It’s probably also biased toward Wikipedia as a source, trusting its accuracy for a query that doesn’t have a lot of good content about it.

If you scroll through that SERP, the results are mostly lower quality.

There’s a Wikipedia page about “Limbs of the horse” in third position (between Quora and Reddit) that’s probably a better fit, but maybe it’s the secondary intent?

But why is the featured snippet not an excerpt from the page, I wonder?

It turns out, it is.

Looking at it closely, we can see a line demarcating the answer from the Wikipedia page.

It almost doesn’t seem like they’re affiliated at first, but they are.

I also suspected Google was pulling a consolidated answer from multiple authoritative sources. (We’ll read about this technology from MUM later.)

Bing also has this functionality, which it introduced in 2017 using deep learning (AI) called “Intelligent answers“:

Source

So where do the “four legs” answer or the original answer of 6 legs back in 2019 come from?

Well, there’s a section of that Wikipedia page that will give you a headache if you read it, but they’re both from there:

My guess is the answer is also tied to the knowledge graph for confirmation.

This is likely why we see “Horse / Legs” like breadcrumbs above the answer:

I think Google’s version of an intelligent answer (or a text excerpt reinforced with knowledge graph information) is true for similar types of queries today.

If we revisit the snake query, for example, we can see it gets the same treatment with “Snake / Limbs” as a breadcrumb:

There’s also a more prominent featured snippet that highlights the correct answer, unlike the horse example which bolded “five legs.”

We also see people also search for (PASF) and then another line.

The featured snippet source is no longer Quora, like it was back in 2019, but that original Quora page is still in the SERP, ranking 3rd.

The answer that originally caused the issue is still there, too, except we can see it clearly says the correct answer. Then there’s an answer below it that suggests snakes have two legs:

This harkens back to the point Professor Bender made in her newsletter regarding situating information “in its context” and considering the source.

When we look at Quora directly, it adds a wholly different context than seeing Google summarize the answer.

The more we distill information into simplified forms and remove the context and source (like a quote that seems to come from Quora or Reddit rather than a user of those platforms), the more nuance gets lost.

This topic goes deeper still, though.

In one of Danny’s tweets in the 2019 SER article, he linked to a blog post in The Keyword called “A reintroduction to Google’s featured snippets” from January 30th, 2018:

As I mentioned in my recent Hamsterdam Research article about USER-LLM, 2018 was a big year for Google’s AI advancements related to natural language processing (NLP).

Coming on the heels of the 2017 introduction of the transformers architecture, this was when BERT was introduced and Google submitted multiple patents about vectors.

So I’m not surprised featured snippets got “reintroduced” with ongoing improvements in 2018, if those timelines are related.

Let’s now explore the article for more context.

Danny first describes what a featured snippet is. Basically, the snippet comes before the link, or gets featured:

“With featured snippets, we reverse the usual format. We’re featuring the snippet, hence the ‘featured snippet’ name. We also generate featured snippets in a different way from our regular snippets, so that they’re easier to read. … It’s especially helpful for those on mobile or searching by voice.”
– Danny Sullivan, The Keyword (2018)

There’s also a carousel of examples, with one I think is relevant to our horse and snake examples earlier, showing the answer separately above the source:

In the context of featured snippets being helpful for mobile and voice search, Danny also explains how they’re not necessarily definitive:

“… featured snippets aren’t meant as a sole source of information. They’re part of an overall set of results we provide, giving people information from a wide range of sources.”

The post also refers to examples of poor-quality featured snippets, which feels similar to what we saw with AI Overviews recently. It also gives a similar explanation related to “fringe queries” and authoritativeness of results:

“Last year, we took deserved criticism for featured snippets that said things like ‘women are evil’ or that former U.S. President Barack Obama was planning a coup. We failed in these cases because we didn’t weigh the authoritativeness of results strongly enough for such rare and fringe queries.“

This also reminds me of debates over the Discussions and forums feature appearing for YMYL topics.

The post also provides more context on how improvements are made to featured snippets, including involving search quality raters:

“To improve, we launched an effort that included updates to our Search Quality Rater Guidelines to provide more detailed examples of low-quality webpages for raters to appropriately flag, which can include misleading information, unexpected offensive results, hoaxes and unsupported conspiracy theories. This work has helped our systems better identify when results are prone to low-quality content. If detected, we may opt not to show a featured snippet.“

In reference to a humorous featured snippet that said ancient Romans would use sundials for telling time at night, Danny explained:

“While the example above might give you a chuckle, we take issues like this seriously, as we do with any problems reported to us or that we spot internally. We study them and use those learnings to make improvements for featured snippets overall. In this case, it led to us providing a better response: water clocks.”

He then went on to explain about near matches and confidence. This section is pretty fascinating, so I’ll include it in full:

“Another improvement we’re considering is to better communicate when we give you a featured snippet that’s not exactly what you searched for but close enough that it helps you get to the information you seek.

For example, the original ‘sundial’ featured snippet above was actually a response for ‘How did Romans tell time.’ We displayed this near-match then because we didn’t have enough confidence to show a featured snippet specifically about how Romans told time at night. We knew sundials were used by Romans to tell time generally, because so many pages discussed this. How they told time at night was less discussed, so we had less data to make a firm connection.

Showing a near-match may seem odd at first glance, but we know in such cases that people often explore the source of a featured snippet and discover what they’re looking for. In this case, the page that the featured snippet originally came from did explain that Romans used water clocks to tell time at night. We just didn’t then have enough confidence then to display that information as a featured snippet.“

There’s also a section about showing multiple featured snippets when there isn’t a single answer or are several perspectives to consider.

I don’t recall seeing the featured snippet examples shown below in the wild, at least recently. They do kind of look similar to topic bubbles (filter pills) and “things to know” today, though:

The post lastly discusses cases when featured snippets contradict each other.

This is an interesting section, too, especially the mention of “perspectives,” a word we hear often from Google (like with the retired Perspectives filter or in reference to hidden gems):

“For instance, people who search for ‘are reptiles good pets’ should get the same featured snippet as ‘are reptiles bad pets’ since they are seeking the same information: how do reptiles rate as pets? However, the featured snippets we serve contradict each other.

This happens because sometimes our systems favor content that’s strongly aligned with what was asked. A page arguing that reptiles are good pets seems the best match for people who search about them being good. Similarly, a page arguing that reptiles are bad pets seems the best match for people who search about them being bad. We’re exploring solutions to this challenge, including showing multiple responses.

‘There are often legitimate diverse perspectives offered by publishers, and we want to provide users visibility and access into those perspectives from multiple sources,’ Matthew Gray, the software engineer who leads the featured snippets team, told me.”

For key takeaways, I think we can extrapolate some causes of odd featured snippets:

Uncommon questions
Lack of authoritative sources

And solutions:

Adjusting the systems across broad sets or ranges of queries
Quality raters feedback

It would seems the same applies for AI Overviews, perhaps with more user feedback, as well.

Let’s continue our journey to see if we learn more.

Link 2: “errors”

This link came right after oddities.

Here’s the context again:

“We know that people trust Google Search to provide accurate information, and they’ve never been shy about pointing out oddities or errors when they come across them — in our rankings or in other Search features.”
– Liz Reid, The Keyword (2024)

Oh wait …

It’s the same blog post that we just looked at!

“A reintroduction to Google’s featured snippets,” by Danny.

Funny coincidence.

But since we covered it pretty in-depth, I guess we’re on to link number 3.

Link 3: “other Search features”

This links context refers to the oddities or errors being “in our rankings or in other Search features.”

The link goes to a post called “A reintroduction to our Knowledge Graph and knowledge panels” by Danny Sullivan from May 20th, 2020:

What first came to mind upon seeing the mention of the knowledge graph is how the official Google Search Help documentation on featured snippets (which I was looking at by chance earlier) says they can be presented alongside KG information:

“You might find featured snippets on their own within overall search results, within the ‘People also ask’ section, or along with Knowledge Graph information.”
– Google Search Help, How Google’s featured snippets work

The blog post itself seems to focus on knowledge panels, but it also addresses inaccuracies as one of its main themes:

“Sometimes Google Search will show special boxes with information about people, places and things. We call these knowledge panels. … In this post, we’ll share more about how knowledge panels are automatically generated, how data for the Knowledge Graph is gathered and how we monitor and react to reports of incorrect information.“
– Danny Sullivan, The Keyword (2020)

Danny explains how the systems that gather facts for the knowledge graph are automatic, yet users can submit feedback for inaccuracies in knowledge panels:

“Inaccuracies in the Knowledge Graph can occasionally happen. Just as we have automatic systems that gather facts for the Knowledge Graph, we also have automatic systems designed to prevent inaccuracies from appearing. However, as with anything, the systems aren’t perfect. That’s why we also accept reports from anyone about issues.

Selecting the ‘Feedback’ link at the bottom of a knowledge panel or the three dots at the top of one on mobile brings up options to provide feedback to us.”

What I find interesting is how user feedback isn’t applied directly to make changes in knowledge panels.

Here’s the full explanation Danny gave:

“We analyze feedback like this to understand how any actual inaccuracies got past our systems, so that we can make improvements generally across the Knowledge Graph overall. We also remove inaccurate facts that come to our attention for violating our policies, especially prioritizing issues relating to public interest topics such as civic, medical, scientific, and historical issues or where there’s a risk of serious and immediate harm.”

I’m assuming knowledge panel content gets dynamically populated.

Thus, feedback wouldn’t change the results, but rather be used like labeled data for supervised training and system-wide improvements.

This is similar to the points Danny and Liz made earlier about improving “an entire range” or “broad sets” of queries rather than individual ones for featured snippets or AI Overviews, respectively.

Or such is my understanding.

The point about manually removing facts for policy violations is notable, as well.

Liz mentioned something similar about AIO in her post:

“In addition to these improvements, we’ve been vigilant in monitoring feedback and external reports, and taking action on the small number of AI Overviews that violate content policies.”
– Liz Reid, The Keyword (2024)

We’ll explore “content policies” a little later.

For now, it sounds like the takeaway is that knowledge panels, featured snippets, or AI Overview inaccuracies are addressed systematically through supervised machine learning by training on user feedback or other labeled examples, except when policy violations necessitate manual interventions.

I’m sure there are a lot of automatic improvements that get pushed live regularly, as well.

Here’s another quote from Liz’s recent post:

“From looking at examples from the past couple of weeks, we were able to determine patterns where we didn’t get it right, and we made more than a dozen technical improvements to our systems.”

More broadly, the topic of user feedback relates to our next link.

Link 4: “take it seriously”

In reference to oddities and errors in Search and other Search features, Liz mentioned that, when it comes to user feedback, they “take it seriously.”

Here’s the context:

“We hold ourselves to a high standard, as do our users, so we expect and appreciate the feedback, and take it seriously.”
– Liz Reid, The Keyword (2024)

The link goes to a post called “Our latest quality improvements for Search” by Ben Gomes from April 25th, 2017:

The context of the post seems to be people trying to “game” Google’s search results:

“In addition to trying to organize information, our algorithms have always had to grapple with individuals or systems seeking to ‘game’ our systems in order to appear higher in search results—using low-quality ‘content farms,’ hidden text and other deceptive practices. We’ve tackled these problems, and others over the years, by making regular updates to our algorithms and introducing other features that prevent people from gaming the system.”
– Ben Gomes, The Keyword (2017)

I suspect the theme of “gaming” Search refers to people deliberately aiming to trigger bad AI Overviews.

On that point, the next part of the post addresses “fake news”:

“Today, in a world where tens of thousands of pages are coming online every minute of every day, there are new ways that people try to game the system. The most high profile of these issues is the phenomenon of ‘fake news,’ where content on the web has contributed to the spread of blatantly misleading, low quality, offensive or downright false information. While this problem is different from issues in the past, our goal remains the same—to provide people with access to relevant information from the most reliable sources available.”

It also seems like user feedback is a big focus again, though, with Google using the feedback as a mechanism to improve its results’ quality:

“… we’re taking the next step toward continuing to surface more high-quality content from the web. This includes improvements in Search ranking, easier ways for people to provide direct feedback, and greater transparency around how Search works.”

Let’s talk about ranking results first, though.

Gomes mentions the scale of the issue to start, which seems to be a common theme in how Google responds to issues, putting them into perspective of what percentage of results are good vs. bad:

“Our algorithms help identify reliable sources from the hundreds of billions of pages in our index. However, it’s become very apparent that a small set of queries in our daily traffic (around 0.25 percent), have been returning offensive or clearly misleading content, which is not what people are looking for. To help prevent the spread of such content for this subset of queries, we’ve improved our evaluation methods and made algorithmic updates to surface more authoritative content.”

The word choice of “very apparent” likely comes from the amplified or viral effect that a small percentage of bad results can get.

His post also mentions quality raters as part of the solution. (I wonder if we’ll get updated search quality evaluator guidelines with more AI Overview examples soon?)

It’s also worth pointing out how Gomes says SQ ratings are used:

“These ratings don’t determine individual page rankings, but are used to help us gather data on the quality of our results and identify areas where we need to improve.“

I think some have taken statements like this from Google to mean that quality rater’s test results aren’t used in rankings.

I always took it to mean the quality tests weren’t used directly in rankings, similar to how a user’s feedback on a featured snippet, knowledge panel, or AI Overview wouldn’t directly impact its visibility.

I see clicks, Chrome data, and other user interaction data the same way.

They “do not directly impact ranking,” as Google says in its How Search Works guide.

However, such data contributes feedback for supervised learning that informs how ranking systems work.

And my guess is some of those systems can be updated in near-real time, like freshness systems for viral content, or non-bot click tests. 😉

I also suspect the role of that interaction data is decreasing, being replaced by LLM-based predictions, as we learned in Google’s post-trial debrief.

Gomes mentions how signals were adjusted with regard to ranking changes:

“We’ve adjusted our signals to help surface more authoritative pages and demote low-quality content, so that issues similar to the Holocaust denial results that we saw back in December are less likely to appear.”

I don’t suspect those adjustments are manual, like someone turning knobs. I suspect it’s all about good data in, good results out.

On that point, Gomes next talks about direct feedback tools.

I think it’s worth including this entire section:

“When you visit Google, we aim to speed up your experience with features like Autocomplete, which helps predict the searches you might be typing to quickly get to the info you need, and Featured Snippets, which shows a highlight of the information relevant to what you’re looking for at the top of your search results. The content that appears in these features is generated algorithmically and is a reflection of what people are searching for and what’s available on the web. This can sometimes lead to results that are unexpected, inaccurate or offensive. Starting today, we’re making it much easier for people to directly flag content that appears in both Autocomplete predictions and Featured Snippets. These new feedback mechanisms include clearly labeled categories so you can inform us directly if you find sensitive or unhelpful content. We plan to use this feedback to help improve our algorithms.“

His reference to “clearly labeled categories” is key, as it could directly influence labeled training data that Google’s systems could use for supervised learning.

This makes me think back to a 2013 interview Jeff Dean (Google DeepMind) gave about how Google used neural networks then, where one challenge was a preponderance of unlabeled data (unsupervised learning). User feedback would be one solution for that.

In 2017 when Gomes was writing this post, the issues were getting feedback to improve autocomplete and featured snippets.

However, we can see how Google solicits feedback for AI Overviews today, which could provide labeled data for supervised learning and improvements.

Here’s an AI Overview for “what is supervised learning” (which itself is worth reading):

If we wish to give feedback, we have thumbs up or down buttons in the lower corner:

Or we can click the three dots in the upper corner to enable direct feedback:

We can then elect to give general feedback or click one of the highlighted sections above to give feedback on that section.

In either case, we’re prompted with this form:

This could lead to both labeled data and unstructured text, which an LLM could process.

The last section of Gomes’ 2017 post is about transparency.

He specifically mentions the “How Search Works site” that we just mentioned above, which “includes a description of how Google ranking systems sort through hundreds of billions of pages to return your results, as well as an overview of our user testing process.”

That site has also changed a bit in recent years, as we’ll see in the next section.

Link 5: “featured snippets”

This fifth link is from the next section of Liz Reid’s post, about how AI Overviews work.

If you’re familiar with RAG models already, you can skip the next few paragraphs.

However, I know not everyone is familiar with the differences between a search engine, LLM, grounded-LLM, and RAG model.

In short, search engines index documents and retrieve them based on your query. Those featured snippets we see are pulling text from actual content.

Large language models (LLMs), on the other hand, train on lots of examples of information and then learn patterns from that to predict answers based on the probabilities of the next word. These responses aren’t pulled from actual content.

Since LLMs aren’t retrieving information like a search engine, but are instead generating text by predicting the most likely sequence of words based on their training data and the given prompt, they can create made-up answers (hallucinations).

One way to prevent hallucinations is by grounding LLMs with additional knowledge bases.

One grounding technique is called retrieval augmented generation (RAG), where a model combines an LLM and a search engine (using vector embeddings).

AI Overviews are a RAG model that use an LLM to generate an answer, which is supported by relevant search results.

That is how Liz explains AI Overviews in her post:

“AI Overviews work very differently than chatbots and other LLM products that people may have tried out. They’re not simply generating an output based on training data. While AI Overviews are powered by a customized language model, the model is integrated with our core web ranking systems and designed to carry out traditional ‘search’ tasks, like identifying relevant, high-quality results from our index. That’s why AI Overviews don’t just provide text output, but include relevant links so people can explore further. Because accuracy is paramount in Search, AI Overviews are built to only show information that is backed up by top web results.

This means that AI Overviews generally don’t ‘hallucinate’ or make things up in the ways that other LLM products might. When AI Overviews get it wrong, it’s usually for other reasons: misinterpreting queries, misinterpreting a nuance of language on the web, or not having a lot of great information available. (These are challenges that occur with other Search features too.)”
– Liz Reid, The Keyword (2024)

Her next paragraph includes our fifth link with the anchor text “featured snippets,” in reference to how AI Overviews have similar accuracy rates.

Here’s the full context:

“This approach is highly effective. Overall, our tests show that our accuracy rate for AI Overviews is on par with another popular feature in Search — featured snippets — which also uses AI systems to identify and show key info with links to web content.“

AI systems contribute to featured snippets just as they do AI Overviews. Which systems is anyone’s guess. I’m assuming BERT is used in a lot of stuff, for example.

The link itself goes to the Google Search Help document that we mentioned earlier, “How Google’s featured snippets work“:

Since we’ve talked about how featured snippets can appear in regular search results, alongside knowledge graph information, or people also ask, let’s see what other interesting excerpts the help doc has.

It mentions when featured snippets are triggered:

“We display featured snippets when our systems determine this format will help people more easily discover what they’re seeking, both from the description about the page and when they click on the link to read the page itself. They’re especially helpful for those on mobile or searching by voice.”
-Google Search Help, How Google’s featured snippets work

Then it mentions how featured snippets are chosen:

“Featured snippets come from web search listings. Google’s automated systems determine whether a page would make a good featured snippet to highlight for a specific search request. Your feedback helps us improve our search algorithms and the quality of your search results.“

The reference to “web search listings” is interesting. Featured snippets can be video results, so I’m thinking this refers to the main search results, in general.

The mention of user feedback helping improve search algorithms is also notable.

We might naturally assume “feedback” comes from users actively providing it, like a labeled response in a form.

However, feedback could be normal user interaction data, like clicks and hovers, too.

Again, it appears feedback contributes to system-wide improvements (algorithmic training) for featured snippets, but manual removals are still employed for policy violations, just as with knowledge panels.

Featured snippets can be removed when they don’t follow content policies. User feedback can help here, as well:

“Our automated systems are designed not to show featured snippets that don’t follow our policies. However, since the scale of search is so large, we also rely on reports from our users. Your reports help us improve our search algorithms to avoid issues in the future. We manually remove any reported featured snippets if we find that they don’t follow our policies.”

The link on “search algorithms” above is quite interesting.

It goes to a How Search Works page called “algorithms,” which I’d never heard of.

It turns out it’s been 302 redirected to the “ranking results” page:

Looking at both pages, they’re clearly the same.

One difference is the introduction of the old algorithm page has an extra paragraph about quality raters that links to the SQR guidelines, which the new ranking results page doesn’t have.

Old introduction (2022 screenshot):

New introduction:

However, the part about quality raters seems to have been moved to a different page called “rigorous testing.”

So the “algorithms” page basically got divided into two pages for ranking results and testing.

There’s also a page for “detecting spam” that wasn’t there before.

Meanwhile, the old site had a page for “useful responses,” which talked about different types of results, like featured snippets, the knowledge graph, and direct answers.

Why does it matter how the How Search Works page has evolved?

Well, one takeaway could be that Google now frames “ranking results” and “testing” (live experiments, search quality tests, etc.) in different contexts, and that’s kind of a recent change.

The rest of Google’s featured snippet help page discusses content policies.

I was planning to skip it, but it might be a critical part of this discussion, actually, especially if we’re talking about information quality in AI Overviews.

Here’s the full policies section for featured snippets:

“To help ensure featured snippets are a helpful experience for everyone, we have systems in place to prevent showing those that are in violation of Google Search’s overall policies or these policies for Search features:

Dangerous content

Deceptive practices

Harassing content

Hateful content

Manipulated media

Medical content

Sexually explicit content

Terrorist content

Violence and gore

Vulgar language and profanity

Learn more about these Content policies for Google Search.

Featured snippets also has this additional feature-specific policy that is applicable:

Contradicting consensus on public interest topics: Featured snippets about public interest content — including many civic, medical, scientific and historical issues — should not contradict well-established or expert consensus support. We may remove information presented as fact that lacks supporting evidence if it accuses individuals or groups of serious malevolent acts.

Tip: These policies only apply to what can appear as a featured snippet. They do not apply to web search listings nor cause those to be removed.”

As it relates back to our topic of AI Overviews, Liz Reid mentioned in her post how Google “found a content policy violation on less than one in every 7 million unique queries on which AI Overviews appeared.”

We’ll get into content policies a little later, but I suspect the bigger issue are data voids.

Link 6: “see that yourself on Google Trends”

See for yourself how obscure this search is.

That’s kind of the gist of this link.

It’s interesting to contemplate whether Google would have shipped AI Overviews in its current form if competing products like Microsoft Copilot, OpenAI’s ChatGPT, or Perplexity didn’t exist.

On the one hand, history does seem to suggest some issues could have been anticipated.

For instance, recall what Danny wrote in 2018 about featured snippets (link 2 above):

“Last year, we took deserved criticism for featured snippets that said things like ‘women are evil’ or that former U.S. President Barack Obama was planning a coup. We failed in these cases because we didn’t weigh the authoritativeness of results strongly enough for such rare and fringe queries.“
– Danny Sullivan, The Keyword (2018)

Then again, perhaps this is new territory from an engineering perspective.

I remember when the New Bing (now Copilot) launched in early 2023, there were viral stories of similar oddities, like when it told a reporter to leave their spouse.

It seems like we’re living in a different era today, which means fewer guardrails and more calculated risks with generative AI products.

Here’s some more context from Liz’s recent post about testing on AIO:

“In addition to designing AI Overviews to optimize for accuracy, we tested the feature extensively before launch. This included robust red-teaming efforts, evaluations with samples of typical user queries and tests on a proportion of search traffic to see how it performed. But there’s nothing quite like having millions of people using the feature with many novel searches. We’ve also seen nonsensical new searches, seemingly aimed at producing erroneous results.

Separately, there have been a large number of faked screenshots shared widely. … Those AI Overviews never appeared. …

But some odd, inaccurate or unhelpful AI Overviews certainly did show up. And while these were generally for queries that people don’t commonly do, it highlighted some specific areas that we needed to improve.”
– Liz Reid, The Keyword (2024)

There is a point to be made about uncommon queries, at least for the “rock” example.

Here’s the context for the next link:

“One area we identified was our ability to interpret nonsensical queries and satirical content. Let’s take a look at an example: ‘How many rocks should I eat?’ Prior to these screenshots going viral, practically no one asked Google that question. You can see that yourself on Google Trends.”

It goes to the Google Trends page for the now infamous query “how many rocks should I eat” in the United States over the past 12 months:

This is pretty smart, in my opinion. Let the data do the talking.

It doesn’t excuse the answer, but it does put its obscurity in context.

The top states are interesting, as well. Maine is pretty high for some reason, and the others are coastal, too.

It also seems like Florida is pretty normal, but we know that can’t be true.

Just kidding.

But in all seriousness, please don’t eat rocks.

Unless it’s as a funny corporate gift …

Along with edible glue …

Ok, ok.

Now on to our next link, which is about a seriously interesting topic that I’ve been waiting for: data voids.

Link 7: “information gap”

This is in reference to the query above about eating rocks, or our example of how many legs a horse has earlier.

For such queries, there aren’t necessarily great results.

Here’s the context:

“There isn’t much web content that seriously contemplates that question, either. This is what is often called a ‘data void’ or ‘information gap,’ where there’s a limited amount of high quality content about a topic.”
– Liz Reid, The Keyword (2024)

The link on “information gap” goes to a blog post called “New ways we’re helping you find high-quality information” by Pandu Nayak from August 11th, 2022:

For you Google blog post fans out there, you’ll recognize this as the MUM post (Multitask Unified Model) for featured snippets.

Pandu first sets out the context for the post itself:

“We have deeply invested in both information quality and information literacy on Google Search and News, and today we have a few new developments about this important work.”
– Pandu Nayak, The Keyword (2022)

You might recall earlier when we looked at the help documentation for Featured Snippets that they had their own “feature-specific policy” around contradicting consensus.

This relates the innovations provided from MUM, a multimodal AI model based on the T5 framework with multi-headed attention (see here):

“By using our latest AI model, Multitask Unified Model (MUM), our systems can now understand the notion of consensus, which is when multiple high-quality sources on the web all agree on the same fact.”

Remember in our first example for horses, when we suggested Google may be using something similar to Bing’s Intelligent Answers?

Pandu explains they are:

“Our systems can check snippet callouts (the word or words called out above the featured snippet in a larger font) against other high-quality sources on the web, to see if there’s a general consensus for that callout, even if sources use different words or concepts to describe the same thing. We’ve found that this consensus-based technique has meaningfully improved the quality and helpfulness of featured snippet callouts.”

The example from the article is the similar type of featured snippet as we saw for the horse and snake limbs earlier.

More the point of this discussion, Pandu references how “AI models” are helping prevent featured snippets from generating on nonsensical queries, or what he calls “false premises”:

“AI models are also helping our systems understand when a featured snippet might not be the most helpful way to present information. This is particularly helpful for questions where there is no answer: for example, a recent search for ‘when did snoopy assassinate Abraham Lincoln’ provided a snippet highlighting an accurate date and information about Lincoln’s assassination, but this clearly isn’t the most helpful way to display this result.

We’ve trained our systems to get better at detecting these sorts of false premises, which are not very common, but are cases where it’s not helpful to show a featured snippet. We’ve reduced the triggering of featured snippets in these cases by 40% with this update.“

I suspect the same will be true for AI Overviews.

On the topic of information literacy, Pandu mentions features like “Fact Check Explorer, Reverse image search, and About this result.”

The reason it was linked in Liz Reid’s post, though, is likely the mention of “content advisories for information gaps.”

You’ve probably seen this if you’ve ever searched for a topic where Google said it couldn’t find many “great results” for the search.

This was originally designed for breaking news, but was extended to all results with data voids:

“Sometimes interest in a breaking news topic travels faster than facts, or there isn’t enough reliable information online about a given subject. Information literacy experts often refer to these situations as data voids. To address these, we show content advisories in situations when a topic is rapidly evolving, indicating that it might be best to check back later when more sources are available.

Now we’re expanding content advisories to searches where our systems don’t have high confidence in the overall quality of the results available for the search. This doesn’t mean that no helpful information is available, or that a particular result is low-quality. These notices provide context about the whole set of results on the page, and you can always see the results for your query, even when the advisory is present.”

The rest of the post is about educating people about misinformation.

A noble goal. 😉

Two more links to go!

The next link is from the same section of Liz’s post, referencing how one source can crawl out of the void.

Link 8: “republished”

This is kind of a funny one, and I’ll explain why.

First, here’s the context:

“However, in this case, there is satirical content on this topic … that also happened to be republished on a geological software provider’s website. So when someone put that question into Search, an AI Overview appeared that faithfully linked to one of the only websites that tackled the question.”
– Liz Reid, The Keyword (2024)

The original story was in The Onion.

So, obviously satire.

But that Onion article had a photo of a geologist, who was associated with the company of the “geological software provider’s website.”

So naturally, they took part of the article and put it on their website’s blog.

Well, that took “eating rocks” from the domain of a satire website to the domain of a real scientific site with no other satire on it.

They’ve since added a section to their page explaining the situation, in good fun:

As I’m guessing they’ve gotten a ton of traffic from this.

Just look at the page’s referring domains shoot up (reminds me of the Google Trends screenshot earlier):

I’ve worked on sites that have gotten viral traffic before. It doesn’t convert well.

Hopefully, there were at least some geologists in the mix. 😉

Speaking of which, here’s why the situation is extra funny to me …

My father is a geologist, and I grew up around people in that profession.

Here’s me at Golden Sunlight mine in rural Montana in the mid-1990s:

Many Geologists I’ve known had a rich sense of humor:

“Geologists make the bedrock.“
– A sign I saw on an office desk as a kid that I’ll never forget, and a saying I later heard from a geology professor at NAU.

Anyway, that’s definitely an interesting edge case of an AI Overview mishap.

As for how improvements were metered out, Liz’s post explains (as we saw with featured snippets or knowledge panels earlier), that such content is improved with feedback algorithmically, unless there’s a content policy violation:

“We worked quickly to address these issues, either through improvements to our algorithms or through established processes to remove responses that don’t comply with our policies.”
– Liz Reid, The Keyword (2024)

We’ll explore content policies in our final link below.

But first, wouldn’t it have been kind of Darwinian to leave the rock-eating example live and see what happens?

Sorry Google, don’t put that in an AI Overview. It’s sarcasm. 😉

However, there’s a real point to be made here about context, and whether distilling information from its original context is “helpful” or not for a user.

With featured snippets, the source is more easily identifiable than in a typical AI Overview, which can put more distance between information and its source, let alone its original context.

Search also isn’t an AI chatbot or assistant. It’s a venue where users have come to anticipate vetted information.

The context of seeing LLM-based information in Search is quite different from Gemini or ChatGPT, and the same E-E-A-T criteria and quality should apply uniformly.

According to Liz’s post, it does seem like the steps Google has taken will address some of these issues:

“Here’s a sample of what we’ve done so far:

We built better detection mechanisms for nonsensical queries that shouldn’t show an AI Overview, and limited the inclusion of satire and humor content.

We updated our systems to limit the use of user-generated content in responses that could offer misleading advice.

We added triggering restrictions for queries where AI Overviews were not proving to be as helpful.

For topics like news and health, we already have strong guardrails in place. For example, we aim to not show AI Overviews for hard news topics, where freshness and factuality are important. In the case of health, we launched additional triggering refinements to enhance our quality protections.“

That covers the algorithmic updates. Some of them also harken back to adjustments we saw explained earlier for featured snippets.

Let’s now look at our final link about manual interventions.

Link 9: “content policies”

This is an important topic to understand for SEOs.

By and large, you won’t have to worry about these, but Google does have policies in place that can prevent content from appearing in Search.

In the context of Liz’s post, it speaks to the manual removal of AI Overviews for violating these policies (as opposed to the “more than a dozen technical improvements” at a system level).

Here’s the full context:

“In addition to these improvements, we’ve been vigilant in monitoring feedback and external reports, and taking action on the small number of AI Overviews that violate content policies.”
– Liz Reid, The Keyword (2024)

The link on “content policies” takes us to a Google Search Help documentation called “Content policies for Google Search”:

What’s interesting is this page has many drop downs, but when you visit it from the AIO blog post, one of them is already open for you.

It’s the one on “Search features policies”:

I’m a little relieved, because I really didn’t want to go through the whole document. Its long!

As you can see in the screenshot above, though, this section covers policies that “apply to many of our search features.”

Here’s an important detail to note:

“Even though these features and the content within them is automatically generated as with web results, how they’re presented might be interpreted as having greater quality or credibility than web results.“
– Google Search Help, Content policies for Google Search

If a poor result is called out at the top of the SERP, above all other results, and even highlighted in large or bold text within a featured snippet, knowledge panel, or AI Overview, it’s going to imply that it’s more credible and trustworthy for users.

This is the double-edged sword of AI Overviews.

Often they can provide helpful callouts, but sometimes they bestow credibility on a source beyond what it merits.

One thing is for sure though, AI Overviews are here to stay, so buckle up for the long haul:

“At the scale of the web, with billions of queries coming in every day, there are bound to be some oddities and errors. We’ve learned a lot over the past 25 years about how to build and maintain a high-quality search experience, including how to learn from these errors to make Search better for everyone. We’ll keep improving when and how we show AI Overviews and strengthening our protections, including for edge cases, and we’re very grateful for the ongoing feedback.“
– Liz Reid, The Keyword (2024)

Now if we could only get some Google Search Console data for them, we’d be in business! 🙂

Outro

I hope you’ve enjoyed this journey of going through the links in Liz Reid’s latest blog post about AI Overviews.

My general takeaways are that, just as Google plays cat-and-mouse with spammers and AI-generated content, so too does it contend with the engineering challenges from sarcasm, nuance, and data voids.

One thing is for sure, though, their goal is to surface the most helpful content for a user’s satisfaction, and the methods to achieve that are always improving.

Regardless of “ranking factors” or anything else that might get you thinking about gaming Google’s ranking systems, keep last year’s HCU impacts in mind.

When Danny Sullivan said the way to stay ahead of the algorithms is to chase what people like instead, I believed him.

It’s advice I know my clients are happy for, and so is this “small personal site” you’re reading. 😉

Ultimately, it’s not about clicks, impressions, or backlinks — it’s about brand, conversions, and business goals.

Thanks for going on this journey with me!

Check out some related posts below, if you’d like.

Until next time, enjoy the vibes:

Thanks for reading. Happy optimizing! 🙂

SEO Strategist and Consultant

Ethan Lazuk

What Are Google’s Reliable Information Systems & How Might They Be Used in Discussions and Forums

Let’s take a look at Google’s Discussions and forums and how Reliable information systems may play a role in its visibility for YMYL queries.

April 4, 2024October 5, 2024

You’ve Got Questions About Google Search’s Ranking Systems? That’s Understandable

After creating an article about Google Search ranking volatility, I was left with some questions about ranking systems. This outlines my pursuit of answers.

December 13, 2023October 5, 2024

Excerpts from Google’s Anti-Trust Trial Debrief: As an SEO, Here’s What Stood Out to Me from USA v. Google LLC (Document #833)

In February, Google filed a debrief known as “Document #833” after its anti-trust trial (USA v. Google). As an SEO, here are 55 screenshots of…

March 1, 2024October 5, 2024