Ethan Lazuk

SEO & marketing professional.


“Book Me A Trip To Washington, DC”: Revisiting Google Research’s Ambitions with Neural Networks in 2013 a Decade Later, After Google I/O 2024 (Hamsterdam History)

By Ethan Lazuk

Last updated:

Hamsterdam History Part 8 Google Research Ambitions in 2013 with Forrest Gump DC Background.

Welcome to a new week of Hamsterdam History, where we look at vintage SEO articles to pay homage to their contributors, get historical context, and learn how things have changed since.

This week, we’re going back to 2013 to talk about a trip to Washington, DC.

I’ve visited and also lived in DC a few times and love the city’s culture. But that’s not the trip we’re talking about.

We’ll be looking at a Search Engine Land article by Matt McGee published on August 13th, 2013, called, Google Scientist: We Want To Be Able To Respond To A Query Like “Book Me A Trip To Washington, DC”:

Google Scientist: We Want To Be Able To Respond To A Query Like "Book Me A Trip To Washington, DC circa 2013 on WayBack Machine.

We’ll also have quotes from the interview it references, Google scientist Jeff Dean on how neural networks are improving everything Google does, published in the Puget Sound Business Journal on August 12th, 2013:

Google scientist Jeff Dean on how neural networks are improving everything Google does circa 2013 on WayBack Machine.

Meanwhile, Google I/O 2024 happened this week, where many of the goals mentioned in the interview and SEL article from 2013 appeared in some form.

Comparing those past goals to our new present reality will be the focus of this history lesson.

Let’s start with the article itself.

SEL “Book Me A Trip To Washington, DC” article (circa 2013)

The article discusses several ambitions Google Research scientists had in 2013.

The opening paragraph mentions the first ambition of trip planning:

Will the day come that Google can successfully respond when you tell it something like, ‘Book me a trip to Washington, DC’ — walking you through all the queries and answers needed to complete such a complex request?”

– SEL (2013)

The next ambition was a voice-only search experience:

“Will the day come when we’re using devices that only offer voice-based search?”

And after that came the ambition of live visual search:

“Will the day come when visual search happens continuously via the camera on Google Glass?”

As Matt described at the time:

“Those are some of the search-based challenges that Google is working on, and they’re discussed in a recent interview that offers a peek at what Google thinks search will be like in the next decade. Or maybe ‘plans for search to be like’ would be a more accurate phrase.”

The context for the article was a recent interview Google Research Fellow Jeff Dean gave with the Puget Sound Business Journal.

You may be familiar with Jeff.

He’s the fellow (pun intended) who is currently Chief Scientist at Google DeepMind (the fusion of Google Research and DeepMind that happened in April of last year) and tweets out announcements and cool stuff like this:

As that video demonstrates, AI has evolved considerably over just the last few years.

Let’s look back even further at some of the main themes from Jeff’s 2013 interview (as highlighted in SEL), and compare them to what we saw at Google I/O this year.

Booking a trip

I don’t think we’re there quite yet, at least by using a Google Search box.

However, let’s review some of the “agentive” capabilities of Google’s multimodal AI model (Gemini).

If you have a trip planned already, for example, Gemini can help build you a personalized itinerary:

Here’s the full video showing what steps this might involve:

And here’s the final screenshot, where the AI agent found the flight and hotel information in Gmail:

Screenshot from Gemini travel planning demo.

But what about completing actions, like actually booking a trip?

Well, there was also this I/O demo of an AI agent returning shoes, which included scheduling a pickup from UPS:

Gemini AI Agent scheduling a UPS pickup.

If it can do that, I’m sure it can schedule a flight soon enough.

After all, we saw at Google Cloud Next in April how an AI customer agent (for an ecommerce website) was able to make a purchase from a voice command:

Customer Agent from Google Cloud Next demo.

Perhaps a similar customer agent working for an airline could do the same for a ticket, or maybe Google will one day have an agent that operates within Gemini-powered search results directly:

Speaking of voice commands …

Voice search

This capability already existed when Jeff did the interview. He was speaking about a voice-only experience.

Most LLM-powered conversational devices today, like the Rabbit R1, still have a keyboard available. The Humane AI pin perhaps gets closest to this reality. We’ll have to see where that all leads.

That said, let’s look at some of the history of voice search at Google and how it’s progressed.

Google rolled out voice search in 2008 to its smartphone app and in June 2011 to desktop.

Google Speech Recognition article.

Interestingly, the article also says this innovation was partly driven by “Google’s ability to quickly process large data sets via MapReduce.”

You might recall MapReduce was mentioned in the recent Hamsterdam Research article on TeraHAC, a graph clustering algorithm for large datasets.

To recap, MapReduce is a programming model that allows for parallelism, meaning processing vast amounts of data across multiple machines (computers).

MapReduce diagram
Source

It has two phases.

The map phase divides input data into chunks for independent processing on individual machines, using a mapper function.

Then the reduce phase aggregates the results, using a reducer function.

It’s an interesting insight into how Google might process large datasets, and it clearly helped enable voice search capabilities. (Dean even mentioned parallelism in the interview, as we’ll see later.)

The original algorithms that interpreted voice inputs into text were trained from models based on speech patterns, and each language had its own statistical model that used anonymized word databases collected from Google products.

It used a recognition system that had three separate models, including an acoustic model, pronunciation model, and language model, all trained separately and then composed into one “gigantic search graph.”

The acoustic model identified phonemes (units of sound) in audio samples. It took a waveform, chunked it into small time-segments, implemented a frequency analysis, and output a “probability distribution over all of the triphone-states for that particular input.” A waveform frequency vector matched with a probability distribution then identified which phonemes are more likely to be in the sample.

The pronunciation model took the phonemic probability distributions from the acoustic model and checked them “against a massive lexicon defining valid sequences of phonemes for the words of a specific language.”

Then the language model calculated “the frequencies of all word sequences that can be formed” out of the other two models.

A final search algorithm picked the valid word sequence that had the highest frequency of occurrence in the language.

Here’s an example:

“User says, ‘My dog ran away’. Audio/Pronunciation models identified various valid possibilities: My or Mai, Dog or Dock, Ran or Ram, Away or A Whey. Language model looks at combos and figures that it has seen “my dog ran away” much more frequently than ‘Mai dock ram a whey’ or ‘my dock ran away,’ so it constrains it to that combination.”

– Google Careers article on speech recognition

The models were built on a corpus of data that included “all searches typed into YouTube or Google Maps, or simply entered into Google.com.”

So we can see why techniques for working on large datasets apply.

And that was 2008-era technology.

Things have moved further since.

For example, in the 2013 SEL article, Matt mentions the rollout of Google’s conversational search feature:

“Google launched conversational search on its Chrome browser earlier this year, and the product is smart enough to follow a spoken sequence of searches like ‘how old is Barack Obama’ followed by ‘how tall is he.’ It recognizes that the ‘he’ in the second spoken query refers to Obama.

And earlier this year, as Marie Haynes pointed out, Google made a change to its Assistant, where the microphone icon would now trigger search results.

In terms of AI chatbots, ChatGPT introduced voice chat on September 25th, 2023. Those capabilities were bolstered this week by the introduction of its new natively multimodal GPT-4o model:

Gemini was natively multimodal already, but it didn’t have the same voice capabilities.

But as we learned at I/O, it soon will:

We’ve already known for 5+ years that voice assistants have been able to read featured snippets.

Google also tested the ability to play audio of featured snippets in a SERP.

Will we see chat capabilities where agents read AI Overviews, or even use dialogue for follow-ups in the SERP?

Visual search

In addition to the voice search microphone icon, the camera icon for Google Lens is present in search bars and enables searching with images.

Circle to search was another step in that direction, which is itself evolving:

However, soon more innovations in visual search will be happening with video inputs:

Could that one day happen in real-time?

Beyond search, we also saw more visual capabilities presented with Project Astra:

In the 2013 interview, Jeff referenced Google Glass.

This was a technology announced in 2012 and presented to testers in 2013, so it makes sense he would mention it.

Google Glass was considered a flop at the time, although more recently its enterprise applications have been promoted:

Google Glass enterprise article.

However, we’ve since seen more stylish innovations on this theme, like RayBan’s Meta smart glasses.

The Project Astra video also had this teaser:

Are we there yet?

It’s just over a decade since “Book Me A Trip To Washington, DC” was a dream query.

It may not be directly solvable with Google Search today, but as this Google Search Labs demo below shows, we’re inching closer:

AI Overview of a trip itinerary for Washington, DC.

Meanwhile, Google I/O showed how visual and voice inputs have advanced, both within search and for AI agents.

This is all made possible by AI.

But even back in 2013, AI was already a central part of the picture.

The AI connection

Matt writes in the 2013 SEL article how Jeff Dean had been with Google since 1999 and worked “in the Systems Infrastructure Group, where they do things like apply machine learning to search (and pretty much all of Google’s other products).”

“It’s pretty high-level stuff; you won’t find anything about keyword research or SEO or even the basics like ’10 blue links’ on a search results page. But you will find, for example, a peek at how Google is using machine learning to build out the Knowledge Graph.

We have the start of being able to do a kind of mixing of supervised and unsupervised learning, and if we can get that working well, that will be pretty important. In almost all cases you don’t have as much labeled data as you’d really like. And being able to take advantage of the unlabeled data would probably improve our performance by an order of magnitude on the metrics we care about. You’re always going to have 100x, 1000x as much unlabeled data as labeled data, so being able to use that is going to be really important.’

– SEL (2013)

The mention of unlabeled training data is interesting.

I mentioned TeraHAC earlier. That’s a recent graph clustering algorithm that’s built on an over 10-year history in the field of clustering algorithms, which are used for unlabeled datasets.

More broadly, in a recent Hamsterdam Research opinion piece, Doing the Global Minimum: Thinking About SEO More in the Context of Neural Network Architectures, we went more in-depth on different types of machine learning, specifically neural networks.

What I hoped to do in that piece was highlight how the term “machine learning” is akin to describing SEO as “optimizing websites.”

There’s a lot to unpack.

In the Puget Sound Business Journal interview, for example, the author mentions:

“As a Google Research Fellow, Dean has been working on ways to use machine learning and deep neural networks to solve some of the toughest problems Google has, such as natural language processing, speech recognition, and computer vision.”

– Puget Sound Business Journal (2013)

Deep neural networks (DNNs) are a subset of machine learning that include models like recurrent neural networks (RNNs), long short-term memory (LSTM), and convolutional neural networks (CNNs).

These are the types of models that would likely have been used for natural language processing (NLP), speech recognition, or computer vision tasks in 2013.

What I also find interesting, though, is the “conversational search problem” Jeff describes in the interview:

“Dean goes on to say that his team is working on ‘big problems’ like being able to use voice and predictive search to answer queries like ‘Please book me a trip to Washington, DC.’

‘That’s a very high-level set of instructions. And if you’re a human, you’d ask me a bunch of follow-up questions, ‘What hotel do you want to stay at?’ ‘Do you mind a layover?’ – that sort of thing. I don’t think we have a good idea of how to break it down into a set of follow-up questions to make a manageable process for a computer to solve that problem. The search team often talks about this as the “conversational search problem.’

– Search Engine Land (2013)

This was four years before the transformer architecture was introduced by Google Research (where Jeff worked) in 2017, which soon replaced RNNs for many NLP tasks and led to the development of modern LLMs.

Of course, language models were used long before transformers.

Remember our earlier example of speech recognition technology, where a language model was used?

In the interview, Jeff mentions more of that history:

“We in our group are trying to do several things. One is we are trying to build very robust systems that can scale and be trained on very large sets of data. We’re also looking to apply parallelism in various ways to train those models more quickly, you know, train up a lot of models at once and pick the best one. …

We’ve been applying neural networks to several different problems. One of our earliest collaborations was with the speech group. Eventually speech recognition grew up into two main applications. First it was going from the raw waveform in a short-time frame, you know, what the sound actually looks like, and from that you try to predict what the small piece of a word is being uttered at that second.

And then there’s a model that comes in after that that tries to stitch all of those temporal representations of sounds together into words, so if you said ‘Buh Ah Da’ the model would construct that into the word Bad. That’s called a language model where you are stitching these pieces together to get a full word, and also stitching those words together to get phrases, sentences, and so on.”

– Jeff Dean, Puget Sound Business Journal (2013)

Another theme I found interesting was that of AI-driven systems vs. hand-engineered (more rule-based) algorithms (a topic Alan Kent touched on, mentioned in a past history lesson, and I also wrote about in relation to the old helpful content system in another blog post, if it’s of interest):

“… neural nets (networks of functions that behave like neurons in the human brain) have been around for a long time, since the late ’60s, but they’re coming back into vogue for several reasons. One is that a lot of machine learning systems require you to hand engineer a bunch of models they think are predictive. And that works for some small to moderate problems, but for low-level perceptual problems it’s often not clear, even for a real domain expert, what features you should look at that would be very predictive of the correct end result. So neural nets, especially deep ones, is that they build features that describe the data well automatically, without humans having to get involved. So that is one big advantage.”

Jeff also speaks in the interview about an early form of what we described as “transfer learning” for DNNs in the neural network opinion piece, although here it’s within the same network:

Until four or five years ago, it was impossible to get more than like a three-layer network to train well because, since each computer neuron is a non-linear function, as you get deeper and deeper its output gets more and more irregular. It’s a very difficult optimization process the deeper the network is. But people have now figured out ways around that. You can pre-train on the first layer, do your optimization there, get it into a good state, and then add a layer.”

Another interesting topic he raises is vector embeddings (a basis for semantic search):

“… it turns out you can represent a lot of textual problems as neural net problems. So for example, we can build a dimensional matrix representing different words and grouping them by how similar they are. So for example the word ‘iPhone’ is going to be much closer to ‘smartphone’ than other words. And we can use that to start to understand what you’re searching for on Google no matter what you type. Like, if you type in smartphone, you’re probably still expecting to see iPhones in the results.”

Later in the interview, he also elaborated on current limitations for search at Google, while also highlighting where things would head, which provides some interesting hints at future topics (like BERT, neural matching, or even AI Overviews):

I think we will have a much better handle on text understanding, as well. You see the very slightest glimmer of that in word vectors, and what we’d like to get to where we have higher level understanding than just words. If we could get to the point where we understand sentences, that will really be quite powerful. So if two sentences mean the same thing but are written very differently, and we are able to tell that, that would be really powerful. Because then you do sort of understand the text at some level because you can paraphrase it.

Once you understand text, it changes the game. Because today what we do for search, for example, we’re not really understanding at a deep human level the text we see on webpages, we’re looking for the word you searched for, we’re looking for related words and we score them in some way. But if we really understand the text we see, and the text you entered for a query, that would fundamentally be pretty important. It might be possible to build user interfaces that read the things people read and do the things people do. You could ask hard questions like, ‘What are some of the lesser known causes of the Civil War?’ Or queries where you have to join together data from lots of different sources. Like ‘What’s the Google engineering office with the highest average temperature?’ There’s no webpage that has that data on it. But if you know a page that has all the Google offices on it, and you know how to find historical temperature data, you can answer that question. But making the leap to being able to manipulate that data to answer the question depends fundamentally on actually understanding what the data is.”

He also speaks to other types of machine learning besides neural networks:

It’s not just neural networks; machine learning in general is used underneath a lot of our products in ways that are probably not obvious to consumers, but which make a lot of features on the site run. Things like our ad network, which has a lot of machine learning built into it. Or Gmail’s spam and virus recognition. That’s a machine learning problem because you’re having to predict which messages are spam when they’re messages you’ve never seen before. Or on Google+, we use machine learning to try to predict which folks you’d like to interact with or which people you should add to your circles.”

For example, the Gmail spam example could align with the Naive Bayes model example from the neural network opinion article.

I would suspect more of these use cases employ several models, including neural networks, especially today.

In that sense, we can see how Google’s aspirations and capabilities track with available AI technology.

Matt’s 2013 SEL article also mentioned Jeff’s team using machine learning to build out the knowledge graph, which is interesting for a few reasons.

In recent years, Google’s knowledge graph has been expanding at a fast rate:

Liz Reid also mentioned in a recent interview the central role the knowledge graph plays in combination with AI models to enable many of the Gemini-powered search features:

Of course, knowledge graphs are amenable to and can be expanded by graph neural networks (GNNs), among other AI-driven methods.

For example, we saw in this week’s Hamsterdam weekly SEO news recap (Part 57) that DoorDash is using LLMs to extract product attributes and expand its internal knowledge graph.

In revisiting the aspirations Google Research had for using machine learning and deep neural networks to solve particular problems in 2013, we can get a sense of how that’s informed the evolution to Google DeepMind’s work today, where many of the same goals are closer than ever to being solved, if not accomplished already.

Come back next week!

Thank you for checking out this week’s Hamsterdam History lesson!

Stop by next week for another lesson, or feel free to explore past articles below.

Until next time, enjoy the vibes:

Thanks for reading. Happy optimizing! 🙂


Related history posts:

Editorial history:

Created by Ethan Lazuk on:

Last updated:

Need a hand with a brand audit or marketing strategy?

I’m an independent brand strategist and marketing consultant. Learn about my services or contact me for more information!

Leave a Reply

Discover more from Ethan Lazuk

Subscribe now to keep reading and get access to the full archive.

Continue reading

GDPR Cookie Consent with Real Cookie Banner