Stumbling Upon Google Engineer Ni Lao’s Work, & Exploring What It Can Teach Us about ML, IR & NLP for SEO Insights (a Hamsterdam Research Post)

Last updated:

June 20, 2024

Welcome to another rendition of Hamsterdam Research!

If you’re new here, this is where we usually look at new AI research papers to understand them better and hypothesize how they might influence the future of search and SEO.

But this week is a bit different.

Long story short, I’ve been prepping for a holistic SEO guide, which involves gathering sources for an outline. (I talked a bit more about it in another post.)

As I was gathering sources today, though, I realized something that gave me pause, and sent me down a rabbit hole, leading to Ni Lao’s work. 😉

Since entity-based optimization is a bigger focus of contemporary and semantically focused SEO strategies, many folks are familiar with Google’s knowledge graph.

It helps Google Search show knowledge panels, for example, like so:

That was before a haircut …

Beyond the knowledge graph is Google’s knowledge vault:

“Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Microsoft’s Satori, and Google’s Knowledge Graph. To increase the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous approaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories.”

That excerpt is from the abstract in “Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion,” a Google Research paper published in 2014.

That paper has several contributors, including Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang.

One of those names stands out: Ni Lao.

But why are we interested in Ni Lao’s work here, in particular?

Well, it kind of happened by accident. 🙂

I was gathering sources to inform that holistic SEO guide I mentioned, and that knowledge vault paper was set to be one of them.

So, I clicked the “Download” button:

What happened next was unexpected.

The page redirected:

To this location:

For a brief second, I thought, “Was Google hacked?!”

(I’ve been hacked before, and sneaky redirects were a part of it.)

Once the initial shock wore off, though, I noticed the page title says, “Ni Lao’s Homepage,” and it’s on a GitHub subdomain.

Then after I read the first couple of sentences, I realized it was the right place:

“I work on machine learning, information retrieval, and natural language processing. Previously I have studied a wide range of topics such as robotic soccer, computer system diagnosis, product search, and question answering. Now I am interested in learning to control machines, and learning to create machines.

I graduated from Language Technologies Institute, School of Computer Science at Carnegie Mellon University. My thesis was advised by professor William W. Cohen. I worked at Google and Apple on language understanding and question answering. I was the chief scientist at SayMosaic. Now I work at Google on large pretrained models.“

What you don’t see in those excerpts above from Ni’s GitHub page are all of its links.

In total, the page has 129 outbound hyperlinks, per Screaming Frog, including to various publications, Google patents, and other documents:

That’s when it hit me.

I bet there’s a lot of cool stuff we could learn in those documents!

So, for this article, the plan is to explore a choice selection of the research that Ni has contributed to and linked from his GitHub page — the page, let me remind you, that Google Research’s knowledge vault paper links to … 😉 — to learn about machine learning (ML), information retrieval (IR), and natural language processing (NLP) for SEO.

I’ll grab 30 or so documents based on titles that look appealing, going from the top of the page down, then use Gemini 1.5 Pro in Google AI Studio for summarization and question answering, putting the documents into a Chat prompt.

It’s basically the same process I used for summarizing SOTR podcast transcripts. (I talk more about how this method influences the model or compares functionally to RAG in that article, as well.)

As a disclaimer, we’re not looking for Google Search algorithm or ranking secrets …

Our goal is simply to expand our personal knowledge vaults, i.e., our minds, by learning more about the work of an accomplished Google engineer (and bringing visibility to it). 🙂

On that point, another AI content disclaimer:

Since much of the information below comes from an LLM summarizing a source (meaning it’s AI-generated content), please fact check any takeaways by referencing an authoritative human source, preferably Ni’s own work.

*I’ll share links to the documents we’re referencing at the end of this article.

I’ll also put all information created by Gemini in this font, so you can tell the difference.

Without further ado, here’s what Ni Lao’s work as a Google engineer can teach us about ML, IR, and NLP for SEO:

Well, one last thing …

Who is Ni Lao?

He’s an engineer who works at Google and a prolific contributor to research (with over 80 citations on Google Scholar). His KGMID is /g/11c73sq_st. 😉

Basically, he’s someone whose work is worth exploring!

Let’s do just that …

Here is the prompt I put into Gemini, along with the documents:

I’m creating content for SEOs, digital marketers, and website owners with advanced knowledge of machine learning, information retrieval, and natural language processing. I’ve uploaded 30 documents that cover these topics. Based on those documents, can you give me a summary of key points to know or learn more about. These should be done in the style of FAQs, with a question as a heading followed by an answer as a paragraph, and section headings for each topic, related to ML, IR, NLP basics and future implications.

Note: the initial answers were good, but I realized it would be beneficial to know where to look for more information, so I asked a follow-up:

For each section, please add a reference to the documents it came from or that would be relevant for related learning.

If you’re curious, the prompts and the 30 uploaded documents equaled 320,751 tokens (of 2 million available).

Here were Gemini’s takeaways from a selection of Ni Lao’s work:

Note that I’ve adjusted the formatting (bolding, italics, headings, etc.) as well as put links to the relevant documents mentioned in each section.

Just another reminder, the below is AI-generated content. Please verify takeaways with an authoritative human source. 😉

What are the core concepts of machine learning?

Machine learning (ML) focuses on enabling computers to learn patterns from data and make predictions or decisions without explicit programming.

It involves using algorithms to analyze and interpret data, identifying patterns and relationships, and applying this knowledge to new or unseen data.

Key concepts include supervised learning, where models are trained on labeled data to predict outcomes for new instances, and unsupervised learning, where models find hidden patterns in unlabeled data.

Relevant Documents:

What are some popular machine learning algorithms?

Numerous machine learning algorithms exist, each with strengths and weaknesses.

Some popular algorithms include:

Supervised Learning:

Decision Trees: Simple, interpretable models that make predictions based on a series of decision rules.
Support Vector Machines (SVMs): Powerful classifiers that aim to find the optimal hyperplane separating different data classes.
Naive Bayes: Probabilistic classifiers based on Bayes’ theorem, assuming feature independence.
Logistic Regression: A linear model for binary classification, predicting the probability of belonging to a class.

Unsupervised Learning:

k-means Clustering: Partitions data into k clusters based on similarity.
Principal Component Analysis (PCA): Reduces data dimensionality while retaining maximum variance.
Association Rule Mining: Discovers frequent patterns and relationships between items in a dataset.

Relevant Documents:

How does deep learning fit into machine learning?

Deep learning is a subset of machine learning that utilizes artificial neural networks with multiple layers (deep neural networks) to learn complex patterns and representations from data.

These networks consist of interconnected nodes (neurons) organized in layers, allowing them to extract hierarchical features and model intricate relationships.

Deep learning has achieved remarkable success in areas like image recognition, natural language processing, and speech recognition.

Relevant Documents:

What is information retrieval?

Information retrieval (IR) focuses on retrieving relevant information from a collection of documents or data in response to a user’s query.

It involves indexing, searching, ranking, and presenting results that satisfy the user’s information need.

Traditional IR systems rely on techniques like keyword matching, term weighting (e.g., TF-IDF), and relevance ranking to retrieve and present relevant documents.

Relevant Documents:

What are the common challenges of IR?

IR faces several challenges, including:

Vocabulary Mismatch: Queries and documents may use different terms to express the same concept, leading to low recall.
Ambiguity: Query terms can have multiple meanings, leading to the retrieval of irrelevant documents.
Relevance Assessment: Determining the actual relevance of retrieved documents to a user’s query is subjective and challenging.
Scalability: Handling large document collections and efficiently processing queries at scale is crucial for practical IR systems.

Relevant Documents:

How can semantic search improve IR?

Semantic search goes beyond keyword matching by incorporating the meaning and context of query terms and documents.

It utilizes techniques like natural language processing, knowledge graphs, and machine learning to understand the user’s intent and retrieve semantically relevant results.

Semantic search aims to improve IR by addressing vocabulary mismatch, ambiguity, and relevance assessment.

Relevant Documents:

What is natural language processing?

Natural language processing (NLP) deals with the interaction between computers and human language.

It involves developing algorithms and techniques for computers to understand, interpret, generate, and manipulate human language.

Key NLP tasks include:

Text Processing: Tokenization, stemming, lemmatization, and stop word removal.
Syntactic Analysis: Part-of-speech tagging, parsing, and dependency analysis.
Semantic Analysis: Word sense disambiguation, named entity recognition, and relation extraction.
Text Generation: Machine translation, text summarization, and dialogue systems.

Relevant Documents:

How can NLP improve information retrieval?

NLP plays a crucial role in enhancing IR by:

Query Understanding: NLP techniques can analyze user queries to identify key concepts, disambiguate terms, and infer the user’s intent.
Document Processing: NLP can extract relevant information from documents, identify named entities, and determine semantic relationships.
Relevance Ranking: NLP features can be incorporated into ranking models to improve the accuracy of relevance assessment.
Query Expansion: NLP can be used to expand queries with synonyms, related terms, or concepts to improve recall.

Relevant Documents:

What are the key advances in NLP driven by deep learning?

Deep learning has revolutionized NLP, leading to significant advancements in:

Word Embeddings: Representing words as dense vectors capturing semantic relationships.
Sequence-to-Sequence Models: Mapping input sequences to output sequences, enabling tasks like machine translation and text summarization.
Attention Mechanisms: Allowing models to focus on relevant parts of input sequences, improving performance in tasks like question answering.

Relevant Documents:

How can these technologies be combined for knowledge acquisition and reasoning?

Combining machine learning, information retrieval, and natural language processing holds great potential for:

Knowledge Base Construction: Extracting structured knowledge from unstructured text, enriching existing knowledge bases, and enabling automated knowledge acquisition.
Question Answering: Developing systems that can understand natural language questions and provide accurate answers by reasoning over large knowledge bases.
Intelligent Systems: Building intelligent agents and assistants that can interact with users in natural language, understand complex instructions, and perform tasks based on reasoning over acquired knowledge.

Relevant Documents:

What are some ethical considerations with these technologies?

As these technologies advance, it’s essential to address ethical considerations:

Bias and Fairness: Ensuring that ML models and NLP systems are fair, unbiased, and do not perpetuate societal biases.
Privacy and Data Security: Protecting user privacy and ensuring the responsible use and secure storage of data.
Transparency and Explainability: Making AI systems transparent and interpretable, enabling users to understand how decisions are made.

Relevant Documents:

Resources on AI Ethics, e.g., publications from The Alan Turing Institute, Partnership on AI (Note: Gemini added this.)

What are the key directions for future research?

Key areas for future research include:

Commonsense Reasoning: Enabling AI systems to understand and reason with commonsense knowledge.
Explainable AI: Developing methods to make AI systems more transparent and interpretable.
Human-AI Collaboration: Exploring new ways for humans and AI systems to collaborate effectively.
AI for Social Good: Leveraging AI to address societal challenges and promote social good.

Relevant Documents:

Publications from conferences like NeurIPS, ICML, ACL, as well as journals like “Nature Machine Intelligence” (Note: Gemini added this.)

It’s me again.

I think there’s a lot of helpful information in those answers above, but it’s all quite high level.

In terms of key takeaways, I think there’s general educational value here, overall, but the practical value is in identifying which of Ni Lao’s work to explore further based on topics of interest.

This supports the conclusion I had in the review of SOTR podcast transcripts, which is that while the AI-generated summaries are helpful, they’re not a match for reading the firsthand work and perspectives of the author.

For example, compare the style and breadth of detail in Gemini’s response above to this excerpt from a paper co-authored by Ni called “Efficient inference and learning in a large knowledge base,” which Gemini referenced in the “What are the common challenges of IR?” section:

“Personalized PageRank (Page et al. 1998) is a crucial algorithm for inference in large graphs. In this subsection, we introduce an approximate personalized PageRank method called PageRank–Nibble (Andersen et al. 2006, 2008). The outline of this algorithm is shown in Table 2. This method has been used for the problem of local partitioning: in local partitioning, the goal is to find a small, low-conductance2 component Gˆ of a large graph G that contains a given node v. In the next section, we will show that this algorithm can also be used to implement a probabilistic Selective Linear Definite (SLD) resolution with bounded computation.“

The differences are stark.

It speaks to why in Hamsterdam Research, we typically go straight to the research paper itself, but use Gemini Advanced to help explore topics further.

Rather than uploading multiple documents, we can take one document, head to Gemini Advanced, and ask for an explanation.

For example, here’s Gemini’s summary of the PageRank-Nibble paper:

For context, I uploaded the paper and gave Gemini Advanced this prompt:

From this paper, please explain the concept of personalized pagerank, pagerank-nibble, and how it’s relevant to SEO professionals interested in ML, IR, or NLP. please elaborate on specific terminology mentioned by the researchers.

Personalized PageRank (PPR) is an algorithm derived from the original PageRank algorithm, which was initially designed to rank web pages in search engine results.

In PPR, a “personalized” vector is introduced to bias the random walk towards a particular area of interest within the graph. This personalization allows for a more focused exploration of the graph structure, making it particularly relevant to tasks like recommendation systems or link prediction in social networks.

PageRank-Nibble is an approximation method for PPR designed to be more efficient in large graphs. It operates by incrementally exploring the graph, starting from a seed node and expanding outwards.

The algorithm maintains two vectors: ‘p’ (an approximation of the personalized PageRank vector) and ‘r’ (a vector of residual errors in ‘p’). It iteratively selects nodes with high residual errors and redistributes these errors to their neighbors, gradually refining the approximation. This process continues until a desired level of accuracy is reached or computational resources are exhausted.

The relevance of these concepts to SEO professionals interested in ML, IR, or NLP lies in their applicability to various tasks within these domains.

For instance, in information retrieval, PPR can be used to personalize search results based on a user’s query history or preferences.

In natural language processing, PPR can be employed for tasks like word sense disambiguation or entity linking, where the graph represents relationships between words or entities.

The researchers mention several specific terms related to PPR and PageRank-Nibble:

Random-walk-with-restart: This is another name for the PPR algorithm, emphasizing the process of simulating a random walk on the graph that occasionally restarts from the seed node.
Local partitioning: PageRank-Nibble is used for this task, aiming to find a small, well-connected subgraph around a given node. This is relevant to community detection in social networks or identifying relevant documents in a large corpus.
Conductance: This metric measures how well-connected a subgraph is to the rest of the graph. Low conductance indicates a strong internal connection, which is desirable in local partitioning.
Residual errors: These represent the difference between the true PPR values and the current approximation in PageRank-Nibble. The algorithm focuses on reducing these errors to improve the approximation.

By understanding these concepts and their applications, SEO professionals can leverage graph-based algorithms like PPR and PageRank-Nibble to enhance their strategies in various domains, such as improving search result personalization, identifying relevant content, or understanding relationships between entities in a knowledge graph.

Me again.

I think this illustrates the value of having a workflow with two steps:

Use Gemini 1.5 Pro in Google AI Studio to summarize a large corpus of documents and find which are relevant for further exploration.
Use Gemini Advanced to look at individual documents in detail.

But if we could add a third step …

It would be to build up a knowledge of ML, IR, and NLP to understand the author’s (in this case, Ni Lao’s) contributed work in its original format.

To that end, here’s more advanced vocabulary based on the 30 referenced documents, created by Gemini:

I gave Gemini this follow-up prompt:

I’m going through the answer now and I see several docs referenced, but I don’t think all of the ones I uploaded were included. I also think the explanations are a little basic. Can you please now give me a list of key vocabulary from across all of the documents, with a focus on more advanced ML, IR, and NLP terms.

Here’s a list of key vocabulary extracted from the provided documents, focusing on advanced terms in Machine Learning, Information Retrieval, and Natural Language Processing:

Machine Learning (ML)

ℓ1-norm Regularization: A regularization technique that encourages sparsity in the model parameters, effectively performing feature selection.
Orthant-Wise Quasi-Newton (OWL-QN): A gradient-based optimization method specifically designed for ℓ1-regularized problems.
Grafting-Light: An efficient, incremental feature selection algorithm for Markov Random Fields (MRFs) that alternates between one-step gradient descent and feature selection.
Contrastive Feature Induction (CFI): A fast feature evaluation algorithm for CRFs based on mean-field contrastive divergence, focusing on variables with high signals or errors.
Mean-Field Contrastive Divergence (CDMF): An approximate inference method for training probabilistic models, particularly effective for MRFs and CRFs.
Gradient Decomposition Theorem (GDT): A theorem that shows the gradient of CRF features can be represented as a function of signals and errors.
Multi-Instance Learning: A learning paradigm where training data consists of bags of instances, and the goal is to learn a model that can classify bags based on the instances they contain.
Distant Supervision: A method for automatically generating training data for relation extraction by aligning text with an existing knowledge base.
Reinforcement Learning (RL): A learning paradigm where agents learn by interacting with an environment, receiving rewards for desired behavior.
Policy Gradient Methods: A family of RL algorithms that directly optimize the policy by adjusting its parameters based on the expected reward.
REINFORCE: A popular policy gradient algorithm used for training models in situations where backpropagation is not directly applicable.
Memory Augmented Policy Optimization (MAPO): An RL algorithm that uses a memory buffer to store high-return experiences and a clipping mechanism to combat spurious rewards.
Spurious Programs: Programs that accidentally produce the correct answer but are not semantically correct, leading to poor generalization.
Stacked Learning: A method for improving model performance by training a second-level model (e.g., a reranker) on the output of a first-level model.
Leave-One-Out (LOO) Cross-Validation: A robust cross-validation technique where the model is trained on all data points except one, and the held-out point is used for evaluation.
Multi-Task Learning: A learning paradigm where a single model is trained to perform multiple related tasks simultaneously, often sharing parameters or representations.
Knowledge Distillation: A technique for transferring knowledge from a larger, more complex teacher model to a smaller, more efficient student model.

Information Retrieval (IR)

Personalized PageRank (PPR): A variant of PageRank that calculates the probability of reaching a node from a specified starting node, useful for measuring proximity in graphs.
PageRank–Nibble: An approximate PPR algorithm used for local partitioning of graphs, finding small, low-conductance components containing a given node.
Local Grounding: A technique for constructing a partial proof graph for a query in a probabilistic logic, allowing efficient inference independent of database size.
Probabilistic Selective Linear Definite (SLD) Resolution: A probabilistic extension of Prolog’s resolution procedure for reasoning over logic programs.
Stochastic Logic Programs (SLPs): A formalism for probabilistic logic programming where clauses are associated with probabilities, defining a distribution over possible proofs and answers.
Local Closed World Assumption (LCWA): An assumption used for labeling data in KB completion, where if a KB contains some facts about a subject-predicate pair, it is assumed to be locally complete for that pair.
Path Ranking Algorithm (PRA): A relational learning algorithm that learns weighted rules (represented as graph path patterns) for link prediction in knowledge bases.
Dense Passage Retrieval: Retrieving relevant passages from a large corpus using dense vector representations, often outperforming traditional keyword-based methods.

Natural Language Processing (NLP)

Lexico-Syntactic Patterns (LSPs): Patterns that combine lexical (word) and syntactic information, useful for tasks like relation extraction and query expansion.
Bootstrapping Learning: An iterative learning process that starts with a small set of seed examples and expands it by using learned patterns to extract new examples.
Pseudo-Relevance Feedback (PRF): A query expansion technique that assumes the top-ranked documents from an initial retrieval are relevant and uses terms from these documents to expand the query.
Concept Expansion: Expanding queries with semantically related terms or concepts to improve recall.
Word Embedding: Representing words as dense vectors capturing semantic relationships, commonly used in deep learning-based NLP models.
Sequence-to-Sequence Models (seq2seq): Neural networks that map input sequences to output sequences, crucial for tasks like machine translation and text summarization.
Copy Mechanism: A technique used in seq2seq models that allows copying words or phrases directly from the input sequence to the output sequence.
Attention Mechanism: A technique used in seq2seq models that allows the model to focus on relevant parts of the input sequence when generating each output token.
Key-Variable Memory: A memory mechanism used in neural symbolic machines that allows the model to represent and refer to program variables, enabling compositionality.
Code Assistance: A technique used in neural symbolic machines where the interpreter provides hints and constraints to the programmer, reducing the search space.
Neural Computer Interface: The interface between a neural network and a symbolic computer, facilitating the execution of programs and the exchange of information.

These are just some of the advanced terms found in the documents. As you dive deeper into these topics, you’ll undoubtedly encounter even more specialized vocabulary. Be sure to use the references listed in the previous response to guide your learning and explore these concepts in greater depth.

That gives us a pretty good breadth of information to get started building or expand our personal knowledge vaults. 😉

As a final disclaimer: the above is AI-generated. Please fact check with reputable human sources.

Lastly, here is the full list of resources from Ni Lao’s GitHub page that we’ve referenced in the prompts and responses above:

To recap our takeaways:

It can be helpful to upload several documents at once to an AI model in order to get general takeaways and find a direction for future research.

With that direction known, it can then be helpful to upload one document at a time, referencing specific information in greater detail.

But ultimately, the goal should be to understand the original source material without (or with limited) assistance from AI models.

In short, the best way to gain knowledge is through the author’s original voice.

We also learned that Ni Lao is a prolific research contributor. Look for his name in future papers, patents, and beyond.

But I’d be remiss not to ask Gemini for a few SEO takeaways, too. 😉

The final prompt:

Based on our conversation, can you provide 5 to 10 quick takeaways for SEO professionals to use in their strategies or inform what they focus on learning.

Here are 5-10 quick takeaways for SEO professionals based on our conversation about Machine Learning, Information Retrieval, and Natural Language Processing:

1. Semantic Search is Here to Stay: Google increasingly understands the meaning behind search queries, not just keywords. Optimize for topics and concepts, not just individual keywords.

2. Content Quality Reigns Supreme: NLP helps search engines discern high-quality content. Focus on creating comprehensive, well-structured, informative, and engaging content.

3. Structured Data Enhances Visibility: Use schema markup to provide structured data about your website and content. This helps search engines understand your content better and can lead to rich snippets in search results.

4. Entity Recognition Matters: Search engines recognize entities (people, places, things) in content. Incorporate relevant entities and provide context around them.

5. Link Building Remains Relevant, But Quality Over Quantity: Search engines like Google use algorithms like PageRank that consider link structure. Focus on earning high-quality, relevant backlinks.

6. Stay Updated on NLP Advancements: Deep learning is driving rapid progress in NLP. Keep abreast of new techniques and research to understand how search is evolving.

7. Experiment with AI-Powered SEO Tools: Numerous tools leverage ML and NLP for tasks like keyword research, content optimization, and link analysis. Explore and test these tools to potentially gain an edge.

8. Consider Future Implications of AI: AI will continue to transform search. Think about how advancements in knowledge acquisition and reasoning might impact SEO strategies in the long term.

9. Don’t Neglect the Human Element: While technology is important, remember that SEO ultimately serves users. Focus on providing a great user experience and building a website that people want to visit and share.

I definitely agree with that last item. 😉

Lastly, while I’ve reviewed all the content in this article, please remember to fact check any of the AI-generated content with authoritative human sources.

Before concluding, I put the content of this post back into Gemini for feedback, and it suggested:

The post rightly emphasizes the importance of human expertise in interpreting and applying AI-generated insights. This could be further emphasized by showcasing how human SEOs can work in tandem with AI tools to achieve better results.

To that end, here’s an article summarizing how I’ve used generative AI for SEO work for clients and on my own site, along with some opinions on responsible use. 😉

Generative AI for SEO: “Real Life” Examples of How I’ve Used ChatGPT, Claude & Gemini for Client Work

Outro

Thanks for reading this week’s Hamsterdam Research lesson.

It was a bit of a different approach, but I certainly learned some helpful information and hope you did, as well!

I always try to mention the researchers behind the papers we look at in Hamsterdam Research, but this might open new doors of looking at multiple papers from the same researchers. Stay tuned!

Feel free to comment with your thoughts (or contact me to share them) or check out related posts below.

Until next time, enjoy the vibes:

Thanks for reading. Happy optimizing!

SEO Strategist and Consultant

Ethan Lazuk

Everything You Need to Know (Or A Lot of It, Probably) from Google Search Central’s SOTR Podcasts (as Told Through Gemini Prompted with 75 Transcript PDFs)

I put 75 transcripts from Google Search Central Search Off the Record (SOTR) podcast episodes into Gemini 1.5 Pro and asked questions. Here’s how it…

June 15, 2024October 5, 2024

Doing the Global Minimum: Thinking About SEO More in the Context of Neural Network Architectures (A Hamsterdam Research Opinion Piece)

In this opinion piece, we broaden the SEO lexicon to include more deep neural network terminology, expanding our worldview with the global minimum.

May 15, 2024October 5, 2024

“Book Me A Trip To Washington, DC”: Revisiting Google Research’s Ambitions with Neural Networks in 2013 a Decade Later, After Google I/O 2024 (Hamsterdam History)

In this Hamsterdam History article, we’ll take a look at Google Research’s AI-driven goals in 2013 to see if they’ve come true after Google I/O…

May 16, 2024October 5, 2024

Stumbling Upon Google Engineer Ni Lao’s Work, & Exploring What It Can Teach Us about ML, IR & NLP for SEO Insights (a Hamsterdam Research Post)

As I was gathering sources today, though, I realized something that gave me pause, and sent me down a rabbit hole, leading to Ni Lao’s work. 😉

But why are we interested in Ni Lao’s work here, in particular?

I bet there’s a lot of cool stuff we could learn in those documents!

Without further ado, here’s what Ni Lao’s work as a Google engineer can teach us about ML, IR, and NLP for SEO:

Who is Ni Lao?

Here is the prompt I put into Gemini, along with the documents:

Here were Gemini’s takeaways from a selection of Ni Lao’s work:

What are the core concepts of machine learning?

What are some popular machine learning algorithms?

Supervised Learning:

Unsupervised Learning:

How does deep learning fit into machine learning?

What is information retrieval?

What are the common challenges of IR?

How can semantic search improve IR?

What is natural language processing?

Key NLP tasks include:

How can NLP improve information retrieval?

What are the key advances in NLP driven by deep learning?

How can these technologies be combined for knowledge acquisition and reasoning?

What are some ethical considerations with these technologies?

What are the key directions for future research?

Rather than uploading multiple documents, we can take one document, head to Gemini Advanced, and ask for an explanation.

For example, here’s Gemini’s summary of the PageRank-Nibble paper:

I think this illustrates the value of having a workflow with two steps:

But if we could add a third step …

To that end, here’s more advanced vocabulary based on the 30 referenced documents, created by Gemini:

Machine Learning (ML)

Information Retrieval (IR)

Natural Language Processing (NLP)

Lastly, here is the full list of resources from Ni Lao’s GitHub page that we’ve referenced in the prompts and responses above:

To recap our takeaways:

But I’d be remiss not to ask Gemini for a few SEO takeaways, too. 😉

Before concluding, I put the content of this post back into Gemini for feedback, and it suggested:

Outro

Related posts

Everything You Need to Know (Or A Lot of It, Probably) from Google Search Central’s SOTR Podcasts (as Told Through Gemini Prompted with 75 Transcript PDFs)

Doing the Global Minimum: Thinking About SEO More in the Context of Neural Network Architectures (A Hamsterdam Research Opinion Piece)

“Book Me A Trip To Washington, DC”: Revisiting Google Research’s Ambitions with Neural Networks in 2013 a Decade Later, After Google I/O 2024 (Hamsterdam History)

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Ethan Lazuk