Popular Posts

Latest Articles

Semrush Partner
Search Engine Optimization
244 views

Google’s Ranking Secrets Leaked: Quality, Clicks, Popularity, and LLM Support

As part of the antitrust lawsuit filed by the U.S. Department of Justice against Google, documents have been released revealing important clues about how the search giant ranks websites.

Google's Ranking Secrets Leaked

The antitrust lawsuit filed by the U.S. Department of Justice against Google has revealed fascinating insights into how the search giant ranks websites. Court documents, including testimony from a Google engineer, have unveiled algorithmic details that SEO professionals have been curious about for years. These documents provide valuable information about Google’s quality scoring, click behaviours, popularity signals, and AI integration.

Google’s Data Flow and Indexing Process

Google’s search results generation process begins with data collection. According to court documents, Google feeds on two main data sources:

  1. Structured Data: Information from third-party feeds, organised in a specific format.
  2. Unstructured Data: Raw content collected through web crawling (GoogleBot).

This data is collected and processed through a system called “Multiverse.” The collected data undergoes cleaning and normalisation processes to prepare for the main index. Structured information is forwarded to systems like Knowledge Graph to provide rich results and semantic contributions.

When a user performs a search, components like Query Understanding Service (QUS) and Superroot come into play. These systems analyze user queries and match them with appropriate content. In the final stage, Google Web Server (GWS) delivers the results and applies personalisation. Throughout this process, Logging Stack monitors user interactions and records them to optimise the system.

Google’s ABC Signals

One of the most interesting revelations in the court documents is Google’s core ranking factors, called “ABC Signals.” These signals include:

  • A — Anchors: Pages linking to the target page, i.e., backlinks.
  • B — Body: The presence of search query terms in the document content.
  • C — Clicks: The time users spend on a page before returning to search results.

These ABC signals form a page’s “topicality” score. According to the Google engineer’s testimony, the ranking development process (especially topicality) involves solving complex mathematical problems, and a team is continuously working on these problems.

Hand-Crafted Signals and Machine Learning

Another noteworthy point in the documents is Google’s continued use of “hand-crafted signals.” This indicates that the algorithm is not entirely automatic or AI-based but consists of scalable algorithms fine-tuned by search engineers.

The Google engineer compares their approach with Microsoft’s completely automated approach used in Bing and says: “The reason the vast majority of signals are hand-crafted is so that when something breaks, Google knows what to fix. Google wants their signals to be completely transparent so they can troubleshoot and improve.”

This approach offers an important insight for SEO professionals: Google’s algorithm consists of understandable and predictable components. This means SEO strategies can still be effective.

The Relationship Between Page Quality and Relevance

Another important point revealed in the court documents is the relationship between page quality and query relevance. According to the Google engineer, page quality is generally a static value independent of queries. In other words, if a page is evaluated as high-quality and reliable, it is accepted as such for all relevant queries.

However, in some cases, the quality signal may include information from the query in addition to the static signal. For example, if a site is high-quality but contains general information, a query seeking very narrow/technical information can be used to direct to a high-quality site that is more technical.

The Google engineer emphasises that the page quality measure called “Q*” (Q-star) (the concept of reliability) is “incredibly important.” The quality score remains of great importance even today, and page quality is what people complain about the most.

AI and Quality Issues

The Google engineer notes that AI is making quality issues worse: “People still complain about quality today, and AI is making it even worse.”

This statement serves as an important warning for SEO professionals. As AI content production becomes more widespread, maintaining quality standards and meeting Google’s quality assessment criteria becomes even more important.

LLM-Based Ranking Signals

The documents also provide hints about how Google uses LLM (Large Language Models) based ranking signals. The Google engineer mentions a system called “eDeepRank.” This is an LLM system that uses language models like BERT.

The engineer says, “eDeepRank tries to take LLM-based signals and break them down into components to make them more transparent.” This shows Google’s effort to make LLM-based ranking signals more understandable. This parsing process is done so that search engineers can understand why the LLM ranked something.

Additionally, signals called “RankEmbed,” derived from Google’s main LLM model, are also used in the ranking process.

PageRank and Distance Ranking Algorithms

PageRank, Google’s original ranking innovation, has been updated since then. According to court documents, PageRank is still an important signal that provides input to the quality score.

The Google engineer defines PageRank as: “PageRank is a single signal about distance from a known good source and is used as an input to the Quality score.”

This shows that link distance algorithms are still important. These algorithms calculate the distance from authoritative websites for a specific topic (called seed sites) to other websites. The algorithms start with a set of authoritative sites for a specific topic, and sites that are farther from the relevant seed site are determined to be less reliable. Sites closer to the seed sets tend to be more authoritative and reliable.

Chrome-Based Mysterious Popularity Signal

The court documents mention a popularity signal that is redacted but uses Chrome data. This suggests that Google may use data collected from the Chrome browser as a ranking factor.

This information provides a new perspective for SEO professionals. The claim that the Chrome API leak is related to real ranking factors seems reasonable, but many SEO experts believe that these APIs are developer-focused tools used to display performance metrics like Core Web Vitals in the Chrome Dev Tools interface.

Navboost: System Measuring Click Behaviours

The documents also mention a system called “Navboost.” This system measures how often users click on a document for a particular query and uses data from the last 13 months.

According to Dr. Eric Lehman’s testimony, “Navboost is not a machine learning system. It’s just a big table. For this document — sorry, for this search query, this document got two clicks. For this query, this document got three clicks, and so on. And it’s aggregated, there’s some extra data. But you can just think of it as a giant table.”

This information confirms the role user behaviors play in Google ranking and emphasizes the importance of SEO professionals focusing on click-through rate optimization.

Share!

These May Also Interest You

Craving more SEO knowledge? Extend your learning with #SEOSDINERSCLUB

Subscribe to our newsletter for weekly SEO insights, join the discussion in our community, or engage with professionals on our Twitter group.