SEO Lessons From the Yandex Leak

Yandex is not a Google, but there is a lot that SEO experts can learn about how to build a modern search engine by studying these codes.

Mike King, founder of Ipullrank digital marketing agency, has published a comprehensive blog post about Yandex code samples. I made a summary for you.

It Is Not Google’s Code, So Why Do We Care?

Some believe examining this codebase is distracting and has nothing to influence their SEO-related decisions. Of course, Yandex is not Google. However, both are state-of-the-art web search engines that remain at the cutting edge of technology.

Software engineers from both companies attend the same conferences (SIGIR, ECIR, etc.) and share findings and innovations in Information Gain, Natural Language Processing/Comprehension, and Machine Learning. Yandex had a presence in Palo Alto and Google previously in Moscow.

A quick LinkedIn search reveals several hundred engineers who have worked at both companies, though we don’t know how many are working on Search at either company.

In a more direct overlap, Yandex also uses open-source technologies critical to innovations in Search, such as Google’s TensorFlow, BERT, MapReduce, and, much lesser extent Protocol Buffers.

So, while Yandex is definitely not Google, it’s not just some random research project we’re talking about here. There is a lot we can learn about how a modern search engine is built by examining this codebase.

Leaked Codes Have 17,854 Rank Factors

A deep look at the code base reveals that Yandex has a large number of ranking factor files for different subsets of query processing and ranking systems.

When we scan them, we see that there are 17,854 ranking factors in total. These ranking factors include various measures of:

  • Clicks
  • Dwell Time
  • Data obtained using Metrika, Yandex’s equivalent of Google Analytics.

Yandex’s Top Priority Negative Ranking Factors

In summary, these factors suggest that to get the best score, you should:

  • Avoid ads
  • Update old content instead of creating new pages.
  • Make sure that most of the backlinks to your site have branded anchor text.

Yandex’s Top Priority Positive Ranking Factors

For your rankings to be positively affected, you must:

  • Play word games while creating your domain name
  • Make sure your domain is .com
  • Encourage people to search for your target keywords in Yandex Bar
  • Keep getting clicks

There Are Many Unexpected First Ranking Factors

The more interesting first-weighted ranking factors are the unexpected ones. Below is a list of seventeen factors that stand out.

FI_PAGE_RANK: +0.1828678331 — PageRank is Yandex’s 17th highest weighted factor. They had previously completely removed backlinks from their ranking system, so it’s not surprising that it’s this low on the list.

FI_SPAM_KARMA: +0.00842682963 — The SPAM hash gets its name from “antispammers” and is the probability that the server is spam; Based on whois information.

FI_SUBQUERY_THEME_MATCH_A: +0.1786465163 — How closely the query and document match thematically. It is the 19th highest weighted factor.

FI_REG_HOST_RANK: +0.1567124399 — Yandex has a host (or domain) ranking factor.

FI_URL_LINK_PERCENT: +0.08940421124 — The ratio of links with URL (rather than text) to the total number of links.

FI_PAGE_RANK_UKR: +0.08712279101 — Has a specific Ukraine PageRank

FI_IS_NOT_RU: +0.08128946612 — It is a positive thing that the domain name is not .RU. The Russian search engine doesn’t trust Russian sites 🙂

FI_YABAR_HOST_AVG_TIME2: +0.07417219313 — This is the average wait time reported by YandexBar

FI_LERF_LR_LOG_RELEV: +0.06059448504 — This is link relevance based on the quality of each link FI_NUM_SLASHES9417

FI_ADV_PRONOUNS_PORTION: -0.001250755075 — The ratio of pronoun names on the page.

FI_TEXT_HEAD_SYN: -0.01291908335— Presence of [query] words in the title, taking into account synonyms.

FI_PERCENT_FREQ_WORDS: -0.02021022114 — The ratio of the number of words, which are the 200 most frequently used words of the language, to the total number of words in the text.

FI_YANDEX_ADV: -0.09426121965 — More specific with the dislike for ads, Yandex penalizes pages that contain Yandex ads.

FI_AURA_DOC_LOG_SHARED: -0.09768630485 — The logarithm of the number of non-unique text fields in the document.

FI_AURA_DOC_LOG_AUTHOR: -0.09727752961 — The logarithm of the number of text fields for which this document owner is recognized as the author.

FI_CLASSIF_IS_SHOP: -0.1339319854 — Apparently, Yandex will pay less attention to you if your page is a store.

When we examine these strange ranking factors and the factors available in the Yandex codebase, we see that many things could be ranking factors.

Mike King suspects that the “200 signals” that Google reports are 200 signal classes, and each signal combines many other components. According to King, just as Google Analytics has dimensions associated with many metrics, Google Search probably has classes of ranking signals consisting of many attributes.

Chris Long — Yandex prioritizes content close to the homepage

Yandex Digs Google, Bing, YouTube, and TikTok!

The codebase also reveals that Yandex has many parsers for other websites and related services. Also, Yandex has parsers for various services as well as their own.

What Can We Add to What We Know About Google from the Yandex Leak?

Naturally, this is still the question on everyone’s mind. While there are certainly many similarities between Yandex and Google, the truth is that only a Google Software Engineer working on Search can definitively answer this question.

Still, this is the wrong question.

Indeed, this code should help us expand our thinking about modern search. Much of the collective understanding of search comes from what the SEO community learned through testing in the early 2000s and from the mouths of search engineers when the search was much less opaque. Unfortunately, this hasn’t kept up with the fast pace of innovation.

The insights from the Yandex leak’s many features and ranking factors should yield more hypotheses that need to be tested and considered for ranking in Google. They should also offer more that can be parsed and measured by SEO crawling, link analysis, and ranking tools.

Here are the weekly SEO insights for the SEOs Diners Club members. You may also join our free SEO Diners Club network to ask questions and share your thoughts on these topics.


These May Also Interest You

Craving more SEO knowledge? Extend your learning with #SEOSDINERSCLUB

Subscribe to our newsletter for weekly SEO insights, join the discussion in our community, or engage with professionals on our Twitter group.