The secrets of Google's algorithm leaked. Will SEO change from now on?

The world of SEO and Digital Marketing has been revolutionized by the alleged leak of more than 2,500 internal documents of the Google search API, which details the operation and the parameters that are taken into account to “position” or rank a website in search results.

To date, the parameters that Google uses in its algorithms to generate rankings and determine which content will be positioned above others in the results, have never been directly declared or detailed by Google.

Everything that was known came from public guidelines provided by the search engine, recommendations or good practices, together with the data of SEO professionals, who through their results in projects or experiments carried out, share information with conclusions on the points that can be considered relevant and their weight.

But these documents leaked and exposed by Rand Fishkin on the Spark Toro blog on May 27, give us information that contradicts what Google told us in many cases through its guidelines.

Here's my post breaking down the leak's source, my efforts to authenticate it, and early findings from the document trove: https://t.co/nmOD0fd5mN pic.twitter.com/yMxMrSeeLa
— Rand Fishkin (follow @randderuiter on Threads) (@randfish) May 28, 2024

The leak provides us with a never-before-seen picture of Google’s internal systems, challenging several public claims made by the company for years regarding what parameters or data are or are not used to generate search results rankings.

This information can redefine SEO strategies by fully showing what factors really influence search rankings and leveling the playing field in some ways by putting fully detailed, written rules of the game on the table. Although for those of us who have been in the world of SEO for many years, these documents will in a certain way validate the strategies that we were already carrying out, researching and understanding the user, identifying what they need and creating the best possible content to satisfy those needs, along with with technical analyzes and optimizations to enhance them. There doesn’t seem to be much new information, but there are confirmations.

And although we still have to analyze and understand in detail all the information in the leaked documents, we can say that in the coming days and weeks, many reports will come out that will make us much clearer and summarize everything that has been exposed. Surely tables will soon emerge classifying all the parameters or attributes (more than 14,000) and their weight within the algorithm based on leaked Google documentation.

For now, we leave you here the first conclusions from the information that is being extracted from the documents:

Table of Contents

Use of User Click Data (Clickstream Data)

According to the documents, the first inconsistency with Google’s public information arises here, and that is that despite the fact that Google sources have always stated that information from user clicks is not used in the ranking for classifications, Google seems that it does use click data to improve the accuracy of its search results.

This data comes from user behaviors, such as clicks on search results and subsequent browsing.

The leaked Google documentation has references to modules that use user click data and rankings such as “goodClicks,” “badClicks,” “lastLongestClicks,” among others, which are used to display results in the SERP.

This data is linked to systems such as NavBoost and Glue, which improve the accuracy of the results.

Google also apparently filters out unwanted clicks, measures click duration, and uses Chrome view data to calculate metrics and determine the most important URLs.

In the SEO world, this made a lot of logic and was considered a fairly clear way to determine the interest of users in the content and to be able to assign quality metrics to it. This is even something that at SEOCOM Agency we do in our projects to be able to measure with concrete data, which content is working well and is of interest to users.

Existence of a Domain or Site Authority score (siteAuthority)

Another point that on several occasions was the subject of discussion and clarification by those responsible for Google, indicating that they did not use a ranking or score to rate the authority of the domains.

However, the leaked documentation refers to siteAuthority, and although it is still unclear what classification is carried out and how it affects the rankings, we are sure that there is a numerical weighting to determine domain authority.

Chrome Browser Data Usage

Apparently, information collected by Google Chrome is used, although it has always been denied.

It appears that Page Quality Scores have a site-level measurement of views from Chrome and also sitelink generation could be related to an attribute extracted from Chrome.

Existence of secondary algorithms to promote or lower rankings

Reference is made to several algorithms or secondary functions that by their names would be linked to boosting or lowering results rankings, such as some of the following: Quality Boost, Nav Boost, Nav Demotion, Real Time Boost, SERP Demotion, Web Image Boost .

Authors of content and EEAT

Google stores as a parameter the information related to the authors associated with the content.

Additionally, it attempts to determine if an entity on the page is also the author of the page.

This shows us that in some way importance is being given to the authority and experience of content authors.

Links are still an important parameter

There is a lot of information that shows the work done to understand the link graph and how content is linked. It may be that the weighting has changed over time and the links are given less importance or relevance, we will have to finish looking at this, but we know that they continue to be a valuable contribution to the rankings.

Content freshness

Google places great importance on the freshness of content and uses various techniques to associate dates with pages, maintaining temporal relevance. In addition, reference is made to functions such as reshnessTwiddler, which is a system that modifies rankings based on the freshness of the content.

Other mentions of parameters

Reference is also made to many parameters that affect things that we already knew, such as page titles, anchor of the links, quality of the links, size of the texts, etc… The weight they currently have will have to be confirmed.

And for now this is all we can tell you, but we will pay close attention to the analysis (we even already have team members looking at the documentation in detail) to be able to write more about it. What we do know is that this will cause a controversy, because it validates the information that many SEOs within the community were commenting on for years despite Google’s denials.