During the State of Search keynote yesterday in Dallas, Texas, Gary Illyes shared a pretty incredible stat – Google knows about 120 trillion URLs, and of those URLs, 60% of them are duplicate.
Now keep in mind Illyes does not say these are all indexed and ranking – merely that 60% of the URLs Google knows about are duplicate. But this is still an incredible number – more than half of them are duplicate!
We see Google regularly kick duplicates out of the search results – completely if it is a case of hardcore spam – or at least to not show them until someone clicks on the link at the bottom of the search results to reveal duplicates. There are also duplicates that get removed via a DMCA filed with Google, to remove stolen content from the search results.
Google doesn’t tend to release crawl stats very often, so it is nice to hear that there are 140 trillion URLs that Google knows about, as well as the fact 60% of those known URLs are duplicate.
Google knows about 120 trillion URLs, and 60% of them are duplicates o.O @methode #StateofSearch
— Jennifer Slegg (@jenstar) November 16, 2015
Jennifer Slegg
Latest posts by Jennifer Slegg (see all)
- 2022 Update for Google Quality Rater Guidelines – Big YMYL Updates - August 1, 2022
- Google Quality Rater Guidelines: The Low Quality 2021 Update - October 19, 2021
- Rethinking Affiliate Sites With Google’s Product Review Update - April 23, 2021
- New Google Quality Rater Guidelines, Update Adds Emphasis on Needs Met - October 16, 2020
- Google Updates Experiment Statistics for Quality Raters - October 6, 2020
Reuben Yau says
I wonder if they also classify unintentional duplicates, like URLs with utm params, as part of that 60%.