|Google Blacklists Millions of Pirate URLs Before They’re Indexed|
Google maintains a rapidly growing list of copyright-infringing URLs which they haven't indexed yet. This blacklist ensures that these links are never added to the search engine. Thanks to a new update in the transparency report, we now know how many non-indexed links every takedown notice includes, which is surprisingly high in some cases.
In recent years, Google has had to cope with a continuous increase in takedown requests which target pirate sites in search results.
The total number of ‘removed’ URLs just reached 3.5 billion and millions more are added every day.
While that’s nothing new, Google just started sharing some additional insight into the nature of these requests.
As it turns out, millions, if not hundreds of millions, of the links copyright holders target have never appeared in Google’s search index.
Earlier this year Google copyright counsel Caleb Donaldson revealed that the company had started to block non-indexed links ‘prophylactically.’ In other words, Google blocks URLs before they appear in the search results, as some sort of piracy vaccine.
“Google has critically expanded notice and takedown in another important way: We accept notices for URLs that are not even in our index in the first place. That way, we can collect information even about pages and domains we have not yet crawled,” Donaldson noted.
“We process these URLs as we do the others. Once one of these not-in-index URLs is approved for takedown, we prophylactically block it from appearing in our Search results,” he added.
Unfortunately, Google provided no easy way to see how many links in a request were not indexed, but that has now changed.
Over the past week or so the search engine added a new signal to its DMCA transparency report listing how many of the submitted URLs in a notice are not indexed yet. In some cases, this is the vast majority.
Take the Mexican branch on the anti-piracy group APDIF, for example. This organization is one of the most active DMCA reporters and has asked Google to remove over a million URLs last week alone.
As can be seen below, the majority of the links appear to be non-indexed links. We browsed through dozens of recent listings from APDIF and these reveal a pattern where in most cases over 90% of the submitted URLs are not in Google’s search results.
Google now reporting non-indexed takedown requests
These URLs are obviously not removed since they weren’t listed. According to the company’s earlier statement, they are put on a separate blocklist instead, which prevents them from being added in the future.
APDIF is not the only reporter that does this though. Rivendell, the most active sender of all, also has a high rate of non-indexed links, often well over 50%.
Not all reporting agencies have such high rates as APDIF. However, it is clear that millions of non-indexed pirate URLs are added to the preemptive blocklist every month.
Technically, the DMCA takedown process is meant for links and content which actually exist on a service, but it appears that Google doesn’t mind going a step further.
TorrentFreak reached out to the search giant several days ago, hoping to find out what percentage of the overall requests are not in Google’s search results, but at the time of writing, we have yet to hear back.