Originality.ai has done a study to determine how websites have responded since OpenAI shared details, earlier this month, on how to block its GPTBot. To identify which sites blocked GPTBot within the first two weeks, they analyzed the top global 1000 websites. Interesting insights include (directly quoted):
- In the first 14 days since GPTBot documentation was launched almost 10% of the top 1000 websites in the world have chosen to block GPTBot.
- OpenAI launched GPTBot on August 7th and shortly after the first “Top 100” website to block GPTBot was Reuters.com.
- Within the first 2 weeks of launching GPTBot these are the biggest websites in the world that had blocked GPTBot from accessing its site: Amazon.com, Quaroa.com, NYTimes.com, Shutterstock.com, Wikihow.com, and CNN.com.
Provided is a list of websites that are blocking GPTBot & CCBot, as of August 24, 2023. Some of the sites are blocking GPTBot and not Common Crawls’ CCBot, others are blocking CCBot and not GPTBot, with some blocking both. Author of the study and Founder/CEO of Originality.ai, Jonathan Gillham, notes that “GPTBot does not remove the knowledge that LLM’s have gained by training on existing web content they have already accessed.”
Of interest to business researchers, in addition to Reuters, NYTimes and CNN, these are news sites that are blocking ChatGPT:
- businessinsider.com
- bloomberg.com
- pcmag.com
- theverge.com
- prnewswire.com
- mashable.com
- timesofindia.com
- fool.com
- chicagotribune.com
Image by Niek Verlaan from Pixabay