OpenAI launches webcrawler GPTBot, and instructions on how to block it

资讯 2024-09-23 18:21:27 9

OpenAI has launched a web crawler to improve artificial intelligence models like GPT-4.

Called GPTBot, the system combs through the Internet to train and enhance AI's capabilities. Using GPTBot has the potential to improve existing AI models when it comes to aspects like accuracy and safety, according to a blog post by OpenAI.

"Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies," reads the post.

Websites can choose to restrict access to the web crawler, however, and prevent GPTBot from accessing their sites, either partially or by opting out entirely. OpenAI said that website operators can disallow the crawler by blocking its IP address or on a site's Robots.txt file.

SEE ALSO:Google's Bard AI chatbot is vulnerable to use by hackers. So is ChatGPT.

Previously, OpenAI has landed in hot water for how it collects data and for things like copyright infringement and privacy breaches. This past June, the AI platform was sued for "stealing" personal data to train ChatGPT.

Mashable Light SpeedWant more out-of-this world tech, space and science stories?Sign up for Mashable's weekly Light Speed newsletter.By signing up you agree to our Terms of Use and Privacy Policy.Thanks for signing up!

Its opt-out functions were only recently implemented, with features like disabling chat history allowing users more control over what personal data can be accessed.

ChatGPT 3.5 and 4 were trained on online data and text dating up to Sept. 2021. There is currently no way to remove content from that dataset.


Related Stories
  • ChatGPT has an Android app now
  • Learn ChatGPT with this training bundle for $30
  • MIT study: ChatGPT increases productivity for human workers
  • OpenAI is being sued for training ChatGPT with 'stolen' personal data
  • The dark web is overflowing with stolen ChatGPT accounts

How to prevent GPTBot from using your website's content

According to OpenAI, you can disallow GPTBot by adding it to your site's Robots.txt, which is essentially a text file that instructs web crawlers on what they can or cannot access from a website.

The code for disallowing GPTBot from your site.Credit: Screenshot / OpenAI.

You can also customize what parts a web crawler can use, allowing certain pages and disallowing others.

The code for disallowing or allowing GPTBot from your site's pagess.Credit: Screenshot / OpenAI.
本文地址:http://1.zzzogryeb.bond/html/61b199217.html
版权声明

本文仅代表作者观点,不代表本站立场。
本文系作者授权发表,未经许可,不得转载。

全站热门

Scientists detect water sloshing on Mars. There could be a lot.

LAPD says navigation apps steered people to neighborhoods on fire

雅安日报荣膺“2012十大公信力地市党报”

夏来钱排“银妃”红

Michael Cohen fights Donald Trump at the Supreme Court.

Brazil players support Neymar, says Fernandinho

Update your iPhone now to fix a major security flaw

Number of househusbands highest in 2016: statistics

友情链接