Artificial intelligence powerhouse OpenAI has quietly taken the blame for its AI detection software citing a low accuracy rate.
The AI classifier developed by OpenAI was first launched on January 31 and aimed to help users, such as teachers and professors, distinguish human-written text from AI-generated text.
However, according to the original blog post which announced the launch of the tool, the AI classifier was stopped on July 20:
“As of July 20, 2023, the AI classifier is no longer available due to its low accuracy rate.”
The link to the tool is no longer functional, while the note only offered simple reasoning as to why the tool was discontinued. However, the company explained that it was looking for new, more efficient ways to identify AI-generated content.
“We are working to integrate feedback and are currently researching more effective provenance techniques for text, and are committed to developing and deploying mechanisms that allow users to understand whether audio or visual content is AI-generated,” the note reads.
From the outset, OpenAI made it clear that the detection tool was error-prone and could not be considered “fully reliable”.
The company said limitations of its AI detection tool included being “very inaccurate” when checking text under 1,000 characters and that it could “confidently” label human-written text as AI-generated.
The classifier is the latest of OpenAI’s products to come under scrutiny.
On July 18, researchers from Stanford and UC Berkeley published a study that found that OpenAI’s flagship ChatGPT got significantly worse with age.
We evaluated #ChatGPTover time and found substantial differences in its answers to the *same questions* between the June release of GPT4 and GPT3.5 and the March releases. Newer versions have made some tasks worse. with Lingjiao Chen @matei_zaharia https://t.co/TGeN4T18Fd https://t.co/36mjnejERy pic.twitter.com/FEiqrUVbg6
—James Zou (@james_y_zou) July 19, 2023
The researchers found that over the past few months, ChatGPT-4’s ability to accurately identify prime numbers had dropped from 97.6% to just 2.4%. Additionally, ChatGPT-3.5 and ChatGPT-4 experienced a significant drop in their ability to generate new lines of code.