OpenAI’s ChatGPT, launched in November 2022, has been employed by individuals for diverse literary writing purposes. However, despite its human-like facade, ChatGPT’s writing inaccuracies can have devastating consequences in academic writing. To address this issue, a team of researchers from the University of Kansas has developed a tool with over 99 percent accuracy to differentiate between AI-generated and human-written academic content. The tool was detailed in a publication on 7 June in the journal Cell Reports Physical Science.
Heather Desaire, the lead author of the paper and a chemistry professor at the University of Kansas, acknowledges the impressive results of ChatGPT but recognizes its limitations in terms of accuracy. Consequently, she devised this identification tool, noting that AI text generators like ChatGPT cannot consistently produce precise information. Desaire raises concerns about the potential impact of AI text generation in the scientific domain, where the accumulation of communal knowledge is paramount, as inaccurate information may be difficult to discern from factual content.
Chatbots like ChatGPT are trained on extensive real text samples to effectively mimic human-generated writing. However, existing machine-learning tools can detect indications of AI intervention, such as the use of less emotional language. Nevertheless, these tools, like the popular deep-learning detector RoBERTa, are not ideally suited for academic writing, where emotional language is already less prevalent. Previous studies demonstrated that RoBERTa achieved approximately 80 percent accuracy in identifying AI-generated academic abstracts.
To bridge this gap, Desaire and her team developed a machine-learning tool that required minimal training data. They collected 64 Perspectives articles from the journal Science, in which scientists provide commentary on new research, and utilized these articles to generate 128 ChatGPT samples. The researchers’ tool examined these ChatGPT samples, comprising 1,276 text paragraphs.
Following model optimization, the researchers evaluated its performance using two datasets: one consisting of 30 original human-written articles and 60 ChatGPT-generated articles. The new model achieved 100 percent accuracy when assessing full articles and 97 to 99 percent accuracy when evaluating only the first paragraph of each article. In comparison, RoBERTa achieved accuracy rates of only 85 and 88 percent on the same test sets.
Through their analysis, the team identified certain distinct characteristics of AI writing, such as sentence length and complexity, that distinguished it from human writing. They also observed that human writers were more likely to mention colleagues by name, whereas ChatGPT tended to use general terms like “researchers” or “others.” Overall, AI-generated papers were deemed less engaging and more monotonous compared to human-written ones.
The researchers envision their work as proof that even readily available tools can be employed to identify AI-generated samples without extensive machine-learning expertise. Nonetheless, they caution that these results may only apply to a limited subset of academic writing that ChatGPT is capable of. For instance, it may be more challenging to detect AI writing when ChatGPT is specifically asked to mimic the style of a particular human sample.
Desaire believes in the ethical use of AI like ChatGPT but emphasizes the need for identification tools to evolve alongside the technology to ensure responsible deployment. She suggests that AI could be used safely and effectively, similar to spell-checking, by allowing it to perform a final revision for clarity in a nearly complete draft. However, rigorous fact-checking must be conducted to avoid introducing factual inaccuracies during this process.
The whytry.ai article you just read is a brief synopsis; the original article can be found here: Read the Full Article…