(Toronto, November 17, 2025) A groundbreaking study published in JMIR Mental Health has unveiled a significant threat to research integrity posed by Large Language Models (LLMs) like GPT-4o. The study reveals that these AI tools frequently fabricate bibliographic citations, raising serious concerns about their reliability in academic settings. This discovery underscores the urgent need for stringent human verification processes and institutional safeguards, particularly in specialized fields within mental health.
According to the study, a staggering 19.9% of citations generated by GPT-4o in simulated literature reviews were entirely fabricated, lacking any real-world publication traceability. Additionally, among citations that appeared legitimate, 45.4% contained bibliographic errors, such as incorrect or invalid Digital Object Identifiers (DOIs).
Impact on Academic Integrity
This research arrives at a critical juncture as academic journals increasingly encounter AI-generated references that are either fabricated or erroneous. Such issues are not mere formatting errors; they disrupt the chain of verifiability, mislead readers, and fundamentally compromise the trustworthiness of scientific results. This makes the need for meticulous scrutiny and verification essential to uphold academic rigor.
Reliability Influenced by Topic Familiarity
The study, led by Dr. Jake Linardon from Deakin University, assessed GPT-4o’s citation reliability across mental health topics with varying degrees of public awareness and scientific maturity. These included major depressive disorder (high familiarity), binge eating disorder (moderate familiarity), and body dysmorphic disorder (low familiarity). The research also compared general versus specialized review prompts, such as those focusing on digital interventions.
Fabrication rates were notably higher for less familiar topics, with binge eating disorder at 28% and body dysmorphic disorder at 29%, compared to just 6% for major depressive disorder.
Furthermore, specialized topics, particularly those involving digital interventions, exhibited higher fabrication rates compared to general overviews, highlighting a specific vulnerability in niche research areas.
Urgent Call for Oversight
The study’s conclusions issue a stark warning to the academic community: Citation fabrication and errors are prevalent in GPT-4o outputs. The reliability of LLM-generated citations is not static but varies depending on the topic and the specificity of the prompt used.
Key Recommendations
- Rigorous Verification is Mandatory: Researchers and students must rigorously verify all LLM-generated references to ensure their accuracy and authenticity.
- Journal and Institutional Role: Journal editors and publishers should implement robust safeguards, potentially utilizing detection software to flag non-matching citations as potential hallucinations.
- Policy and Training: Academic institutions need to develop clear policies and training programs to equip users with the skills to critically assess LLM outputs and design strategic prompts, especially for less visible or highly specialized research topics.
Looking Forward
The findings from this study highlight a pivotal moment for the integration of AI in academic research. As AI tools become more prevalent, the academic community must adapt by implementing rigorous verification processes and developing robust policies to safeguard research integrity. This will involve collaboration between researchers, publishers, and institutions to ensure that AI serves as a reliable tool rather than a source of misinformation.
For further details, the original study by Linardon et al. can be accessed in JMIR Mental Health, providing a comprehensive analysis of the influence of topic familiarity and prompt specificity on citation fabrication in mental health research using LLMs.
About JMIR Publications: JMIR Publications is a leading open access publisher specializing in digital health research. Committed to advancing open science, JMIR partners with researchers to amplify their work and impact. For more information, visit jmirpublications.com.