AI Tools Generating Over 4,000 Fake Citations in Biomedical Research Papers, New Audit Reveals

Surge in Fabricated References Threatens Scientific Integrity

higher-education
biomedical-research
research-integrity
fake-citations
ai-in-research

672views

a close up of a typewriter with a paper on it — Photo by Markus Winkler on Unsplash

🚨 The Alarming Discovery of Fabricated Citations in Biomedical Literature

In a groundbreaking investigation that has sent shockwaves through the scientific community, researchers at Columbia University School of Nursing uncovered a disturbing trend: artificial intelligence tools are infiltrating biomedical research by generating thousands of fake citations. This issue came to light through an exhaustive AI-assisted audit examining 2.5 million peer-reviewed papers published between early 2023 and February 2026 in PubMed Central's open access subset. What they found was over 4,046 citations to nonexistent studies scattered across 2,810 papers—a rate that has skyrocketed more than 12-fold in just a few years.

The study, led by Maxim Topaz, PhD, an associate professor at Columbia's School of Nursing and Data Science Institute, highlights how these hallucinations—plausible-sounding but entirely fabricated references—are slipping past peer review and into the literature. Fake citations mimic real ones, complete with authors, journals, and even DOIs that don't resolve to actual publications. This isn't just a technical glitch; it's a direct threat to the foundation of evidence-based medicine, where every reference serves as a pillar of credibility.

Biomedical research, encompassing fields like oncology, cardiology, and infectious diseases, relies heavily on systematic reviews and meta-analyses that aggregate prior studies. When fake citations appear here, they can propagate errors, misleading future researchers and clinicians alike. Topaz emphasized the real-world stakes: "A medical professional or clinical guideline developer has no way of knowing that the evidence they are relying on does not exist." One egregious example involved a paper where 18 out of 30 references were bogus, some already cited by others.

Understanding AI Hallucinations: How Fake Citations Are Born

Generative artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and its successors, excels at producing human-like text. However, these tools "hallucinate" when generating references. Asked to support a claim, an LLM might invent a study by blending patterns from its training data—real authors from one paper, a journal from another, and a fabricated title or year. The result? A reference that looks legitimate at a glance but vanishes upon verification.

This phenomenon isn't new to AI users. Early adopters reported LLMs citing nonexistent sources as far back as 2023, but the problem exploded mid-2024 as AI writing assistants became ubiquitous in academia. Researchers, under pressure to publish amid shrinking grant funding and tenure demands, increasingly turn to these tools for drafting manuscripts, polishing language, or even generating literature reviews. Step one: Input a topic. Step two: AI suggests citations. Step three: Copy-paste without checking—because who has time to verify dozens of DOIs?

The process is insidious. Unlike blatant plagiarism, these fakes pass initial scans. They evade simple plagiarism detectors and even some peer reviewers, who rarely deep-dive into reference lists. In higher education, where graduate students and early-career faculty often lead paper writing, this creates a perfect storm of unchecked AI output.

The Audit Methodology: AI Fighting Fire with Fire

To quantify the crisis, Topaz and colleagues—Nir Roguin, Pallavi Gupta, Zhihong Zhang, and Laura-Maria Peltonen—built an automated verification pipeline. They scanned 97.1 million references with resolvable identifiers (DOIs or PubMed IDs) from the 2.5 million papers. LLMs flagged mismatches between cited titles and actual linked publications, cross-checked against PubMed, Crossref, OpenAlex, and Google Scholar.

Filters minimized false positives: pattern detection culled parsing errors, and manual review of 500 samples by three experts confirmed 70% accuracy in flagging true fabrications. The conservative approach likely underestimates the problem, as 23% of references lacked verifiable IDs and paywalled papers were excluded.

This innovative use of AI to detect AI-generated errors marks a turning point. As Kathryn Weber-Boer of Digital Science noted, it's a "solid first contribution," though human verification remains crucial due to databases like Google Scholar occasionally indexing ghosts.

Diagram of the AI-assisted audit pipeline for detecting fake citations in biomedical papers

Shocking Statistics and Trends: A 12-Fold Explosion

The numbers paint a grim picture. In 2023, fake citations hovered at 4 per 10,000 papers. By early 2026, that surged to 57 per 10,000—one in every 277 PubMed-indexed papers. Review articles fared worst, with a 57% higher rate, as they compile vast reference lists ripe for AI assistance.

91% of affected papers had 1-2 fakes, likely inadvertent.
246 papers had 3+ fakes, raising misconduct flags.
98.4% saw no publisher action—no corrections, no retractions.

Less selective open-access journals bore the brunt, with one unnamed publisher 14 times higher than top-tier ones. This aligns with papermill suspicions but points squarely to AI slop. Nature's analysis estimates 1.6% of 2025 publications harbor at least one ghost reference.Details from the original Lancet correspondence.

Real-World Case Studies: From Manuscripts to Clinical Risk

Consider a Springer Nature paper on bowel surgery: 12 of 14 references nonexistent. Or a World Bank obesity report with 14 fakes. Even Retraction Watch's cofounder appeared as a hallucinated author. In one Columbia-reviewed paper, 60% of citations were invalid, now embedded in systematic reviews guiding treatments for conditions like diabetes or cancer.

These aren't isolated. A CBS News report detailed Topaz's own ordeal: an AI polisher inserted a fake into his manuscript, greenlit by reviewers until editorial catch. Propagation amplifies harm—fakes gain citations, laundering illegitimacy. In higher education, this erodes trust in university outputs, as faculty publications fuel rankings and grants.

Stakeholder Perspectives: Voices from Publishers, Researchers, and Journals

Howard Bauchner and Frederick Rivara, in Lancet commentary, demand author accountability: "Renewed efforts are needed to enhance research integrity." Publishers like PLOS view isolated fakes as non-misconduct absent intent, opting for institutional probes. Taylor & Francis rejects integrity-compromising submissions.

Researchers like David Resnik argue retractions hinge on impact—if a fake bolsters key claims, retract; if peripheral, correct. Mohammad Hosseini warns of subtler issues: biased or incomplete AI citations, harder to spot. In academia, deans and provosts face calls for AI literacy training, balancing productivity gains against risks.

Current Responses: Retractions Lagging, Tools Emerging

Over 98% of flagged papers remain untouched, per Retraction Watch. Only 2% retracted, despite calls for retroactive sweeps. Emerging tools include Citadel by Topaz, an interactive dashboard ranking publishers by fake rates.

Journals integrate LLM detectors (e.g., OpenAI's classifier), but fakes dodge them. Some mandate AI disclosure; others auto-verify references via APIs. Higher ed institutions like Columbia now emphasize verification protocols in grad seminars.

Response Strategy	Examples	Effectiveness
Automated Verification	DOI/PMID cross-checks at submission	High, catches 70-90%
AI Disclosure Mandates	Nature, Lancet policies	Moderate, relies on honesty
Manual Peer Review Boost	Reference audits in reviews	Low currently, time-intensive
Integrity Databases	New 'fake citation' category proposed	Emerging

Solutions and Best Practices for Researchers and Institutions

To combat this, adopt a multi-pronged approach:

Always verify citations manually—use PubMed, Crossref, Google Scholar.
Employ tools like Scite.ai or Citation Gecko for context checks.
Train on AI limits: Treat outputs as drafts, not finals.
Universities: Integrate integrity modules in PhD programs; audit departmental outputs.
Publishers: Mandate metadata for reference accuracy; reject unchecked AI use.

For aspiring academics, explore tips on crafting robust CVs with verified research.

Infographic of best practices to avoid AI-generated fake citations in research

Future Outlook: Restoring Trust in an AI-Augmented Era

By 2026, AI aids 40% of manuscripts per surveys, but unchecked growth risks a citation crisis. Optimism lies in hybrid solutions: AI auditors scaling human oversight. Indexing services like PubMed could flag suspicious refs; blockchain for immutable citations experiments underway.

In higher education, this underscores a pivot: From volume to verified quality. As pressures mount—publish-or-perish meets AI temptation—proactive policies will safeguard biomedicine's gold standard. Researchers must adapt, ensuring innovation doesn't fabricate its own downfall. For the latest in academic careers, check research positions.

Books about law are neatly arranged on a shelf.

Photo by Krists Luhaers on Unsplash

Browse by Subject

Frequently Asked Questions

❓What are fake citations in biomedical research?

Fake citations, or hallucinated references, are fabricated entries generated by AI tools that appear real but link to no existing studies. They include plausible authors, titles, and journals but fail verification in databases like PubMed.

📊How many fake citations did the Columbia audit find?

The audit identified 4,046 fake citations across 2,810 papers out of 2.5 million reviewed, analyzing 97.1 million references. Rates jumped from 4 to 57 per 10,000 papers.

🤖Why are AI tools causing fake citations?

Large language models hallucinate by inventing references from training data patterns when asked to cite sources, creating plausible but nonexistent studies without real-time database access.

📈What is the trend in fake citations since 2023?

A 12-fold increase, stable at low levels in 2023, exploding mid-2024 with AI adoption. By 2026, one in 277 PubMed papers has fakes.

⚕️How do fake citations impact patient care?

They infiltrate systematic reviews and guidelines, leading clinicians to base treatments on nonexistent evidence, potentially harming outcomes in fields like oncology or cardiology.