AI in Medicine: Curae ex Machina
Posts
Beware the Mirage: How Physicians Can Prevent Erroneous Citations When Using Generative AI

Beware the Mirage: How Physicians Can Prevent Erroneous Citations When Using Generative AI

Campion Quinn
April 12, 2025

By Campion Quinn, MD

The Rise of AI in Medical Writing

Generative AI is becoming a powerful assistant in medical writing. From summarizing clinical notes to helping craft review articles and patient instructions, large language models (LLMs) like ChatGPT, Claude, and Med-PaLM are increasingly popular among physicians. In a 2023 survey of clinicians by Stanford Medicine, nearly 28% of respondents reported using AI tools in some capacity for drafting documentation, research, or presentations[^1].

However, with this convenience comes a significant challenge: AI-generated hallucinations. One of the most dangerous forms involves fake citations—references that look authentic but do not actually exist. These phantom sources often include plausible-sounding article titles, real journal names, and fabricated DOIs. When not caught, they can compromise scientific integrity, mislead readers, and propagate misinformation in clinical and academic settings.

---

A Case of Misplaced Trust

Consider the case of a hospitalist preparing a grand rounds presentation on new treatments for atrial fibrillation. They ask ChatGPT to “list five recent randomized controlled trials on atrial fibrillation management and cite them in AMA format.” The model quickly produces five polished references, complete with journal names, issue numbers, and PMIDs.

On closer inspection, however, two of the articles are completely fictional. The titles are not listed in PubMed, the author combinations are incorrect, and the DOIs are invalid. Had these gone unnoticed, the physician could have presented data that had never been published—an outcome that risks professional embarrassment or worse.

---

How AI Hallucinates Citations

At its core, an LLM is a probabilistic text generator. It does not search PubMed or clinical databases in real time. Instead, it predicts the next most likely word or phrase based on patterns learned during training. When prompted for a citation, it assembles text that statistically resembles real academic references.

This phenomenon—termed “hallucination”—occurs for several reasons:

1. Lack of Real-Time Verification: Unless connected to a search-enabled tool (e.g., Bing, Scite, or PubMed plugins), the model cannot distinguish between actual and fabricated content[^2].

2. Overgeneralization: LLMs generalize from their training data. If they “see” thousands of reference formats, they learn to reproduce those formats—even without understanding factual accuracy[^3].

3. Ambiguous Prompts: Vague queries like “give me studies about diabetes” invite the model to fill gaps with statistically probable—but not verified—content.

---

The Scale of the Problem

This issue is not confined to medicine. A March 2024 report by the Columbia Journalism Review tested eight AI-powered search tools, including You.com, Perplexity, and Arc. All engines demonstrated a tendency to misattribute, fabricate, or mangle citations. In several instances, articles were credited to the wrong outlets or didn’t exist at all[^4].

These findings echo a 2023 study in JAMA which documented that LLMs like ChatGPT fabricated citations in over 50% of trials where users requested references for medical topics[^5].

---

Clinical and Ethical Risks

Physicians bear a heightened responsibility when leveraging AI in writing. The risks of citation errors extend beyond academic inconvenience:

- Patient Harm: Decisions based on fictitious sources can lead to inappropriate management.
- Misinformation Spread: Published errors may be cited by others, compounding inaccuracies.
- Legal Consequences: Use of unverified citations in medico-legal documents could be challenged in court.
- Professional Repercussions: Journals and institutions may retract or penalize work that contains fabricated references.

---

Recommendations to Prevent AI Citation Errors

Physicians can protect themselves by integrating simple but effective safeguards:

1. **Use Verified AI Tools with Retrieval Capability
Opt for AI platforms that link to real-time, indexed sources. Tools like Scite.ai, Consensus, and Perplexity with academic mode provide citations tied to existing literature. However, links and metadata must be independently verified[6].

2. Manually Validate All Citations
Before including any AI-generated reference:
- Confirm the article exists in PubMed, Google Scholar, or journal databases.
- Cross-check authors, publication dates, and DOI accuracy.
- Review the abstract or full text to ensure relevance.

3. Prompt for Titles or Ideas, Not Full Citations
Avoid asking for references outright. Instead, prompt AI with:
- “List recent trials related to X topic.”
- “Summarize findings of major studies in the last 5 years.”
Once articles are identified independently, you can prompt: “Format this in AMA style.”

4. Set Custom Instructions or Prompts to Minimize Hallucinations
Configure your AI interface with specific guardrails. Examples include:
> “Only cite sources that you can verify exist. Do not fabricate references. If unsure, say so.”

This will not eliminate hallucinations, but it reduces frequency and alerts the user to uncertainty.

5. Include a Human-in-the-Loop Review Process
Before submitting articles, CME material, or public presentations, include a citation review step. This is especially vital for institutional publications, patient education, or grant writing.

6. Stay Updated on Journal Guidelines
Leading journals—including NEJM, JAMA, and BMJ—have issued statements discouraging or regulating AI use in manuscript preparation without disclosure[^7]. Know the policies of your target journal or institution.

---

Physicians as Stewards of Information

AI can accelerate writing, reduce cognitive load, and increase access to knowledge. But as with any tool, its power must be coupled with scrutiny. Physicians must not outsource their judgment to a model that lacks understanding of truth, consequence, or patient outcomes.

As AI grows in clinical influence, the stakes grow with it. A citation is not just an academic footnote—it is a signpost pointing to verified evidence. When AI muddies those signposts, it is up to us to clean the path.

---

References

1. American Medical Association. Augmented Intelligence in Medicine: Physician Use and Perceptions of AI Tools in 2024. American Medical Association. Published December 6, 2024. Accessed March 22, 2025. https://www.ama-assn.org/practice-management/digital/augmented-intelligence-medicine
2. OpenAI. Best practices for using GPT models. Updated August 2023. https://platform.openai.com/docs/guides/gpt-best-practices
3. Bubeck S, Chandrasekaran V, Eldan R, et al. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv. March 2023. https://arxiv.org/abs/2303.12712
4. Tow Center for Digital Journalism. We compared eight AI search engines. They’re all bad at citing news. Columbia Journalism Review. March 2024. https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php
5. Korngiebel DM, Mooney SD. Considering the possibilities and pitfalls of generative AI in medical literature. JAMA. 2023;329(12):975–976. doi:10.1001/jama.2023.1951
6. Scite.ai. How our Smart Citations verify claims with real articles. Updated February 2024. https://scite.ai
7. Flanagin A, Bibbins-Domingo K, Berkwits M. Nonhuman "Authors" and Implications for the Integrity of Scientific Publication and Medical Knowledge. JAMA. 2023;329(6):507-508. doi:10.1001/jama.2023.1344

---