Medical diagnosis and decision: Human versus Artificial intelligence (AI) | ||||
DISEASE | INTERVENTION | COMPARISON | RESULTS | |
Nature. 2024 Sep 25. doi: 10.1038/s41586-024-07930-y. Epub ahead of print | Descriptive | |||
IN medical informatics, artificial intelligence |
The Use of
recent more powerful large language models As Methodology procedure |
Is worse Than
older, less powerful large language models |
To while stability to different natural phrasings of the same question is improved, more powerful language models do not avoid answering questions, even if very difficult and, paradoxaly, do not secure areas of low difficulty | |
JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969 | Randomized Controlled Trial, Diagnostic | |||
IN medical informatics, clinical decision support systems, artificial intelligence |
The Use of
artificial intelligence (AI), chatbots, large language models, ChatGPT-4, as diagnostic help for clinicians As Diagnostic Tool |
Is equal Than
conventional medical information sources, as diagnostic help for clinicians |
To modify diagnostic performance on clinical vignettes. However, ChatGPT-4 alone was better than physicians in finding the right diagnostic (71% AI VS 63% physicians) | |
Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2145-2151. doi: 10.1007/s00405-023-08423-w | Controlled Trial (non-randomized) | |||
IN medical informatics, clinical decision support systems, artificial intelligence |
The Use of
artificial intelligence (AI), chatbots, large language models, ChatGPT-3.5 As Methodology procedure |
Is worse Than
human written medical structured text, UpToDate® |
To obtain answers to clinical questions with accuracy (mean 0.25 in a scale of 0-2 with ChatGPT) and usefulness (mean 1.0 ChatGPT VS 2.63 UpToDate in a scale of 1-3). ChatGPT 3.5 was limited to 2021 | |
NPJ Digit Med. 2025 Mar 22;8(1):175. doi: 10.1038/s41746-025-01543-z | Systematic Review | |||
IN medical informatics, clinical decision support systems, artificial intelligence, generative, diagnosis |
The Use of
generative artificial intelligence models As Diagnostic Tool |
Is equal Than
human physicians, non expert, but often worse than expert physicians |
To modify diagnostic accuracy on vignette medical cases: 52% overall | |
JAMA. 2023 Dec 19;330(23):2275-2284. doi: 10.1001/jama.2023.22295 | Randomized Controlled Trial | |||
IN medical informatics, clinical decision support systems, artificial intelligence, machine learning, diagnosis |
The Use of
non-biased artificial intelligence model in support to clinical diagnosis of 3 conditions: pneumonia, heart failure, and chronic obstructive pulmonary disease As Diagnostic Tool |
Is better Than
clinical diagnosis alone and, ever more, than biased artificial intelligence models |
To improve diagnostic acccuracy on clinical vignettes: 73% clinical alone VS 76% with AI predictions support VS 77.4% with AI support + explanations VS 62-64% with biased AI models support | |
JAMA Intern Med. 2016 Dec 1;176(12):1860-1861. doi: 10.1001/jamainternmed.2016.6001 | Clinical Trial (non-controlled, non-randomized) | |||
IN medical informatics, clinical decision support systems, diagnosis |
The Use of
computer symptoms checkers, artificial intelligence As Diagnostic Tool |
Is worse Than
physicians, human (trained) intelligence |
To find the correct diagnosis - on clinical vignettes - in the top 3 diagnoses listed (84% physicians vs 51% computer) | |
CMAJ. 2019 Dec 2;191(48):E1332-E1335. doi: 10.1503/cmaj.190506 | Review (Narrative) | |||
IN medical informatics, clinical decision support systems, diagnosis, medical thinking, diagnostic reasoning, cognition |
The Use of
artificial intelligence, machine learning As Diagnostic Tool |
Is worse Than
human intelligence |
To accurately reach a general diagnostic decision. Currently only effective for highly targeted tasks | |
BMJ. 2021 Oct 20;375:n2281. doi: 10.1136/bmj.n2281 | Systematic Review | |||
IN medical informatics, clinical decision support systems, machine learning, diagnosis, prognostic |
The Use of
current machine learning based diagnostic / prediction models As Diagnostic Tool |
Is bad Than
no comparison here |
To make accurate diagnostic / predictions: most studies on these models show poor methodological quality and are at high risk of bias | |
NEJM AI 2025 July 15;2(8) DOI: 10.1056/AIcs2401155 | Descriptive, Cross-Sectional Study | |||
IN medical informatics, keeping up-to-date medical knowledge systems, artificial intelligence |
The Use of
large language models (LLM), GPT-4o, Gemini 1.5 Pro, Llama 3.1, even fine-tuned As Methodology procedure |
Is bad Than
no comparison here |
To integrate relevant information from new FDS drug approvals, patient records, and updated medical guidelines |