Browsing articles

Medical diagnosis and decision: Human versus Artificial intelligence (AI)
DISEASE	INTERVENTION	COMPARISON	RESULTS
Nature. 2024 Sep 25. doi: 10.1038/s41586-024-07930-y. Epub ahead of print		Descriptive
IN medical informatics, artificial intelligence	The Use of recent more powerful large language models As Methodology procedure	Is worse Than older, less powerful large language models	To while stability to different natural phrasings of the same question is improved, more powerful language models do not avoid answering questions, even if very difficult and, paradoxaly, do not secure areas of low difficulty
JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969		Randomized Controlled Trial, Diagnostic
IN medical informatics, clinical decision support systems, artificial intelligence	The Use of artificial intelligence (AI), chatbots, large language models, ChatGPT-4, as diagnostic help for clinicians As Diagnostic Tool	Is equal Than conventional medical information sources, as diagnostic help for clinicians	To modify diagnostic performance on clinical vignettes. However, ChatGPT-4 alone was better than physicians in finding the right diagnostic (71% AI VS 63% physicians)
Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2145-2151. doi: 10.1007/s00405-023-08423-w		Controlled Trial (non-randomized)
IN medical informatics, clinical decision support systems, artificial intelligence	The Use of artificial intelligence (AI), chatbots, large language models, ChatGPT-3.5 As Methodology procedure	Is worse Than human written medical structured text, UpToDate®	To obtain answers to clinical questions with accuracy (mean 0.25 in a scale of 0-2 with ChatGPT) and usefulness (mean 1.0 ChatGPT VS 2.63 UpToDate in a scale of 1-3). ChatGPT 3.5 was limited to 2021
NPJ Digit Med. 2025 Mar 22;8(1):175. doi: 10.1038/s41746-025-01543-z		Systematic Review
IN medical informatics, clinical decision support systems, artificial intelligence, generative, diagnosis	The Use of generative artificial intelligence models As Diagnostic Tool	Is equal Than human physicians, non expert, but often worse than expert physicians	To modify diagnostic accuracy on vignette medical cases: 52% overall
JAMA. 2023 Dec 19;330(23):2275-2284. doi: 10.1001/jama.2023.22295		Randomized Controlled Trial
IN medical informatics, clinical decision support systems, artificial intelligence, machine learning, diagnosis	The Use of non-biased artificial intelligence model in support to clinical diagnosis of 3 conditions: pneumonia, heart failure, and chronic obstructive pulmonary disease As Diagnostic Tool	Is better Than clinical diagnosis alone and, ever more, than biased artificial intelligence models	To improve diagnostic acccuracy on clinical vignettes: 73% clinical alone VS 76% with AI predictions support VS 77.4% with AI support + explanations VS 62-64% with biased AI models support
JAMA Intern Med. 2016 Dec 1;176(12):1860-1861. doi: 10.1001/jamainternmed.2016.6001		Clinical Trial (non-controlled, non-randomized)
IN medical informatics, clinical decision support systems, diagnosis	The Use of computer symptoms checkers, artificial intelligence As Diagnostic Tool	Is worse Than physicians, human (trained) intelligence	To find the correct diagnosis - on clinical vignettes - in the top 3 diagnoses listed (84% physicians vs 51% computer)
CMAJ. 2019 Dec 2;191(48):E1332-E1335. doi: 10.1503/cmaj.190506		Review (Narrative)
IN medical informatics, clinical decision support systems, diagnosis, medical thinking, diagnostic reasoning, cognition	The Use of artificial intelligence, machine learning As Diagnostic Tool	Is worse Than human intelligence	To accurately reach a general diagnostic decision. Currently only effective for highly targeted tasks
BMJ. 2021 Oct 20;375:n2281. doi: 10.1136/bmj.n2281		Systematic Review
IN medical informatics, clinical decision support systems, machine learning, diagnosis, prognostic	The Use of current machine learning based diagnostic / prediction models As Diagnostic Tool	Is bad Than no comparison here	To make accurate diagnostic / predictions: most studies on these models show poor methodological quality and are at high risk of bias
NEJM AI 2025 July 15;2(8) DOI: 10.1056/AIcs2401155		Descriptive, Cross-Sectional Study
IN medical informatics, keeping up-to-date medical knowledge systems, artificial intelligence	The Use of large language models (LLM), GPT-4o, Gemini 1.5 Pro, Llama 3.1, even fine-tuned As Methodology procedure	Is bad Than no comparison here	To integrate relevant information from new FDS drug approvals, patient records, and updated medical guidelines