Microsoft AI claims to beat doctors in complex medical diagnosis

Microsoft says its GPT-4-based medical AI outperformed doctors on complex case studies involving rare diseases, raising prospects for its use as a clinical assistant.

Wednesday July 02, 2025 , 2 min Read

Microsoft has claimed that its artificial intelligence (AI) model demonstrates a higher diagnostic accuracy than doctors when assessing complex medical cases. The announcement comes amid growing interest in the role of generative AI in clinical decision-making and diagnostics.

The findings are based on a new internal evaluation published by Microsoft researchers in collaboration with OpenAI, suggesting that AI models powered by GPT-4 show potential in matching or even surpassing human diagnostic capabilities, particularly in scenarios involving rare or difficult-to-identify conditions.

How Microsoft tested its medical AI model

The research, published on the preprint server arXiv, evaluated the AI’s performance against that of medical professionals using case vignettes derived from the New England Journal of Medicine. These vignettes featured complex patient histories and diagnostic challenges, requiring synthesis of symptoms across multiple systems.

According to the study, Microsoft's model scored higher than the average performance of previously published diagnostic models, and in many instances, matched or exceeded the accuracy of individual human clinicians. The GPT-4-based AI tool was evaluated using multiple-choice questions across 28 case studies, focusing on diagnostic precision, reasoning, and clinical relevance.

AI in medicine: Assistance, not replacement

Despite the promising results, Microsoft researchers emphasised that the AI is not intended to replace healthcare professionals. Instead, the technology is positioned as a supportive tool to enhance clinical decision-making, reduce diagnostic errors, and improve patient outcomes.

Peter Lee, Corporate Vice President of Research and Incubations at Microsoft, said the AI's diagnostic capability could be most valuable in resource-constrained settings or as a second opinion in complicated cases. He also acknowledged the limitations of AI, particularly in dealing with real-world medical ambiguity, patient-specific nuances, and the ethical implications of automated care.

Advertise with us