Experts from the Gaidar Institute: “General-purpose chatbots are wrong 80% of the time when making diagnoses”

Experts from the Gaidar Institute: “General-purpose chatbots are wrong 80% of the time when making diagnoses”
Image by Freepik

Olga Magomedova and Maria Girich, Researchers at the Gaidar Institute’s International Best Practices Analysis Department, provided a detailed commentary for “Nezavisimaya Gazeta”, in which they explained the reasons for AI’s low effectiveness in medical diagnostics and discussed existing approaches to regulating this field in Russia and abroad.

As Olga Magomedova explained, the reason for the high probability of errors on the part of popular chatbots lies in the very nature of these technologies:

“AI chatbots are not a medical diagnostic tool; they are a user service based on a language model. It has no diagnostic purpose; its goal is to respond to a query based on the data provided—when a person enters a set of symptoms, the language model searches all publicly available resources describing such symptoms to find the most common descriptions of diseases. Therefore, naturally, 80% of AI responses will be incorrect regarding the diagnosis, lacking the set of data that a qualified doctor would request when working with the same patient. Conclusions: when formulating a query independently, the user is more likely to receive an incorrect answer due to a lack of knowledge about what data needs to be provided for a more accurate diagnosis.”

At the same time, Olga Magomedova drew attention to the fundamental difference between general-purpose chatbots and specialized medical AI systems. The latter, she said, can be truly useful, but only if an important condition is met:

“If we’re talking about genuine AI-based medical technologies for diagnosis, their effectiveness directly depends on the databases on which they are trained. This explains the narrow specialization of diagnostic technologies: a model trained to recognize images of tumors won’t be able to help evaluate bone scans. Such specialization increases the accuracy of results and can truly help doctors with diagnosis.”

The expert’s final conclusion is unequivocal:

“You should not make decisions about self-treatment based on outputs from non-specialized language models.”

Maria Girich also provided a detailed overview of the regulatory aspects of using AI in medicine:

“In Russia, medical devices with AI are classified as high-risk (Class 3) (Order of the Russian Ministry of Health No. 4n dated June 6, 2012) if they pose a high individual risk and/or a high risk to public health. Roszdravnadzor conducts post-registration monitoring of the safety and clinical efficacy of medical devices with AI. For example, the registration of Botkin.AI (medical image analysis and decision-making) was revoked.”

She also spoke about the experimental legal regime currently in effect in Russia:

“In addition, Russia is implementing an experimental legal regime for medical activities involving technologies for collecting and processing information on citizens’ health status and diagnoses (Decree of the Government of the Russian Federation No. 2276 dated December 9, 2022). “Personal medical assistants” are being tested, which collect patient data in the information systems of medical organizations to monitor patient health indicators, for example, in cases of hypertension, diabetes, etc. (blood pressure, heart rate, blood glucose levels, etc.).”

Speaking about international approaches, Maria Girich noted that the EU also classifies medical devices with AI as high-risk, imposing strict requirements on them:

“In this regard, specific requirements are imposed, for example, the presence of a risk management system (analysis of known risks and identification of reasonably foreseeable risks, as well as mandatory labeling and certification of devices), ensuring an error-free and complete dataset analyzed by AI, and ensuring human oversight of decisions generated by AI.”

For the market’s further development, according to the expert, the Ministry of Health needs to take a number of measures:

“To develop the market for AI-based medical devices, the Ministry of Health must also define requirements for AI software as high-risk, including: mandatory human monitoring of decisions made by AI; requirements for data collection and storage—it must be possible to analyze the data on which the AI generated a decision; implementation of a risk management system—the ability to forecast, respond, and automatically log events; the obligation of AI system developers to provide technical documentation for risk assessment.”

Maria Girich paid particular attention to the inadmissibility of completely replacing doctors with artificial intelligence, citing examples from China and the United States:

“It is also important to discuss the risks of replacing doctors’ decisions regarding patients with decisions generated by AI. For example, in China, a medical institution cannot use AI to impersonate a doctor or replace a doctor who is authorized to personally provide diagnostic and treatment services. Prescriptions for medications must also be written by the attending physician; the use of AI or other automated methods for generating prescriptions is prohibited. In the U.S., several bills have also been introduced, such as one in Georgia that would prohibit healthcare and health insurance decisions based solely on AI results—such decisions must be substantively reviewed by a human. Another example is a ban on healthcare facilities adopting policies that replace the independent assessments of licensed healthcare professionals with AI recommendations or decisions (Illinois bill).

Thursday, 16.04.2026