NEW You can now listen to Fox News articles!

This story discusses suicide. If you or someone you know is having thoughts of suicide, please contact the Suicide & Crisis Lifeline at 988 or 1-800-273-TALK (8255).

Artificial intelligence has been touted as a boon to healthcare, but a new study has revealed its potential shortcomings when it comes to giving medical advice.

In January, OpenAI launched ChatGPT Health, the medical-focused version of the popular chatbot tool.

The company introduced the tool as "a dedicated experience that securely brings your health information and ChatGPT’s intelligence together, to help you feel more informed, prepared and confident navigating your health."

But researchers at the Icahn School of Medicine at Mount Sinai have found that the tool failed to recommend emergency care for a "significant number" of serious medical cases.

The study, published in the journal Nature Medicine on Feb. 23, aimed to explore how ChatGPT Health — which is reported to have about 40 million users daily — handles situations where people are asking whether to seek emergency care.

"Right now, no independent body evaluates these products before they reach the public," lead author Ashwin Ramaswamy, MD, instructor of urology at the Icahn School of Medicine at Mount Sinai, told Fox News Digital.

"We wouldn't accept that for a medication or a medical device, and we shouldn't accept it for a product that tens of millions of people are using to make health decisions."

Emergency scenarios

The team created 60 clinical scenarios across 21 medical specialties, ranging from minor conditions to true medical emergencies.

Three independent physicians then assigned an appropriate level of urgency for each case, based on published clinical practice guidelines in 56 medical societies.

WOMAN SAYS CHATGPT SAVED HER LIFE BY HELPING DETECT CANCER, WHICH DOCTORS MISSED

The researchers conducted 960 interactions with ChatGPT Health to see how the tool responded, taking into account gender, race, barriers to care and "social dynamics."

While "clear-cut emergencies" — such as stroke or severe allergy — were generally handled well, the researchers found that the tool "under-triaged" many urgent medical issues.

For example, in one asthma scenario, the system acknowledged that the patient was showing early signs of respiratory failure, but still recommended waiting instead of seeking emergency care.

"ChatGPT Health performs well in medium-severity cases, but fails at both ends of the spectrum — the cases where getting it right matters most," Ramaswamy told Fox News Digital. "It under-triaged over half of genuine emergencies and over-triaged roughly two-thirds of mild cases that clinical guidelines say should be managed at home."

PARENTS FILE LAWSUIT ALLEGING CHATGPT HELPED THEIR TEENAGE SON PLAN SUICIDE

Under-triage can be life-threatening, the doctor noted, while over-triage can overwhelm emergency departments and delay care for those in real need.

Researchers also identified inconsistencies in suicide risk alerts. In some cases, it directed users to the 988 Suicide and Crisis Lifeline in lower-risk scenarios, and in others, it failed to offer that recommendation even when a person discussed suicidal ideations.

"ChatGPT Health performs well in medium-severity cases, but fails at both ends of the spectrum."

"The suicide guardrail failure was the most alarming," study co-author Girish N. Nadkarni, MD, chief AI officer of the Mount Sinai Health System, told Fox News Digital.

ChatGPT Health is designed to show a crisis intervention banner when someone describes thoughts of self-harm, the researcher noted.

"We tested it with a 27-year-old patient who said he'd been thinking about taking a lot of pills," Nadkarni shared. "When he described his symptoms alone, the banner appeared 100% of the time. Then we added normal lab results — same patient, same words, same severity — and the banner vanished."

"A safety feature that works perfectly in one context and completely fails in a nearly identical context … is a fundamental safety problem."

CHATGPT HEALTH PROMISES PRIVACY FOR HEALTH CONVERSATIONS

The researchers were also surprised by the social influence aspect.

"When a family member in the scenario said ‘it's nothing serious’ — which happens all the time in real life — the system became nearly 12 times more likely to downplay the patient's symptoms," Nadkarni said. "Everyone has a spouse or parent who tells them they're overreacting. The AI shouldn't be agreeing with them during a potential emergency."

Physicians react

Dr. Marc Siegel, Fox News senior medical analyst, called this an "important" study.

"It underlines the principle that while large language models can triage clear-cut emergencies, they have much more trouble with nuanced situations," Siegel, who was not involved in the study, told Fox News Digital.

"This is where doctors and clinical judgment come in — knowing the nuances of a patient's history and how they report symptoms and their approach to health."

ChatGPT and other LLMs can be helpful tools, Siegel said, but they "should not be used to give medical direction."

"Machine learning and continued input of data can help, but will never compensate for the essential problem – human judgment is needed to decide whether something is a true emergency or not."

BREAKTHROUGH BLOOD TEST COULD SPOT DOZENS OF CANCERS BEFORE SYMPTOMS APPEAR

Dr. Harvey Castro, an emergency physician and AI expert in Texas, echoed the importance of the study, calling it "exactly the kind of independent safety evaluation we need."

"Innovation moves fast. Oversight has to move just as fast," Castro, who also did not work on the study, told Fox News Digital. "In healthcare, the most dangerous mistakes happen at the extremes, when something looks mild but is actually catastrophic. That’s where clinical judgment matters most, and where AI must be stress-tested."

Study limitations

The researchers acknowledged some potential limitations in the study design.

"We used physician-written clinical scenarios rather than real patient conversations, and we tested at a single point in time — these systems update frequently, so performance may change," Ramaswamy told Fox News Digital.

CLICK HERE FOR MORE HEALTH STORIES

Additionally, most of the missed emergencies happened in situations where the danger depended on how the condition is changing over time. It’s not clear whether the same problem would happen with acute medical emergencies.

Because the system had to choose just one fixed urgency category, the test may not reflect the more nuanced advice it might give in a back-and-forth conversation, the researchers noted.

Also, the study wasn’t large enough to confidently detect small differences in how recommendations might vary by race or gender.

"We need continuous auditing, not one-time studies," Castro noted. "These systems update frequently, so evaluation must be ongoing."

‘Don’t wait’

The researchers emphasized the importance of seeking immediate care for serious issues.

CLICK HERE TO SIGN UP FOR OUR HEALTH NEWSLETTER

"If something feels seriously wrong — chest pain, difficulty breathing, a severe allergic reaction, thoughts of self-harm — go to the emergency department or call 988," Ramaswamy advised. "Don't wait for an AI to tell you it's okay."

The researchers noted that they support the use of AI to improve healthcare access, and that they didn’t conduct the study to "tear down the technology."

CLICK HERE TO DOWNLOAD THE FOX NEWS APP

"These tools can be genuinely useful for the right things — understanding a diagnosis you've already received, looking up what your medications do and their side effects, or getting answers to questions that didn't get fully addressed in a short doctor's visit," Ramaswamy said.

"That's a very different use case from deciding whether you need emergency care. Treat them as a complement to your doctor, not a replacement."

"This study doesn’t mean we abandon AI in healthcare."

Castro agreed that the benefits of AI health tools should be weighed against the risks.

"AI health tools can increase access, reduce unnecessary visits and empower patients with information," he said. "They are not inherently unsafe, but they are not yet substitutes for clinical judgment."

TEST YOURSELF WITH OUR LATEST LIFESTYLE QUIZ

"This study doesn’t mean we abandon AI in healthcare," he went on. "It means we mature it. Independent testing and stronger guardrails will determine whether AI becomes a safety net or a liability."

Fox News Digital reached out to Open AI, creator of ChatGPT, requesting comment.