Investigation Examines Capacity of AI to Sustain Racial and Gender Biases Within Clinical Decision Making

Assessing the Potential of Gpt-4 to Perpetuate Racial and Gender Biases in Healthcare: A Model Evaluation Study

Author: Brigham and Women’s Hospital – Contact: massgeneralbrigham.org
Published: 2023/12/18
Peer Reviewed: Yeah – Post type: Observational study
In this page: SummaryMain articleAbout the Author

Synopsis: The researchers analyzed the performance of GPT-4 in clinical decision support scenarios: clinical vignette generation, diagnostic reasoning, clinical plan generation, and subjective patient evaluations. When assessing patient perception, GPT-4 produced significantly different responses by gender or race/ethnicity in 23% of cases. When asked to generate clinical vignettes for medical education, GPT-4 failed to model the demographic diversity of medical conditions, exaggerating known demographic differences in prevalence in 89% of diseases.

advertisement

Main summary

“Evaluating the potential of Gpt-4 to perpetuate racial and gender bias in healthcare: a model evaluation study” – Digital health The Lancet.

Large language models (LLMs) such as ChatGPT and GPT-4 have the potential to help in clinical practice by automating administrative tasks, writing clinical notes, communicating with patients, and even supporting clinical decision making. However, preliminary studies suggest that the models may encode and perpetuate social biases that could negatively affect historically marginalized groups. A new study by researchers at Brigham and Women’s Hospital, a founding member of the Mass General Brigham healthcare system, assessed GPT-4’s tendency to encode and exhibit racial and gender biases in four clinical decision support functions. Their results are published in Digital health The Lancet.

“While most of the focus is on using LLMs for administrative or documentation tasks, there is also excitement about the potential of using LLMs to support clinical decision making,” said corresponding author Emily Alsentzer, PhD, postdoctoral researcher in the General Internal Affairs Division. Medicine at Brigham and Women’s Hospital. “We wanted to systematically evaluate whether GPT-4 encodes racial and gender biases that affect its ability to support clinical decision making.”

Evidence

Alsentzer and his colleagues tested four applications of GPT-4 using the Azure OpenAI platform. First, they asked GPT-4 to generate patient vignettes that could be used in medical education. Next, they tested GPT-4’s ability to correctly develop a differential diagnosis and treatment plan for 19 different patient cases from a NEJM Healer, a medical education tool that presents challenging clinical cases to physicians in training. Finally, they assessed how GPT-4 makes inferences about a patient’s clinical presentation using eight case vignettes that were originally generated to measure implicit bias. For each application, the authors evaluated whether the GPT-4 results were biased by race or gender.

For the medical education task, the researchers created ten prompts that required GPT-4 to generate a patient presentation for a given diagnosis. They ran each message 100 times and found that GPT-4 exaggerated known differences in disease prevalence by demographic group.

“A striking example is when GPT-4 is asked to generate a vignette for a patient with sarcoidosis: GPT-4 describes a black woman 81% of the time,” explains Alsentzer. “While sarcoidosis is more prevalent in black patients and in women, it does not represent 81% of all patients.”

Then, when GPT-4 was asked to develop a list of 10 possible diagnoses for NEJM Healer cases, changing the patient’s gender or race/ethnicity significantly affected its ability to prioritize the correct primary diagnosis in 37% of the cases. cases.

“In some cases, GPT-4 decision-making reflects racial and gender biases known in the literature,” Alsentzer said. “In the case of pulmonary embolism, the model classified panic/anxiety attack as a more likely diagnosis for women than men. It also classified sexually transmitted diseases, such as acute HIV and syphilis, as more likely for racial minority patients compared to white patients.”

When asked to assess subjective patient traits such as honesty, understanding, and pain tolerance, GPT-4 produced significantly different responses by race, ethnicity, and gender on 23% of the questions. For example, GPT-4 was significantly more likely to rate black male patients as abusers of the opioid Percocet than Asian, black, Hispanic, and white female patients when responses should have been identical for all simulated patient cases.

Limitations of the current study include testing GPT-4 responses using a limited number of simulated prompts and analyzing model performance using only a few traditional demographic identity categories. Future work should investigate bias using clinical notes from the electronic medical record.

“While LLM-based tools are currently being implemented with a clinician in the loop to verify model results, it is very difficult for clinicians to detect systemic biases when looking at individual patient cases,” Alsentzer said. “It is critical that we perform bias assessments for each intended use of LLMs, just as we do with other machine learning models in the medical domain. Our work can help start a conversation about the potential of GPT-4 to propagate biases in applications to support clinical decision making.

Literary paternity:

Other BWH authors include Jorge A Rodríguez, David W Bates and Raja-Elie E Abdulnour. Additional authors include Travis Zack, Eric Lehman, Mirac Suzgun, Leo Anthony Celi, Judy Gichoya, Dan Jurafsky, Peter Szolovits, and Atul J Butte.

Disclosures:

Alsentzer reports personal fees from Canopy Innovations, Fourier Health, and Xyla; and grants from Microsoft Research. Abdulnour is an employee of the Massachusetts Medical Society, which owns NEJM Healer (NEJM Healer cases were used in the study). Additional author disclosures can be found in the article.

Money:

NCI T32 Hematology/Oncology Training Grant; Open Philanthropy and National Science Foundation (IIS-2128145); and a philanthropic donation from Priscilla Chan and Mark Zuckerberg.

Article cited:

Zack, T; Lehman, E et al. “Evaluating the potential of GPT-4 to perpetuate racial and gender bias in healthcare: a model evaluation study” Digital health The Lancet.

Attribution/Source(s):

This peer-reviewed article related to our AI and Disabilities section was selected for publication by Disabled World editors because of its likely interest to readers in our disabilities community. Although content may have been edited for style, clarity, or length, the article “Research examines AI’s ability to maintain racial and gender biases in clinical decision-making” was originally written by Brigham and Women’s Hospital and published by Disabled-World.com on 12/18/2023. If you require further information or clarification, you may contact Brigham and Women’s Hospital at massgeneralbrigham.org. Disabled World makes no warranties or representations in connection therewith.

advertisement

Discover related topics

Share this information with:
𝕏.com Facebook Reddit

Page information, citations and disclaimer

Disabled World is an independent disability community founded in 2004 to provide disability news and information to disabled people, older people, their families and/or carers. Visit our homepage for informative reviews, exclusive stories, and how-tos. You can connect with us on social media like X.com and ours Facebook page.

Permanent link: Research examines AI’s ability to maintain racial and gender biases in clinical decision making

Cite this page (APA): Brigham and Women’s Hospital. (2023, December 18). The research examines AI’s ability to maintain racial and gender biases in clinical decision-making. Disabled world. Retrieved December 18, 2023 from www.disabled-world.com/assistivedevices/ai/clinical-decisions.php

Disabled World provides general information only. The materials presented are never intended to be a substitute for qualified professional medical care. Any third party offers or advertisements do not constitute an endorsement.

We will be happy to hear your thoughts

Leave a reply

Equipment4cpr
Logo
Register New Account
Compare items
  • Total (0)
Compare
0
Shopping cart