Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department

Publikation: Bidrag til tidsskrift › Leder › Forskning › fagfællebedømt

Background
A high number of patients seek health information online, and large language models (LLMs) may produce a rising amount of it.

Aim
This study evaluates the performance regarding health information provided by ChatGPT, a LLM developed by OpenAI, focusing on its utility as a source for otolaryngology-related patient information.

Material and method
A variety of doctors from a tertiary otorhinolaryngology department used a Likert scale to assess the chatbot’s responses in terms of accuracy, relevance, and depth. The responses were also evaluated by ChatGPT.

Results
The composite mean of the three categories was 3.41, with the highest performance noted in the relevance category (mean = 3.71) when evaluated by the respondents. The accuracy and depth categories yielded mean scores of 3.51 and 3.00, respectively. All the categories were rated as 5 when evaluated by ChatGPT.

Conclusion and significance
Despite its potential in providing relevant and accurate medical information, the chatbot’s responses lacked depth and were found to potentially perpetuate biases due to its training on publicly available text. In conclusion, while LLMs show promise in healthcare, further refinement is necessary to enhance response depth and mitigate potential biases.

Originalsprog	Engelsk
Tidsskrift	Acta Oto-Laryngologica
Vol/bind	143
Udgave nummer	9
Sider (fra-til)	779-782
Antal sider	4
ISSN	0001-6489
DOI	https://doi.org/10.1080/00016489.2023.2254809
Status	Udgivet - 2023

Bibliografisk note

ID: 395912764