Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department

Publikation: Bidrag til tidsskriftLederForskningfagfællebedømt

Background
A high number of patients seek health information online, and large language models (LLMs) may produce a rising amount of it.

Aim
This study evaluates the performance regarding health information provided by ChatGPT, a LLM developed by OpenAI, focusing on its utility as a source for otolaryngology-related patient information.

Material and method
A variety of doctors from a tertiary otorhinolaryngology department used a Likert scale to assess the chatbot’s responses in terms of accuracy, relevance, and depth. The responses were also evaluated by ChatGPT.

Results
The composite mean of the three categories was 3.41, with the highest performance noted in the relevance category (mean = 3.71) when evaluated by the respondents. The accuracy and depth categories yielded mean scores of 3.51 and 3.00, respectively. All the categories were rated as 5 when evaluated by ChatGPT.

Conclusion and significance
Despite its potential in providing relevant and accurate medical information, the chatbot’s responses lacked depth and were found to potentially perpetuate biases due to its training on publicly available text. In conclusion, while LLMs show promise in healthcare, further refinement is necessary to enhance response depth and mitigate potential biases.
OriginalsprogEngelsk
TidsskriftActa Oto-Laryngologica
Vol/bind143
Udgave nummer9
Sider (fra-til)779-782
Antal sider4
ISSN0001-6489
DOI
StatusUdgivet - 2023

Bibliografisk note

Publisher Copyright:
© 2023 Acta Oto-Laryngologica AB (Ltd).

ID: 395912764