Commercially Available Chest Radiograph AI Tools for Detecting Airspace Disease, Pneumothorax, and Pleural Effusion

Research output: Contribution to journalJournal articleResearchpeer-review

Background: Commercially available artificial intelligence (AI) tools can assist radiologists in interpreting chest radiographs, but their real-life diagnostic accuracy remains unclear. Purpose: To evaluate the diagnostic accuracy of four commercially available AI tools for detection of airspace disease, pneumothorax, and pleural effusion on chest radiographs. Materials and Methods: This retrospective study included consecutive adult patients who underwent chest radiography at one of four Danish hospitals in January 2020. Two thoracic radiologists (or three, in cases of disagreement) who had access to all previous and future imaging labeled chest radiographs independently for the reference standard. Area under the receiver operating characteristic curve, sensitivity, and specificity were calculated. Sensitivity and specificity were additionally stratified according to the severity of findings, number of findings on chest radiographs, and radiographic projection. The χ2 and McNemar tests were used for comparisons. Results: The data set comprised 2040 patients (median age, 72 years [IQR, 58–81 years]; 1033 female), of whom 669 (32.8%) had target findings. The AI tools demonstrated areas under the receiver operating characteristic curve ranging 0.83–0.88 for airspace disease, 0.89–0.97 for pneumothorax, and 0.94–0.97 for pleural effusion. Sensitivities ranged 72%–91% for airspace disease, 63%–90% for pneumothorax, and 62%–95% for pleural effusion. Negative predictive values ranged 92%–100% for all target findings. In airspace disease, pneumothorax, and pleural effusion, specificity was high for chest radiographs with normal or single findings (range, 85%–96%, 99%–100%, and 95%–100%, respectively) and markedly lower for chest radiographs with four or more findings (range, 27%–69%, 96%–99%, 65%–92%, respectively) (P < .001). AI sensitivity was lower for vague airspace disease (range, 33%–61%) and small pneumothorax or pleural effusion (range, 9%–94%) compared with larger findings (range, 81%–100%; P value range, > .99 to < .001). Conclusion: Current-generation AI tools showed moderate to high sensitivity for detecting airspace disease, pneumothorax, and pleural effusion on chest radiographs. However, they produced more false-positive findings than radiology reports, and their performance decreased for smaller-sized target findings and when multiple findings were present.

Original languageEnglish
Article numbere231236
JournalRadiology
Volume308
Issue number3
Number of pages13
ISSN0033-8419
DOIs
Publication statusPublished - 2023

Bibliographical note

Funding Information:
This study was supported by a research grant from the Danish government (Project SmartChest, jr. nr 2020–6718). L.L.P., F.C.M., M.W.B., M.B., M.B.A. supported by funding from an AI Signature grant (SmartChest) from the Danish government, which included the PhD salaries connected to the study and meeting and/or travel support. F.C.M. supported by grants from the Agency for Digitalization (Digitaliseringsstyrelsen) and Innovation Fund Denmark, Capitol Region of Denmark.

Funding Information:
ineers. F.C.M. Institutional research grants from Siemens Healthineers and Innovation Fund Denmark; lecture payment from Siemens Healthineers. M.W.B. No relevant relationships. L.C.L. No relevant relationships. F.R. No relevant relationships. O.W.N. Lecture payments from Roche, Orion, Pharmacosmos, and Novartis; stock options in Bavarian Nordic and Merck; currently employed by Novo Nordisk. M.B. No relevant relationships. M.B.A. Lecture payments from Philips Healthcare, Siemens Healthineers, Boehringer Ingelheim, and Roche.

Publisher Copyright:
© RSNA, 2023.

ID: 396804210