Evaluating Deep Taylor Decomposition for Reliability Assessment in the Wild

Publikation: Bidrag til tidsskrift › Konferenceartikel › Forskning › fagfællebedømt

Standard

Evaluating Deep Taylor Decomposition for Reliability Assessment in the Wild. / Brandl, Stephanie; Hershcovich, Daniel; Søgaard, Anders.

I: Proceedings of the International AAAI Conference on Web and Social Media, Bind 16, 2022, s. 1368-1372.

Publikation: Bidrag til tidsskrift › Konferenceartikel › Forskning › fagfællebedømt

Harvard

Brandl, S, Hershcovich, D & Søgaard, A 2022, 'Evaluating Deep Taylor Decomposition for Reliability Assessment in the Wild', Proceedings of the International AAAI Conference on Web and Social Media, bind 16, s. 1368-1372. https://doi.org/10.1609/icwsm.v16i1.19389

APA

Brandl, S., Hershcovich, D., & Søgaard, A. (2022). Evaluating Deep Taylor Decomposition for Reliability Assessment in the Wild. Proceedings of the International AAAI Conference on Web and Social Media, 16, 1368-1372. https://doi.org/10.1609/icwsm.v16i1.19389

Vancouver

Brandl S, Hershcovich D, Søgaard A. Evaluating Deep Taylor Decomposition for Reliability Assessment in the Wild. Proceedings of the International AAAI Conference on Web and Social Media. 2022;16:1368-1372. https://doi.org/10.1609/icwsm.v16i1.19389

Author

Brandl, Stephanie ; Hershcovich, Daniel ; Søgaard, Anders. / Evaluating Deep Taylor Decomposition for Reliability Assessment in the Wild. I: Proceedings of the International AAAI Conference on Web and Social Media. 2022 ; Bind 16. s. 1368-1372.

Bibtex

@inproceedings{a6896f184045438581c9906b8b1ecfee,

title = "Evaluating Deep Taylor Decomposition for Reliability Assessment in the Wild",

abstract = "We argue that we need to evaluate model interpretability methods 'in the wild', i.e., in situations where professionals make critical decisions, and models can potentially assist them. We present an in-the-wild evaluation of token attribution based on Deep Taylor Decomposition, with professional journalists performing reliability assessments. We find that using this method in conjunction with RoBERTa-Large, fine-tuned on the Gossip Corpus, led to faster and better human decision-making, as well as a more critical attitude toward news sources among the journalists. We present a comparison of human and model rationales, as well as a qualitative analysis of the journalists' experiences with machine-in-the-loop decision making.",

author = "Stephanie Brandl and Daniel Hershcovich and Anders S{\o}gaard",

year = "2022",

doi = "10.1609/icwsm.v16i1.19389",

language = "English",

volume = "16",

pages = "1368--1372",

journal = "Proceedings of the International AAAI Conference on Web and Social Media",

issn = "2162-3449",

note = "16th International AAAI Conference on Web and Social Media ; Conference date: 06-06-2022 Through 09-06-2022",

}

RIS

TY - GEN

T1 - Evaluating Deep Taylor Decomposition for Reliability Assessment in the Wild

AU - Brandl, Stephanie

AU - Hershcovich, Daniel

AU - Søgaard, Anders

PY - 2022

Y1 - 2022

N2 - We argue that we need to evaluate model interpretability methods 'in the wild', i.e., in situations where professionals make critical decisions, and models can potentially assist them. We present an in-the-wild evaluation of token attribution based on Deep Taylor Decomposition, with professional journalists performing reliability assessments. We find that using this method in conjunction with RoBERTa-Large, fine-tuned on the Gossip Corpus, led to faster and better human decision-making, as well as a more critical attitude toward news sources among the journalists. We present a comparison of human and model rationales, as well as a qualitative analysis of the journalists' experiences with machine-in-the-loop decision making.

AB - We argue that we need to evaluate model interpretability methods 'in the wild', i.e., in situations where professionals make critical decisions, and models can potentially assist them. We present an in-the-wild evaluation of token attribution based on Deep Taylor Decomposition, with professional journalists performing reliability assessments. We find that using this method in conjunction with RoBERTa-Large, fine-tuned on the Gossip Corpus, led to faster and better human decision-making, as well as a more critical attitude toward news sources among the journalists. We present a comparison of human and model rationales, as well as a qualitative analysis of the journalists' experiences with machine-in-the-loop decision making.

U2 - 10.1609/icwsm.v16i1.19389

DO - 10.1609/icwsm.v16i1.19389

M3 - Conference article

VL - 16

SP - 1368

EP - 1372

JO - Proceedings of the International AAAI Conference on Web and Social Media

JF - Proceedings of the International AAAI Conference on Web and Social Media

SN - 2162-3449

T2 - 16th International AAAI Conference on Web and Social Media

Y2 - 6 June 2022 through 9 June 2022

ER -

ID: 339852192