The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks
Research output: Contribution to conference › Paper › Research
Standard
The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks. / Gonzalez, Ana Valeria; Søgaard, Anders.
2020. Paper presented at NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies, ONLINE.Research output: Contribution to conference › Paper › Research
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - CONF
T1 - The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks
AU - Gonzalez, Ana Valeria
AU - Søgaard, Anders
PY - 2020
Y1 - 2020
N2 - The Turing Test evaluates a computer program’s ability to mimic human behaviour. The Reverse Turing Test, reversely, evaluates a human’s ability to mimic machine behaviour in a forward prediction task. We propose to use the Reverse Turing Test to evaluate the quality of interpretability methods. The Reverse Turing Test improves on previous experimental protocols for human evaluation of interpretability methods by a) including a training phase, and b) masking the task, which, combined, enables us to evaluate models independently of their quality, in a way that is unbiased by the participants' previous exposure to the task. We present a human evaluation of LIME across five NLP tasks in a Latin Square design and analyze the effect of masking the task in forward prediction experiments. Additionally, we demonstrate a fundamental limitation of LIME and show how this limitation is detrimental for human forward prediction in some NLP tasks.
AB - The Turing Test evaluates a computer program’s ability to mimic human behaviour. The Reverse Turing Test, reversely, evaluates a human’s ability to mimic machine behaviour in a forward prediction task. We propose to use the Reverse Turing Test to evaluate the quality of interpretability methods. The Reverse Turing Test improves on previous experimental protocols for human evaluation of interpretability methods by a) including a training phase, and b) masking the task, which, combined, enables us to evaluate models independently of their quality, in a way that is unbiased by the participants' previous exposure to the task. We present a human evaluation of LIME across five NLP tasks in a Latin Square design and analyze the effect of masking the task in forward prediction experiments. Additionally, we demonstrate a fundamental limitation of LIME and show how this limitation is detrimental for human forward prediction in some NLP tasks.
M3 - Paper
T2 - NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies
Y2 - 11 December 2020
ER -
ID: 258400558