Revisiting Transformer-based Models for Long Document Classification

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

Revisiting Transformer-based Models for Long Document Classification. / Dai, Xiang ; Chalkidis, Ilias; Darkner, Sune; Elliott, Desmond.

Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, 2022. s. 7212–7230.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Dai, X, Chalkidis, I, Darkner, S & Elliott, D 2022, Revisiting Transformer-based Models for Long Document Classification. i Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, s. 7212–7230.

APA

Dai, X., Chalkidis, I., Darkner, S., & Elliott, D. (2022). Revisiting Transformer-based Models for Long Document Classification. I Findings of the Association for Computational Linguistics: EMNLP 2022 (s. 7212–7230). Association for Computational Linguistics.

Vancouver

Dai X, Chalkidis I, Darkner S, Elliott D. Revisiting Transformer-based Models for Long Document Classification. I Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics. 2022. s. 7212–7230

Author

Dai, Xiang ; Chalkidis, Ilias ; Darkner, Sune ; Elliott, Desmond. / Revisiting Transformer-based Models for Long Document Classification. Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, 2022. s. 7212–7230

Bibtex

@inproceedings{f661069d5554449ca12450de9fe4973e,

title = "Revisiting Transformer-based Models for Long Document Classification",

abstract = "The recent literature in text classification is biased towards short text sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers to encode much longer text, namely sparse attention and hierarchical encoding methods.We examine several aspects of sparse attention (e.g., size of local attention window, use of global attention) and hierarchical (e.g., document splitting strategy) transformers on four document classification datasets covering different domains. We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.",

author = "Xiang Dai and Ilias Chalkidis and Sune Darkner and Desmond Elliott",

year = "2022",

language = "English",

pages = "7212–7230",

booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",

publisher = "Association for Computational Linguistics",

}

RIS

TY - GEN

T1 - Revisiting Transformer-based Models for Long Document Classification

AU - Dai, Xiang

AU - Chalkidis, Ilias

AU - Darkner, Sune

AU - Elliott, Desmond

PY - 2022

Y1 - 2022

N2 - The recent literature in text classification is biased towards short text sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers to encode much longer text, namely sparse attention and hierarchical encoding methods.We examine several aspects of sparse attention (e.g., size of local attention window, use of global attention) and hierarchical (e.g., document splitting strategy) transformers on four document classification datasets covering different domains. We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.

AB - The recent literature in text classification is biased towards short text sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers to encode much longer text, namely sparse attention and hierarchical encoding methods.We examine several aspects of sparse attention (e.g., size of local attention window, use of global attention) and hierarchical (e.g., document splitting strategy) transformers on four document classification datasets covering different domains. We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.

M3 - Article in proceedings

SP - 7212

EP - 7230

BT - Findings of the Association for Computational Linguistics: EMNLP 2022

PB - Association for Computational Linguistics

ER -

ID: 339145904