On Position Embeddings in BERT

Publikation: Konferencebidrag › Paper › Forskning

Dokumenter

Fulltext
Forlagets udgivne version, 2,12 MB, PDF-dokument

Benyou Wang
Lifeng Shan
Lioma, Christina
Xin Jiang
Hao Yang
Qun Liu
Simonsen, Jakob Grue

Various Position Embeddings (PEs) have been proposed in Transformer based architectures~(e.g. BERT) to model word order. These are empirically-driven and perform well, but no formal framework exists to systematically study them. To address this, we present three properties of PEs that capture word distance in vector space: translation invariance, monotonicity, and symmetry. These properties formally capture the behaviour of PEs and allow us to reinterpret sinusoidal PEs in a principled way.
Moreover, we propose a new probing test (called `identical word probing') and mathematical indicators to quantitatively detect the general attention patterns with respect to the above properties. An empirical evaluation of seven PEs (and their combinations) for classification (GLUE) and span prediction (SQuAD) shows that: (1) both classification and span prediction benefit from translation invariance and local monotonicity, while symmetry slightly decreases performance;
(2) The fully-learnable absolute PE performs better in classification, while relative PEs perform better in span prediction. We contribute the first formal and quantitative analysis of desiderata for PEs, and a principled discussion about their correlation to the performance of typical downstream tasks.

Originalsprog	Engelsk
Publikationsdato	2021
Status	Udgivet - 2021
Begivenhed	9th International Conference on Learning Representations - ICLR 2021 - Virtual Varighed: 3 maj 2021 → 7 maj 2021

Konference

Konference	9th International Conference on Learning Representations - ICLR 2021
By	Virtual
Periode	03/05/2021 → 07/05/2021

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk

Ingen data tilgængelig

ID: 300919719