Efficient Structured Prediction with Transformer Encoders

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Efficient Structured Prediction with Transformer Encoders. / Basirat, Ali.

In: The Northern European Journal of Language Technology (NEJLT), Vol. 10, No. 1, 14.03.2024.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Basirat, A 2024, 'Efficient Structured Prediction with Transformer Encoders', The Northern European Journal of Language Technology (NEJLT), vol. 10, no. 1. https://doi.org/10.3384/nejlt.2000-1533.2024.4932

APA

Basirat, A. (2024). Efficient Structured Prediction with Transformer Encoders. The Northern European Journal of Language Technology (NEJLT), 10(1). https://doi.org/10.3384/nejlt.2000-1533.2024.4932

Vancouver

Basirat A. Efficient Structured Prediction with Transformer Encoders. The Northern European Journal of Language Technology (NEJLT). 2024 Mar 14;10(1). https://doi.org/10.3384/nejlt.2000-1533.2024.4932

Author

Basirat, Ali. / Efficient Structured Prediction with Transformer Encoders. In: The Northern European Journal of Language Technology (NEJLT). 2024 ; Vol. 10, No. 1.

Bibtex

@article{3cd17b9fade0413ab3d4287a1c888d8a,
title = "Efficient Structured Prediction with Transformer Encoders",
abstract = "Finetuning is a useful method for adapting Transformer-based text encoders to new tasks but can be computationally expensive for structured prediction tasks that require tuning at the token level. Furthermore, finetuning is inherently inefficient in updating all base model parameters, which prevents parameter sharing across tasks. To address these issues, we propose a method for efficient task adaptation of frozen Transformer encoders based on the local contribution of their intermediate layers to token representations. Our adapter uses a novel attention mechanism to aggregate intermediate layers and tailor the resulting representations to a target task. Experiments on several structured prediction tasks demonstrate that our method outperforms previous approaches, retaining over 99% of the finetuning performance at a fraction of the training cost. Our proposed method offers an efficient solution for adapting frozen Transformer encoders to new tasks, improving performance and enabling parameter sharing across different tasks.",
keywords = "Faculty of Humanities, large language models, structured prediction, Relation extraction, Faculty of Science, large language models, deep learning, finetuning",
author = "Ali Basirat",
year = "2024",
month = mar,
day = "14",
doi = "https://doi.org/10.3384/nejlt.2000-1533.2024.4932",
language = "English",
volume = "10",
journal = "The Northern European Journal of Language Technology (NEJLT)",
issn = "2000-1533",
publisher = "Link{\"o}pings University Electronic Press",
number = "1",

}

RIS

TY - JOUR

T1 - Efficient Structured Prediction with Transformer Encoders

AU - Basirat, Ali

PY - 2024/3/14

Y1 - 2024/3/14

N2 - Finetuning is a useful method for adapting Transformer-based text encoders to new tasks but can be computationally expensive for structured prediction tasks that require tuning at the token level. Furthermore, finetuning is inherently inefficient in updating all base model parameters, which prevents parameter sharing across tasks. To address these issues, we propose a method for efficient task adaptation of frozen Transformer encoders based on the local contribution of their intermediate layers to token representations. Our adapter uses a novel attention mechanism to aggregate intermediate layers and tailor the resulting representations to a target task. Experiments on several structured prediction tasks demonstrate that our method outperforms previous approaches, retaining over 99% of the finetuning performance at a fraction of the training cost. Our proposed method offers an efficient solution for adapting frozen Transformer encoders to new tasks, improving performance and enabling parameter sharing across different tasks.

AB - Finetuning is a useful method for adapting Transformer-based text encoders to new tasks but can be computationally expensive for structured prediction tasks that require tuning at the token level. Furthermore, finetuning is inherently inefficient in updating all base model parameters, which prevents parameter sharing across tasks. To address these issues, we propose a method for efficient task adaptation of frozen Transformer encoders based on the local contribution of their intermediate layers to token representations. Our adapter uses a novel attention mechanism to aggregate intermediate layers and tailor the resulting representations to a target task. Experiments on several structured prediction tasks demonstrate that our method outperforms previous approaches, retaining over 99% of the finetuning performance at a fraction of the training cost. Our proposed method offers an efficient solution for adapting frozen Transformer encoders to new tasks, improving performance and enabling parameter sharing across different tasks.

KW - Faculty of Humanities

KW - large language models

KW - structured prediction

KW - Relation extraction

KW - Faculty of Science

KW - large language models

KW - deep learning

KW - finetuning

U2 - https://doi.org/10.3384/nejlt.2000-1533.2024.4932

DO - https://doi.org/10.3384/nejlt.2000-1533.2024.4932

M3 - Journal article

VL - 10

JO - The Northern European Journal of Language Technology (NEJLT)

JF - The Northern European Journal of Language Technology (NEJLT)

SN - 2000-1533

IS - 1

ER -

ID: 385593226