Visual Prompt Tuning

Research output: Working paper › Preprint › Research

Standard

Visual Prompt Tuning. / Jia, Menglin; Tang, Luming; Chen, Bor-Chun; Cardie, Claire; Belongie, Serge; Hariharan, Bharath.

arXiv.org, 2022.

Research output: Working paper › Preprint › Research

Harvard

Jia, M, Tang, L, Chen, B-C, Cardie, C, Belongie, S & Hariharan, B 2022 'Visual Prompt Tuning' arXiv.org. <https://arxiv.org/pdf/2203.12119.pdf>

APA

Jia, M., Tang, L., Chen, B-C., Cardie, C., Belongie, S., & Hariharan, B. (2022). Visual Prompt Tuning. arXiv.org. https://arxiv.org/pdf/2203.12119.pdf

Vancouver

Jia M, Tang L, Chen B-C, Cardie C, Belongie S, Hariharan B. Visual Prompt Tuning. arXiv.org. 2022 Mar 23.

Author

Jia, Menglin ; Tang, Luming ; Chen, Bor-Chun ; Cardie, Claire ; Belongie, Serge ; Hariharan, Bharath. / Visual Prompt Tuning. arXiv.org, 2022.

Bibtex

@techreport{9126eca298a44ea3aae29e2ea0c2672c,

title = "Visual Prompt Tuning",

abstract = "The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.",

author = "Menglin Jia and Luming Tang and Bor-Chun Chen and Claire Cardie and Serge Belongie and Bharath Hariharan",

year = "2022",

month = mar,

day = "23",

language = "English",

publisher = "arXiv.org",

type = "WorkingPaper",

institution = "arXiv.org",

}

RIS

TY - UNPB

T1 - Visual Prompt Tuning

AU - Jia, Menglin

AU - Tang, Luming

AU - Chen, Bor-Chun

AU - Cardie, Claire

AU - Belongie, Serge

AU - Hariharan, Bharath

PY - 2022/3/23

Y1 - 2022/3/23

N2 - The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.

AB - The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.

M3 - Preprint

BT - Visual Prompt Tuning

PB - arXiv.org

ER -

ID: 303685576

Forskning