Visual Prompt Tuning
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost. Code is available at github.com/kmnp/vpt.
Original language | English |
---|---|
Title of host publication | Computer Vision – ECCV 2022 : 17th European Conference, Proceedings |
Editors | Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner |
Number of pages | 19 |
Publisher | Springer |
Publication date | 2022 |
Pages | 709-727 |
ISBN (Print) | 978-3-031-19826-7 |
ISBN (Electronic) | 978-3-031-19827-4 |
DOIs | |
Publication status | Published - 2022 |
Event | 17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, Israel Duration: 23 Oct 2022 → 27 Oct 2022 |
Conference
Conference | 17th European Conference on Computer Vision, ECCV 2022 |
---|---|
Land | Israel |
By | Tel Aviv |
Periode | 23/10/2022 → 27/10/2022 |
Series | Lecture Notes in Computer Science |
---|---|
Volume | 13693 LNCS |
ISSN | 0302-9743 |
Bibliographical note
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
ID: 342671827