Multi-modal Prompt Learning (MaPLe) for both vision and language branches to ![]() ![]() Representation spaces on a downstream task. Since it does not allow the flexibility to dynamically adjust both Representations in a single branch of CLIP (language or vision) is sub-optimal Recent CLIP adaptation approaches learn prompts as the textual inputs toįine-tune CLIP for downstream tasks. Inspired by the Natural Language Processing (NLP) literature, However, they are sensitive to theĬhoice of input text prompts and require careful selection of prompt templates Generalization ability to downstream tasks. Download a PDF of the paper titled MaPLe: Multi-modal Prompt Learning, by Muhammad Uzair Khattak and 4 other authors Download PDF Abstract: Pre-trained vision-language (V-L) models such as CLIP have shown excellent
0 Comments
Leave a Reply. |