【论文笔记】Pre-train, Prompt, and Predict

这篇具有很好参考价值的文章主要介绍了【论文笔记】Pre-train, Prompt, and Predict。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Prompt Template Engineering

Prompt shape

cloze prompts(eg:I love this movie, it is a [Z] movie): for tasks that are solved using masked LMs
prefix prompts(eg:I love this movie. What’s the sentiment of the review? [Z]): for generation tasks

for some tasks regarding multiple inputs such as text pair classification, prompt templates must contain space for two inputs, [X1] and [X2],ormore.

Manual Template Engineering

Automated Template Learning

Discrete Prompts(hard prompts)

Prompt Mining: 找到input x和output y之间的桥梁(middle words) [X] middle words [Z]
Prompt Paraphrasing: 复述已有的prompt
Gradient-based Search
Prompt Generation: 当成text generation task
Prompt Scoring

Continuous Prompts(soft prompts)(没太懂)

Prefix Tuning: 保持语言模型（LM）参数固定，在生成模型输入之前添加了一些特定的向量序列
Tuning Initialized with Discrete Prompts:
Hard-SoftPromptHybridTuning

Prompt Answer Engineering

Answer Shape

Tokens
Span
Sentence

Answer Space Design Methods

Manual Design

Unconstrained Spaces: 直接就是 answer z = output y
Constrained Spaces: calculate the probability of an output among multiple choices

Discrete Answer Search

Answer Paraphrasing
Prune-then-Search(eg: top k)
Label Decomposition: decompose each relation label into its constituent words and use them as an answer.

Continuous Answer Search(?)

Multi-Prompt

Prompt Ensembling

【论文笔记】Pre-train, Prompt, and Predict,论文阅读,prompt

Uniform averaging: selecting K prompts that achieve the highest accuracy on the training set and then use the average log probabilities obtained from the top K prompts to calculate the probability for a single token at [Z] position
Weighted averaging
Majority voting
Knowledge distillation: 多模型集成之后，可以把多个模型的知识蒸馏到一个模型里面。
Prompt ensembling for text generation: generate the output based on the ensembled probability of the next word in the answer sequence

Prompt Augmentation

【论文笔记】Pre-train, Prompt, and Predict,论文阅读,prompt

Sample Selection: how do we choose the most effective examples?
Sample Ordering: How do we properly order the chosen examples?

Prompt Composition

【论文笔记】Pre-train, Prompt, and Predict,论文阅读,prompt

Prompt Decomposition

【论文笔记】Pre-train, Prompt, and Predict,论文阅读,prompt

Training Strategies for Prompting Methods

Promptless Fine-tuning

BERT, RoBERTa

parameters of the pre-trained LM will be updated via gradients induced from downstream training samples

Disadvantages: LMs may overfit or not learn stably on smaller datasets.

Tuning-free Prompting

LAMA, GPT-3

directly generates the answers without changing the parameters of the pre-trained LMs based only on a prompt

Disadvantages: heavy engineering on prompt

Fixed-LM Prompt Tuning

Prefix-Tuning and Prompt-Tuning

语言模型的参数不进行改变，添加提示，并在提示部分引入额外参数。仅对提示部分的参数进行训练。

Disadvantages: Not applicable in zero-shot scenarios. While effective in few-shot scenarios, representation power is limited in large-data settings. Prompt engineering through choice of hyperparameters or seed prompts is necessary. Prompts are usually not human-interpretable or manipulable.

Fixed-prompt LM Tuning

PET-TC , PET-Gen , and LM-BFF

语言模型的参数参与训练，提示部分的参数固定，与上一种方法相反。

Disadvantages: Template or answer engineering are still required, although perhaps not as much as without prompting. LMs fine-tuned on one downstream task may not be effective on another one.

Prompt+LM Tuning

PADA and P-Tuning

全部参数参与微调

Disadvantages: Requires training and storing all parameters of the models. May overfit to small datasets.

Meta-Application

Domain Adaptation

adapting a model from one domain (e.g., news text) to another (e.g., social media text)
Debiasing

perform self-diagnosis and self-debiasing based on biased or debiased instructions
Dataset Construction

Challenges

Selection of Pre-trained LMs

Prompt Design

information extraction and text analysis tasks
Prompting with Structured Information
Entanglement of Template and Answer

How to simultaneously search or learn for the best combination of template and answer

Prompt Answer Engineering

Many-class Classification Tasks.

When there are too many classes, how to select an appropriate answer space
Long-answer Classification Tasks

how to best decode multiple tokens using LMs
Multiple Answers for Generation Tasks

How to better guide the learning process with multiple references文章来源地址https://www.toymoban.com/news/detail-830031.html