高级提示技术 到目前为止,你应该已经意识到改进提示可以提高不同任务的结果质量。这就是提示工程背后的核心思想。 虽然那些例子很有趣,但在深入更高级的概念之前,让我们先系统地讨论一些概念。 主题: 零样本提示技术 少量样本提示技术 链式思维提示技术 零样本CoT 自我一致性 生成知识提示技术 自动提示工程师 零样本提示技术 今天经过大量数据训练并调整为遵循指令的大型语言模型(LLM)能够实现零样本任务执行。我们在上一节中尝试了一些零样本示例。以下是我们使用的一个示例: 提示: 输出: 请注意,在上述提示中我们没有向模型提供任何示例——这就是零样本能力的体现。当零样本不起作用时,建议在提示中提供演示或示例。接下来我们将讨论一种称为少量样本提示的技术。
到目前为止,你应该已经意识到改进提示可以提高不同任务的结果质量。这就是提示工程背后的核心思想。
虽然那些例子很有趣,但在深入更高级的概念之前,让我们先系统地讨论一些概念。
主题:
今天经过大量数据训练并调整为遵循指令的大型语言模型(LLM)能够实现零样本任务执行。我们在上一节中尝试了一些零样本示例。以下是我们使用的一个示例:
提示:
Classify the text into neutral, negative, or positive. Text: I think the vacation is okay. Sentiment:
输出:
Neutral
请注意,在上述提示中我们没有向模型提供任何示例——这就是零样本能力的体现。当零样本不起作用时,建议在提示中提供演示或示例。接下来我们将讨论一种称为少量样本提示的技术。
尽管大型语言模型已经展示了出色的零样本能力,但在使用零样本设置时,它们在处理更复杂任务时仍然存在不足。为了改善这一点,少量样本提示作为一种技术被用来实现上下文学习,其中我们在提示中提供演示以引导模型获得更好的性能。这些演示作为后续示例的条件,我们希望模型能生成响应。
通过一个由Brown等人,2020年提出的例子来展示少量样本提示。在这个例子中,任务是正确地在一个句子中使用新词。
提示:
A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is: We were traveling in Africa and we saw these very cute whatpus. To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:
输出:
When we won the game, we all started to farduddle in celebration.
我们可以观察到,模型通过仅提供一个示例(即1-shot)已经学会了如何执行任务。对于更复杂的任务,我们可以尝试增加演示的数量(例如3-shot、5-shot、10-shot等)。
根据Min等人,2022年的研究结果,这里有一些关于演示/示例的更多技巧:
让我们尝试一些示例。我们首先尝试一个带有随机标签的示例(这意味着标签“负面”和“正面”是随机分配给输入的):
提示:
This is awesome! // Negative This is bad! // Positive Wow that movie was rad! // Positive What a horrible show! //
输出:
Negative
即使标签被随机化,我们仍然得到了正确的答案。注意,我们也保持了格式,这也有帮助。实际上,进一步的实验表明,我们正在试验的新GPT模型变得更加健壮,甚至可以应对随机格式。示例:
提示:
Positive This is awesome! This is bad! Negative Wow that movie was rad! Positive What a horrible show! --
输出:
Negative
上面的格式没有一致性,但模型仍然预测出了正确的标签。我们需要进行更彻底的分析,以确认这对于不同的和更复杂的任务是否也成立,包括不同形式的提示。
标准的少量样本提示在许多任务中表现良好,但对于更复杂的推理任务来说,它仍然不是完美的技术。让我们来演示一下为什么这是这种情况。你还记得之前的例子,我们提供了以下任务:
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A:
如果我们再次尝试,模型会输出以下内容:
Yes, the odd numbers in this group add up to 107, which is an even number.
这不是正确的响应,这不仅突显了这些系统的局限性,而且表明需要更先进的提示工程技术。
让我们尝试添加一些示例,看看少量样本提示是否能改善结果。
提示:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1. A: The answer is False. The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24. A: The answer is True. The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24. A: The answer is True. The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2. A: The answer is False. The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A:
输出:
The answer is True.
这不起作用。看来少量样本提示不足以获得这种推理问题的可靠响应。上述示例提供了任务的基本信息。如果你仔细观察,我们会发现引入的任务涉及更多的推理步骤。换句话说,如果我们把问题分解成几个步骤并演示给模型,可能会有所帮助。最近,链式思维(CoT)提示被广泛用于解决更复杂的算术、常识和符号推理任务。
总的来说,提供示例对于解决某些任务是有用的。当零样本提示和少量样本提示都不足以解决问题时,可能意味着模型所学到的知识还不足以很好地完成任务。从这里开始,建议考虑微调你的模型或尝试更高级的提示技术。接下来我们将讨论一种流行的提示技术,即链式思维提示,它非常受欢迎。
在Wei等人,2022年中引入的链式思维(CoT)提示技术通过中间推理步骤实现了复杂的推理能力。你可以将其与少量样本提示结合使用,以在需要推理后才能响应的更复杂任务中获得更好的结果。
提示:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1. A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False. The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24. A: Adding all the odd numbers (17, 19) gives 36. The answer is True. The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24. A: Adding all the odd numbers (11, 13) gives 24. The answer is True. The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2. A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False. The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A:
输出:
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
哇!我们可以看到,当我们提供推理步骤时,结果完美。我们可以通过提供更少的示例来解决这个任务,即只需一个示例似乎就足够了:
提示:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1. A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False. The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A:
输出:
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
请记住,作者声称这是一种在足够大的语言模型中出现的新兴能力。
最近提出的一个想法是零样本CoT(Kojima等人,2022年),其基本思想是在原始提示中添加“让我们一步一步思考”。让我们尝试一个简单的问题,看看模型的表现如何:
提示:
I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
输出:
11 apples
答案不正确!现在让我们尝试使用特殊的提示。
提示:
I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with? Let's think step by step.
输出:
First, you started with 10 apples. You gave away 2 apples to the neighbor and 2 to the repairman, so you had 6 apples left. Then you bought 5 more apples, so now you had 11 apples. Finally, you ate 1 apple, so you would remain with 10 apples.
令人印象深刻的是,这个简单的提示在这个任务上非常有效。这在你没有太多示例可用的情况下特别有用。
也许在提示工程中更为高级的技术之一是自我一致性。由Wang等人,2022年提出的自我一致性旨在“取代链式思维提示中使用的朴素贪婪解码方法”。其想法是通过少量样本CoT采样多个不同的推理路径,并使用生成的答案来选择最一致的答案。这有助于提升CoT提示在涉及算术和常识推理任务中的性能。
让我们尝试以下算术推理的例子:
提示:
When I was 6 my sister was half my age. Now I’m 70 how old is my sister?
输出:
35
输出是错误的!我们如何使用自我一致性来改进呢?让我们试一试。我们将使用Wang等人,2022年(表17)的少量样本示例:
提示:
Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today? A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted. So, they must have planted 21 - 15 = 6 trees. The answer is 6. Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot? A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5. Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total? A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74 chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39. Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny? A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8. Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now? A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so in total he has 7 + 2 = 9 toys. The answer is 9. Q: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room? A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 = 20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers. The answer is 29. Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many golf balls did he have at the end of wednesday? A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33. Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left? A: She bought 5 bagels for $3 each. This means she spent 5 Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister? A:
输出1:
When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70 - 3 = 67. The answer is 67.
输出2:
When the narrator was 6, his sister was half his age, which is 3. Now that the narrator is 70, his sister would be 70 - 3 = 67 years old. The answer is 67.
输出3:
When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70/2 = 35. The answer is 35.
计算最终答案涉及一些步骤(详情请参阅论文),但为了简化起见,我们可以看到已经出现了多数答案,这将基本上成为最终答案。
LLM继续得到改进,一种流行的技术包括将知识或信息融入模型以帮助其做出更准确的预测。
类似地,模型能否在做出预测之前生成知识?这就是Liu等人,2022年论文中试图实现的内容——生成知识作为提示的一部分。特别是,这对常识推理任务有多大的帮助?
让我们尝试一个简单的提示:
提示:
Part of golf is trying to get a higher point total than others. Yes or No?
输出:
Yes.
这种错误揭示了LLM在执行需要更多世界知识的任务时的局限性。我们如何通过知识生成来改进呢?
首先,我们生成一些“知识”:
提示:
Input: Greece is larger than mexico. Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece. Input: Glasses always fog up. Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath, and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid, forming a film that you see as fog. Your lenses will be relatively cool compared to your breath, especially when the outside air is cold. Input: A fish is capable of thinking. Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their cognitive powers match or exceed those of ’higher’ vertebrates including non-human primates. Fish’s long-term memories help them keep track of complex social relationships. Input: A common effect of smoking lots of cigarettes in one’s lifetime is a higher than normal chance of getting lung cancer. Knowledge: Those who consistently averaged less than one cigarette per day over their lifetime had nine times the risk of dying from lung cancer than never smokers. Among people who smoked between one and 10 cigarettes per day, the risk of dying from lung cancer was nearly 12 times higher than that of never smokers. Input: A rock is the same size as a pebble. Knowledge: A pebble is a clast of rock with a particle size of 4 to 64 millimetres based on the Udden-Wentworth scale of sedimentology. Pebbles are generally considered larger than granules (2 to 4 millimetres diameter) and smaller than cobbles (64 to 256 millimetres diameter). Input: Part of golf is trying to get a higher point total than others. Knowledge:
知识1:
The objective of golf is to play a set of holes in the least number of strokes. A round of golf typically consists of 18 holes. Each hole is played once in the round on a standard golf course. Each stroke is counted as one point, and the total number of strokes is used to determine the winner of the game.
知识2:
Golf is a precision club-and-ball sport in which competing players (or golfers) use many types of clubs to hit balls into a series of holes on a course using the fewest number of strokes. The goal is to complete the course with the lowest score, which is calculated by adding up the total number of strokes taken on each hole. The player with the lowest score wins the game.
我们使用的是Liu等人,2022年论文中的提示。
下一步是整合知识并获取预测。我把问题重新格式化为问答格式,以指导答案格式。
提示:
Question: Part of golf is trying to get a higher point total than others. Yes or No? Knowledge: The objective of golf is to play a set of holes in the least number of strokes. A round of golf typically consists of 18 holes. Each hole is played once in the round on a standard golf course. Each stroke is counted as one point, and the total number of strokes is used to determine the winner of the game. Explain and Answer:
答案1(置信度非常高):
No, the objective of golf is not to get a higher point total than others. Rather, the objective is to play a set of holes in the least number of strokes. The total number of strokes is used to determine the winner of the game, not the total number of points.
Question: Part of golf is trying to get a higher point total than others. Yes or No? Knowledge: Golf is a precision club-and-ball sport in which competing players (or golfers) use many types of clubs to hit balls into a series of holes on a course using the fewest number of strokes. The goal is to complete the course with the lowest score, which is calculated by adding up the total number of strokes taken on each hole. The player with the lowest score wins the game. Explain and Answer:
答案2(置信度较低):
Yes, part of golf is trying to get a higher point total than others. Each player tries to complete the course with the lowest score, which is calculated by adding up the total number of strokes taken on each hole. The player with the lowest score wins the game.
这个例子中发生了一些非常有趣的事情。在第一个答案中,模型非常自信,而在第二个答案中则不那么自信。为了演示目的,我简化了过程,但在得出最终答案时还需要考虑其他一些细节。请查阅论文了解更多。

Zhou等人,2022年提出了自动提示工程师(APE),这是一种自动指令生成和选择框架。指令生成问题被看作是一个黑盒优化问题,通过LLM生成和搜索候选解决方案。
第一步涉及一个大型语言模型(作为推理模型),它接受输出演示以生成任务的指令候选。这些候选方案将指导搜索过程。指令使用目标模型执行,然后根据计算出的评估分数选择最合适的指令。
APE发现了比人工设计的“让我们一步一步思考”提示更好的零样本CoT提示(Kojima等人,2022年)。
提示“让我们一步一步地解决这个问题以确保我们得到了正确的答案。”激发了链式思维推理,并在MultiArith和GSM8K基准测试中提高了性能:

这篇论文探讨了与提示工程相关的一个重要话题,即自动优化提示的想法。虽然本指南中没有深入讨论这一话题,但如果你对此感兴趣,这里有几篇重要的论文:
声明:
本文件灏天文库团队进行了翻译。尽管我们力求准确,但请注意,翻译可能包含错误或不准确之处。原文档以其原始语言为准。我们不对因使用此翻译而产生的任何误解或误译负责。