自然语言处理从入门到应用——LangChain：提示（Prompts）-[示例选择器（Example Selectors）]-Toy模板网

这篇具有很好参考价值的文章主要介绍了自然语言处理从入门到应用——LangChain：提示（Prompts）-[示例选择器（Example Selectors）]。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

LangChain系列文章：

基础知识
快速入门
- 安装与环境配置
- 链（Chains）、代理（Agent:）和记忆（Memory）
- 快速开发聊天模型
模型（Models）
- 基础知识
- 大型语言模型（LLMs）
  - 基础知识
  - LLM的异步API、自定义LLM包装器、虚假LLM和人类输入LLM（Human Input LLM）
  - 缓存LLM的调用结果
  - 加载与保存LLM类、流式传输LLM与Chat Model响应和跟踪tokens使用情况
- 聊天模型（Chat Models）
  - 基础知识
  - 使用少量示例和响应流式传输
- 文本嵌入模型
  - Aleph Alpha、Amazon Bedrock、Azure OpenAI、Cohere等
  - Embaas、Fake Embeddings、Google Vertex AI PaLM等
提示（Prompts）
- 基础知识
- 提示模板
  - 基础知识
  - 连接到特征存储
  - 创建自定义提示模板和含有Few-Shot示例的提示模板
  - 部分填充的提示模板和提示合成
  - 序列化提示信息
- 示例选择器（Example Selectors）
- 输出解析器（Output Parsers）
记忆（Memory）
- 基础知识
- 记忆的类型
  - 会话缓存记忆、会话缓存窗口记忆和实体记忆
  - 对话知识图谱记忆、对话摘要记忆和会话摘要缓冲记忆
  - 对话令牌缓冲存储器和基于向量存储的记忆
- 将记忆添加到LangChain组件中
- 自定义对话记忆与自定义记忆类
- 聊天消息记录
- 记忆的存储与应用
索引（Indexes）
- 基础知识
- 文档加载器（Document Loaders）
- 文本分割器（Text Splitters）
- 向量存储器（Vectorstores）
- 检索器（Retrievers）
链（Chains）
- 基础知识
- 通用功能
  - 自定义Chain和Chain的异步API
  - LLMChain和RouterChain
  - SequentialChain和TransformationChain
  - 链的保存（序列化）与加载（反序列化）
- 链与索引
  - 文档分析和基于文档的聊天
  - 问答的基础知识
  - 图问答（Graph QA）和带来源的问答（Q&A with Sources）
  - 检索式问答
  - 文本摘要（Summarization）、HyDE和向量数据库的文本生成
代理（Agents）
- 基础知识
- 代理类型
- 自定义代理（Custom Agent）
- 自定义MRKL代理
- 带有ChatModel的LLM聊天自定义代理和自定义多操作代理（Custom MultiAction Agent）
- 工具
  - 基础知识
  - 自定义工具（Custom Tools）
  - 多输入工具和工具输入模式
  - 人工确认工具验证和Tools作为OpenAI函数
- 工具包（Toolkit）
- 代理执行器（Agent Executor）
  - 结合使用Agent和VectorStore
  - 使用Agents的异步API和创建ChatGPT克隆
  - 处理解析错误、访问中间步骤和限制最大迭代次数
  - 为代理程序设置超时时间和限制最大迭代次数和为代理程序和其工具添加共享内存
- 计划与执行
回调函数（Callbacks）

如果我们拥有大量的示例，我们可能需要选择在提示中包含哪些示例。ExampleSelector是负责执行此操作的类。其基本接口定义如下所示：

class BaseExampleSelector(ABC):
    """Interface for selecting examples to include in prompts."""

    @abstractmethod
    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""

它只需要暴露一个select_examples方法，该方法接收输入变量并返回一个示例列表。具体如何选择这些示例取决于每个具体实现。

自定义示例选择器（Custom Example Selector）

自定义示例选择器从给定的示例的列表中选择固定个示例。一个ExampleSelector必须实现两个方法：

一个add_example方法，它接受一个示例并将其添加到ExampleSelector中
一个select_examples方法，它接受输入变量（用户输入），并返回要在few-shot提示中使用的示例列表

让我们实现一个简单的自定义ExampleSelector，它只随机选择两个示例。

实现自定义示例选择器

from langchain.prompts.example_selector.base import BaseExampleSelector
from typing import Dict, List
import numpy as np

class CustomExampleSelector(BaseExampleSelector):
    def __init__(self, examples: List[Dict[str, str]]):
        self.examples = examples
    
    def add_example(self, example: Dict[str, str]) -> None:
        """Add new example to store for a key."""
        self.examples.append(example)

    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""
        return np.random.choice(self.examples, size=2, replace=False)

使用自定义示例选择器

examples = [
    {"foo": "1"},
    {"foo": "2"},
    {"foo": "3"}
]

# 初始化示例选择器
example_selector = CustomExampleSelector(examples)

# 选择示例
example_selector.select_examples({"foo": "foo"})
# -> [{'foo': '2'}, {'foo': '3'}]

# 向示例集合添加新示例
example_selector.add_example({"foo": "4"})
example_selector.examples
# -> [{'foo': '1'}, {'foo': '2'}, {'foo': '3'}, {'foo': '4'}]

# 选择示例
example_selector.select_examples({"foo": "foo"})
# -> [{'foo': '1'}, {'foo': '4'}]

基于长度的示例选择器（LengthBased ExampleSelector）

基于长度的示例选择器根据示例的长度来选择要使用的示例。当我们担心构建的提示内容超过上下文窗口的长度时这种示例选择器将非常有用。对于较长的输入，它会选择较少的示例进行包含，而对于较短的输入，它会选择更多的示例。

from langchain.prompts import PromptTemplate
from langchain.prompts import FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

# These are a lot of examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

example_selector = LengthBasedExampleSelector(
    # These are the examples it has available to choose from.
    examples=examples, 
    # This is the PromptTemplate being used to format the examples.
    example_prompt=example_prompt, 
    # This is the maximum length that the formatted examples should be.
    # Length is measured by the get_text_length function below.
    max_length=25,
    # This is the function used to get the length of a string, which is used
    # to determine which examples to include. It is commented out because
    # it is provided as a default value if none is specified.
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)

dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:", 
    input_variables=["adjective"],
)
# An example with small input, so it selects all examples.
print(dynamic_prompt.format(adjective="big"))

输出：

Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output:

当输入较长时：

# An example with long input, so it selects only one example.
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))
Give the antonym of every input

输出：

Input: happy
Output: sad

Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
Output:

我们还可以新增一个示例：

# You can add an example to an example selector as well.
new_example = {"input": "big", "output": "small"}
dynamic_prompt.example_selector.add_example(new_example)
print(dynamic_prompt.format(adjective="enthusiastic"))

输出：

Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output: small

Input: enthusiastic
Output:

最大边际相关性示例选择器（Maximal Marginal Relevance ExampleSelector）

最大边际相关性示例选择器根据示例与输入的相似度以及多样性进行选择。它通过找到与输入具有最大余弦相似度的示例的嵌入，然后迭代地添加它们，同时对它们与已选择示例的接近程度进行惩罚，来实现这一目标。

from langchain.prompts.example_selector import MaxMarginalRelevanceExampleSelector, SemanticSimilarityExampleSelector
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

# 这些是一个虚构任务创建反义词的许多示例。
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]
example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
    # 这是可供选择的示例列表。
    examples,
    # 这是用于生成嵌入向量以测量语义相似性的嵌入类。
    OpenAIEmbeddings(),
    # 这是用于存储嵌入向量并进行相似性搜索的 VectorStore 类。
    FAISS,
    # 这是要生成的示例数量。
    k=2
)
mmr_prompt = FewShotPromptTemplate(
    # 我们提供一个 ExampleSelector 而不是示例列表。
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="给出每个输入的反义词",
    suffix="输入：{adjective}\n输出：",
    input_variables=["adjective"],
)
# 输入是一个情感，因此应该选择 happy/sad 示例作为第一个示例
print(mmr_prompt.format(adjective="worried"))

输出：

Give the antonym of every input

Input: happy
Output: sad

Input: windy
Output: calm

Input: worried
Output:

我们还可以与仅基于相似性进行选择的情况进行比较：

# 使用 SemanticSimilarityExampleSelector 而不是 MaxMarginalRelevanceExampleSelector。
example_selector = SemanticSimilarityExampleSelector.from_examples(
    # 这是可供选择的示例列表。
    examples,
    # 这是用于生成嵌入向量以测量语义相似性的嵌入类。
    OpenAIEmbeddings(),
    # 这是用于存储嵌入向量并进行相似性搜索的 VectorStore 类。
    FAISS,
    # 这是要生成的示例数量。
    k=2
)
similar_prompt = FewShotPromptTemplate(
    # 我们提供一个 ExampleSelector 而不是示例列表。
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="给出每个输入的反义词",
    suffix="输入：{adjective}\n输出：",
    input_variables=["adjective"],
)
print(similar_prompt.format(adjective="worried"))

输出：

Give the antonym of every input

Input: happy
Output: sad

Input: sunny
Output: gloomy

Input: worried
Output:

N-Gram重叠示例选择器(N-Gram Overlap ExampleSelector)

NGramOverlapExampleSelector根据示例与输入之间的n-gram重叠得分选择和排序示例。n-gram重叠得分是一个介于0.0和1.0之间的浮点数。该选择器允许设置一个阈值分数。n-gram 重叠得分小于或等于阈值的示例将被排除。默认情况下，阈值设置为-1.0，因此不会排除任何示例，只会重新排序它们。将阈值设置为0.0将排除与输入没有n-gram重叠的示例。

from langchain.prompts import PromptTemplate
from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

# 这是一个假设任务（创建反义词）的许多示例。
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]
# 这些是虚构的翻译任务的示例。
examples = [
    {"input": "See Spot run.", "output": "Ver correr a Spot."},
    {"input": "My dog barks.", "output": "Mi perro ladra."},
    {"input": "Spot can run.", "output": "Spot puede correr."},
]

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

example_selector = NGramOverlapExampleSelector(
    # 这些是可供选择的示例。
    examples=examples, 
    # 用于格式化示例的 PromptTemplate。
    example_prompt=example_prompt, 
    # 选择器停止的阈值分数。
    # 默认值为 -1.0。
    threshold=-1.0,
    # 对于负阈值：
    # 选择器按照 ngram 重叠得分对示例进行排序，不排除任何示例。
    # 对于大于 1.0 的阈值：
    # 选择器排除所有示例，并返回一个空列表。
    # 对于等于 0.0 的阈值：
    # 选择器根据 ngram 重叠得分对示例进行排序，
    # 并排除与输入没有 ngram 重叠的示例。
)
dynamic_prompt = FewShotPromptTemplate(
    # 我们提供 ExampleSelector 而不是示例。
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="给出每个输入的西班牙语翻译",
    suffix="输入：{sentence}\n输出：", 
    input_variables=["sentence"],
)

# 一个与“Spot can run.”有较大ngram重叠的示例输入
# 与“My dog barks.”没有重叠
print(dynamic_prompt.format(sentence="Spot can run fast."))

输出：

Give the Spanish translation of every input

Input: Spot can run.
Output: Spot puede correr.

Input: See Spot run.
Output: Ver correr a Spot.

Input: My dog barks.
Output: Mi perro ladra.

Input: Spot can run fast.
Output:

我们还可以向NGramOverlapExampleSelector添加示例：

new_example = {"input": "Spot plays fetch.", "output": "Spot juega a buscar."}

example_selector.add_example(new_example)
print(dynamic_prompt.format(sentence="Spot can run fast."))

输出：

Give the Spanish translation of every input

Input: Spot can run.
Output: Spot puede correr.

Input: See Spot run.
Output: Ver correr a Spot.

Input: Spot plays fetch.
Output: Spot juega a buscar.

Input: My dog barks.
Output: Mi perro ladra.

Input: Spot can run fast.
Output:

我们还以设置一个阈值，决定哪些示例会被排除：

# 例如，将阈值设为0.0
# 会排除与输入没有ngram重叠的示例。
# 因为"My dog barks."与"Spot can run fast."没有ngram重叠，
# 所以它被排除在外。
example_selector.threshold=0.0
print(dynamic_prompt.format(sentence="Spot can run fast."))

输出：

Give the Spanish translation of every input

Input: Spot can run.
Output: Spot puede correr.

Input: See Spot run.
Output: Ver correr a Spot.

Input: Spot plays fetch.
Output: Spot juega a buscar.

Input: Spot can run fast.
Output:

我们也可以设置一个小的非零阈值：

example_selector.threshold=0.09
print(dynamic_prompt.format(sentence="Spot can play fetch."))

输出：

Give the Spanish translation of every input

Input: Spot can run.
Output: Spot puede correr.

Input: Spot plays fetch.
Output: Spot juega a buscar.

Input: Spot can play fetch.
Output:

我们再尝试设置大于1.0的阈值：

example_selector.threshold=1.0+1e-9
print(dynamic_prompt.format(sentence="Spot can play fetch."))
Give the Spanish translation of every input

输出：

Input: Spot can play fetch.
Output:

相似性示例选择器

语义相似性示例选择器根据输入与示例的相似性选择示例，它通过找到具有最大余弦相似度的嵌入的示例来实现这一点：

from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

以下是一个虚构任务的许多示例，用于创建反义词：

examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

使用这些示例，可以创建一个语义相似性示例选择器：

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # 这是可供选择的示例列表。
    examples,
    # 这是用于生成嵌入的嵌入类，用于衡量语义相似性。
    OpenAIEmbeddings(),
    # 这是用于存储嵌入并进行相似性搜索的VectorStore类。
    Chroma,
    # 这是要生成的示例数量。
    k=1
)

similar_prompt = FewShotPromptTemplate(
    # 我们提供了一个ExampleSelector而不是示例列表。
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="给出每个词的反义词",
    suffix="输入：{adjective}\n输出：",
    input_variables=["adjective"],
)

通过使用这个示例选择器，我们可以根据输入的相似性来选择示例，并将其应用于生成反义词的问题：

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.

输入worried是一种情感，因此应选择happy/sad示例：

print(similar_prompt.format(adjective="worried"))

输出：

给出每个词的反义词

输入：happy
输出：sad

输入：worried
输出：

输入fat是一种度量，因此应选择tall/short示例：

print(similar_prompt.format(adjective="fat"))

输出：

给出每个词的反义词

输入：happy
输出：sad

输入：fat
输出：

我们还可以将新示例添加到SemanticSimilarityExampleSelector中：

similar_prompt.example_selector.add_example({"input": "enthusiastic", "output": "apathetic"})
print(similar_prompt.format(adjective="joyful"))

输出：

给出每个词的反义词

输入：happy
输出：sad

输入：joyful
输出：

参考文献：
[1] LangChain官方网站：https://www.langchain.com/
[2] LangChain 🦜️🔗 中文网，跟着LangChain一起学LLM/GPT开发：https://www.langchain.com.cn/
[3] LangChain中文网 - LangChain 是一个用于开发由语言模型驱动的应用程序的框架：http://www.cnlangchain.com/文章来源地址https://www.toymoban.com/news/detail-630640.html

到了这里，关于自然语言处理从入门到应用——LangChain：提示（Prompts）-[示例选择器（Example Selectors）]的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

自然语言处理从入门到应用——LangChain：提示（Prompts）-[示例选择器（Example Selectors）]

自定义示例选择器（Custom Example Selector）

实现自定义示例选择器

使用自定义示例选择器

基于长度的示例选择器（LengthBased ExampleSelector）

最大边际相关性示例选择器（Maximal Marginal Relevance ExampleSelector）

N-Gram重叠示例选择器(N-Gram Overlap ExampleSelector)

相似性示例选择器

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

支付宝扫一扫领取红包，优惠每天领

二维码1

二维码2