【LangChain】检索器之上下文压缩

这篇具有很好参考价值的文章主要介绍了【LangChain】检索器之上下文压缩。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

LangChain学习文档

  • 【LangChain】检索器(Retrievers)
  • 【LangChain】检索器之MultiQueryRetriever
  • 【LangChain】检索器之上下文压缩

概要

检索的一项挑战是,通常我们不知道:当数据引入系统时,文档存储系统会面临哪些特定查询。

这意味着与查询最相关的信息可能被隐藏在包含大量不相关文本的文档中。

通过我们的应用程序传递完整的文件可能会导致更昂贵的llm通话和更差的响应。

上下文压缩旨在解决这个问题。

这个想法很简单:我们可以使用给定查询的上下文来压缩它们,以便只返回相关信息,而不是立即按原样返回检索到的文档。

这里的“压缩”既指压缩单个文档的内容,也指批量过滤文档。

要使用上下文压缩检索器,我们需要:

  • 基础检索器
  • 文档压缩器

上下文压缩检索器将查询传递给基础检索器,获取初始文档并将它们传递给文档压缩器。文档压缩器获取文档列表并通过减少文档内容或完全删除文档来缩短它。

【LangChain】检索器之上下文压缩,LangChain,AI,langchain,检索器

内容

# 打印文档的辅助功能

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

使用普通向量存储检索器

让我们首先初始化一个简单的向量存储检索器并存储 2023 年国情咨文演讲(以块的形式)。我们可以看到,给定一个示例问题,我们的检索器返回一两个相关文档和一些不相关的文档。甚至相关文档中也有很多不相关的信息。

from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS
# 加载文档
documents = TextLoader('../../../state_of_the_union.txt').load()
# 拆分器
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# 拆分文档
texts = text_splitter.split_documents(documents)
# 构建索引,并构建检索器
retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()
# 运行
docs = retriever.get_relevant_documents("What did the president say about Ketanji Brown Jackson")
# 美化打印
pretty_print_docs(docs)

结果:

    Document 1:
    
    Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 
    
    Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 
    
    One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 
    
    And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
    ----------------------------------------------------------------------------------------------------
    Document 2:
    
    A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. 
    
    And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. 
    
    We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  
    
    We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  
    
    We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. 
    
    We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
    ----------------------------------------------------------------------------------------------------
    Document 3:
    
    And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. 
    
    As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. 
    
    While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. 
    
    And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. 
    
    So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.  
    
    First, beat the opioid epidemic.
    ----------------------------------------------------------------------------------------------------
    Document 4:
    
    Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers. 
    
    And as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.  
    
    That ends on my watch. 
    
    Medicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. 
    
    We’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. 
    
    Let’s pass the Paycheck Fairness Act and paid leave.  
    
    Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. 
    
    Let’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges.

使用 LLMChainExtractor 添加上下文压缩(Adding contextual compression with an LLMChainExtractor)

现在让我们用 ContextualCompressionRetriever 包装我们的基本检索器。我们将添加一个 LLMChainExtractor,它将迭代最初返回的文档,并从每个文档中仅提取与查询相关的内容。

from langchain.llms import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
# 构建大模型
llm = OpenAI(temperature=0)
# 从大模型中构建LLMChainExtractor
compressor = LLMChainExtractor.from_llm(llm)
# 构建压缩检索器
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)
# 运行
compressed_docs = compression_retriever.get_relevant_documents("What did the president say about Ketanji Jackson Brown")
# 美化打印
pretty_print_docs(compressed_docs)

结果:

    Document 1:
    
    "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 
    
    And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence."
    ----------------------------------------------------------------------------------------------------
    Document 2:
    
    "A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

更多内置压缩机:过滤器(More built-in compressors: filters)

LLMChainFilter

LLMChainFilter 是稍微简单但更强大的压缩器,它使用 LLM Chain来决定过滤掉最初检索到的文档中的哪些文档以及返回哪些文档,而无需操作文档内容。

from langchain.retrievers.document_compressors import LLMChainFilter

# 构建LLMChainFilter
_filter = LLMChainFilter.from_llm(llm)
# 构建上下文压缩检索器
compression_retriever = ContextualCompressionRetriever(base_compressor=_filter, base_retriever=retriever)
# 运行
compressed_docs = compression_retriever.get_relevant_documents("What did the president say about Ketanji Jackson Brown")
# 美化打印
pretty_print_docs(compressed_docs)
    Document 1:
    
    Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 
    
    Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 
    
    One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 
    
    And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

EmbeddingsFilter

对每个检索到的文档进行额外的 LLM 调用既昂贵又缓慢。 EmbeddingsFilter 通过嵌入文档和查询并仅返回那些与查询具有足够相似嵌入的文档来提供更便宜且更快的选项。

from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers.document_compressors import EmbeddingsFilter
# 构建嵌入
embeddings = OpenAIEmbeddings()
# 构建EmbeddingsFilter
embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
# 构建上下文压缩检索器
compression_retriever = ContextualCompressionRetriever(base_compressor=embeddings_filter, base_retriever=retriever)
# 运行
compressed_docs = compression_retriever.get_relevant_documents("What did the president say about Ketanji Jackson Brown")
# 美化打印
pretty_print_docs(compressed_docs)

结果:

    Document 1:
    
    Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 
    
    Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 
    
    One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 
    
    And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
    ----------------------------------------------------------------------------------------------------
    Document 2:
    
    A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. 
    
    And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. 
    
    We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  
    
    We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  
    
    We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. 
    
    We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
    ----------------------------------------------------------------------------------------------------
    Document 3:
    
    And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. 
    
    As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. 
    
    While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. 
    
    And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. 
    
    So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.  
    
    First, beat the opioid epidemic.

将压缩器和文档转换器串在一起(Stringing compressors and document transformers together)

使用 DocumentCompressorPipeline 我们还可以轻松地按顺序组合多个压缩器。除了压缩器之外,我们还可以将 BaseDocumentTransformers 添加到管道中,它不执行任何上下文压缩,而只是对一组文档执行一些转换。

例如,TextSplitters 可以用作文档转换器,将文档分割成更小的部分,而 EmbeddingsRedundantFilter 可以用于根据文档之间嵌入的相似性来过滤掉冗余文档。

下面我们创建一个压缩器管道,首先将文档分割成更小的块,然后删除冗余文档,然后根据与查询的相关性进行过滤。

from langchain.document_transformers import EmbeddingsRedundantFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.text_splitter import CharacterTextSplitter
# 构建拆分器
splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0, separator=". ")
# 构建EmbeddingsRedundantFilter
redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
# 构建嵌入过滤器:EmbeddingsFilter
relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
# 构建文档管道
pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, redundant_filter, relevant_filter]
)
# 构建上下文检索器
compression_retriever = ContextualCompressionRetriever(base_compressor=pipeline_compressor, base_retriever=retriever)
# 运行
compressed_docs = compression_retriever.get_relevant_documents("What did the president say about Ketanji Jackson Brown")
# 美化打印
pretty_print_docs(compressed_docs)

结果:

    Document 1:
    
    One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 
    
    And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson
    ----------------------------------------------------------------------------------------------------
    Document 2:
    
    As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. 
    
    While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year
    ----------------------------------------------------------------------------------------------------
    Document 3:
    
    A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder

总结

我们在进行文档搜索的时候,正相关的文档是少部分,大部分都是不相关的文档。
我们可以使用上下文压缩检索器,只返回正相关的那部分文档。

主要步骤:

  1. 构建一个普通检索器:retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()
  2. 构建一个上下文压缩检索器:ContextualCompressionRetriever(base_compressor=embeddings_filter, base_retriever=retriever)

特别是第二步骤:构建上下文压缩器的第一个参数,有很多花样:
① LLMChainExtractor 提取,精炼
② LLMChainFilter 普通过滤
③ EmbeddingsFilter 嵌入过滤
④ DocumentCompressorPipeline 文档管道,可以将多个过滤器组合在一起。

参考地址:

https://python.langchain.com/docs/modules/data_connection/retrievers/how_to/contextual_compression/文章来源地址https://www.toymoban.com/news/detail-611903.html

到了这里,关于【LangChain】检索器之上下文压缩的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 【python】flask执行上下文context,请求上下文和应用上下文原理解析

    ✨✨ 欢迎大家来到景天科技苑✨✨ 🎈🎈 养成好习惯,先赞后看哦~🎈🎈 🏆 作者简介:景天科技苑 🏆《头衔》:大厂架构师,华为云开发者社区专家博主,阿里云开发者社区专家博主,CSDN新星创作者,掘金优秀博主,51CTO博客专家等。 🏆《博客》:Python全栈,前后端开

    2024年03月26日
    浏览(65)
  • 超长上下文处理:基于Transformer上下文处理常见方法梳理

    原文链接:芝士AI吃鱼 目前已经采用多种方法来增加Transformer的上下文长度,主要侧重于缓解注意力计算的二次复杂度。 例如,Transformer-XL通过缓存先前的上下文,并允许随着层数的增加线性扩展上下文。Longformer采用了一种注意力机制,使得token稀疏地关注远距离的token,从而

    2024年02月13日
    浏览(52)
  • 无限上下文,多级内存管理!突破ChatGPT等大语言模型上下文限制

    目前,ChatGPT、Llama 2、文心一言等主流大语言模型,因技术架构的问题上下文输入一直受到限制,即便是Claude 最多只支持10万token输入,这对于解读上百页报告、书籍、论文来说非常不方便。 为了解决这一难题,加州伯克利分校受操作系统的内存管理机制启发,提出了MemGPT。

    2024年02月06日
    浏览(64)
  • 从零开始理解Linux中断架构(7)--- Linux执行上下文之中断上下文

            当前运行的loop是一条执行流,中断程序运行开启了另外一条执行流,从上一节得知这是三种跳转的第三类,这个是一个大跳转。对中断程序的基本要求就是 中断执行完毕后要恢复到原来执行的程序 ,除了时间流逝外,原来运行的程序应该毫无感知。        

    2024年02月11日
    浏览(67)
  • 〖大前端 - 基础入门三大核心之JS篇(51)〗- 面向对象之认识上下文与上下文规则

    说明:该文属于 大前端全栈架构白宝书专栏, 目前阶段免费 , 如需要项目实战或者是体系化资源,文末名片加V! 作者:哈哥撩编程,十余年工作经验, 从事过全栈研发、产品经理等工作,目前在公司担任研发部门CTO。 荣誉: 2022年度博客之星Top4、2023年度超级个体得主、谷

    2024年02月05日
    浏览(61)
  • 微软和OpenAI联手推出了GitHub Copilot这一AI编程工具,可根据开发者的输入和上下文,生成高质量的代码片段和建议

    只需要写写注释,就能生成能够运行的代码?对于程序员群体来说,这绝对是一个提高生产力的超级工具,令人难以置信。实际上,早在2021年6月,微软和OpenAI联手推出了GitHub Copilot这一AI编程工具。它能够根据开发者的输入和上下文,生成高质量的代码片段和建议。这个工具

    2024年02月09日
    浏览(71)
  • 执行上下文

    通过var定义(声明)的变量--在定义语句之前就可以访问到 值为undefined 通过function声明的函数--在之前就可以直接调用 值为函数定义(对象) 全局代码 函数(局部)代码 在执行全局代码前将window确定为全局执行上下文 对全局数据进行预处理 var定义的全局变量--undefined--添加

    2023年04月20日
    浏览(58)
  • Servlet 上下文参数

    2024年02月05日
    浏览(58)
  • 上下文切换性能篇

    现代操作系统都是多任务的分时操作系统,也就是说同时响应多个用户交互或同时支持多个任务处理,因为 CPU 的速度很快而用户交互的频率相比会低得多。所以例如在 Linux 中,可以支持远大于 CPU 数量的任务同时执行,对于单个 CPU 来说,其实任务并不是在同时执行,而是操

    2024年02月15日
    浏览(55)
  • CPU上下文切换

    CPU 上下文切换,就是先把前一个任务的 CPU 上下文(也就是 CPU 寄存器和程序计数器)保存起来,然后加载新任务的上下文到这些寄存器和程序计数器,最后再跳转到程序计数器所指的新位置,运行新任务。 CPU 的上下文切换就可以分为几个不同的场景,也就是进程上下文切换、

    2024年02月14日
    浏览(37)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包