Langchain 集成 FAISS

这篇具有很好参考价值的文章主要介绍了Langchain 集成 FAISS。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

1. FAISS

Facebook AI Similarity Search (Faiss)是一个用于高效相似性搜索和密集向量聚类的库。它包含的算法可以搜索任意大小的向量集,甚至可能无法容纳在 RAM 中的向量集。它还包含用于评估和参数调整的支持代码。

Faiss 文档地址在这里.

本笔记本展示了如何使用与 FAISS 矢量数据库相关的功能。

示例代码,

# !pip install faiss
# OR
# !pip install faiss-cpu
import os
import getpass

os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

# 如果需要在没有 AVX2 优化的情况下初始化 FAISS,请取消注释以下一行
# os.environ['FAISS_NO_AVX2'] = '1'
# from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader

输出结果,

from langchain.document_loaders import TextLoader

loader = TextLoader("./state_of_the_union_en.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

# embeddings = OpenAIEmbeddings
embeddings = CohereEmbeddings()

示例代码,

db = FAISS.from_documents(docs, embeddings)

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

输出结果,

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

2. Similarity Search with score

有一些 FAISS 特定方法。其中之一是 similarity_search_with_score ,它不仅允许您返回文档,还允许返回查询到它们的距离分数。返回的距离分数是L2距离。因此,分数越低越好。

示例代码,

docs_and_scores = db.similarity_search_with_score(query)
docs_and_scores[0]

输出结果,

(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'}),
 7172.888)

refer: https://python.langchain.com/docs/integrations/vectorstores/faiss 文档的分数是 0.36913747

    (Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}),
     0.36913747)

还可以使用 similarity_search_by_vector 搜索与给定嵌入向量类似的文档,它接受嵌入向量作为参数而不是字符串。

示例代码,

embedding_vector = embeddings.embed_query(query)
docs_and_scores = db.similarity_search_by_vector(embedding_vector)
docs_and_scores

输出结果如下,

[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'}),
 Document(page_content='We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n\nI’ve worked on these issues a long time. \n\nI know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.', metadata={'source': './state_of_the_union_en.txt'}),
 Document(page_content='And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \n\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n\nWhile it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \n\nAnd soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n\nSo tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.  \n\nFirst, beat the opioid epidemic.', metadata={'source': './state_of_the_union_en.txt'}),
 Document(page_content='Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers. \n\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.  \n\nThat ends on my watch. \n\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \n\nWe’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \n\nLet’s pass the Paycheck Fairness Act and paid leave.  \n\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \n\nLet’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges.', metadata={'source': './state_of_the_union_en.txt'})]

3. Saving and loading

您还可以保存和加载 FAISS 索引。这很有用,因此您不必每次使用它时都重新创建它。

示例代码,

db.save_local("faiss_index")
new_db = FAISS.load_local("faiss_index", embeddings)
docs = new_db.similarity_search(query)
docs[0]

输出结果,

Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'})

4. Merging

您还可以合并两个 FAISS 矢量存储。

示例代码,

db1 = FAISS.from_texts(["foo"], embeddings)
db2 = FAISS.from_texts(["bar"], embeddings)
db1.docstore._dict

输出结果,

{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}

示例代码,

db1.docstore._dict

输出结果,

{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}

示例代码,

db2.docstore._dict

输出结果,

{'8dcb4556-8eb5-43be-9eaa-0bff9a6e7997': Document(page_content='bar', metadata={})}

示例代码,

db1.docstore._dict

输出结果,

{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}

示例代码,

db1.merge_from(db2)

输出结果,

db1.docstore._dict

输出结果,

{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={}),
 '8dcb4556-8eb5-43be-9eaa-0bff9a6e7997': Document(page_content='bar', metadata={})}

5. Similarity Search with filtering

FAISS vectorstore 还可以支持过滤,因为 FAISS 本身不支持过滤,我们必须手动执行。这是通过首先获取比 k 更多的结果然后过滤它们来完成的。您可以根据元数据过滤文档。您还可以在调用任何搜索方法时设置 fetch_k 参数,以设置在过滤之前要获取的文档数量。这是一个小例子:

示例代码,

from langchain.schema import Document

list_of_documents = [
    Document(page_content="foo", metadata=dict(page=1)),
    Document(page_content="bar", metadata=dict(page=1)),
    Document(page_content="foo", metadata=dict(page=2)),
    Document(page_content="barbar", metadata=dict(page=2)),
    Document(page_content="foo", metadata=dict(page=3)),
    Document(page_content="bar burr", metadata=dict(page=3)),
    Document(page_content="foo", metadata=dict(page=4)),
    Document(page_content="bar bruh", metadata=dict(page=4)),
]
db = FAISS.from_documents(list_of_documents, embeddings)
results_with_scores = db.similarity_search_with_score("foo")
for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")

输出结果,

Content: foo, Metadata: {'page': 1}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 2}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 3}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 4}, Score: 0.018019594252109528

现在我们进行相同的查询调用,但我们仅过滤 page = 1

results_with_scores = db.similarity_search_with_score("foo", filter=dict(page=1))
for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")

输出结果,

Content: foo, Metadata: {'page': 1}, Score: 0.018019594252109528
Content: bar, Metadata: {'page': 1}, Score: 10266.8544921875

同样的事情也可以用 max_marginal_relevance_search 来完成。

示例代码,

results = db.max_marginal_relevance_search("foo", filter=dict(page=1))
for doc in results:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")

输出结果,

Content: foo, Metadata: {'page': 1}
Content: bar, Metadata: {'page': 1}

以下是调用 similarity_search 时如何设置 fetch_k 参数的示例。通常您需要 fetch_k 参数 >> k 参数。这是因为 fetch_k 参数是过滤之前将获取的文档数。如果将 fetch_k 设置为较小的数字,您可能无法获得足够的文档进行过滤。

示例代码,

results = db.similarity_search("foo", filter=dict(page=1), k=1, fetch_k=4)
for doc in results:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")

输出结果,

Content: foo, Metadata: {'page': 1}

完结!文章来源地址https://www.toymoban.com/news/detail-608803.html

到了这里,关于Langchain 集成 FAISS的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • Faiss在windows下安装和使用

    pip install faiss-cpu 直接安装可能出现问题: error: command \\\'swig.exe\\\' failed: No such file or directory 安装swig即可解决,安装方式

    2024年02月13日
    浏览(26)
  • Langchain 集成 Milvus

    refer: https://docs.docker.com/engine/install/centos/ Milvus 会以容器方式启动,所以先安装 Docker。(本示例使用的是 Alma Linux 9.2) 卸载旧版本, 设置存储库, 安装 Docker 引擎, 启动 Docker, 通过运行 hello-world 映像来验证 Docker 引擎安装是否成功, refer: https://milvus.io/docs/install_standalone-dock

    2024年02月15日
    浏览(73)
  • Langchain 和 Chroma 的集成

    Chroma 是一个人工智能原生开源矢量数据库,专注于开发人员的生产力和幸福感。 Chroma 在 Apache 2.0 下获得许可。 安装 Chroma: Chroma 以多种模式运行。请参阅下面每个与 LangChain 集成的示例。 in-memory - 在 python 脚本或 jupyter 笔记本中 in-memory with persistance - 在脚本或笔记本中并保

    2024年02月16日
    浏览(26)
  • 基于arcFace+faiss开发构建人脸识别系统

    在上一篇博文《基于facenet+faiss开发构建人脸识别系统》中,我们实践了基于facenet和faiss的人脸识别系统开发,基于facenet后续提出来很多新的改进的网络模型,arcFace就是其中一款优秀的网络模型,本文的整体开发实现流程与前文相同,只是在深度学习模型节点上将facenet替换为

    2024年02月13日
    浏览(41)
  • 基于facenet+faiss开发构建人脸识别系统

    facenet是一款非常经典的神经网络模型,它可以直接学习从人脸图像到欧几里德空间的映射(直接将人脸映射到欧几里得空间)。在欧几里德空间中,距离直接对应于人脸相似性的度量。一旦这个空间产生,使用标准技术,将FaceNet嵌入作为特征向量,就可以很容易地实现人脸识别

    2024年02月14日
    浏览(37)
  • 基于SimCSE和Faiss的文本向量检索实践

    目录 文本的向量表示 1、SimCSE 2、支持无监督训 3、训练注意事项 向量检索 1、精准查找flat 2、HNSWx 3、IVFx 4、PQx 5、LSH 对博客标题进行向量检索 数据向量化 构建索引 文本检索 测试检索 传统的文本检索一般是建立倒排索引,对搜索词的召回结果进行打分排序返回最终结果,但

    2024年02月16日
    浏览(42)
  • 《向量数据库》——怎么安装向量检索库Faiss?

    装 Faiss   以下教程将展示如何在 Linux 系统上安装 Faiss:   1. 安装 Conda。   在安装 Faiss 之前,先在系统上安装 Conda。Conda 是一个开源软件包和环境管理系统,可在 Windows、macOS 和 Linux 操作系统上运行。根据以下步骤在 Linux 系统上安装 Conda。   2. 从官网下载 Miniconda 安装包(

    2024年02月13日
    浏览(37)
  • 开源向量数据库比较:Chroma, Milvus, Faiss,Weaviate

    语义搜索和检索增强生成(RAG)正在彻底改变我们的在线交互方式。实现这些突破性进展的支柱就是向量数据库。选择正确的向量数据库能是一项艰巨的任务。本文为你提供四个重要的开源向量数据库之间的全面比较,希望你能够选择出最符合自己特定需求的数据库。 向量数据

    2024年04月26日
    浏览(37)
  • 人脸识别场景下Faiss大规模向量检测性能测试评估分析

    在前面的两篇博文中,主要是考虑基于之前以往的人脸识别项目经历结合最近使用到的faiss来构建更加高效的检索系统,感兴趣的话可以自行移步阅读即可: 《基于facenet+faiss开发构建人脸识别系统》 《基于arcFace+faiss开发构建人脸识别系统》 在前面两篇博文中整体的计算流程

    2024年02月13日
    浏览(51)
  • 向量检索(一)Faiss 在工业界的应用和常见问题解决

    传统的搜索,使用关键做精确的查找,利用倒排索引在索引库中搜索。日常在用的百度,Google都属于搜索。 在 AI 时代,我们需要查找一张相似的图片,一个问题的答案,或者根据一段音乐查找对应的歌曲,这些情况下没有准确的用来做检索。 这些图片,问题(

    2024年02月02日
    浏览(51)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包