自然语言处理从入门到应用——LangChain：模型（Models）-[大型语言模型（LLMs）：缓存LLM的调用结果]

这篇具有很好参考价值的文章主要介绍了自然语言处理从入门到应用——LangChain：模型（Models）-[大型语言模型（LLMs）：缓存LLM的调用结果]。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

LangChain系列文章：

基础知识
快速入门
- 安装与环境配置
- 链（Chains）、代理（Agent:）和记忆（Memory）
- 快速开发聊天模型
模型（Models）
- 基础知识
- 大型语言模型（LLMs）
  - 基础知识
  - LLM的异步API、自定义LLM包装器、虚假LLM和人类输入LLM（Human Input LLM）
  - 缓存LLM的调用结果
  - 加载与保存LLM类、流式传输LLM与Chat Model响应和跟踪tokens使用情况
- 聊天模型（Chat Models）
  - 基础知识
  - 使用少量示例和响应流式传输
- 文本嵌入模型
  - Aleph Alpha、Amazon Bedrock、Azure OpenAI、Cohere等
  - Embaas、Fake Embeddings、Google Vertex AI PaLM等
提示（Prompts）
- 基础知识
- 提示模板
  - 基础知识
  - 连接到特征存储
  - 创建自定义提示模板和含有Few-Shot示例的提示模板
  - 部分填充的提示模板和提示合成
  - 序列化提示信息
- 示例选择器（Example Selectors）
- 输出解析器（Output Parsers）
记忆（Memory）
- 基础知识
- 记忆的类型
  - 会话缓存记忆、会话缓存窗口记忆和实体记忆
  - 对话知识图谱记忆、对话摘要记忆和会话摘要缓冲记忆
  - 对话令牌缓冲存储器和基于向量存储的记忆
- 将记忆添加到LangChain组件中
- 自定义对话记忆与自定义记忆类
- 聊天消息记录
- 记忆的存储与应用
索引（Indexes）
- 基础知识
- 文档加载器（Document Loaders）
- 文本分割器（Text Splitters）
- 向量存储器（Vectorstores）
- 检索器（Retrievers）
链（Chains）
- 基础知识
- 通用功能
  - 自定义Chain和Chain的异步API
  - LLMChain和RouterChain
  - SequentialChain和TransformationChain
  - 链的保存（序列化）与加载（反序列化）
- 链与索引
  - 文档分析和基于文档的聊天
  - 问答的基础知识
  - 图问答（Graph QA）和带来源的问答（Q&A with Sources）
  - 检索式问答
  - 文本摘要（Summarization）、HyDE和向量数据库的文本生成
代理（Agents）
- 基础知识
- 代理类型
- 自定义代理（Custom Agent）
- 自定义MRKL代理
- 带有ChatModel的LLM聊天自定义代理和自定义多操作代理（Custom MultiAction Agent）
- 工具
  - 基础知识
  - 自定义工具（Custom Tools）
  - 多输入工具和工具输入模式
  - 人工确认工具验证和Tools作为OpenAI函数
- 工具包（Toolkit）
- 代理执行器（Agent Executor）
  - 结合使用Agent和VectorStore
  - 使用Agents的异步API和创建ChatGPT克隆
  - 处理解析错误、访问中间步骤和限制最大迭代次数
  - 为代理程序设置超时时间和限制最大迭代次数和为代理程序和其工具添加共享内存
- 计划与执行
回调函数（Callbacks）

from langchain.llms import OpenAI

在内存中缓存

import langchain
from langchain.cache import InMemoryCache

langchain.llm_cache = InMemoryCache()

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms Wall time: 4.83 s

输出：

"\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"

计算第二次执行时间：

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

CPU times: user 238 µs, sys: 143 µs, total: 381 µs Wall time: 1.76 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLite 缓存

!rm .langchain.db

# 我们可以用 SQLite 缓存做同样的事情
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms Wall time: 825 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第二次执行时间：

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms Wall time: 2.67 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

Redis缓存

我们还可以使用Redis缓存提示信息和做同样的事情：

# （确保您的本地 Redis 实例在运行此示例之前先运行）
from redis import Redis
from langchain.cache import RedisCache

langchain.llm_cache = RedisCache(redis_=Redis())

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 6.88 ms, sys: 8.75 ms, total: 15.6 ms Wall time: 1.04 s

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间：

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

CPU times: user 1.59 ms, sys: 610 µs, total: 2.2 ms Wall time: 5.58 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

Semantic语义缓存

我们还使用Redis缓存提示和响应，并根据语义相似性评估命中率：

from langchain.embeddings import OpenAIEmbeddings
from langchain.cache import RedisSemanticCache


langchain.llm_cache = RedisSemanticCache(
    redis_url="redis://localhost:6379",
    embedding=OpenAIEmbeddings()
)

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 351 ms, sys: 156 ms, total: 507 ms Wall time: 3.37 s

输出：

"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

计算第二次执行时间：

%%time
# The second time, while not a direct hit, the question is semantically similar to the original question,
# so it uses the cached result!
llm("Tell me one joke")

日志输出：

CPU times: user 6.25 ms, sys: 2.72 ms, total: 8.97 ms Wall time: 262 ms

输出：

"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

GPTCache

我们可以使用GPTCache进行精确匹配缓存或基于语义相似性缓存结果，我们先举一个精确匹配的例子：

from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain.cache import GPTCache
import hashlib

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()

def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    cache_obj.init(
        pre_embedding_func=get_prompt,
        data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),
    )

langchain.llm_cache = GPTCache(init_gptcache)

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 21.5 ms, sys: 21.3 ms, total: 42.8 ms Wall time: 6.2 s

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间：

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

CPU times: user 571 µs, sys: 43 µs, total: 614 µs Wall time: 635 µs

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

现在让我们举一个相似度缓存的例子。

from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache
import hashlib

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()

def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")

langchain.llm_cache = GPTCache(init_gptcache)

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 1.42 s, sys: 279 ms, total: 1.7 s Wall time: 8.44 s

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第二次执行时间：

%%time
# 这是一个完全匹配，所以它在缓存中找到它
llm("Tell me a joke")

日志输出：

CPU times: user 866 ms, sys: 20 ms, total: 886 ms Wall time: 226 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第三次执行时间：

%%time
# 这不是完全匹配，但在语义上是在距离之内，所以它命中了！
llm("Tell me joke")

日志输出：

CPU times: user 853 ms, sys: 14.8 ms, total: 868 ms Wall time: 224 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLAlchemy Cache

我们可以使用 SQLAlchemyCache来缓存SQLAlchemy支持的任何 SQL 数据库：

# from langchain.cache import SQLAlchemyCache
# from sqlalchemy import create_engine

# engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
# langchain.llm_cache = SQLAlchemyCache(engine)

Custom SQLAlchemy Schemas

我们可以定义自己的声明性SQLAlchemyCache子类，以自定义用于缓存的模式。例如，为了支持在Postgres中进行高速全文提示索引，我们可以使用：

from sqlalchemy import Column, Integer, String, Computed, Index, Sequence
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy_utils import TSVectorType
from langchain.cache import SQLAlchemyCache

Base = declarative_base()


class FulltextLLMCache(Base):  # type: ignore
    """Postgres table for fulltext-indexed LLM Cache"""

    __tablename__ = "llm_cache_fulltext"
    id = Column(Integer, Sequence('cache_id'), primary_key=True)
    prompt = Column(String, nullable=False)
    llm = Column(String, nullable=False)
    idx = Column(Integer)
    response = Column(String)
    prompt_tsv = Column(TSVectorType(), Computed("to_tsvector('english', llm || ' ' || prompt)", persisted=True))
    __table_args__ = (
        Index("idx_fulltext_prompt_tsv", prompt_tsv, postgresql_using="gin"),
    )

engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
langchain.llm_cache = SQLAlchemyCache(engine, FulltextLLMCache)

可选缓存（Optional Caching）

我们也可以选择关闭特定LLM的缓存。在下面的示例中，即使启用了全局缓存，我们也将其关闭了一个特定的LLM：

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2, cache=False)

计算第一次执行时间：

%%time
llm("Tell me a joke")

日志输出：

CPU times: user 5.8 ms, sys: 2.71 ms, total: 8.51 ms Wall time: 745 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间：

%%time
llm("Tell me a joke")

日志输出：

CPU times: user 4.91 ms, sys: 2.64 ms, total: 7.55 ms Wall time: 623 ms

输出：

'\n\nTwo guys stole a calendar. They got six months each.'

链式可选缓存（Optional Caching in Chains）

我们还可以关闭链中特定节点的缓存。需要注意的是，某些接口通常更容易先构建链，然后再编辑 LLM。作为示例，我们将加载一个汇总器map-reduce链。我们将缓存映射步骤的结果，但不会冻结合并步骤的结果：

llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain

text_splitter = CharacterTextSplitter()
with open('../../../state_of_the_union.txt') as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
from langchain.docstore.document import Document
docs = [Document(page_content=t) for t in texts[:3]]
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)

计算第一次执行时间：

%%time
chain.run(docs)

日志输出：

CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms Wall time: 5.09 s

输出：

'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'

当我们再次运行它时，我们会发现它的运行速度大大加快，但最终的答案却不同。这是由于在映射步骤进行缓存，但在归约步骤没有进行缓存所致计算第二次执行时间：

%%time
chain.run(docs)

日志输出：

CPU times: user 11.5 ms, sys: 4.33 ms, total: 15.8 ms Wall time: 1.04 s

输出：

'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure.'

最后我们需要记得执行：

!rm .langchain.db sqlite.db

参考文献：
[1] LangChain 🦜️🔗 中文网，跟着LangChain一起学LLM/GPT开发：https://www.langchain.com.cn/
[2] LangChain中文网 - LangChain 是一个用于开发由语言模型驱动的应用程序的框架：http://www.cnlangchain.com/文章来源地址https://www.toymoban.com/news/detail-603010.html

到了这里，关于自然语言处理从入门到应用——LangChain：模型（Models）-[大型语言模型（LLMs）：缓存LLM的调用结果]的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

Toy模板网

自然语言处理从入门到应用——LangChain：模型（Models）-[大型语言模型（LLMs）：缓存LLM的调用结果]

在内存中缓存

SQLite 缓存

Redis缓存

Semantic语义缓存

GPTCache

SQLAlchemy Cache

Custom SQLAlchemy Schemas

可选缓存（Optional Caching）

链式可选缓存（Optional Caching in Chains）

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

支付宝扫一扫领取红包，优惠每天领

二维码1

二维码2