概要
指定知识源来回答问题。非常适用于公司里某些专业领域。
下文是将2023_GPT4All_Technical_Report.pdf
文件当做知识源,来回答问题。
具体:
- 通过加载PDF文件,读取里面的内容。
- 将内容进行压缩成块,交给openai embeddings处理(创建知识的门牌号、房间(具体知识)的对应关系)
- 利用FAISS(
short for Facebook AI Similarity Search
),进行问题搜索,得到答案。 - 再将问题和答案,交给openai进行润色。
准备工作
pip install langchain
pip install openai
pip install PyPDF2
pip install faiss-cpu
pip install tiktoken
代码
from PyPDF2 import PdfReader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS
import os
os.environ["OPENAI_API_KEY"] = "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"
# location of the pdf file/files.
reader = PdfReader('/Users/yutao/Downloads/2023_GPT4All_Technical_Report.pdf')
# read data from the file and put them into a variable called raw_text
raw_text = ''
for i, page in enumerate(reader.pages):
text = page.extract_text()
if text:
raw_text += text
# raw_text
# raw_text[:100]
text_splitter = CharacterTextSplitter(
separator = "\n",
chunk_size = 1000,
chunk_overlap = 200,
length_function = len,
)
texts = text_splitter.split_text(raw_text)
print(len(texts))
# print(texts[0])
# Download embeddings from OpenAI
embeddings = OpenAIEmbeddings()
# faiss是Facebook ai similarity search的缩写
# 一种为了对嵌入向量进行高效搜索的索引结构
# https://huggingface.co/learn/nlp-course/chapter5/6?fw=pt#using-faiss-for-efficient-similarity-search
docsearch = FAISS.from_texts(texts, embeddings)
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
chain = load_qa_chain(OpenAI(), chain_type="stuff")
query = "who are the authors of the article?"
docs = docsearch.similarity_search(query)
# 将搜索到的结果、问题,交给openai进行润色
aa = chain.run(input_documents=docs, question=query)
print("---------")
# print(docs)
print(aa)
# 理解:embeddings 将分词数据,映射到向量空间中,用于相关性的计算。
query = "What was the cost of training the GPT4all model?"
docs = docsearch.similarity_search(query)
aa = chain.run(input_documents=docs, question=query)
print(aa)
参考地址:文章来源:https://www.toymoban.com/news/detail-493989.html
https://colab.research.google.com/drive/181BSOH6KF_1o2lFG8DQ6eJd2MZyiSBNt?usp=sharing#scrollTo=2VXlucKiW7bX文章来源地址https://www.toymoban.com/news/detail-493989.html
到了这里,关于【chatgpt】将PDF文件当做知识源的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!