基于Llama2和LangChain构建本地化定制化知识库AI聊天机器人

这篇具有很好参考价值的文章主要介绍了基于Llama2和LangChain构建本地化定制化知识库AI聊天机器人。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

参考:

本项目 https://github.com/PromtEngineer/localGPT

模型 https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML

云端知识库项目:基于GPT-4和LangChain构建云端定制化PDF知识库AI聊天机器人_Entropy-Go的博客-CSDN博客 

1. 摘要

        相比OpenAI的LLM ChatGPT模型必须网络连接并通过API key云端调用模型,担心数据隐私安全。基于Llama2和LangChain构建本地化定制化知识库AI聊天机器人,是将训练好的LLM大语言模型本地化部署,在没有网络连接的情况下对你的文件提问。100%私有化本地化部署,任何时候都不会有数据离开您的运行环境。你可以在没有网络连接的情况下获取文件和提问!        

        介绍一款尖端应用,使用户能够在没有互联网连接的情况下利用语言模型的功能。这款先进工具作为一个不可或缺的资源,帮助用户在超越传统语言模型工具(如ChatGPT)的限制之外获取信息。

        这个应用的一个关键优势在于数据控制的保留。当处理需要保持在组织内部或具有最高机密性的个人文件时,这个功能尤为重要,消除了通过第三方渠道传输信息的需求。

        将个人文件无缝集成到系统中非常简单,确保用户体验流畅。无论是文本、PDF、CSV还是Excel文件,用户都可以方便地提供所需查询的信息。该应用程序快速处理这些文档,有效地创建了一个全面的数据库供模型利用,实现准确而深入的回答。

        这种方法的一个显著优势在于其高效的资源利用。与替代方法中资源密集型的重新训练过程不同,这个应用程序中的文档摄取要求更少的计算资源。这种效率优化可以实现简化的用户体验,节省时间和计算资源。

        体验这个技术奇迹的无与伦比的能力,使用户能够在离线状态下充分发挥语言模型的潜力。迎接信息获取的新时代,提高生产力,拓展可能性。拥抱这个强大的工具,释放您的数据的真正潜力。

2. 准备工作

2.1 Meta's Llama 2 7b Chat GGML

These files are GGML format model files for Meta's Llama 2 7b Chat.

GGML files are for CPU + GPU inference using llama.cpp and libraries and UIs which support this format

2.2 安装Conda

CentOS 上快速安装包管理工具Conda_Entropy-Go的博客-CSDN博客

2.3 升级gcc

CentOS gcc介绍及快速升级_Entropy-Go的博客-CSDN博客

3. 克隆或下载项目localGPT

git clone https://github.com/PromtEngineer/localGPT.git

4. 安装依赖包

4.1 Conda安装并激活

conda create -n localGPT
conda activate localGPT

4.2 安装依赖包

如果Conda环境变量正常设置,直接pip install

pip install -r requirements.txt

否则会使用系统自带的python,可以使用Conda的安装的绝对路径执行,后续都必须使用Conda的python

whereis conda
conda: /root/miniconda3/bin/conda /root/miniconda3/condabin/conda
/root/miniconda3/bin/pip install -r requirements.txt

安装时如遇下面问题,参考2.3 gcc升级,建议升级至gcc 11

ERROR: Could not build wheels for llama-cpp-python, hnswlib, lxml, which is required to install pyproject.toml-based project

5. 添加文档为知识库

5.1 文档目录以及模板文档

可以替换成需要的文档

~localGPT/SOURCE_DOCUMENTS/constitution.pdf

注入前验证下help,如前面提到,建议直接使用Conda绝对路径的python

/root/miniconda3/bin/python ingest.py --help
Usage: ingest.py [OPTIONS]

Options:
  --device_type [cpu|cuda|ipu|xpu|mkldnn|opengl|opencl|ideep|hip|ve|fpga|ort|xla|lazy|vulkan|mps|meta|hpu|mtia]
                                  Device to run on. (Default is cuda)
  --help                          Show this message and exit.

5.2 开始注入文档

默认使用cuda/GPU

/root/miniconda3/bin/python ingest.py

可以指定CPU

/root/miniconda3/bin/python ingest.py --device_type cpu

首次注入时,会下载对应的矢量数据DB,矢量数据DB会存放到  /root/localGPT/DB

首次注入过程

/root/miniconda3/bin/python ingest.py
2023-08-18 09:36:55,389 - INFO - ingest.py:122 - Loading documents from /root/localGPT/SOURCE_DOCUMENTS
all files: ['constitution.pdf']
2023-08-18 09:36:55,398 - INFO - ingest.py:34 - Loading document batch
2023-08-18 09:36:56,818 - INFO - ingest.py:131 - Loaded 1 documents from /root/localGPT/SOURCE_DOCUMENTS
2023-08-18 09:36:56,818 - INFO - ingest.py:132 - Split into 72 chunks of text
2023-08-18 09:36:57,994 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
Downloading (…)c7233/.gitattributes: 100%|███████████████████████████████████████████████████████████████████████████| 1.48k/1.48k [00:00<00:00, 4.13MB/s]
Downloading (…)_Pooling/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 270/270 [00:00<00:00, 915kB/s]
Downloading (…)/2_Dense/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 380kB/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████| 3.15M/3.15M [00:01<00:00, 2.99MB/s]
Downloading (…)9fb15c7233/README.md: 100%|████████████████████████████████████████████████████████████████████████████| 66.3k/66.3k [00:00<00:00, 359kB/s]
Downloading (…)b15c7233/config.json: 100%|███████████████████████████████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 5.70MB/s]
Downloading (…)ce_transformers.json: 100%|████████████████████████████████████████████████████████████████████████████████| 122/122 [00:00<00:00, 485kB/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████| 1.34G/1.34G [03:15<00:00, 6.86MB/s]
Downloading (…)nce_bert_config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 109kB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 8.96MB/s]
Downloading spiece.model: 100%|████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 3.46MB/s]
Downloading (…)c7233/tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 3.01MB/s]
Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████| 2.41k/2.41k [00:00<00:00, 9.75MB/s]
Downloading (…)15c7233/modules.json: 100%|███████████████████████████████████████████████████████████████████████████████| 461/461 [00:00<00:00, 1.92MB/s]
load INSTRUCTOR_Transformer
2023-08-18 09:40:26,658 - INFO - instantiator.py:21 - Created a temporary directory at /tmp/tmp47gnnhwi
2023-08-18 09:40:26,658 - INFO - instantiator.py:76 - Writing /tmp/tmp47gnnhwi/_remote_module_non_scriptable.py
max_seq_length  512
2023-08-18 09:40:30,076 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-08-18 09:40:30,248 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: /root/localGPT/DB
2023-08-18 09:40:30,252 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-08-18 09:40:30,257 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-08-18 09:40:30,295 - INFO - duckdb.py:454 - No existing DB found in /root/localGPT/DB, skipping load
2023-08-18 09:40:30,295 - INFO - duckdb.py:466 - No existing DB found in /root/localGPT/DB, skipping load
2023-08-18 09:40:32,800 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: /root/localGPT/DB
2023-08-18 09:40:32,813 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: /root/localGPT/DB

项目文件列表

ls
ACKNOWLEDGEMENT.md  CONTRIBUTING.md  ingest.py   localGPT_UI.py  README.md            run_localGPT.py
constants.py        DB               LICENSE     __pycache__     requirements.txt     SOURCE_DOCUMENTS
constitution.pdf    Dockerfile       localGPTUI  pyproject.toml  run_localGPT_API.py 

6. 运行知识库AI聊天机器人

现在可以和你的本地化知识库开始对话聊天了!

6.1 命令行方式运行提问

 首次运行时,会下载对应的默认模型 ~/localGPT/constants.py 

# model link: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML

MODEL_ID = "TheBloke/Llama-2-7B-Chat-GGML"

MODEL_BASENAME = "llama-2-7b-chat.ggmlv3.q4_0.bin"

模型会下载到 /root/.cache/huggingface/hub/models--TheBloke--Llama-2-7B-Chat-GGML

直接运行

/root/miniconda3/bin/python run_localGPT.py

对话输入

支持英文,中文需要加utf-8进行处理

Enter a query:

对话记录

/root/miniconda3/bin/python run_localGPT.py
2023-08-18 09:43:02,433 - INFO - run_localGPT.py:180 - Running on: cuda
2023-08-18 09:43:02,433 - INFO - run_localGPT.py:181 - Display Source Documents set to: False
2023-08-18 09:43:02,676 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length  512
2023-08-18 09:43:05,301 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-08-18 09:43:05,317 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: /root/localGPT/DB
2023-08-18 09:43:05,328 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-08-18 09:43:05,336 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-08-18 09:43:05,402 - INFO - duckdb.py:460 - loaded in 72 embeddings
2023-08-18 09:43:05,405 - INFO - duckdb.py:472 - loaded in 1 collections
2023-08-18 09:43:05,406 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection
2023-08-18 09:43:05,406 - INFO - run_localGPT.py:45 - Loading Model: TheBloke/Llama-2-7B-Chat-GGML, on: cuda
2023-08-18 09:43:05,406 - INFO - run_localGPT.py:46 - This action can take a few minutes!
2023-08-18 09:43:05,406 - INFO - run_localGPT.py:50 - Using Llamacpp for GGML quantized models
Downloading (…)chat.ggmlv3.q4_0.bin: 100%|███████████████████████████████████████████████████████████████████████████| 3.79G/3.79G [09:53<00:00, 6.39MB/s]
llama.cpp: loading model from /root/.cache/huggingface/hub/models--TheBloke--Llama-2-7B-Chat-GGML/snapshots/b616819cd4777514e3a2d9b8be69824aca8f5daf/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 5407.71 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

Enter a query:

或者添加参数--show_sources,回答时显示引用章节信息

/root/miniconda3/bin/python run_localGPT.py --show_sources

对话记录:

/root/miniconda3/bin/python run_localGPT.py --show_sources
2023-08-18 10:03:55,466 - INFO - run_localGPT.py:180 - Running on: cuda
2023-08-18 10:03:55,466 - INFO - run_localGPT.py:181 - Display Source Documents set to: True
2023-08-18 10:03:55,708 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length  512
2023-08-18 10:03:58,302 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-08-18 10:03:58,307 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: /root/localGPT/DB
2023-08-18 10:03:58,312 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-08-18 10:03:58,318 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-08-18 10:03:58,372 - INFO - duckdb.py:460 - loaded in 72 embeddings
2023-08-18 10:03:58,373 - INFO - duckdb.py:472 - loaded in 1 collections
2023-08-18 10:03:58,373 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection
2023-08-18 10:03:58,374 - INFO - run_localGPT.py:45 - Loading Model: TheBloke/Llama-2-7B-Chat-GGML, on: cuda
2023-08-18 10:03:58,374 - INFO - run_localGPT.py:46 - This action can take a few minutes!
2023-08-18 10:03:58,374 - INFO - run_localGPT.py:50 - Using Llamacpp for GGML quantized models
llama.cpp: loading model from /root/.cache/huggingface/hub/models--TheBloke--Llama-2-7B-Chat-GGML/snapshots/b616819cd4777514e3a2d9b8be69824aca8f5daf/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 5407.71 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

Enter a query: how many times could president act, and how many years as max?

llama_print_timings:        load time = 19737.32 ms
llama_print_timings:      sample time =   101.14 ms /   169 runs   (    0.60 ms per token,  1671.02 tokens per second)
llama_print_timings: prompt eval time = 19736.91 ms /   925 tokens (   21.34 ms per token,    46.87 tokens per second)
llama_print_timings:        eval time = 36669.35 ms /   168 runs   (  218.27 ms per token,     4.58 tokens per second)
llama_print_timings:       total time = 56849.80 ms


> Question:
how many times could president act, and how many years as max?

> Answer:
 The answer to this question can be found in Amendment XXII and Amendment XXIII of the US Constitution. According to these amendments, a person cannot be elected President more than twice, and no person can hold the office of President for more than two years of a term to which someone else was elected President. However, if the President is unable to discharge their powers and duties due to incapacity, the Vice President will continue to act as President until Congress determines the issue.
In summary, a person can be elected President at most twice, and they cannot hold the office for more than two years of a term to which someone else was elected President. If the President becomes unable to discharge their powers and duties, the Vice President will continue to act as President until Congress makes a determination.
----------------------------------SOURCE DOCUMENTS---------------------------

> /root/localGPT/SOURCE_DOCUMENTS/constitution.pdf:
Amendment  XXII.

Amendment  XXIII.

Passed by Congress March 21, 1947. Ratified February 27,

Passed by Congress June 16, 1960. Ratified March 29, 1961.

951.

SECTION 1

...

SECTION 2

....

----------------------------------SOURCE DOCUMENTS---------------------------

Enter a query: exit

6.2 Web UI方式运行提问

6.2.1 启动服务器端API

可以使用Web UI方式运行,启动服务器端API在5110端口上进行监听服务

http://127.0.0.1:5110

/root/miniconda3/bin/python run_localGPT_API.py

如果执行过程遇到下面问题,还是代码中的python没有使用Conda PATH下面的python导致的。

/root/miniconda3/bin/python run_localGPT_API.py
load INSTRUCTOR_Transformer
max_seq_length  512
The directory does not exist
run_langest_commands ['python', 'ingest.py']
Traceback (most recent call last):
  File "/root/localGPT/run_localGPT_API.py", line 56, in <module>
    raise FileNotFoundError(
FileNotFoundError: No files were found inside SOURCE_DOCUMENTS, please put a starter file inside before starting the API!

可以修改~/localGPT/run_localGPT_API.py中的python为Conda下的路径

run_langest_commands = ["python", "ingest.py"]

修改为

run_langest_commands = ["/root/miniconda3/bin/python", "ingest.py"]

运行过程

看到 INFO:werkzeug:  表示启动成功,窗口可以保留座位debug用途

/root/miniconda3/bin/python run_localGPT_API.py
load INSTRUCTOR_Transformer
max_seq_length  512
WARNING:chromadb:Using embedded DuckDB with persistence: data will be stored in: /root/localGPT/DB
llama.cpp: loading model from /root/.cache/huggingface/hub/models--TheBloke--Llama-2-7B-Chat-GGML/snapshots/b616819cd4777514e3a2d9b8be69824aca8f5daf/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 5407.71 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
 * Serving Flask app 'run_localGPT_API'
 * Debug mode: on
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5110
INFO:werkzeug:Press CTRL+C to quit
INFO:werkzeug: * Restarting with watchdog (inotify)

6.2.2 启动服务器端UI

重新打开一个新的命令行终端,运行~/localGPT/localGPTUI/localGPTUI.py,启动服务器端UI在5111端口上进行监听服务

http://127.0.0.1:5111

/root/miniconda3/bin/python localGPTUI.py

如需局域网访问,修改localGPTUI.py,127.0.0.1 -> 0.0.0.0

parser.add_argument("--host", type=str, default="0.0.0.0",
                        help="Host to run the UI on. Defaults to 127.0.0.1. "
                             "Set to 0.0.0.0 to make the UI externally "
                             "accessible from other devices.")

运行记录

/root/miniconda3/bin/python localGPTUI.py 
 * Serving Flask app 'localGPTUI'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5111
 * Running on http://IP:5111

端口使用情况

netstat -nltp | grep 511
tcp        0      0 127.0.0.1:5110          0.0.0.0:*               LISTEN      57479/python
tcp        0      0 0.0.0.0:5111            0.0.0.0:*               LISTEN      21718/python

6.2.3 浏览器访问Web UI

本机: http://127.0.0.1:5111 

局域网: http://IP:5111

基于Llama2和LangChain构建本地化定制化知识库AI聊天机器人,随笔,langchain,人工智能,llama

网页端可以进行自由对话,支持中文输入。

使用截图

基于Llama2和LangChain构建本地化定制化知识库AI聊天机器人,随笔,langchain,人工智能,llama

6.3 更换本地文档为知识库

6.3.1 命令行方式

直接将文档添加到 ~/localGPT/SOURCE_DOCUMENTS/

会自动触发更新适量数据库,等更新好之后,就可以正常进行提问对话。

6.3.2 Web UI方式

上传文件

1. 要上传文档以供应用程序摄取作为其新的知识库,请单击upload按钮。

2. 选择要用作新知识库的文档进行对话。

3.然后,系统会提示您选择将文档添加到知识库,用您刚刚选择的文档重置知识库,或者取消上传。

4. 当文档被输入到矢量数据库中作为新的知识库时,会有很短的等待时间。 

基于Llama2和LangChain构建本地化定制化知识库AI聊天机器人,随笔,langchain,人工智能,llama

正在注入中文文档到知识库

正在生成回复

结果返回

基于Llama2和LangChain构建本地化定制化知识库AI聊天机器人,随笔,langchain,人工智能,llama

7.常见问题Troubleshooting

7.1 中文文档注入

修改run_localGPT_API.py

max_ctx_size = 4096

修改ingest.py

text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)

7.2 网页打开后,问题无回复,response.status_code = 504, 304

如果环境使用了代理,在运行服务器端UI前,先去掉代理后再运行

unset http_proxy
unset https_proxy
unset ftp_proxy
/root/miniconda3/bin/python localGPTUI.py

7.3 locaGPT如何工作的

Selecting the right local models and the power of LangChain you can run the entire pipeline locally, without any data leaving your environment, and with reasonable performance.

  • ingest.py uses LangChain tools to parse the document and create embeddings locally using InstructorEmbeddings. It then stores the result in a local vector database using Chroma vector store.
  • run_localGPT.py uses a local LLM to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
  • You can replace this local LLM with any other LLM from the HuggingFace. Make sure whatever LLM you select is in the HF format.

7.4 怎么选择不同的LLM大语言模型

The following will provide instructions on how you can select a different LLM model to create your response:

  1. Open up constants.py in the editor of your choice.

  2. Change the MODEL_ID and MODEL_BASENAME. If you are using a quantized model (GGMLGPTQ), you will need to provide MODEL_BASENAME. For unquatized models, set MODEL_BASENAME to NONE

  3. There are a number of example models from HuggingFace that have already been tested to be run with the original trained model (ending with HF or have a .bin in its "Files and versions"), and quantized models (ending with GPTQ or have a .no-act-order or .safetensors in its "Files and versions").

  4. For models that end with HF or have a .bin inside its "Files and versions" on its HuggingFace page.

    • Make sure you have a model_id selected. For example -> MODEL_ID = "TheBloke/guanaco-7B-HF"
    • If you go to its HuggingFace repo and go to "Files and versions" you will notice model files that end with a .bin extension.
    • Any model files that contain .bin extensions will be run with the following code where the # load the LLM for generating Natural Language responses comment is found.
    • MODEL_ID = "TheBloke/guanaco-7B-HF"
  5. For models that contain GPTQ in its name and or have a .no-act-order or .safetensors extension inside its "Files and versions on its HuggingFace page.

    • Make sure you have a model_id selected. For example -> model_id = "TheBloke/wizardLM-7B-GPTQ"

    • You will also need its model basename file selected. For example -> model_basename = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"

    • If you go to its HuggingFace repo and go to "Files and versions" you will notice a model file that ends with a .safetensors extension.

    • Any model files that contain no-act-order or .safetensors extensions will be run with the following code where the # load the LLM for generating Natural Language responses comment is found.

    • MODEL_ID = "TheBloke/WizardLM-7B-uncensored-GPTQ"

      MODEL_BASENAME = "WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors"

  6. Comment out all other instances of MODEL_ID="other model names"MODEL_BASENAME=other base model names, and llm = load_model(args*)

7.5 更多问题参考

Issues · PromtEngineer/localGPT · GitHub文章来源地址https://www.toymoban.com/news/detail-712783.html

到了这里,关于基于Llama2和LangChain构建本地化定制化知识库AI聊天机器人的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • AIGC生成式代码——Code Llama 简介、部署、测试、应用、本地化

            本文介绍了CodeLlama的 简介、本地化部署、测试和应用实战方案 ,帮助学习大语言模型的同学们更好地应用CodeLlama。我们详细讲解了如何将CodeLlama部署到实际应用场景中,并通过实例演示了如何使用CodeLlama进行代码生成和优化。最后,总结了CodeLlama的应用实战经验

    2024年02月05日
    浏览(86)
  • LangChain 本地化方案 - 使用 ChatYuan-large-v2 作为 LLM 大语言模型

    ChatYuan-large-v2 是一个开源的支持中英双语的功能型对话语言大模型,与其他 LLM 不同的是模型十分轻量化,并且在轻量化的同时效果相对还不错,仅仅通过 0.7B 参数量就可以实现 10B 模型的基础效果,正是其如此的轻量级,使其可以在普通显卡、 CPU 、甚至手机上进行推理,而

    2024年02月16日
    浏览(52)
  • 【个人笔记本】本地化部署详细流程 LLaMA中文模型:Chinese-LLaMA-Alpaca-2

    不推荐小白,环境配置比较复杂 下载原始模型:Chinese-LLaMA-Alpaca-2 linux部署llamacpp环境 使用llamacpp将Chinese-LLaMA-Alpaca-2模型转换为gguf模型 windows部署Text generation web UI 环境 使用Text generation web UI 加载模型并进行对话 笔记本环境: 操作系统:win11 CPU:AMD R7535HS GPU:笔记本4060显卡

    2024年02月08日
    浏览(59)
  • LLMs之Vicuna:在Linux服务器系统上实Vicuna-7B本地化部署(基于facebookresearch的GitHub)进行模型权重合并(llama-7b模型与delta模型权重)、模型部

    LLMs之Vicuna:在Linux服务器系统上实Vicuna-7B本地化部署(基于facebookresearch的GitHub)进行模型权重合并(llama-7b模型与delta模型权重)、模型部署且实现模型推理全流程步骤的图文教程(非常详细) 导读 :因为Vicuna的训练成本很低,据说只需要$300左右,所以,还是有必要尝试本地化部署

    2024年02月06日
    浏览(74)
  • 中文大语言模型 Llama-2 7B(或13B) 本地化部署 (国内云服务器、GPU单卡16GB、中文模型、WEB页面TextUI、简单入门)

            本文目的是让大家先熟悉模型的部署,简单入门;所以只需要很小的算力,单台服务器 单GPU显卡(显存不低于12GB),操作系统需要安装 Ubuntu 18.04。         准备一台服务器 单张英伟达GPU显卡(显存不低于12GB),操作系统需要安装 Ubuntu 18.04 (具体安装过程忽略)

    2024年02月08日
    浏览(48)
  • 基于GitHub代码库训练模型本地化AI代码自动补全 - Tabby Windows10

    参考: https://github.com/TabbyML/tabby 已经有好几款类似强劲的代码补全工具,如GitHub Copilot,Codeium等,为什么还要选择Tabby? Tabby除了和其他工具一样支持联网直接使用之外, 还支持本地化部署 。 即对内部代码安全性要求很高时,可以采取Tabby项目模型的本地化部署,不用担心本

    2024年02月02日
    浏览(124)
  • 基于GitHub代码库训练模型本地化AI代码自动补全 - Tabby Linux Debian/CentOS

    参考: https://github.com/TabbyML/tabby Docker | Tabby Linux Debian上快速安装Docker并运行_Entropy-Go的博客-CSDN博客 Tabby - 本地化AI代码自动补全 - Windows10_Entropy-Go的博客-CSDN博客 已经有好几款类似强劲的代码补全工具,如GitHub Copilot,Codeium等,为什么还要选择Tabby? Tabby除了和其他工具一样支

    2024年02月05日
    浏览(49)
  • 详解dedecms织梦远程图片本地化https链接图片无法本地化怎么解决

    最近有朋友遇到发布文章时候文章里面带https的站外图片无法本地化,以下是解决办法: 找到  dede//inc/inc_archives_functions.php文件里面GetCurContent($body)这个函数,里面 这一段改为: 第二步: 这一段改为: 搞定,这样发文章就可以把https的远程图片也本地化了 以上就是本文的全

    2024年02月02日
    浏览(43)
  • Remix本地化,加载本地合约文件,本地链接Remix

    智能合约IDE,在线的比较卡,而且切换网络面临文件丢失的风险,选择本地搭建Solidity本地编辑环境,Remix-IDE + Remixd组合,加载本地合约代码。这里用到两个工具: Remix IDE(本地IDE)+ Remixd (链接) Remix IDE 项目源码:https://github.com/ethereum/remix-project 介绍: Remix IDE是一个本地部署运

    2024年02月13日
    浏览(62)
  • Excalidraw本地化部署

    1 - Excalidraw介绍 Excalidraw是一个开源、小巧易用的手写风格的框图画板软件。 ​excalidraw官网地址:https://excalidraw.com/​ 2 - Excalidraw本地化安装(git方式) 2-1安装部署 在terminal中,输入: 安装完成后,在terminal中,进入项目文件 2-2 安装依赖环境 - nodeJS NodeJS下载地址: nodejs下载

    2024年02月14日
    浏览(51)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包