【Meta-Al】llama GPT 测试

这篇具有很好参考价值的文章主要介绍了【Meta-Al】llama GPT 测试。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

2023-4-28 更新：

github有兄弟合并+量化了7B、13B的权重，Chinese-Alpaca项目部署体验更简单：

GitHub - ymcui/Chinese-LLaMA-Alpaca: 中文LLaMA&Alpaca大语言模型+本地CPU/GPU部署 (Chinese LLaMA & Alpaca LLMs)

【Meta-Al】llama GPT 测试,深度学习/机器学习/强化学习/算法,llama,gpt,人工智能,chatgpt

github地址：

GitHub - facebookresearch/llama: Inference code for LLaMA models

在 LLaMA 发布三天后，初创公司 Nebuly AI 开源了 RLHF 版 LLaMA（ChatLLaMA）的训练方法。它的训练过程类似 ChatGPT，该项目允许基于预训练的 LLaMA 模型构建 ChatGPT 形式的服务。目前知乎上已经有兄弟中文训练起来了，有按照 bert4torch 方式能比较方便快捷的进行尝试。

首先进行安装：

cd llama

pip install -r requirements.txt

pip install -e .

【Meta-Al】llama GPT 测试,深度学习/机器学习/强化学习/算法,llama,gpt,人工智能,chatgpt

然后到github地址上找到申请入口，填写邮箱之后等待邮件：

【Meta-Al】llama GPT 测试,深度学习/机器学习/强化学习/算法,llama,gpt,人工智能,chatgpt

下载模型，执行 download.sh ，修改需要下载的模型尺寸 MODEL_SIZE，只保留7B，同时替换下载链接 PRESIGNED_URL：
PRESIGNED_URL=""             # 需要从邮箱中查找，
MODEL_SIZE="7B"  # 只下载最小
TARGET_FOLDER="./"             # 下载目录

查看下载脚本原来实际上也就是下载的下面几个文件：

wget ${PRESIGNED_URL/'*'/"tokenizer.model"} -O ${TARGET_FOLDER}"/tokenizer.model"

wget ${PRESIGNED_URL/'*'/"tokenizer_checklist.chk"} -O ${TARGET_FOLDER}"/tokenizer_checklist.chk"

wget ${PRESIGNED_URL/'*'/"${i}/consolidated.${s}.pth"} -O ${TARGET_FOLDER}"/${i}/consolidated.${s}.pth"

wget ${PRESIGNED_URL/'*'/"${i}/params.json"} -O ${TARGET_FOLDER}"/${i}/params.json"

wget ${PRESIGNED_URL/'*'/"${i}/checklist.chk"} -O ${TARGET_FOLDER}"/${i}/checklist.chk"

然后下载模型：

download.sh

如果下载失败可以参考国内教程也就是通过pyllama进行下载：

如何评价 LLaMA 模型泄露？ - 知乎

ChatGPT平替模型：LLaMA（附下载地址，平民玩家和伸手党的福音！） - 知乎

推理，替换两个文件路径即可：

torchrun --nproc_per_node 1 --nnodes 1 example.py --ckpt_dir ./weight/7B --tokenizer_path ./weight/tokenizer.model

官方源码推理使用的 torchrun 针对并行推理。下面演示使用 bert4torch 对 llama-7b 进行推理（需要权重model转为bin），首先下载最新版bert4torch:

pip install git+https://www.github.com/Tongjilibo/bert4torch.git

教程如下：

#! -*- coding: utf-8 -*-
# 基本测试：llama的7b模型的测试, fp32精度的单卡占用约27g，fp16的显存占用约14g
# 使用前需要安装最新版本的bert4torch并进行权重转换 https://github.com/Tongjilibo/bert4torch/blob/master/examples/convert_script/convert_llama_facebook.py


# 0. Install lastest bert4torch: `pip install git+https://www.github.com/Tongjilibo/bert4torch.git` or git clone
# 1. Download weights：[Github](https://github.com/facebookresearch/llama) | [huggingface](https://huggingface.co/decapoda-research/llama-7b-hf) | [torrent](https://pan.baidu.com/s/1yBaYZK5LHIbJyCCbtFLW3A?pwd=phhd)，本人实现是基于第三种
# 2. Convert weights：https://github.com/Tongjilibo/bert4torch/blob/master/examples/convert_script/convert_llama_facebook.py
# 3. Inference script：https://github.com/Tongjilibo/bert4torch/blob/master/examples/basic/basic_language_model_llama.py
# 4. VRAM request in single gpu：fp32 27G, fp16 14g

import torch
from bert4torch.models import build_transformer_model
from bert4torch.tokenizers import SpTokenizer
from bert4torch.snippets import AutoRegressiveDecoder

config_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_config.json'
checkpoint_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_pytorch_model.bin'
spm_path = 'F:/Projects/pretrain_ckpt/llama/tokenizer.model'
device = 'cuda' if torch.cuda.is_available() else 'cpu'

tokenizer = SpTokenizer(spm_path, token_start='<s>', token_end=None, keep_accents=True)

model = build_transformer_model(config_path=config_path, checkpoint_path=checkpoint_path, model='llama').half().to(device)  # 建立模型，加载权重

class ArticleCompletion(AutoRegressiveDecoder):
    @AutoRegressiveDecoder.wraps(default_rtype='logits')
    def predict(self, inputs, output_ids, states):
        token_ids = torch.cat([inputs[0], output_ids], 1)
        logits = model.predict([token_ids])
        return logits[:, -1, :]

    def generate(self, text, n=1, topp=0.95):
        token_ids, _ = tokenizer.encode(text)
        results = self.random_sample([token_ids], n, topp=topp)  # 基于随机采样
        return [text + tokenizer.decode(ids.cpu().numpy()) for ids in results]

article_completion = ArticleCompletion(
    start_id=None,
    end_id=2,  # </s>标记
    maxlen=256,
    minlen=20,
    device=device
)

for text in [u'I believe the meaning of life is ']:
    print(article_completion.generate(text))

权重转换脚本：

bert4torch/convert_llama_facebook.py at master · Tongjilibo/bert4torch · GitHub

转换脚本底部有 bert4torch_config.json 文件配置：

{

"hidden_size": 4096,

"intermediate_size": 11008,

"multiple_of": 256,

"num_attention_heads": 32,

"num_hidden_layers": 32,

"norm_eps": 1e-06,

"hidden_act": "silu",

"vocab_size": 32000,