测试大语言模型在嵌入式设备部署的可能性——模型TinyLlama-1.1B-Chat-v1.0-Toy模板网

这篇具有很好参考价值的文章主要介绍了测试大语言模型在嵌入式设备部署的可能性——模型TinyLlama-1.1B-Chat-v1.0。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

测试模型TinyLlama-1.1B-Chat-v1.0修改推理参数，观察参数变化与推理时间变化之间的关系。
本地环境：

处理器 Intel® Core™ i5-8400 CPU @ 2.80GHz 2.80 GHz
机带 RAM 16.0 GB (15.9 GB 可用)
集显 Intel® UHD Graphics 630
独显 NVIDIA GeForce GTX 1050

主要测试修改：

outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)

源代码来源（镜像）：https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0文章来源地址https://www.toymoban.com/news/detail-861458.html

'''
https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0
测试tinyLlama 1.1B效果不错，比Qwen1.8B经过量化的都好很多
'''

# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import os
from datetime import datetime
import torch

os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
from transformers import pipeline

'''
pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://hf-mirror.com/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    # {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
    {"role": "user", "content": "你叫什么名字?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
'''

# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...
def load_pipeline():
    pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16,
                    device_map="auto")
    return pipe

def generate_text(content, length=20):
    """
    根据给定的prompt生成文本
    """
    messages = [
        {
            "role": "提示",
            "content": "这是个友好的聊天机器人...",
        },
        # {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
        {"role": "user", "content": content},
    ]
    prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    datetime1 = datetime.now()
    outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
    print(outputs[0]["generated_text"])
    datetime2 = datetime.now()
    time12_interval = datetime2 - datetime1
    print("时间间隔", time12_interval)
    if False:
        outputs = pipe(prompt, max_new_tokens=32, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
        print(outputs[0]["generated_text"])
        datetime3 = datetime.now()
        time23_interval = datetime3 - datetime2
        print("时间间隔2", time23_interval)
        outputs = pipe(prompt, max_new_tokens=32, do_sample=False, top_k=50)
        print(outputs[0]["generated_text"])
        datetime4 = datetime.now()
        time34_interval = datetime4 - datetime3
        print("时间间隔3", time34_interval)
        outputs = pipe(prompt, max_new_tokens=32, do_sample=True, temperature=0.7, top_k=30, top_p=0.95)
        print(outputs[0]["generated_text"])
        datetime5 = datetime.now()
        time45_interval = datetime5 - datetime4
        print("时间间隔4", time45_interval)
        outputs = pipe(prompt, max_new_tokens=32, do_sample=False, top_k=30)
        print(outputs[0]["generated_text"])
        datetime6 = datetime.now()
        time56_interval = datetime6 - datetime5
        print("时间间隔5", time56_interval)
        outputs = pipe(prompt, max_new_tokens=12, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
        print(outputs[0]["generated_text"])
        datetime7 = datetime.now()
        time67_interval = datetime7 - datetime6
        print("时间间隔6", time67_interval)

    '''
    结论：修改top_p不会显著降低推理时间，并且中英文相同的问题，中文问题推理时间是英文的两倍
    do_sample修改成False基本不会降低推理时间
    只有max_new_tokens才能显著降低推理时间，但是max_new_tokens与推理时间不是呈线性关系
    比如max_new_tokens=256，推理时间2分钟
    当max_new_tokens=32的时候，推理时间才会变成约1分钟
    因此，不如将max_new_tokens设置大些用于获取比较完整的答案
    '''

    return outputs

if __name__ == "__main__":
    '''
    main function
    '''
    global pipe
    pipe = load_pipeline()

    # print('load pipe ok')

    while True:
        prompt = input("请输入一个提示（或输入'exit'退出）：")
        if prompt.lower() == 'exit':
            break
        try:
            generated_text = generate_text(prompt)
            print("生成的文本：")
            print(generated_text[0]["generated_text"])
        except Exception as e:
            print("发生错误：", e)

请输入一个提示（或输入'exit'退出）：如何开门？
<|user|>
如何开门？</s>
<|assistant|>
Certainly! Opening a door is a simple process that involves several steps. Here are the general steps to follow to open a door:

1. Turn off the lock: Turn off the lock with the key by pressing the "lock" button.

2. Press the handle: Use the handle to push the door open. If the door is mechanical, you may need to turn a knob or pull the door handle to activate the door.

3. Release the latch: Once the door is open, release the latch by pulling it backward.

4. Slide the door: Slide the door forward by pushing it against the wall with your feet or using a push bar.

5. Close the door: Once the door is open, close it by pressing the lock button or pulling the handle backward.

6. Use a second key: If the lock has a second key, make sure it is properly inserted and then turn it to the correct position to unlock the door.

Remember to always double-check the locks before opening a door, as some locks can be tricky to open. If you're unsure about the correct procedure for opening a door,
时间间隔 0:04:23.561065
生成的文本：
<|user|>
如何开门？</s>
<|assistant|>
Certainly! Opening a door is a simple process that involves several steps. Here are the general steps to follow to open a door:

1. Turn off the lock: Turn off the lock with the key by pressing the "lock" button.

2. Press the handle: Use the handle to push the door open. If the door is mechanical, you may need to turn a knob or pull the door handle to activate the door.

3. Release the latch: Once the door is open, release the latch by pulling it backward.

4. Slide the door: Slide the door forward by pushing it against the wall with your feet or using a push bar.

5. Close the door: Once the door is open, close it by pressing the lock button or pulling the handle backward.

6. Use a second key: If the lock has a second key, make sure it is properly inserted and then turn it to the correct position to unlock the door.

Remember to always double-check the locks before opening a door, as some locks can be tricky to open. If you're unsure about the correct procedure for opening a door,
请输入一个提示（或输入'exit'退出）：