AI大模型探索之路-应用篇16：GLM大模型-ChatGLM3 API开发实践

这篇具有很好参考价值的文章主要介绍了AI大模型探索之路-应用篇16：GLM大模型-ChatGLM3 API开发实践。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

一、ChatGLM3-6B模型API调用

1. 导入相关的库

2. 加载tokenizer

3. 加载预训练模型

4. 实例化模型

5.调用模型并获取结果

二、OpenAI风格的代码调用

1. Openai api 启动

2. 使用curl命令测试返回

3. 使用Python发送POST请求测试返回

4. 采用GLM提供的chat对话方式

5. Embedding处理

三、Function call 天气查询简单样例

1. 导入依赖，定义模型客户端

2. 定义工具函数

3. 调用测试

四、Function call 天气查询外部API实践

1. 获取天气API演示

2. 直接跟大模型问天气

3. 基于Function call调用天气API

4. 函数调用测试

总结

前言

本章节旨在深入探索ChatGLM3所提供的丰富API接口，不仅覆盖GLM特有的代码风格API开发实践，还包括遵循OpenAI风格的API开发方式。我们将通过具体的操作步骤和实际案例，详细阐述如何利用核心工具Function Call来扩展模型功能，以及如何将这种强大的功能直接应用于实际应用中。随着内容的展开，读者将逐步掌握如何灵活使用ChatGLM3-6B的强大功能，开发出智能化的解决方案，满足不断变化的业务需求。

一、ChatGLM3-6B模型API调用

1. 导入相关的库

引入与GLM模型交互所需的Python库。这通常包括用于模型推理的特定库以及通用的数据处理库。AutoTokenizer 是一个分词器；AutoModel是基础模型

from transformers import AutoTokenizer, AutoModel

2. 加载tokenizer

Tokenizer是用于将文本转换为模型可理解的数字序列的工具。由于其需要从远程服务器加载预训练的词汇表，首次运行时可能会消耗较长时间。

使用 AutoTokenizer.from_pretrained 方法，加载预训练的tokenizer "THUDM/chatglm3-6b" 。trust_remote_code=True 表示信任远程代码。

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True)

3. 加载预训练模型

使用 AutoModel.from_pretrained 方法，加载预训练的模型 "THUDM/chatglm3-6b" 到CUDA设备上。 trust_remote_code=True 表示信任远程代码（如果有）， device='cuda' 表示将模型加载到CUDA设备上以便使用GPU加速。

要注意的是，根据显卡显存的不同，需要考虑加载不同精度的模型。13GB显存以上的显卡可以直接按照上述代码加载全精度的模型。

1）非量化（默认FP16）

model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True,device='cuda')

使用非量化加载，目前权重相关的文件被维护人员修改了，导致量化加载时会报错，问题暂未修复

2）量化处理加载模型（量化为int8）

model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True).quantize(8).cuda()

3）量化处理加载模型（量化为int4）

model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True).quantize(4).cuda()

4）多GPU情况

如果有多张 GPU，但是每张 GPU 的显存大小都不足以容纳完整的模型，那么可以将模型切分在多张GPU上。首先安装 accelerate:

pip install accelerate

然后通过如下方法加载模型：

from utils import load_model_on_gpus
#如果存在2个GPU
model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)

备注：这里默认会直接使用Hugging Face的模型库中开源的预训练模型进行开发测试；我们可以将Hugging Face上的"THUDM/chatglm3-6b"的相关模型文件全部下载到本地，然后将代码中地址换为本地的目录。

AI大模型探索之路-应用篇16：GLM大模型-ChatGLM3 API开发实践,AIGC-AI大模型探索之路,AIGC,AI编程,人工智能,python,语言模型

4. 实例化模型

接下来则需要对模型进行实例化操作，并且设置为评估模式：

model = model.eval()

5. 调用模型并获取结果

一旦模型被成功实例化，我们就可以向其提供输入并获取输出结果了。此步骤是实际进行模型推理的地方，也是API服务的核心功能之一。

response, history = model.chat(tokenizer, "你好", history=[])

#打印查看响应内容

print(response)

AI大模型探索之路-应用篇16：GLM大模型-ChatGLM3 API开发实践,AIGC-AI大模型探索之路,AIGC,AI编程,人工智能,python,语言模型

#打印查看历史记录

print(history)

AI大模型探索之路-应用篇16：GLM大模型-ChatGLM3 API开发实践,AIGC-AI大模型探索之路,AIGC,AI编程,人工智能,python,语言模型

二、OpenAI风格的代码调用

安装依赖：# pip install openai

 pip install openai

1. Openai api 启动

OpenAI风格代码调用时，不需要自己加载分词器和模型，但需要先启动一个OpenAI api服务（用来对外提供openai风格接口时，在内部GLM会将openai的代码风格转化为自己的代码风格，在ChatGLM3/openai_api_demo目录下有一个api_server.py文件）

api服务启动：python api_server.py

AI大模型探索之路-应用篇16：GLM大模型-ChatGLM3 API开发实践,AIGC-AI大模型探索之路,AIGC,AI编程,人工智能,python,语言模型

备注说明：api_server.py源码中会进行拉取huggingface上的模型，进行加载，运行到本地服务器的GPU上。

2. 使用curl命令测试返回

通过curl命令，我们可以在终端中直接发送HTTP请求来测试API的响应情况。

curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \

 -H "Content-Type: application/json" \

 -d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"system\", \"content\": \"You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.\"}, {\"role\": \"user\", \"content\": \"你好，给我讲一个故事，大概100字\"}], \"stream\": false, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"

执行结果：

AI大模型探索之路-应用篇16：GLM大模型-ChatGLM3 API开发实践,AIGC-AI大模型探索之路,AIGC,AI编程,人工智能,python,语言模型

3. 使用Python发送POST请求测试返回

除了curl外，我们还可以使用Python编写脚本，通过POST请求的方式向API发送数据，并获取返回结果。

导入依赖，封装POST请求

import requests
import json

base_url = "http://127.0.0.1:8000" # 前面本地启动的API服务地址
def create_chat_completion(model, messages):
    data = {
        "model": model, # 模型名称
        "messages": messages, # 会话历史
        "max_tokens": 100, # 最多生成字数
        "temperature": 0.8, # 温度
        "top_p": 0.8, # 采样概率
    }
    response = requests.post(f"{base_url}/v1/chat/completions", json=data)
    decoded_line = response.json()
    content = decoded_line.get("choices", [{}])[0].get("message", "").get("content", "")
    return content

代码调用

chat_messages = [
        {
            "role": "system",
            "content": "You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.",
        },
        {
            "role": "user",
            "content": "你好，给我讲一个故事，大概100字"
        }
    ]
content = create_chat_completion("chatglm3-6b", chat_messages)
print(content)

输出结果：

从前有个美丽的小村庄，村子里的居民过着和谐的生活。村子里有一位聪明的老爷爷，他总是能给出最好的建议。有一天，村里发生了一件大事，一个恶龙来到了村子里，偷走了人们的宝藏。

村子里的人都很害怕，不知道该如何解决这个问题。但是，老爷爷却很镇定，他决定带领村民们一起打倒恶龙。经过艰苦的努力，他们最终成功地击败了恶龙，夺回了失去的宝藏。

4. 采用GLM提供的chat对话方式

GLM模型还支持以对话的形式进行交互，这使得我们可以像与人交谈一样与模型进行对话。

from openai import OpenAI

base_url = "http://127.0.0.1:8000/v1/"
client = OpenAI(api_key="EMPTY", base_url=base_url)

def simple_chat(use_stream=True):
    messages = [
        {
            "role": "system",
            "content": "You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's "
                       "instructions carefully. Respond using markdown.",
        },
        {
            "role": "user",
            "content": "你好，请你用生动的话语给我讲一个小故事吧"
        }
    ]

    response = client.chat.completions.create(
        model="chatglm3-6b", # 模型名称
        messages=messages, # 会话历史
        stream=use_stream,# 指定是否使用流式传输模式，如果设置为True，则返回一个生成器对象，可以逐个获取生成的文本片段；如果设置为False，则一次性返回完整的生成结果。
        max_tokens=256, # 最多生成字数
        temperature=0.8, # 温度
        presence_penalty=1.1,#控制生成回答时对已出现词汇的惩罚强度，较高的值会减少重复词汇的出现
        top_p=0.8) # 采样概率
    if response:
        if use_stream:
            for chunk in response:
                print(chunk.choices[0].delta.content)
        else:
            content = response.choices[0].message.content
            print(content)
    else:
        print("Error:", response.status_code)

if __name__ == "__main__":

    simple_chat(use_stream=False)
    #simple_chat(use_stream=True)

输出：

从前，在一个遥远的国度里，有一个美丽的小村庄。村子里的人们过着和谐的生活，每天都充满欢声笑语。在这个小村庄里，有一位聪明、善良的少年，名叫小明。

有一天，村子里来了一个神秘的旅行者。他带着一颗魔法石，说这颗石子可以实现一个人最真诚的愿望。村民们为了寻求幸福和美好，都纷纷向旅行者许愿。可是， after 每个人完成愿望后，旅行者却离开了村子，没有兑换他的承诺。

于是，小明决定站出来，为了村子的和平与幸福，他追踪旅行者，希望能把魔法石带回来，让每个人都有机会实现自己的梦想。在旅途中，他经历了种种困难和挑战，但都凭借着自己的勇敢和智慧克服了它们。最终，他找到了旅行者，成功地将魔法石带回村子。

回到村子后，小明将魔法石交给村长。村长将这颗神奇的石头放在了村子的中心广场上，希望它能给村子带来幸福和繁荣。从此，村子变得更加美好，人们的生活也变得越来越幸福。而小明，则成为了这个故事中最勇敢、善良的英雄，被村民们传颂千古。

5. Embedding处理

GLM中还提供了嵌入模型bge-large-zh-1.5

from openai import OpenAI

base_url = "http://127.0.0.1:8000/v1/"
client = OpenAI(api_key="EMPTY", base_url=base_url)

def embedding():
    response = client.embeddings.create(
        model="bge-large-zh-1.5",
        input=["你好，给我讲一个故事，大概100字"],
    )
    embeddings = response.data[0].embedding
    print("嵌入完成，维度：", len(embeddings))

if __name__ == "__main__":
embedding()

输出：

嵌入完成，维度： 1024

三、Function call 天气查询简单样例

1. 导入依赖，定义模型客户端

from openai import OpenAI

base_url = "http://127.0.0.1:8000/v1/"
client = OpenAI(api_key="EMPTY", base_url=base_url)

2. 定义工具函数

定义一些工具函数，用于辅助完成特定的任务，例如查询天气信息

tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]

3. 调用测试

最后，我们可以调用这些工具函数，并观察模型的响应结果，以验证功能是否正常工作。

messages = [{"role": "user", "content": "What's the weather like in BeiJing?"}]
    response = client.chat.completions.create(
        model="chatglm3-6b",
        messages=messages,
        tools=tools,
        tool_choice="auto",
    )
    content = response.choices[0].message.content
    content

输出

"get_current_weather\n ```python\ntool_call(location='Beijing')\n```"

四、Function call 天气查询外部API实践

在实际应用中，我们可能需要将GLM模型与其他外部API进行结合。例如，获取实时天气信息通常需要调用专门的天气API。

1. 获取天气API演示

提前申请API KEY： open_weather_key = "c6748580ef08742adf7bd05d79dd8e5d"

import json
import requests

def get_weather(loc):
    """
    查询即时天气函数
    :param loc: 必要参数，字符串类型，用于表示查询天气的具体城市名称，\
    注意，中国的城市需要用对应城市的英文名称代替，例如如果需要查询北京市天气，则loc参数需要输入'Beijing'；
    :return：OpenWeather API查询即时天气的结果，具体URL请求地址为：https://api.openweathermap.org/data/2.5/weather\
    返回结果对象类型为解析之后的JSON格式对象，并用字符串形式进行表示，其中包含了全部重要的天气信息
    """
    # Step 1.构建请求
    url = "https://api.openweathermap.org/data/2.5/weather"

    # Step 2.设置查询参数
    params = {
        "q": loc,               
        "appid": open_weather_key,    # 输入API key
        "units": "metric",            # 使用摄氏度而不是华氏度
        "lang":"zh_cn"                # 输出语言为简体中文
    }

    # Step 3.发送GET请求
    response = requests.get(url, params=params)

    # Step 4.解析响应
    data = response.json()
return json.dumps(data)

#调用打印结果
data=get_weather("BeiJing")
data

返回结果

'{"coord": {"lon": 116.3972, "lat": 39.9075}, "weather": [{"id": 801, "main": "Clouds", "description": "\\u6674\\uff0c\\u5c11\\u4e91", "icon": "02d"}], "base": "stations", "main": {"temp": 15.94, "feels_like": 14.09, "temp_min": 15.94, "temp_max": 15.94, "pressure": 1011, "humidity": 19, "sea_level": 1011, "grnd_level": 1005}, "visibility": 10000, "wind": {"speed": 4.98, "deg": 299, "gust": 8.94}, "clouds": {"all": 12}, "dt": 1700633608, "sys": {"type": 1, "id": 9609, "country": "CN", "sunrise": 1700607993, "sunset": 1700643269}, "timezone": 28800, "id": 1816670, "name": "Beijing", "cod": 200}'

2. 直接跟大模型问天气

直接询问GLM大模型的天气情况，看他怎么返回结果

##测试模型
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True).quantize(8).cuda()
model = model.eval()

## 测试模型调用
response, history = model.chat(tokenizer, "你好，北京天气怎么样？", history=[])
print(response)

返回结果：

你好！作为一个人工智能助手，我无法实时获取北京的最新天气信息。但你可以查看天气预报或使用天气应用来了解当前的北京天气。

3. 基于Function call调用天气API

def run_conv_glm(query,tokenizer, history, model,functions_list=None, functions=None, return_function_call=True):

    """
    能够自动执行外部函数调用的Chat对话模型
    :param messages: 必要参数，输入到Chat模型的messages参数对象
    :param functions_list: 可选参数，默认为None，可以设置为包含全部外部函数的列表对象
    :param model: Chat模型，可选参数，默认模型为chatglm3-6b
    :return：Chat模型输出结果
    """

    # 如果没有外部函数库，则执行普通的对话任务
    if functions_list == None:
        response, history = model.chat(tokenizer, query, history=history)
        final_response = response

    # 若存在外部函数库，则需要灵活选取外部函数并进行回答
    else:
        # 创建调用外部函数的system_message
        system_info = {
            "role": "system",
            "content": "Answer the following questions as best as you can. You have access to the following tools:",
            "tools": functions,
        }

        # 创建外部函数库字典
        available_functions = {func.__name__: func for func in functions_list}
        history=[system_info]

        ## 第一次调用，目的是获取函数信息
        response,history = model.chat(tokenizer, query, history=history)
        # 需要调用外部函数
        function_call = response

        # 获取函数名
        function_name = function_call["name"]

        # 获取函数对象
        fuction_to_call = available_functions[function_name]

        # 获取函数参数
        function_args = function_call['parameters']

        # 将函数参数输入到函数中，获取函数计算结果
        function_response = fuction_to_call(**function_args)

        ## 第二次调用，带入进去函数
        # role="observation" 表示输入的是工具调用的返回值而不是用户输入
        # role:user,system,assistant,observation
        print(function_response)
        history=[]
        history.append(
                {
                    "role": "observation",#设置观察着角色
                    "name": function_name,
                    "content": function_response,#将函数调用返回的结果再次给到大模型，由模型进行整理后再给出更加易用理解，可读性更强的答案，否则返回的就是天气API直接返回的内容。
                }
        )  
        response, history = model.chat(tokenizer, query, history=history)
        final_response=response

    return final_response,history

4. 函数调用测试

最终，我们进行函数调用测试，查看整个流程的稳定性和准确性。

query = "请帮我查询一下北京的天气"
history=[]
functions_list = [get_weather]
functions=weather_api_spec

response,history = run_conv_glm(query=query,functions=functions,model=model,functions_list=functions_list,history=history,tokenizer=tokenizer)
print(response)

返回结果

{"coord": {"lon": 116.3972, "lat": 39.9075}, "weather": [{"id": 801, "main": "Clouds", "description": "\u6674\uff0c\u5c11\u4e91", "icon": "02d"}], "base": "stations", "main": {"temp": 15.94, "feels_like": 14.09, "temp_min": 15.94, "temp_max": 15.94, "pressure": 1011, "humidity": 19, "sea_level": 1011, "grnd_level": 1005}, "visibility": 10000, "wind": {"speed": 4.98, "deg": 299, "gust": 8.94}, "clouds": {"all": 12}, "dt": 1700634535, "sys": {"type": 1, "id": 9609, "country": "CN", "sunrise": 1700607993, "sunset": 1700643269}, "timezone": 28800, "id": 1816670, "name": "Beijing", "cod": 200}

北京现在的天气是：Clouds，温度为15.94℃，湿度为19%。