通义千问Qwen模型运行异常解决记录：FlashAttention only supports Ampere GPUs or newer-Toy模板网

这篇具有很好参考价值的文章主要介绍了通义千问Qwen模型运行异常解决记录：FlashAttention only supports Ampere GPUs or newer。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

通过langchain调用Qwen/Qwen-1_8B-Chat模型时，对话过程中出现报错提示：

ERROR: object of type 'NoneType' has no len()
Traceback (most recent call last):
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain/chains/base.py", line 385, in acall
    raise e
  File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain/chains/base.py", line 379, in acall
    await self._acall(inputs, run_manager=run_manager)
  File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain/chains/llm.py", line 275, in _acall
    response = await self.agenerate([inputs], run_manager=run_manager)
  File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain/chains/llm.py", line 142, in agenerate
    return await self.llm.agenerate_prompt(
  File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 506, in agenerate_prompt
    return await self.agenerate(
  File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 466, in agenerate
    raise exceptions[0]
  File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 569, in _agenerate_with_cache
    return await self._agenerate(
  File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_community/chat_models/openai.py", line 519, in _agenerate
    return await agenerate_from_stream(stream_iter)
  File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 85, in agenerate_from_stream
    async for chunk in stream:
  File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_community/chat_models/openai.py", line 490, in _astream
    if len(chunk["choices"]) == 0:
TypeError: object of type 'NoneType' has no len()

很疑惑，其他LLM模型都能正常运行，唯独Qwen不行。
查了很多资料，众说纷纭，未解决。
于是仔细看报错信息，最后一行报错说 File “/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_community/chat_models/openai.py”, line 490有问题，那就打开490行附近，看看源码：

if not isinstance(chunk, dict):
   chunk = chunk.dict()
if len(chunk["choices"]) == 0:
   continue
choice = chunk["choices"][0]

应该就是这个chunk里面没有choices导致的报错。
那我们把这个chunk打印一下，看看他里面有些什么，于是修改这个文件代码为：

if not isinstance(chunk, dict):
   chunk = chunk.dict()
print(f'chunk:{chunk}')
if len(chunk["choices"]) == 0:
   continue
choice = chunk["choices"][0]

再次运行，看到chunk的输出为：

chunk:{'id': None, 'choices': None, 'created': None, 'model': None, 'object': None, 'system_fingerprint': None, 'text': '**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**\n\n(FlashAttention only supports Ampere GPUs or newer.)', 'error_code': 50001}

终于看到真正的错误信息了：NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE：FlashAttention only supports Ampere GPUs or newer。
看样子真正出问题的点在flash-attention上。
翻看huggingface上通义千问的安装说明：

依赖项（Dependency）
运行Qwen-1.8B-Chat，请确保满足上述要求，再执行以下pip命令安装依赖库
pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed

另外，推荐安装flash-attention库（当前已支持flash attention 2），以实现更高的效率和更低的显存占用。
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
# 下方安装可选，安装可能比较缓慢。
# pip install csrc/layer_norm
# pip install csrc/rotary

按照文档，flash-attention是安装好了的，问题应该不是出在安装上面。
在qwenlm的issue上看到说要卸载flash-atten：https://github.com/QwenLM/Qwen/issues/438
然后在huggingface社区看到对这个问题的解释：https://huggingface.co/Qwen/Qwen-7B-Chat/discussions/37：

flash attention是一个用于加速模型训练推理的可选项，且仅适用于Turing、Ampere、Ada、Hopper架构的Nvidia GPU显卡（如H100、A100、RTX 3090、T4、RTX 2080），您可以在不安装flash attention的情况下正常使用模型进行推理。

再一核对我自己的GPU，了然了，原来是我的GPU不适用于flash attention！
所以，解决方案就是：文章来源地址https://www.toymoban.com/news/detail-812227.html

pip uninstall flash-atten

到了这里，关于通义千问Qwen模型运行异常解决记录：FlashAttention only supports Ampere GPUs or newer的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

通义千问Qwen模型运行异常解决记录：FlashAttention only supports Ampere GPUs or newer

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

支付宝扫一扫领取红包，优惠每天领

二维码1

二维码2