4bit/8bit 启动 Mixtral 8*7B 大语言模型-Toy模板网

这篇具有很好参考价值的文章主要介绍了4bit/8bit 启动 Mixtral 8*7B 大语言模型。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

0. 背景

个人电脑配置实在难以以 float16 运行 Mixtral 8*7B 大语言模型，所以参数 4bit 或者 8bit 来启动。

实际测试结果，4bit 时推理速度明显变快了，8bit 时推理也非常慢。

使用的推理框架时 fastchat。

1. 修改代码

vi fastchat/model/model_adapter.py

修改前，

class MistralAdapter(BaseModelAdapter):
    """The model adapter for Mistral AI models"""

    def match(self, model_path: str):
        return "mistral" in model_path.lower() or "mixtral" in model_path.lower()

    def load_model(self, model_path: str, from_pretrained_kwargs: dict):
        model, tokenizer = super().load_model(model_path, from_pretrained_kwargs)
        model.config.eos_token_id = tokenizer.eos_token_id
        model.config.pad_token_id = tokenizer.pad_token_id
        return model, tokenizer

修改后，

class MistralAdapter(BaseModelAdapter):
    """The model adapter for Mistral AI models"""

    def match(self, model_path: str):
        return "mistral" in model_path.lower() or "mixtral" in model_path.lower()

    def load_model(self, model_path: str, from_pretrained_kwargs: dict):
        # model, tokenizer = super().load_model(model_path, from_pretrained_kwargs)
        tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
        if "mixtral" in model_path.lower():
            model = AutoModelForCausalLM.from_pretrained(
                model_path,
                low_cpu_mem_usage=True,
                trust_remote_code=True,
                # attn_implementation="flash_attention_2",
                # load_in_8bit=True,
                load_in_4bit=True,
                **from_pretrained_kwargs,
            )
        else:
            model = AutoModelForCausalLM.from_pretrained(
                model_path,
                low_cpu_mem_usage=True,
                trust_remote_code=True,
                **from_pretrained_kwargs,
            )
        model.config.eos_token_id = tokenizer.eos_token_id
        model.config.pad_token_id = tokenizer.pad_token_id
        return model, tokenizer

完结！文章来源地址https://www.toymoban.com/news/detail-799733.html

到了这里，关于4bit/8bit 启动 Mixtral 8*7B 大语言模型的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！