【语音识别】落地实现--离线智能语音助手-Toy模板网

这篇具有很好参考价值的文章主要介绍了【语音识别】落地实现--离线智能语音助手。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

参考：基于python和深度学习（语音识别、NLP）实现本地离线智能语音控制终端（带聊天功能和家居控制功能）

基于V3S的语音助手（三）移植pocketsphnix关键词唤醒
基于V3S的语音助手（二）移植pyaudio到开发板

基于V3S的语音助手（一）python3的编译和安装（该版本解决zlib readline可以使用pip）

整体的开发逻辑是：在自己的电脑上实现功能，再移植到嵌入式Linux系统上去，主要的实现方式就是相关的软件都要进行交叉编译。

一、休眠和语音唤醒

这里我以pocketsphinx实现语音唤醒为例子：

1.环境配置

pip install pocketsphinx 
pip install pyaudio

如果这样安装失败的，就需要我们到网上下载好指定whl文件，再进行离线安装。

https://www.lfd.uci.edu/~gohlke/pythonlibs/

到上面的那个网站找到 pocketsphinx 和 pyaudio 的whl文件，这里要注意选择对应你的操作系统

和 python 的版本，下载后放到项目文件夹下，在pycharm的终端中输入：

pip install 包的全名

进行本地离线安装。

比如：我在window上python3.7 进行离线安装pocketsphinx,这里要注意文件的路径，否则会报错找不到指定文件

pip install pocketsphinx-0.1.15-cp37-cp37m-win_amd64.whl

2.如何定制训练自己的唤醒词

建立一个名为keyword的文件，里面输入你想要的唤醒词，以及读音相近的词（相似读音的词越多灵敏度越高），比如：我想训练的终端名为 COCO,则我的keyword内容为：

gogo
coco
yoyo
bobo
lolo
momo
nono
hoho

打开网站：Sphinx Knowledge Base Tool VERSION 3

选择 keyword.txt 进行上传，得到对应的压缩包，下载后放到项目文件夹下，并解压得：
window离线语音唤醒怎么实现,语音识别,音频,语音识别,人工智能
下载下来以后，压缩包里面文件：

这里的数字都是网站随机生成的，不一样是正常的。

测试代码：

import os
from pocketsphinx import LiveSpeech, get_model_path
 
def wakeup_co():
    model_path = get_model_path()
    speech = LiveSpeech(
        verbose=False,
        sampling_rate=16000,
        buffer_size=2048,
        no_search=False,
        full_utt=False,
        hmm=os.path.join(model_path, 'en-us'),
        lm=os.path.join('.\\Sphinx_keyword\\keyword_COCO\\', '5995.lm'),
        dic=os.path.join('.\\Sphinx_keyword\\keyword_COCO\\', '5995.dic')
    )
    for phrase in speech:
        #print("phrase:", phrase)
        #print(phrase.segments(detailed=True))
        if str(phrase) in ["GOGO", "COCO", "YOYO",
                           "BOBO", "LOLO", "MOMO",
                           "NONO", "HOHO"]:
              print('我是COCO')

这里 ‘.\Sphinx_keyword\keyword_COCO\’ 是我存放文件的地址，5995 也需要根据你的文件来更改，需要根据你的地址来更改。

测试结果：

Allocating 32 buffers of 2500 samples each
我是COCO
 
进程已结束，退出代码为 0

若想使用中文词作为唤醒词，则需要下载好相关的中文文件：

Download CMU Sphinx from SourceForge.net

下载后放到项目文件夹中解压，得：cmusphinx-zh-cn-5.2 文件夹

和训练英文唤醒词一样，需要建立 keyword.txt 文件，我训练了一个名为“佩佩”的唤醒词：

贝贝
佩佩
蕾蕾
内内
嘿嘿
忒忒
得得
贼贼
尅尅

打开网站：Sphinx Knowledge Base Tool VERSION 3

上传keyword.txt后得到一个压缩包，下载后放到项目路径下并解压。

这里需要更改 dic 为后缀的文件内容，需要在中文后面按格式添加上拼音和声调，所有间隔均为一个空格，修改后，例如：

佩佩 p ei4 p ei3	
内内 n ei4 n ei3	
嘿嘿 h ei4 h ei3	
尅尅 k ei4 k ei3	
得得 d ei4 d ei3	
忒忒 t ei4 t ei3	
蕾蕾 l ei4 l ei3	
贝贝 b ei4 b ei3	
贼贼 z ei4 z ei3

测试代码：

import os
from pocketsphinx import LiveSpeech, get_model_path
 
model_path = '.\\Sphinx_keyword\\cmusphinx-zh-cn-5.2\\'
 
speech = LiveSpeech(
    verbose=False,
    sampling_rate=16000,
    buffer_size=2048,
    no_search=False,
    full_utt=False,
    hmm=os.path.join(model_path ,'zh_cn.cd_cont_5000'),
    lm=os.path.join('.\\Sphinx_keyword\\keyword_PeiPei\\', '0738.lm'),
    dic=os.path.join('.\\Sphinx_keyword\\keyword_PeiPei\\', '0738.dic')
)
for phrase in speech:
    print("phrase:", phrase)
    print(phrase.segments(detailed=True))
    if str(phrase) in ["贝贝", "佩佩", "蕾蕾",
                       "内内", "嘿嘿", "忒忒",
                       "得得", "贼贼", "尅尅",]:
        print("我是佩佩")

这里 model_path 的路径需要指向解压后中文文件夹 cmusphinx-zh-cn-5.2 里面的 zh_cn.cd_cont_5000 文件夹。

‘.\Sphinx_keyword\keyword_PeiPei\’ 和 0738 需要根据你的路径和文件名来修改指向你下载解压好的文件。

测试结果:

Allocating 32 buffers of 2500 samples each
phrase: 尅尅
[('<s>', 0, 4359242, 4359325), ('<sil>', -1331, 4359326, 4359479), ('<sil>', -1331, 4359480, 4359801), ('<sil>', -1331, 4359802, 4359940), ('尅尅', 0, 4359941, 4360080)]
我是佩佩
phrase: 得得
[('<s>', 0, 8516377, 8516528), ('<sil>', -5375, 8516529, 8516683), ('<sil>', -2035, 8516684, 8516764), ('得得', 0, 8516765, 8516968), ('</s>', 0, 8516969, 8516980)]
我是佩佩
phrase: 嘿嘿
[('<s>', 0, 10674834, 10675304), ('嘿嘿', -3628, 10675305, 10675382), ('</s>', 0, 10675383, 10675385)]
我是佩佩

中英两种唤醒词使用一个就可以了，英文版的比较灵敏，中文版的可能比较迟钝，所以这里我还是推荐训练使用英文版的，快捷且灵敏。

3.落地实现–运行在ARM-Linux板上–交叉编译

前面使用pip安装的pocketsphinx 和 pyaudio 要想跑在ARM板上需要使用ARM板对应的交叉编译器进行交叉编译。

二、学会听

这里我基于pytorch的speechbrain的预训练模型，训练得出一个中文语音识别系统，

可以实现中文语音转转化为文字输出。

1.基本的环境配置，后面的缺什么就 pip 什么

pip install speechbrain
pip install SoundFile
pip install sox
pip install speech_recognition

2.实现语音的接收并保存为wav文件

这里基于speech_recognition包进行实现。

在读取语音等到没有语音输入时，自动停止并保存。

测试代码：

import speech_recognition as sr   #pyaudio SpeechRecognition模块
 
from myself_word_to_voice import speakout
 
def rec(rate=16000):     #从系统麦克风拾取音频数据，采样率为 16000
    r = sr.Recognizer()
    with sr.Microphone(sample_rate=rate) as source:
        sayword = 'coco在听呢'
        print(sayword)  #这里会打印please say something，提示你说话进行录音
        audio = r.listen(source)
 
    with open("recording.wav", "wb") as f:   #把采集到的音频数据以 wav 格式保存在当前目录下的recording.wav 文件
        f.write(audio.get_wav_data())
        print('您说的我已经收到了')
    return 1
 
rec()

测试结果：打开项目文件夹，直接双击打开recording.wav文件即可听到你说的话。

3.接收好中文语音的wav文件读取转化为文本输出

这里我给出官网和github的地址，各位可以自行尝试：

SpeechBrain: A PyTorch Speech Toolkit

GitHub - speechbrain/speechbrain: A PyTorch-based Speech Toolkit

这里我选择的是普通话，下载好预训练模型后，网站里也有教程，各位可以自行尝试训练使用。

这里我直接给出实现的码源和模型，配置好环境后可以直接使用。

SpeechBrain(中文语音识别).zip-深度学习文档类资源-CSDN下载

测试结果：

The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
start...
有奶会有的面包括有的一切都会有的调解里数和语音可以达到更好的效果
 
进程已结束，退出代码为 0

三、学会聊天

这里我基于chatterbot和第三方的语义库，高度定制化地训练自己的机器人聊天对话系统

1.环境配置：

pip install chatterbot
pip install chatterbot_corpus

可能的报错：

OSError: [E053] Could not read config.cfg from C:\Users\pc\AppData\Local\Programs\Python\Python38\Lib\site-packages\en_core_web_md\en_core_web_md-2.2.5\config.cfg。

解决办法：

pip uninstall spacy
pip install -i https://pypi.douban.com/simple spacy==2.2.2

2.尝试训练官方的中文数据集并使用

from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer
import logging
 
'''
This is an example showing how to train a chat bot using the
ChatterBot Corpus of conversation dialog.
'''
 
# Enable info level logging
# logging.basicConfig(level=logging.INFO)
chatbot = ChatBot('Example Bot')
 
# Start by training our bot with the ChatterBot corpus data
trainer = ChatterBotCorpusTrainer(chatbot)
 
def train():
    trainer.train(
        'chatterbot.corpus.chinese'
    )
 
def chat(word = ''):
    word = chatbot.get_response(word)
    return word
 
def test1():
    train()
    while 1:
        print(chatbot.get_response(input(">")))
 
test1()

若无报错则，可以继续下一步，训练自己的数据集，实现高度定制化。

3.我提供一个数据集：

corpus.txt用于训练自己的聊天机器人-深度学习文档类资源-CSDN下载

大概长这样：

下载后，在你的项目文件夹中建立一个名为 corpus 的文件夹，把下载好的 corpus.txt 放进去。

进行训练，代码：

from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer
from chatterbot.trainers import ChatterBotCorpusTrainer
 
 
# 构建ChatBot并指定Adapter
my_bot = ChatBot(
    'COCO',
    storage_adapter='chatterbot.storage.SQLStorageAdapter',
    logic_adapters=[
        {
            'import_path': 'chatterbot.logic.BestMatch',
            'threshold': 0.65,#低于置信度，则默认回答
            'default_response':'coco没听懂'
        }
    ]
)
 
 
def train_myword():
    file = open("./corpus/corpus.txt", 'r', encoding='utf-8')
    corpus = []
    print('开始加载语料！')
    # 导入语料库
    while 1:
        try:
            line = file.readline()
            if not line:
                break
            if line == '===\n':
                continue
            temp = line.strip('\n')
            # print(temp)
            corpus.append(temp)
        except:
            pass
    file.close()
    print('语料加载完毕')
    print('》'*30)
 
    #my_bot = ChatBot("coco")
    #my_bot.set_trainer(ListTrainer)
    trainer = ListTrainer(my_bot)
    print('开始训练！')
    trainer.train(corpus[:10000])
    print('训练完毕！')
 
def chat1():
    while True:
        print(my_bot.get_response(input("user:")))
 
def chat_my(word = ''):
    word = my_bot.get_response(word)
    return word
 
def test1():
    train_myword()
    chat1()
 
test1()

训练完毕：

开始加载语料！
语料加载完毕
》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》
开始训练！
List Trainer: [####################] 100%
训练完毕！

这里我只训练语库的前10000条对话，建议不要训练太多的条对话，贪多嚼不烂，就算训练出来了也有很高的回复延迟，甚至直接无法运行回复，非常影响用户体验。

4.进行数学运算和时间查询：

# -*- coding: utf-8 -*-
from chatterbot import ChatBot
 
bot = ChatBot(
    "Math & Time Bot",
    logic_adapters=[
        "chatterbot.logic.MathematicalEvaluation",
        "chatterbot.logic.TimeLogicAdapter"
    ],
    input_adapter="chatterbot.input.VariableInputTypeAdapter",
    output_adapter="chatterbot.output.OutputAdapter"
)
 
 
def chot_math_time(text=''):
    response = bot.get_response(text)
    return response
 
print(chot_math_time('what is 1 + 1'))
print(chot_math_time('现在几点了'))

结果：

1 + 1 = 2
The current time is 05:32 PM

进程已结束，退出代码为 0

由于该模块只支持英文,我们使用中文语音时，需要将语音中的数字进行剥离，具体代码如下：

def Split_num_letters(astr):
    nums = []
    astr = astr +'无'
    num1 = ''
    for i in range(len(astr)-1):
        if astr[i].isdigit()== True and astr[i+1].isdigit()==False:
            nums.append(num1)
            num1 = ''
        elif astr[i].isdigit() == False and astr[i+1].isdigit() == True:
            num1 = num1 + astr[i+1]
        elif astr[i].isdigit() == True and astr[i+1].isdigit() ==True:
                num1 = num1 + astr[i+1]
    if astr[0].isdigit():
        nums[0] = astr[0] + nums[0]
    print(nums)
    return nums
 
 
Split_num_letters('你知道120乘20等于多少吗')
Split_num_letters('120乘20等于多少吗')

结果：该函数配合其他函数使用，即可实现语音识别进行简单运算

['120', '20']
 
进程已结束，退出代码为 0

四、学会控制智能家电

基本原理:在arduino配置好代码，连接好电路，通过python向arduino在一定情况下传输某个信号，arduino在接收到指定信号时，执行指定动作。

1.环境配置:

pip install pyserial

2.实现python控制arduino

这里可以参考我的另一篇文章：

python与arduino通讯(windows和linux)_Leonard2021的博客-CSDN博客_树莓派和arduino通讯

这样来实现python与arduino的交互，arduino可以控制众多的电器，比如：灯，舵机，风扇等，通过舵机的旋转实现可以开关门，打开关闭各种大型电器的电闸等等；

arduino的众多传感器配件也可以为智能语音系统提供相关数据，比如：空气湿度，温度等，让它能更好的来控制相关的电器，让实现智能家居一体化控制。

有很大的想象空间和发展空间，这里我仅仅实现通过语音控制arduino自带的LED灯的开关，其他的控制只需要以相同的原理配置即可。

a.python的代码：

import serial  # 导入串口通讯库
import time
 
def try2():
    ser = serial.Serial("COM3", 9600, timeout=1)
    c = ''
    while 1:
 
        wakeup_co() #语音唤醒
        rec()  #将语音转化为wav文件
        listenword = listen() #将wav文件中的语音转化为中文文字 
        #这三个都是上面给出了的，需要自己命名后，导入
 
        if '灯' in listenword and '开' in listenword:
            c = '1'
        elif '灯' in listenword and '关' in listenword:
            c = '0'
        if (c == '0'):
            ser.write('0'.encode('utf-8'))
        if (c == '1'):
            ser.write('1'.encode('utf-8'))
 
try2()

b.arduino的代码：

void setup(){
  Serial.begin(9600);
  pinMode(13,OUTPUT);//设置13号端口作为输出端口
  //digitalWrite(13,HIGH);
}
char var;
void loop(){
  while(Serial.available()>0)
  {
      var=Serial.read();
      if(var == '0'){
        digitalWrite(13,LOW);
      }
      if(var== '1'){
        digitalWrite(13,HIGH);
      }
  }
}

五、学会说

使用pyttsx3实现了文字转语音，让智能终端学会“说”

1.环境配置：

pip install pyttsx3

2.代码实现

import pyttsx3
 
def speakout(workText):
        # 初始化语音
        engine = pyttsx3.init()  # 初始化语音库
        # 设置语速
        rate = engine.getProperty('rate')
        engine.setProperty('rate', rate - 50)
        # 输出语音
        engine.say(workText)  # 合成语音
        engine.runAndWait()
        
speakout('你好')