Unity 工具之 Azure 微软连续语音识别ASR的简单整理

1年前作者：仙魁XAN分类：Toy博客阅读(37)违法举报

这篇具有很好参考价值的文章主要介绍了Unity 工具之 Azure 微软连续语音识别ASR的简单整理。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

Unity 工具之 Azure 微软连续语音识别ASR的简单整理

目录

Unity 工具之 Azure 微软连续语音识别ASR的简单整理

一、简单介绍

二、实现原理

三、注意实现

四、实现步骤

五、关键脚本

一、简单介绍

Unity 工具类，自己整理的一些游戏开发可能用到的模块，单独独立使用，方便游戏开发。

本节介绍，这里在使用微软的Azure 进行语音合成的两个方法的做简单整理，这里简单说明，如果你有更好的方法，欢迎留言交流。

官网注册：

面向学生的 Azure - 免费帐户额度 | Microsoft Azure

官网技术文档网址：

技术文档 | Microsoft Learn

官网的TTS：

语音转文本快速入门 - 语音服务 - Azure AI services | Microsoft Learn

Azure Unity SDK 包官网：

安装语音 SDK - Azure Cognitive Services | Microsoft Learn

SDK具体链接：

https://aka.ms/csspeech/unitypackage
Unity 工具之 Azure 微软连续语音识别ASR的简单整理,Unity,microsoft,azure,语音识别,ASR,连续语音识别

二、实现原理

1、官网申请得到语音识别对应的 SPEECH_KEY 和 SPEECH_REGION

2、因为语音识别需要用到麦克风，移动端需要申请麦克风权限

3、开启语音识别，监听语音识别对应事件，即可获取到识别结果

三、注意实现

1、注意如果有卡顿什么的，注意主子线程切换，可能可以适当解决你的卡顿现象

2、注意电脑端（例如windows）运行可以不申请麦克风权限，但是移动端（例如Android）运行要申请麦克风权限，不然无法开启识别成功，可能会报错：Exception with an error code: 0x15

<uses-permission android:name="android.permission.RECORD_AUDIO"/>

System.ApplicationException: Exception with an error code: 0x15 at Microsoft.CognitiveServices.Speech.Internal.SpxExceptionThrower.ThrowIfFail (System.IntPtr hr) [0x00000] in <00000000000000000000000000000000>:0 at Microsoft.CognitiveServices.Speech.Recognizer.StartContinuousRecognition () [0x00000] in <00000000000000000000000000000000>:0 at Microsoft.CognitiveServices.Speech.Recognizer.DoAsyncRecognitionAction (System.Action recoImplAction) [0x00000] in <00000000000000000000000000000000>:0 at System.Threading.Tasks.Task.Execute () [0x00000] in <00000000000000000000000000000000>:0 at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <00000000000000000000000000000000>:0 at System.Threading.Tasks.Task.ExecuteWithThreadLocal (System.Threading.Tasks.Task& currentTaskSlot) [0x00000] in

四、实现步骤

1、下载好SDK 导入

Unity 工具之 Azure 微软连续语音识别ASR的简单整理,Unity,microsoft,azure,语音识别,ASR,连续语音识别

2、简单的搭建场景

Unity 工具之 Azure 微软连续语音识别ASR的简单整理,Unity,microsoft,azure,语音识别,ASR,连续语音识别

3、编写对应脚本，测试语音识别功能

Unity 工具之 Azure 微软连续语音识别ASR的简单整理,Unity,microsoft,azure,语音识别,ASR,连续语音识别

Unity 工具之 Azure 微软连续语音识别ASR的简单整理,Unity,microsoft,azure,语音识别,ASR,连续语音识别

4、把测试脚本添加到场景中，并赋值

Unity 工具之 Azure 微软连续语音识别ASR的简单整理,Unity,microsoft,azure,语音识别,ASR,连续语音识别

5、如果移动端，例如 Android 端，勾选如下，添加麦克风权限

<uses-permission android:name="android.permission.RECORD_AUDIO"/>

Unity 工具之 Azure 微软连续语音识别ASR的简单整理,Unity,microsoft,azure,语音识别,ASR,连续语音识别

Unity 工具之 Azure 微软连续语音识别ASR的简单整理,Unity,microsoft,azure,语音识别,ASR,连续语音识别

5、运行，点击对应按钮，开始识别，Console 中可以看到识别结果

Unity 工具之 Azure 微软连续语音识别ASR的简单整理,Unity,microsoft,azure,语音识别,ASR,连续语音识别

五、关键脚本

1、TestSpeechRecognitionHandler


using UnityEngine;
using UnityEngine.Android;
using UnityEngine.UI;

public class TestSpeechRecognitionHandler : MonoBehaviour
{
    #region Data
    
    /// <summary>
    /// 按钮，文本
    /// </summary>
    public Button QuitButton;
    public Button ASRButton;
    public Button StopASRButton;
    public Text ASRText;

    /// <summary>
    /// m_SpeechAndKeywordRecognitionHandler
    /// </summary>
    SpeechRecognitionHandler m_SpeechAndKeywordRecognitionHandler;
    #endregion
    #region Liefecycle function

    /// <summary>
    /// Start
    /// </summary>
    void Start()
    {
        QuitButton.onClick.AddListener(OnClickQuitButton);
        ASRButton.onClick.AddListener(OnClickASRButton);
        StopASRButton.onClick.AddListener(OnClickStopASRButton);

        // 请求麦克风权限
        RequestMicrophonePermission();
    }

    /// <summary>
    /// 应用退出
    /// </summary>
    async void OnApplicationQuit() {
        await m_SpeechAndKeywordRecognitionHandler.StopContinuousRecognizer();
    }

    #endregion

    #region Private function
    /// <summary>
    /// RequestMicrophonePermission
    /// </summary>
    void RequestMicrophonePermission()
    {
        // 检查当前平台是否为 Android
        if (Application.platform == RuntimePlatform.Android)
        {
            // 检查是否已经授予麦克风权限
            if (!Permission.HasUserAuthorizedPermission(Permission.Microphone))
            {
                // 如果没有权限，请求用户授权
                Permission.RequestUserPermission(Permission.Microphone);
            }
        }
        else
        {
            // 在其他平台上，可以执行其他平台特定的逻辑
            Debug.LogWarning("Microphone permission is not needed on this platform.");
        }

        SpeechInitialized();
    }

    /// <summary>
    /// SpeechInitialized
    /// </summary>
    private void SpeechInitialized() {
        ASRText.text = "";
        m_SpeechAndKeywordRecognitionHandler = new SpeechRecognitionHandler();
        m_SpeechAndKeywordRecognitionHandler.onRecognizingAction = (str) => { Debug.Log("onRecognizingAction: " + str); };
        m_SpeechAndKeywordRecognitionHandler.onRecognizedSpeechAction = (str) => { Loom.QueueOnMainThread(() => ASRText.text += str);  Debug.Log("onRecognizedSpeechAction: " + str); };
        m_SpeechAndKeywordRecognitionHandler.onErrorAction = (str) => { Debug.Log("onErrorAction: " + str); };
        m_SpeechAndKeywordRecognitionHandler.Initialized();
    }
    /// <summary>
    /// OnClickQuitButton
    /// </summary>
    private void OnClickQuitButton() {
#if UNITY_EDITOR
        UnityEditor.EditorApplication.isPlaying = false;

#else
                Application.Quit();
#endif 
    }
    /// <summary>
    /// OnClickASRButton
    /// </summary>
    private void OnClickASRButton() {
        m_SpeechAndKeywordRecognitionHandler.StartContinuousRecognizer();
    }

    /// <summary>
    /// OnClickStopASRButton
    /// </summary>
    private async void OnClickStopASRButton()
    {
        await m_SpeechAndKeywordRecognitionHandler.StopContinuousRecognizer();
    }

    #endregion
}

2、SpeechRecognitionHandler

using UnityEngine;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using System;
using Task = System.Threading.Tasks.Task;

/// <summary>
/// 语音识别转文本和关键词识别
/// </summary>
public class SpeechRecognitionHandler
{
    #region Data

    /// <summary>
    /// 
    /// </summary>
    const string TAG = "[SpeechAndKeywordRecognitionHandler] ";

    /// <summary>
    /// 识别配置
    /// </summary>
    private SpeechConfig m_SpeechConfig;

    /// <summary>
    /// 音频配置
    /// </summary>
    private AudioConfig m_AudioConfig;

    /// <summary>
    /// 语音识别
    /// </summary>
    private SpeechRecognizer m_SpeechRecognizer;

    /// <summary>
    /// LLM 大模型配置
    /// </summary>
    private ASRConfig m_ASRConfig;

    /// <summary>
    /// 识别的事件
    /// </summary>
    public Action<string> onRecognizingAction;
    public Action<string> onRecognizedSpeechAction;
    public Action<string> onErrorAction;
    public Action<string> onSessionStoppedAction;


    #endregion
    #region Public function

    /// <summary>
    /// 初始化
    /// </summary>
    /// <returns></returns>
    public async void Initialized()
    {
        m_ASRConfig = new ASRConfig();

        Debug.Log(TAG + "m_LLMConfig.AZURE_SPEECH_RECOGNITION_LANGUAGE " + m_ASRConfig.AZURE_SPEECH_RECOGNITION_LANGUAGE);
        Debug.Log(TAG + "m_LLMConfig.AZURE_SPEECH_REGION " + m_ASRConfig.AZURE_SPEECH_REGION);

        m_SpeechConfig = SpeechConfig.FromSubscription(m_ASRConfig.AZURE_SPEECH_KEY, m_ASRConfig.AZURE_SPEECH_REGION);
        m_SpeechConfig.SpeechRecognitionLanguage = m_ASRConfig.AZURE_SPEECH_RECOGNITION_LANGUAGE;


        m_AudioConfig = AudioConfig.FromDefaultMicrophoneInput();

        Debug.Log(TAG + " Initialized 2 ====");

        // 根据自己需要处理（不需要也行）
        await Task.Delay(100);
    }

    #endregion

    #region Private function

    /// <summary>
    /// 设置识别回调事件
    /// </summary>
    private void SetRecoginzeCallback()
    {
        Debug.Log(TAG + " SetRecoginzeCallback == ");
        if (m_SpeechRecognizer != null)
        {
            m_SpeechRecognizer.Recognizing += OnRecognizing;

            m_SpeechRecognizer.Recognized += OnRecognized;

            m_SpeechRecognizer.Canceled += OnCanceled;

            m_SpeechRecognizer.SessionStopped += OnSessionStopped;

            Debug.Log(TAG+" SetRecoginzeCallback OK ");
        }
    }

    #endregion

    #region Callback
    /// <summary>
    /// 正在识别
    /// </summary>
    /// <param name="s"></param>
    /// <param name="e"></param>
    private void OnRecognizing(object s, SpeechRecognitionEventArgs e)
    {
        Debug.Log(TAG + "RecognizingSpeech:" + e.Result.Text + " :[e.Result.Reason]:" + e.Result.Reason);
        if (e.Result.Reason == ResultReason.RecognizingSpeech )
        {
            Debug.Log(TAG + " Trigger onRecognizingAction is null :" + onRecognizingAction == null);

            onRecognizingAction?.Invoke(e.Result.Text);
        }
    }

    /// <summary>
    /// 识别结束
    /// </summary>
    /// <param name="s"></param>
    /// <param name="e"></param>
    private void OnRecognized(object s, SpeechRecognitionEventArgs e)
    {
        Debug.Log(TAG + "RecognizedSpeech:" + e.Result.Text + " :[e.Result.Reason]:" + e.Result.Reason);
        if (e.Result.Reason == ResultReason.RecognizedSpeech )
        {
            bool tmp = onRecognizedSpeechAction == null;
            Debug.Log(TAG + " Trigger onRecognizedSpeechAction is null :" + tmp);

            onRecognizedSpeechAction?.Invoke(e.Result.Text);
        }
    }

    /// <summary>
    /// 识别取消
    /// </summary>
    /// <param name="s"></param>
    /// <param name="e"></param>
    private void OnCanceled(object s, SpeechRecognitionCanceledEventArgs e)
    {
        Debug.LogFormat(TAG+"Canceled: Reason={0}", e.Reason );

        if (e.Reason == CancellationReason.Error)
        {
            onErrorAction?.Invoke(e.ErrorDetails);
        }
    }

    /// <summary>
    /// 会话结束
    /// </summary>
    /// <param name="s"></param>
    /// <param name="e"></param>
    private void OnSessionStopped(object s, SessionEventArgs e)
    {
        Debug.Log(TAG+"Session stopped event." );
        onSessionStoppedAction?.Invoke("Session stopped event.");
    }

    
    #endregion


    #region 连续语音识别转文本

    /// <summary>
    /// 开启连续语音识别转文本
    /// </summary>
    public void StartContinuousRecognizer()
    {
        Debug.LogWarning(TAG + "StartContinuousRecognizer");

        try
        {
            // 转到异步中（根据自己需要处理）
            Loom.RunAsync(async () => {
                try
                {
                    if (m_SpeechRecognizer != null)
                    {
                        m_SpeechRecognizer.Dispose();
                        m_SpeechRecognizer = null;
                    }

                    if (m_SpeechRecognizer == null)
                    {
                        m_SpeechRecognizer = new SpeechRecognizer(m_SpeechConfig, m_AudioConfig);

                        SetRecoginzeCallback();
                    }

                    await m_SpeechRecognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

                    Loom.QueueOnMainThread(() => {
                        Debug.LogWarning(TAG + "StartContinuousRecognizer QueueOnMainThread ok");
                    });
                    Debug.LogWarning(TAG + "StartContinuousRecognizer RunAsync ok");
                }
                catch (Exception e)
                {
                    Loom.QueueOnMainThread(() =>
                    {
                        Debug.LogError(TAG + " StartContinuousRecognizer 0 " + e);
                    });
                }

            });
        }
        catch (Exception e)
        {
            Debug.LogError(TAG + " StartContinuousRecognizer 1 " + e);
        }

    }

    /// <summary>
    /// 结束连续语音识别转文本
    /// </summary>
    public async Task StopContinuousRecognizer()
    {
        try
        {
            if (m_SpeechRecognizer != null)
            {
                await m_SpeechRecognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
                //m_SpeechRecognizer.Dispose();
                //m_SpeechRecognizer = null;
                Debug.LogWarning(TAG + " StopContinuousRecognizer");
            }
        }
        catch (Exception e)
        {
            Debug.LogError(TAG + " StopContinuousRecognizer Exception : " + e);
        }
    }

    #endregion

}

3、ASRConfig文章来源地址https://www.toymoban.com/news/detail-790103.html



public class ASRConfig 
{
    #region Azure ASR

    /// <summary>
    /// AZURE_SPEECH_KEY
    /// </summary>
    public virtual string AZURE_SPEECH_KEY { get; } = @"You_Key";
    /// <summary>
    /// AZURE_SPEECH_REGION
    /// </summary>
    public virtual string AZURE_SPEECH_REGION { get; } = @"eastasia";
    /// <summary>
    /// AZURE_SPEECH_RECOGNITION_LANGUAGE
    /// </summary>
    public virtual string AZURE_SPEECH_RECOGNITION_LANGUAGE { get; } = @"zh-CN";

    #endregion
}

到了这里，关于Unity 工具之 Azure 微软连续语音识别ASR的简单整理的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：如若内容造成侵权/违法违规/事实不符，请点击违法举报进行投诉反馈，一经查实，立即删除！

分享到：

领支付宝红包赞助服务器费用

开源(离线)中文语音识别ASR(语音转文本)工具整理
开源(离线)中文语音识别ASR(语音转文本)工具整理 Open AI在2022年9月21日开源了号称其英文语音辨识能力已达到人类水准的Whisper神经网络，且它亦支持其它98种语言的自动语音辨识。 Whisper系统所提供的自动语音辨识（Automatic Speech Recognition，ASR）模型是被训练来运行语音辨识与翻
2024年02月13日
浏览(35)
音频深度学习变得简单：自动语音识别（ASR），它是如何工作的
在过去的几年里，随着Google Home，Amazon Echo，Siri，Cortana等的普及，语音助手已经无处不在。这些是自动语音识别（ASR）最著名的示例。此类应用程序从某种语言的语音音频剪辑开始，并将说出的单词提取为文本。因此，它们也称为语音转文本算法。
2024年02月15日
浏览(7)
azure微软文字转语音工具AzureTools使用
文章目录前言一、 AzureTools 是什么？二、使用步骤 1.安装 2.使用总结最近事情不多，作为开发人员总是停不下来，就想写点啥，研究点啥。当下短视频比较火所以研究了下视频剪辑，发现很多人在用微软的Azure文字转语音功能实现配音，但是使用这个有一个弊端就是转完后
2024年02月04日
浏览(7)
azure微软文字转语音小工具V1.3.2（ai智能配音，目前最好用）的使用说明和下载
概括： azure微软文字转语音小工具是调用azure服务器的程序，可能是目前最好用的配音软件。其配音效果几乎与真人没有区别（非常仔细听可能有些许区别）。拥有12种热门配音声音，4男8女，可调整多种情绪，语音速度，音调自定义，支持几十个国家语言合成，微软接口，目
2024年02月05日
浏览(8)
ASR项目实战-语音识别
本文深入探讨语音识别处理环节。本阶段的重点特性为语音识别、VAD、热词、文本的时间偏移、讲话人的识别等。业界流派众多，比如Kaldi、端到端等，具体选择哪一种，需要综合考虑人员能力、训练数据量和质量、硬件设施、交付周期等，作出相对合理的交付规划。基于
2024年02月04日
浏览(8)
语音识别 - ASR whisper
目录 1. 简单介绍 2. 代码调用 Introducing Whisper https://openai.com/blog/whisper/ OpenAI 的开源自动语音识别神经网络 whisper 安装 Python 调用
2024年02月12日
浏览(7)
ASR 语音识别接口封装和分析
这个文档主要是介绍一下我自己封装了 6 家厂商的短语音识别和实时流语音识别接口的一个包，以及对这些接口的一个对比。分别是，阿里，快商通，百度，腾讯，科大，字节。 zxmfke/asrfactory (github.com) 之前刚好在测试各家的语音识别相关功能，但是每家的返回值都不同，
2024年02月13日
浏览(9)
Python使用PaddleSpeech实现语音识别（ASR）、语音合成（TTS）
目录安装语音识别补全标点语音合成参考 PaddleSpeech是百度飞桨开发的语音工具注意，PaddleSpeech不支持过高版本的Python，因为在高版本的Python中，飞桨不再提供paddle.fluid API。这里面我用的是Python3.7 需要通过3个pip命令安装PaddleSpeech：在使用的时候，urllib3库可能会报错，因
2024年04月25日
浏览(11)
Python使用whisper实现语音识别（ASR）
目录 Whisper的安装 Whisper的基本使用识别结果转简体中文断句 Whisper是OpenAI的一个强大的语音识别库，支持离线的语音识别。在使用之前，需要先安装它的库：使用whisper，还需安装setuptools-rust：但是，whisper安装时，自带的pytorch可能有些bug，因此需要卸载重装：卸载：重装
2024年03月20日
浏览(34)
两分钟克隆你的声音，支持替换电影和视频里面的声音，免费使用支持docker一键部署，集成工具包括声音伴奏分离、自动训练集分割、中文自动语音识别(ASR)和文本标注
两分钟克隆你的声音，支持替换电影和视频里面的声音，免费使用支持docker一键部署，集成工具包括声音伴奏分离、自动训练集分割、中文自动语音识别(ASR)和文本标注。查看我们的介绍视频 demo video 中国地区用户可使用 AutoDL 云端镜像进行体验：https://www.codewithgpu.com/i/RVC-
2024年02月20日
浏览(5)