大致思路:
(1)使用moviepy库中的VideoFileClip类读取视频文件,并将其转换为音频文件;
(2)使用pydub库中的splitonsilence函数将音频文件分割成多个音频片段,以便进行语音识别;
(3)使用SpeechRecognition库中的Recognizer类进行语音识别,并将识别结果写入文本文件中
注意:
将音频文件分割成多个音频片段,并对每个片段进行语音识别,最终将所有识别结果合并成一个文本文件。
运行环境简述:
(1)Mac OS 13.3.1
(2)pycharm 2021.1
整体代码:
from moviepy.editor import VideoFileClip
from pathlib import Path
import os
import speech_recognition as sr
from pydub import AudioSegment
import datetime
from pydub.silence import split_on_silence
from tqdm import tqdm
# 视频文件夹路径
video_folder = './folder'
# 初始化语音识别器
r = sr.Recognizer()
# 遍历视频文件夹中的所有视频文件
for video_file in tqdm(Path(video_folder).rglob('*.mp4')):
# 提取视频文件名和扩展名
file_name = video_file.stem
print(f'Processing video file: {file_name}')
# 构建视频文件路径和音频文件路径
audio_file = f'{file_name}.wav'
video_clip = VideoFileClip(str(video_file))
video_clip.audio.write_audiofile(audio_file)
def get_large_audio_transcription(path):
sound = AudioSegment.from_wav(path)
chunks = split_on_silence(sound, min_silence_len=500, silence_thresh=sound.dBFS - 14, keep_silence=500, )
folder_name = "audio-chunks"
# 要创建一个目录来存储音频块
if not os.path.isdir(folder_name):
os.mkdir(folder_name)
whole_text = []
time_lines = []
# 处理每一个音频模块
start_time = datetime.datetime.fromisoformat('2022-01-01T00:00:00')
for i, audio_chunk in enumerate(chunks, start=1):
# 导出音频,并保存
chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
audio_chunk.export(chunk_filename, format="wav")
with sr.AudioFile(chunk_filename) as source:
audio_listened = r.record(source)
text = ""
try:
text = r.recognize_google(audio_listened, language="zh-CN", show_all=True)
if text and len(text['alternative']) > 0:
text = text['alternative'][0]['transcript']
else:
text = ""
except sr.UnknownValueError as e:
# print("Error:", str(e))
pass
else:
if text:
text = f"{text.capitalize()}."
# print(start_time.time(), ":", text)
whole_text.append(text)
time_lines.append(start_time)
duration = audio_chunk.duration_seconds
start_time += datetime.timedelta(seconds=duration)
# return the text for all chunks detected
return whole_text, time_lines
# 使用SpeechRecognition库进行语音识别
with open(f'{file_name}.txt', 'w', encoding='utf-8') as f:
for text, time in tqdm(zip(*get_large_audio_transcription(audio_file)),
total=len(list(get_large_audio_transcription(audio_file)))):
f.write(f'{time.time()} {text}\n')
print('All done!')
运行结果:
文章来源:https://www.toymoban.com/news/detail-743214.html
文章来源地址https://www.toymoban.com/news/detail-743214.html
到了这里,关于通过python如何实现视频提取音频,并将音频转文本的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!