代码收藏家技术教程 2024-07-05

Python 实现离线语音转文本功能

Python 中的离线语音转文本

一、说明

写作、编码、写博客、办公室工作、文档、报告都需要一个人在键盘上打字。这会导致健康问题，如腕管综合症、手和手指疼痛等。我非常了解这种痛苦。这是用于创建自己的离线运行的听写程序的 Python 代码。只需对着耳机的麦克风说话，它就会将您的话转换为文本并将其保存在文本文件中。

二、安装

您将需要安装 Python 库 — vosk、pyaudio。

Vosk 是一个语音识别工具包，它提供用于准确语音识别和说话人识别的流式 API。它支持 20+ 种语言和方言——英语、印度英语、德语、法语、西班牙语、葡萄牙语、中文、俄语、土耳其语、越南语、意大利语、荷兰语、加泰罗尼亚语、阿拉伯语、希腊语、波斯语、菲律宾语、乌克兰语、哈萨克语、瑞典语、日语、世界语、印地语、捷克语、波兰语、乌兹别克语、韩语、布列塔尼语、古吉拉特语。更多内容即将推出。它可以离线工作。您可以在轻量级设备（Raspberry Pi）以及Android和iOS的手机上使用它。它提供了多种语言模型，大小从 40MB 到 16BG 不等。大多数小型模型都允许动态词汇重新配置。大模型是静态的，词汇表不能在运行时修改。在此处查看所有可与 vosk 一起使用的型号。

如果要离线使用应用程序，请将相应模型的 zip 文件下载到您的计算机。解压缩文件，程序使用这个解压缩的文件夹来创建模型并生成文本。否则，如果您想在线工作，您可以在 vock 中提及语言。Model（）和相应的模型在运行时下载。（下面的代码进一步演示了这一点。

您可以在此处找到许多使用 vosk 的示例。

PyAudio 为 PortAudio v19（跨平台音频 I/O 库）提供 Python 绑定。使用 PyAudio，您可以轻松地使用 Python 在各种平台上播放和录制音频，例如 GNU/Linux、Microsoft Windows 和 Apple macOS。PyAudio 在 MIT 许可证下分发。

我已经在我的 Windows 11 PC 上的 Jupyter Notebook 中在 Anaconda Python 3.12 中运行了这段代码。

步骤 1：安装库

pip install vosk
pip install pyaudio

三、初始化模型

步骤 2：初始化模型

import vosk
import pyaudio
import json


# Here I have downloaded this model to my PC, extracted the files 
# and saved it in local directory
# Set the model path
model_path = "vosk-model-en-us-0.42-gigaspeech"
# Initialize the model with model-path
model = vosk.Model(model_path)
#if you don't want to download the model, just mention "lang" argument 
#in vosk.Model() and it will download the right  model, here the language is 
#US-English
#model = vosk.Model(lang="en-us")

四、创建语音采样和语言识别器

步骤 3：创建语音识别器
在这里，我们创建一个采样率为 16000 Hz 的语音识别器。

# Create a recognizer
rec = vosk.KaldiRecognizer(model, 16000)

第 4 步：打开麦克风系统
在这里，我们打开一个流来捕获来自麦克风的音频，我们指定了格式、通道数、采样率和每个缓冲区的帧数。

# Open the microphone stream
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
                channels=1,
                rate=16000,
                input=True,
                frames_per_buffer=8192)

五、文本文件输出

步骤 5：指定输出文本文件的路径

# Specify the path for the output text file
output_file_path = "recognized_text.txt"

第 6 步：在无限循环中收听麦克风并将识别的文本写入文本文件。当用户说“终止”时停止。

# Open a text file in write mode using a 'with' block
with open(output_file_path, "w") as output_file:
    print("Listening for speech. Say 'Terminate' to stop.")
    # Start streaming and recognize speech
    while True:
        data = stream.read(4096)#read in chunks of 4096 bytes
        if rec.AcceptWaveform(data):#accept waveform of input voice
            # Parse the JSON result and get the recognized text
            result = json.loads(rec.Result())
            recognized_text = result['text']
            
            # Write recognized text to the file
            output_file.write(recognized_text + "\n")
            print(recognized_text)
            
            # Check for the termination keyword
            if "terminate" in recognized_text.lower():
                print("Termination keyword detected. Stopping...")
                break

步骤 7：关闭流并终止 PyAudio 对象

# Stop and close the stream
stream.stop_stream()
stream.close()

# Terminate the PyAudio object
p.terminate()

给你。尽情享受吧，免提工作！！

作者：无水先生

Python python

物联沃分享整理
物联沃-IOTWORD物联网 » Python 实现离线语音转文本功能

代码收藏家普通

分享到：

一、说明

二、安装

三、初始化模型

四、创建语音采样和语言识别器

五、文本文件输出

代码收藏家 普通

相关推荐

发表回复 取消回复

代码收藏家普通

发表回复取消回复