基于深度学习的语音识别技术的研究与应用

2022-12-03 11:11:16

论文总字数：19826字

摘要

近十年，语音识别技术作为一门包含了声学、计算机学、概率学等等学科的技术，它逐步成为人类与计算机对话的关键途径。如今的语音识别技术已经达到了实用性、快捷性，从感应器到神经网络，各种软件硬件的提升都不断在更新加强语音识别的能力。

本文主要针对语音识别方面进行研究实现并加以应用。研究采用神经网络和连接性时序算法对音频进行处理，从而应用于视频等。主要有以下研究内容及成果：

调研现有工具选择合适用于搭建语音识别神经网络的编程环境与编写人工智能的程序语言，研究应用数据集与连接性时序算法的设计。
通过卷积神经网络和连接性时序算法实现语音识别基础功能，实现语音与文本的双向转化。语音转文本的功能大部分基于录音功能，文本转语音的功能则直接基于声学模型。利用语音识别功能对视频进行字幕加工处理，能够识别视频的音频并转化为文本形式。
凭借TensorFlow框架创建卷积神经网络，利用神经网络所实现的语音识别功能加以PyQt5框架实现应用的用户界面，完成应用的功能测试。

论文展示了由深度学习的卷积神经网络对语音识别系统进行顺利创建，凭借与之对应的系统测试，测试结果表明语音识别的精准率可达到80％。

关键词：语音识别、卷积神经网络、连接性时序算法、Tensorflow框架、PyQt5框架

Research and application of speech recognition technology based on deep learning

Abstract

In the past ten years, speech recognition technology, as a technology including acoustics, computer science, probability science and other disciplines, has gradually become a key way of human computer dialogue.Nowadays，speech recognition technology has achieved practicality and rapidity. From sensors to neural networks, all kinds of software and hardware are constantly updated to enhance the ability of speech recognition.

This paper focuses on speech recognition research, implementation and application. In this paper, neural network and connectivity timing algorithm are used to process audio, which can be applied to video and so on.The main research contents and achievements are as follows：

Investigate the existing tools, select the suitable programming environment for building speech recognition neural network and the programming language for writing artificial intelligence, and study the design of application data set and connectivity timing algorithm.
Convolution neural network and connectivity time sequence algorithm are used to realize the basic function of speech recognition and realize the Bidirectional Transformation of speech and text. The function of voice to text is mostly based on recording function, while the function of text to speech is based on acoustic model. Using speech recognition function to process video subtitles, video and audio can be recognized and converted into text form.
The convolution neural network is realized by using TensorFlow framework, and the speech recognition function realized by neural network is used to implement the user interface of PyQt5 framework, and the function test of the application is completed.

The paper shows the speech recognition system which is built by convolutional neural network of deep learning. The accuracy of speech recognition obtained by the corresponding system test is 80%.

Key words: speech recognition, convolutional neural network, connectivity timing algorithm, TensorFlow framework, PyQt5 framework

图目录

图 2.2 特征提取流程 4

图 2.3 卷积神经网络示例图 6

图 2.4 cat序列 7

图 2.5 hello序列 7

图 3.1 环境设计流程图 9

图 3.2 音频波纹 9

图 3.3 激活虚拟环境 10

图 3.4 添加类库与包 10

图 3.5 PyCharm配置Anaconda 11

图 3.6 训练集内容 11

图 3.7 文本内容 12

图 3.8 发音词典 12

图 3.9 特征提取结构图 13

图 3.10 算法流程图 14

图 3.11 训练模型代码 15

图 3.12 识别音频文件代码 16

图 4.1 系统主要功能 17

图 4.2 系统用例图 18

图 4.3 PyQt5工具 20

图 4.4 语音转文字界面 20

图 4.5 语音识别实例图 21

图 4.6 文字转语音界面 21

图 4.7 文字播放实例图 22

图 4.8 视频识别界面 22

图 4.9 视频识别实例图 23

绪论

研究目的和意义

人类从石器时代开始再到蒸汽时代最后来到现在的信息时代，每个时代的工具在不断变化，从石制品到铁制品到现在的智能工具，随着这些工具的不断演变不断进化，人类更加懂得了工具的使用性对时代发展具有推动作用。当第一台计算机被制作出来之后，人类一直都有一个梦想，那便是让机器理解人类的语言，再把人类想要获得的信息展示给人类。伴随着现代科学技术日渐兴盛，语音识别技术也就此而产生。

语音识别（Speech Recognition）技术是凭借机器学习协助计算机把人类发出的语音信号有效转化为文本或对应的命令一项高科技技术。该技术属于一门交叉性强的学科，目前已发展为科学计数当中实现人机交接的核心技术，凭借着语音识别技术与语音合成技术的有效综合，让人们通过人类语言来对各种机器发出各种操作命令，这种应用降低了对键盘操作的依赖性，大大提高了便利性。[^[1]]

语音识别发展历史

1920年世界上最早的语音识别器被发明出来，起初的语音识别机器并不是技术工具，“Radio Rex”是一只基于语音识别机器的玩具狗，之后一百年以来，语音识别的技术不断提升。

剩余内容已隐藏，请支付后下载全文，论文总字数：19826字

您需要先支付 80元 才能查看全部内容！立即支付

注册

找回密码

基于深度学习的语音识别技术的研究与应用

Abstract

图目录

目录

绪论

研究目的和意义

语音识别发展历史

您可能感兴趣的文章

最新文档

推荐栏目

登录

注册

找回密码

基于深度学习的语音识别技术的研究与应用

Abstract

图目录

目录

绪论

研究目的和意义

语音识别发展历史

您可能感兴趣的文章

最新文档

推荐栏目