基于高斯混合模型的声音事件识别方法的研究

2022-12-03 11:11:05

论文总字数：18153字

摘要

随着中国互联网的高速发展，互联网成为社会的基础设施，机器与外界的交互会越来越频繁，因此，声音识别这一技术在人们的生活中发挥着越来越重要的作用。同时声音本身作为信息载体，还有着容易采集、受限制小等优点，是良好的信息采集源。

自上世纪50年代开始，贝尔实验室就开始进行声音识别的研究。经历了60年代的基础技术积累和70年代的快速发展，语音识别技术在80年代取得了重要的突破，其中基于隐马尔科夫模型（HMM）的声学建模和基于n.gram的语言模型这两项关键技术在声音识别中得到了初步的应用。尤其是隐马尔可夫模型的应用使得声音识别实现了从简单的模板匹配方法到概率统计建模的方法的转变。而后随着各式各样的特征提取方法和分类算法的提出，声音识别技术得到了长足的发展。

但是声音事件识别技术在目前的阶段仍然存在着识别准确率不高的问题，本文通过分别提取声音事件在时域和频域的特征，同时选用高斯混合模型（GMM）声音事件进行对识别，通过提取并分析相关参数，并在一定程度上改进了方法，提升了对声音事件识别的准确率。

关键词：声音事件识别；高斯混合模型；特征提取；

Abstract

With the rapid development of the Internet in China, the Internet has become the social infrastructure, and the interaction between machines and the outside world will be more and more frequent. Therefore, voice recognition technology plays an increasingly important role in people's life. At the same time, sound itself as the information carrier, has the advantages of easy collection and small limitations, and is a good source of information collection.

Since the 1950s, Bell Labs began to study voice recognition. After the accumulation of basic technology in the 1960s and the rapid development in the 1970s, speech recognition technology has made an important breakthrough in the 1980s. Two key technologies, acoustic modeling based on Hidden Markov model (HMM) and language modeling based on n.gram, have been applied in speech recognition. In particular, the application of HMM makes speech recognition shift from simple template matching method to probabilistic modeling method. Then, with the development of various feature extraction methods and classification algorithms, voice recognition technology has made great progress.

However, there is still a problem of low recognition accuracy in the current stage of sound event recognition technology. In this paper, the features of sound events in time domain and frequency domain are extracted respectively, and the Gaussian mixture model (GMM) sound events are selected for recognition. By extracting and analyzing the relevant parameters, the method is improved to a certain extent, The accuracy of sound event recognition is improved.

Keywords: Sound Event Recognition；GMM；Feature Extraction；

目录

摘要 I

Abstract II

第一章绪论 1

1.1 研究背景 1

1.2 声音识别的基本概念 1

1.3 声音识别技术的研究进展 2

1.3.1 声音识别技术的发展 2

1.3.2 声音识别技术的研究现状 2

1.4 本文的主要工作 3

第二章声音识别的基本原理 4

2.1 声音的数字化处理 4

2.1.1 声音信号的采样 4

2.1.2 声音信号的量化 4

2.1.3 声音信号的编码 4

2.2 声音信号的预处理 5

2.3 声音信号的特征提取与模式识别 5

2.4 本章小结 6

第三章声音信号的特征提取 7

3.1 声音信号的时域特征 7

3.1.1 短时能量特征 7

3.1.2 基音频率特征 8

3.1.3 短时平均过零率 8

3.2 声音信号的频域特征 8

3.2.1 LPCC倒谱特征 8

3.2.2 Mel频率倒谱特征（MFCC）^[17,18] 9

3.3 本章小结 10

第四章基于GMM的声音识别技术 11

4.1 高斯混合模型的基本概念 11

4.2 基于高斯混合模型的声音识别系统 12

4.2.1 GMM的参数估计 12

4.2.2 GMM的原理 13

4.3 本章小结 14

第五章基于GMM的声音识别实验 15

5.1 实验条件 15

5.1.1 软硬件条件 15

5.1.2 实验音频数据库 15

5.1.3 相关参数以及阈值的设定 15

5.2 实验结果与分析 16

5.2.1 噪音、语音和音乐的分类 16

5.2.2 特征维数对分类正确率的影响 17

5.3 本章小结 17

第六章总结与展望 18

6.1全文总结 18

6.2 研究展望 18

致谢 20

参考文献 21

第一章绪论

1.1 研究背景

声音是信息最重要的载体之一，也是人类最为原始普遍和直接的交流方式。人与外界交互的大部分信息均是通过声音进行传播的。同时相比于图片，视频等信息载体，音频作为信息载体有着体积小，易于采集和进行处理等优点。因此，人们一直热衷于对声音进行探索以及研究，而随着数字化与信息化的发展，声音作为重要的信息载体自然也被更加重视了起来。其中最受关注的技术即是声音识别技术，其作用就是将声音信号中所蕴含的信息进行分类和处理，供后续系统进行处理。

声音识别技术从1950年代就已经展开，位于美国的贝尔实验室完成了第一个语音识别系统，尽管其只能识别十个英文数字，但依旧属于历史性的进展。在1960年代，计算机技术在世界范围内蓬勃发展，该技术的应用使得大量需要计算的技术得以实现，语音识别同样受益于此。在1970年代，线性预测技术得到了发展，与此同时，动态时间规整技术也开始逐渐实用化。VQ和HMM理论也促进了基于LPCC和DTW的语音识别系统的实现，在当时可以实现对特定人的孤立语音识别

剩余内容已隐藏，请支付后下载全文，论文总字数：18153字

您需要先支付 80元 才能查看全部内容！立即支付

注册

找回密码