voicebox基特定人的孤立词汇语音识别系统研究

本论文研究基于特定人的孤立词汇语音识别系统。主要工作是完成对语音信号的预处理（包括预加重、分帧、加窗、端点检测等），特征参数的提取、模板匹配，最终寻找最匹配项，输出匹配结果。
论文首先介绍了语音识别的基本原理、语音识别系统结构，然后详细讨论了MFCC参数的提取以及DTW（动态时间规整）算法等，最后针对特定人的孤立词汇，进行了识别实验，得到实验结果。DTW算法对硬件环境要求低，计算速度快，十分适合语音库较小情况下特定人孤立词汇的语音识别。
该识别系统采用Matlab 2010b作为开发工具，使用voicebox作为开发工具包，实现对声音文件的各种操作。并且，本系统采用了GUI（图形界面），使用更加直观。6635
关键词   特定人孤立词汇语音识别 MFCC DTW
毕业设计说明书（论文）外文摘要
Title    A Speaker-independent and Isolated-word Speech Recognition
Abstract
In this paper, we studied the speaker-independent and isolated-word speech recognition system. This system completed the task of pre-processing of speech signals （including pre-emphasis, frame blocking, windowing and end point detecting）, extracting of the parameter, template matching and finally, finding out the match and outputting the results. After this process, we conduct the statistical results in order to illustrate the performance of this system.
This paper introduces the principles of speech recognition, the construction of speech recognition system in the first place. Then, this paper discussed MFCC extraction and DTW（dynamic time warping） algorithm, etc. in detail. Finally, aiming at speaker-independent isolated words, this paper conducted a speech recognition experiment and received the results. DTW has a reputation of low hardware requirements and high computing speed, which makes it fit for speaker-independent and isolated-word speech recognition with a relatively small library.
This recognition system uses Matlab 2010b and a toolbox named voicebox as an SDK to realize voice data management. Additionally, this system has a GUI（graphical user interface） to ensure a more intuitive use.
Keywords Speaker-independent Isolated-words   Speech Recognition   MFCC   DTW
目   录

1   绪论 1
1.1   语音识别的历史背景1
1.2   历史研究、现状及发展2
1.3   本文的主要内容4
2 语音信号的特点 5
2.1短时能量6
2.2 短时平均过零率7
2.3 短时自相关函数9
3 语音识别系统框架10
3.1 语音识别系统概述 10
3.2 预加重11
3.3 分帧与加窗11
3.4 端点检测11
4 基于MFCC的语音识别12
4.1 特征参数提取-MFCC12
4.2 模式匹配-DTW15
5系统实现以及结果分析19
5.1 系统概述19
5.2 试验结果20
结论 22
致谢 23
参考文献24
附录A 部分源程序代码 25
1   绪论
语音识别技术，也被称为自动语音识别Automatic Speech Recognition，（ASR），其目标是将人类的语音中的词汇内容转换为计算机可读的输入，[1]例如按键、二进制编码或者字符序列。与说话人识别及说话人确认不同，后者尝试识别或确认发出语音的说话人而非其中所包含的词汇内容。语音识别技术的应用包括语音拨号、语音导航、室内设备控制、语音文档检索、简单的听写数据录入等。语音识别技术与其他自然语言处理技术如机器翻译及语音合成技术相结合，可以构建出更加复杂的应用，例如语音到语音的翻译。语音识别技术所涉及的领域包括：信号处理、模式识别、概率论和信息论、发声机理和听觉机理、人工智能等等。 voicebox基特定人的孤立词汇语音识别系统研究:http://www.751com.cn/tongxin/lunwen_4254.html