Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems

Institution:	1. Department of Electronics and Communication Engineering, Visvesvaraya National Institute of Technology, South Ambazari Road, Nagpur 40010, India;2. Department of Electronics and Communication Engineering, National Institute of Technology Campus Warangal, Telangana 506004, India;3. Department of Instrumentation and Applied Physics, Indian Institute of Science, C V Raman Ave, Bengaluru 560012, India;1. Department of EIE, Dr. Mahalingam College of Engineering and Technology, Pollachi, Coimbatore, India;2. Department of EEE, Dr. Mahalingam College of Engineering and Technology, Pollachi, Coimbatore, India;1. Department of Mathematics, Shaanxi University of Science & Technology, Xi’an 710021, China;2. Department of Mathematics, Shanghai Maritime University, Shanghai 201306, China;3. Department of Mathematics, University of New Mexico, Gallup, NM 87301, USA;4. Department of Mathematics, Obafemi Awolowo University, Ile Ife 220005, Nigeria;1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China;2. National Institute of Telecommunications (Inatel), Santa Rita do Sapucaí, MG, Brazil;3. Instituto de Telecommunicações, Portugal;4. University of Fortaleza (UNIFOR), Fortaleza, CE, Brazil;5. School of Computer Science and Engineering, Beihang University, Beijing 100191, China;1. Department of Information Systems, Faculty of Commerce & Business Administration, Helwan University, Cairo, Egypt;2. Department of Computer Science, Faculty of Computers and Informatics, Sharqiyah, Cairo, Egypt

Abstract:	The development in the interface of smart devices has lead to voice interactive systems. An additional step in this direction is to enable the devices to recognize the speaker. But this is a challenging task because the interaction involves short duration speech utterances. The traditional Gaussian mixture models (GMM) based systems have achieved satisfactory results for speaker recognition only when the speech lengths are sufficiently long. The current state-of-the-art method utilizes i-vector based approach using a GMM based universal background model (GMM-UBM). It prepares an i-vector speaker model from a speaker’s enrollment data and uses it to recognize any new test speech. In this work, we propose a multi-model i-vector system for short speech lengths. We use an open database THUYG-20 for the analysis and development of short speech speaker verification and identification system. By using an optimum set of mel-frequency cepstrum coefficients (MFCC) based features we are able to achieve an equal error rate (EER) of 3.21% as compared to the previous benchmark score of EER 4.01% on the THUYG-20 database. Experiments are conducted for speech lengths as short as 0.25 s and the results are presented. The proposed method shows improvement as compared to the current i-vector based approach for shorter speech lengths. We are able to achieve improvement of around 28% even for 0.25 s speech samples. We also prepared and tested the proposed approach on our own database with 2500 speech recordings in English language consisting of actual short speech commands used in any voice interactive system.

Keywords:	Gaussian mixture models i-Vectors Mel-frequency cepstrum coefficients Speaker verification Speaker identification Short speech Voice interactive systems
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏