A study into automatic speaker verification with aspects of deep learning
[摘要] Advancements in automatic speaker verification (ASV) can be considered to be primarily limited to improvements in modelling and classification techniques, capable of capturing ever larger amounts of speech data. This thesis begins by presenting a fairly extensive review of developments in ASV, up to the current state-of-the-art with i-vectors and PLDA. A series of practical tuning experiments then follows. It is found somewhat surprisingly, that even the training of the total variability matrix required for i-vector extraction, is potentially susceptible to unwanted variabilities. The thesis then explores the use of deep learning in ASV. A literature review is first made, with two training methodologies appearing evident: indirectly using a deep neural network trained for automatic speech recognition, and directly with speaker related output classes. The review finds that interest in direct training appears to be increasing, underpinned with the intent to discover new robust 'speaker embedding' representations. Last a preliminary experiment is presented, investigating the use of a deep convolutional network for speaker identification. The small set of results show that the network successfully identifies two test speakers, out of 84 possible speakers enrolled. It is hoped that subsequent research might lead to new robust speaker representations or features.
[发布日期] [发布机构] University:University of Birmingham;Department:School of Engineering, Department of Electronic, Electrical and Computer Engineering
[效力级别] [学科分类]
[关键词] T Technology;TK Electrical engineering. Electronics Nuclear engineering [时效性]