Speech recognition of South African English accents

[摘要] ENGLISH ABSTRACT:Several accents of English are spoken in South Africa. Automatic speech recognition (ASR) systemsshould therefore be able to process the di erent accents of South African English (SAE).In South Africa, however, system development is hampered by the limited availability of speechresources. In this thesis we consider di erent acoustic modelling approaches and system con gurationsin order to determine which strategies take best advantage of a limited corpus of theveaccents of SAE for the purpose of ASR. Three acoustic modelling approaches are considered:(i) accent-speci c modelling, in which accents are modelled separately; (ii) accent-independentmodelling, in which acoustic training data is pooled across accents; and (iii) multi-accent modelling,which allows selective data sharing between accents. For the latter approach, selectivesharing is enabled by extending the decision-tree state clustering process normally used to constructtied-state hidden Markov models (HMMs) by allowing accent-based questions.In arst set of experiments, we investigate phone and word recognition performance achievedby the three modelling approaches in a con guration where the accent of each test utterance isassumed to be known. Each utterance is therefore presented only to the matching model set.We show that, in terms of best recognition performance, the decision of whether to separateor to pool training data depends on the particular accents in question. Multi-accent acousticmodelling, however, allows this decision to be made automatically in a data-driven manner.When modelling theve accents of SAE, multi-accent models yield a statistically signi cantimprovement of 1.25% absolute in word recognition accuracy over accent-speci c and accentindependentmodels.In a second set of experiments, we consider the practical scenario where the accent of each testutterance is assumed to be unknown. Each utterance is presented simultaneously to a bankof recognisers, one for each accent, running in parallel. In this setup, accent identi cation isperformed implicitly during the speech recognition process. A system employing multi-accentacoustic models in this parallel con guration is shown to achieve slightly improved performancerelative to the con guration in which the accents are known. This demonstrates that accentidenti cation errors made during the parallel recognition process do not a ect recognition performance.Furthermore, the parallel approach is also shown to outperform an accent-independentsystem obtained by pooling acoustic and language model training data.In anal set of experiments, we consider the unsupervised reclassi cation of training set accentlabels. Accent labels are assigned by human annotators based on a speaker's mother-tongue orethnicity. These might not be optimal for modelling purposes. By classifying the accent of eachutterance in the training set by usingrst-pass acoustic models and then retraining the models,reclassi ed acoustic models are obtained. We show that the proposed relabelling procedure doesnot lead to any improvements and that training on the originally labelled data remains the bestapproach.

[发布日期] [发布机构] Stellenbosch University

[效力级别] [学科分类]

[关键词] [时效性]

浏览次数：4

统一登录查看全文激活码登录查看全文