Automatic phoneme recognition of South African English
[摘要] ENGLISH ABSTRACT:Automatic speech recognition applications have been developed for many languages inother countries but not much research has been conducted on developing Human LanguageTechnology (HLT) for S.A. languages. Research has been performed on informallygathered speech data but until now a speech corpus that could be used to develop HLTfor S.A. languages did not exist. With the development of the African Speech TechnologySpeech Corpora, it has now become possible to develop commercial applications of HLT.The two main objectives of this work are the accurate modelling of phonemes, suitablefor the purposes of LVCSR, and the evaluation of the untried S.A. English speech corpus.Three different aspects of phoneme modelling was investigated by performing isolatedphoneme recognition on the NTIMIT speech corpus. The three aspects were signalprocessing, statistical modelling of HMM state distributions and context-dependentphoneme modelling. Research has shown that the use of phonetic context when modellingphonemes forms an integral part of most modern LVCSR systems. To facilitatethe context-dependent phoneme modelling, a method of constructing robust and accuratemodels using decision tree-based state clustering techniques is described. The strengthof this method is the ability to construct accurate models of contexts that did not occurin the training data. The method incorporates linguistic knowledge about the phoneticcontext, in conjunction with the training data, to decide which phoneme contexts aresimilar and should share model parameters.As LVCSR typically consists of continuous recognition of spoken words, the contextdependentand context-independent phoneme models that were created for the isolatedrecognition experiments are evaluated by performing continuous phoneme recognition.The phoneme recognition experiments are performed, without the aid of a grammar orlanguage model, on the S.A. English corpus. As the S.A. English corpus is newly created,no previous research exist to which the continuous recognition results can be compared to.Therefore, it was necessary to create comparable baseline results, by performing continuousphoneme recognition on the NTIMIT corpus. It was found that acceptable recognitionaccuracy was obtained on both the NTIMIT and S.A. English corpora. Furthermore, theresults on S.A. English was 2 - 6% better than the results on NTIMIT, indicating that theS.A. English corpus is of a high enough quality that it can be used for the developmentof HLT.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]