Language modelling for code-switched automatic speech recognition in five South African languages

[摘要] ENGLISH ABSTRACT: Code-switching refers to natural, spontaneous language alternation by multilingualspeakers during a conversation or utterance, and is prevalent in everydayconversations by multilingual South Africans. Automatic speech recognitionsystems are generally highly optimised for monolingual input and performancedeteriorates when presented with mixed-language speech. This thesisaddresses the automatic recognition of speech containing code-switchingbetween English and four South African Bantu languages, focussing specificallyon the language modelling of English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. Due to the severe scarcity of code-switchedspeech data in South African languages, it was necessary to first develop a representativecorpus. This new and unique 35-hour corpus contains segmentedand transcribed code-switched speech from conversations in South African soapoperas, which exhibit spontaneous utterances with regular code-switching inthe target languages. Insertional, alternational, and intraword intrasententialcode-switching are all represented in the data, as are some other specialcharacteristics of fast, spontaneous Bantu speech such as postlexical deletion.The distribution of language switches is extremely sparse, however. In thisthesis, a number of data-driven modelling approaches were investigated andapplied to address the sparsity by augmenting the training data with syntheticallygenerated data. Postlexical deletion was successfully modelled statisticallywith joint-sequence models, and these models were used to generatesynthetic pronunciations which were demonstrated to lead to improved automaticspeech recognition performance. Two new code-switched languagemodelling approaches were proposed to address data sparsity. First, parallellanguage-dependent language modelling (PLDLM), which consists of twomonolingual language models with explicit language transitions, was demonstratedto outperform a conventional language-independent language modelin terms of recognition word error rate. Second, language models in whichword embeddings were used to synthesise probable unseen code-switched bigramswere considered. It was possible to achieve a reduction of up to 31%in language model perplexity across a language switch boundary by includingsuch synthesised code-switch bigrams. Although smaller, improvements in the recognition word error rate were also observed.

[发布日期] [发布机构] Stellenbosch University

[效力级别] [学科分类]

[关键词] [时效性]

浏览次数：11

统一登录查看全文激活码登录查看全文