Phonene-based topic spotting on the switchboard corpus
[摘要] ENGLISH ABSTRACT:The field of topic spotting in conversational speech deals with the problem of identifyinginteresting conversations or speech extracts contained within large volumes of speechdata. Typical applications where the technology can be found include the surveillanceand screening of messages before referring to human operators. Closely related methodscan also be used for data-mining of multimedia databases, literature searches, languageidentification, call routing and message prioritisation.The first topic spotting systems used words as the most basic units. However, because of thepoor performance of speech recognisers, a large amount of topic-specific hand-transcribedtraining data is needed. It is for this reason that researchers started concentrating on methodsusing phonemes instead, because the errors then occur on smaller, and therefore lessimportant, units. Phoneme-based methods consequently make it feasible to use computergenerated transcriptions as training data.Building on word-based methods, a number of phoneme-based systems have emerged.The two most promising ones are the Euclidean Nearest Wrong Neighbours (ENWN) algorithmand the newly developed Stochastic Method for the Automatic Recognition ofTopics (SMART). Previous experiments on the Oregon Graduate Institute of Science andTechnology's Multi-Language Telephone Speech Corpus suggested that SMART yields alarge improvement over ENWN which outperformed competing phoneme-based systemsin evaluations. However, the small amount of data available for these experiments meantthat more rigorous testing was required.In this research, the algorithms were therefore re-implemented to run on the much largerSwitchboard Corpus. Subsequently, a substantial improvement of SMART over ENWNwas observed, confirming the result that was previously obtained. In addition to this,an investigation was conducted into the improvement of SMART. This resulted in a newcounting strategy with a corresponding improvement in performance.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]