Bridging the Gap between Speech Production and Speech Recognition.
[摘要] Although stochastic models of speech signals (e.g. hidden Markov models, trigrams, etc) have lead to impressive improvements in speech recognition accuracy, it has been noted that these models have little relationship to speech production (Lee, 1989) and their recognition performance on some important tasks is far from perfect. However, there have been recent attempts to bridge the gap between speech production and speech recognition using models that are stochastic and yet make more reasonable assumptions about the mechanisms underlying speech production (Bakis, 1991; Deng, 1998; Hogden, 1998; Picone et al., 1999). One of theses models, Multiple Observable, Maximum Likelihood Continuity Mapping (MO-MALCOM) is described in this paper. There are theoretical and experimental reasons to believe that MO-MALCOM learns an insertable stochastic mapping between articulator positions and speech acoustics. Furthermore, MO-MALCOM can be combined with standard speech recognition algorithms to create a speech recognition model based on a stochastic production model. Results of using MO-MALCOM speech recognition on data derived from the switchboard corpus will be discussed. (Jelinek, 1997). A nice feature of HMMs is that maximum likelihood techniques allow the model parameters to be automatically determined from training data. The automatic parameter estimation, and the stochastic nature of the HMMs are presumably the features that allow them to cope with the amazing amount of variability in speech.
[发布日期] [发布机构] Technical Information Center Oak Ridge Tennessee
[效力级别] [学科分类] 工程和技术(综合)
[关键词] Speech;Signal processing;World models;Production;Speech synthesizers [时效性]