已收录 273162 条政策
 政策提纲
  • 暂无提纲
Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction
[摘要] Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN) to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.
[发布日期]  [发布机构] 
[效力级别]  [学科分类] 自动化工程
[关键词] Audio-visual speech recognition;Deep Dynamic Bayesian Network;unsupervised feature learning;Tibetan speech recognition [时效性] 
   浏览次数:17      统一登录查看全文      激活码登录查看全文