已收录 268921 条政策
 政策提纲
  • 暂无提纲
Developing phoneme-based lip-reading sentences system for silent speech recognition
[摘要] Lip-reading is a process of interpreting speech by visually analysing lip movements. Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild. This paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance system performance. Different classification schemas have been investigated, including character-based and visemes-based schemas. The visual front-end model of the system consists of a Spatial-Temporal (3D) convolution followed by a 2D ResNet. Transformers utilise multi-headed attention for phoneme recognition models. For the language model, a Recurrent Neural Network is used. The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art approaches in lip-reading sentences, the proposed system has demonstrated an improved performance by a 10% lower word error rate on average under varying illumination ratios.
[发布日期]  [发布机构] 
[效力级别]  [学科分类] 数学(综合)
[关键词] deep learning;deep neural networks;lip-reading;phoneme-based lip-reading;spatial-temporal convolution;transformers [时效性] 
   浏览次数:3      统一登录查看全文      激活码登录查看全文