Eye-speech affect detection for automatic speech recognition
[摘要] Human-computer interaction (HCI) is becoming increasingly natural. Machines are now able to recognise faces, to understand individual speech and to converse like a human would. However, they are still far from exhibiting humanlike intelligence. Affects play an important role in interaction, so understanding and responding to them are necessary steps towards more natural HCI. This thesis reports the development and evaluation of affect detection systems suitable for use in real-life HCI applications (e.g. speech-enabled interfaces such as Alexa) using speech and eye movement modalities. A corpus of spontaneous affective responses in these modalities within an interactive virtual gaming environment, designed to elicit different affective states corresponding to the arousal and valence dimensions, was collected. A support vector machine was employed as a classifier to detect the affects elicited from both modalities. Several features of eye movement, namely pupillary response, fixation, saccade and blinking, are assessed for use in affect detection and new pupil response features based on the Hilbert transform are proposed. Acoustic and lexical characteristics of speech are investigated. The detection results suggest that eye movement is superior to speech, with pupillary response features based on Hilbert transform yielding superior performance on the arousal dimension, whereas saccade and fixation features perform better on the valence dimension. The improvement made by combining information from eye movement and speech modalities suggests that the two modalities carry complementary information for affect detection and that both warrant incorporation where feasible. An ASR application integrating affective information from both modalities for affect robustness was investigated. The best performing system uses affective information from eye movements, significantly reducing word error rates compared to the speech modality alone. This work highlights the potential of eye movements as an additional modality to speech to enhance the accuracy of affect detection and facilitate the development of robust affect-aware speech-enabled interfaces
[发布日期] [发布机构] University:University of Birmingham;Department:School of Engineering, Department of Electronic, Electrical and Systems Engineering
[效力级别] [学科分类]
[关键词] Q Science;Q Science (General) [时效性]