Modeling temporal coordination in speech production using an artificial central pattern generator neural network
[摘要] Researchers have made great strides in developing formal and biologically plausible models of the speech production and planning system. Models based on oscillators have been successful in simulating gestural dynamics, and more recently, some suprasegmental timing patterns. This dissertation explores how different levels of phonological organization (the gesture, syllable, stress group, and phrase) are coordinated in speech production by measuring the effects of that coordination on acoustic and behavioral measures of speech production.It accomplishes this by modeling those coordination patterns using an artificial neural network (ANN) model which incorporates oscillators, inspired by central pattern generators (CPGs), a type of neural circuit which underlies other animal behaviors. Included in this thesis are a description of the model, named the Neural Oscillator Model of Speech Timing and Rhythm (NOMSTR), and three empirical studies designed to test NOMSTR’s usefulness as a tool to model interactions between levels of phonological structure and to simulate those interactions’ effects on speech timing. Chapter 4 describes a study of NOMSTR’s ability to model the metrical structures of French and English utterances, through comparing the syllable durations and locations of accents and phrase boundaries in spontaneous utterances and repetitions of those utterances with the syllable durations and locations of accents and phrase boundaries in the simulations of the utterances. Chapter 5 describes a study investigating the interactions between levels of prosodic structure in French and in English by measuring the effects of accent-boundary proximity on syllable duration; as well as simulations of those effects using the model. Chapter 6, comprising the second part of the thesis, describes the background of studying speech errors as errors in speech timing, and of using oscillator-driven and ANN models to simulate speech errors; it also describes an investigation of how syllable structure (and super-syllabic structure) affects speech error distribution, using data from a speech error production study and an extension to NOMSTR. Major findings of Study 1 that are an artificial neural network model based on oscillators can be useful for simulating prosodic timing, even given the variability of timing in natural speech—NOMSTR was very successful in generating simulations of spontaneous utterances with a variety of different timing structures, in both French and English. This suggests that despite non-rhythmic influences on spontaneous speech, such as syllable content and idiosyncratic duration changes, much of the temporal structure of spontaneous speech can be modeled by a system whose timing rhythmic and regular, if not isochronous. Study 2 showed that In English, accented syllables near a phrase boundary were found to undergo more lengthening than adjacent unaccented syllables. This was an interaction which had been described previously (Turk & Shattuck-Hufnagel 2007), but had not been explained using existing models of prosodic timing. This effect was simulated in NOMSTR by modeling a prosodic structure in which accents and phrase boundaries inhibit (thus lengthening) the syllable, while phrase boundaries excite accents, providing nearby accents greater syllable lengthening power. In French, the lengthening of syllables with Initial Accents, or syllables immediately following the end of an accentual phrase, was found to be lessened in the presence of an upcoming IP boundary. Because of the coincidence of Intonational and Accentual Phrase boundaries in French, NOMSTR provided two possible ways to simulate this effect: either with AP boundary crowding or by modeling an inhibitory connection between the IP and AP nodes. Study 3 found that principles of the Articulatory Phonology model of suprasegmental structure can be integrated into a model of serial speech production in order to simulate aspects of speech error behavior which other models have been unable to explain, such as C/V error asymmetry, and error dependence.This dissertation demonstrates how an artificial neural network which incorporates oscillators can be a powerful tool for modeling the interactions between elements of phonological structure and for stimulating speech timing patterns, and additionally that a common underlying model architecture can be useful for stimulating multiple languages and multiple levels of structure.
[发布日期] [发布机构]
[效力级别] Prosody [学科分类]
[关键词] Speech Timing;Prosody;Rhythm;Speech Errors;Artificial Neural Network;Central Pattern Generator;English;French [时效性]