The Speech Unit(e)s project is focused on the speech unification process associating the auditory, visual and motor streams in the human brain, in an interdisciplinary approach combining cognitive psychology, neurosciences, phonetics (both descriptive and developmental) and computational models. The framework is provided by the "Perception-for-Action-Control Theory (PACT)" developed by the PI (Schwartz et al., 2012).
PACT is a perceptuo-motor theory of speech communication, which connects in a principled way perceptual shaping and motor procedural knowledge in speech multisensory processing. The communication unit in PACT is neither a sound nor a gesture but a perceptually shaped gesture, that is a perceptuo-motor unit. It is characterised by both articulatory coherence – provided by its gestural nature – and perceptual value – necessary for being functional. PACT considers two roles for the perceptuo-motor link in speech perception: online unification of the sensory and motor streams through audio-visuo-motor binding, and offline joint emergence of the perceptual and motor repertoires in speech development.
Objectives of the PhD position
In the debates between auditory and motor theories of speech perception, and in their modern revival concerning the role of the dorsal route (Hickok & Poeppel, 2004, 2007), there is no real reflexion about what could be the functional role of a perceptuo-motor coupling for speech perception. The "dorsal route" is supposed to be useful in "adverse conditions", e.g. in noise or with a foreign language (Callan et al., 2004; Zekveld et al., 2006). But no theoretical explanation is actually proposed for this potential efficiency of motor processes in adverse conditions.
We have recently developed a computational framework enabling to compare the predictions of auditory, motor and perceptuo-motor theories in various kinds of situations (Moulin-Frier et al., 2012). Casting these theories into a single, unified mathematical framework is an efficient way to compare the theories and their properties in a systematic manner. Bayesian modeling is a mathematical framework that precisely allows such comparisons. The trick is that the same tool, namely probabilities, can be used both for defining the models and for comparing them (see e.g. Myung & Pitt, 2009).
The generic model we developed is called COSMO, which stands for "Communicating about Objects using Sensory-Motor Operations". The COSMO acronym also represents the five variables around which the basic structure of the model is built. In COSMO, communication (C) is a success when an object OS in the speaker's mind is transferred, via sensory and motor means S and M, to the listener's mind where it is correctly recovered as OL. COSMO assumes that a communicating agent, which is both a speaker and a listener, internalizes the communication situation inside an internal model.
The PhD project aims at developing COSMO in two major directions.
(1) Joint acquisition of perceptual and motor repertoires in a syllabic framework. Experiments in COSMO have mainly concerned simple stimuli, e.g. in abstract one-dimensional sensory-motor spaces, or with restricted vowel samples. We will explore strategies for automatically learning to produce and perceive complex sequences such as plosive-vowel CV sequences, which display systematic coarticulation phenomena. Various kinds of exploration and learning mechanisms are available from cognitive and developmental robotics (Moulin-Frier & Oudeyer, 2012). Validation tests will be inspired from real data, on e.g. locus equations for plosive acoustics (Sussman et al., 1998), robustness to perturbations in production (Lindblom et al., 1979; Savariaux et al., 1995), or coupling of perceptual and motor idiosyncrasies.
(2) Comparison of auditory, motor and perceptuo-motor theories for speech processing in various conditions. Once these perception and production components will be settled in COSMO, we will compare auditory, motor and perceptuo-motor speech perception theories in challenging conditions, such as noise, speaker normalization, or foreign accent. We will test the ability to develop a perceptuo-motor phonology from auditory and motor experience, e.g. to acquire a category such as "plosive place of articulation" through the discovery of perceptuo-motor links in learning. We will also test COSMO on natural CV stimuli, exploiting natural multi-speaker corpora of CV sequences for learning and perceptual tests.
The work will be realized within a multidisciplinary team gathering knowledge in speech communication, cognitive theories and Bayesian modeling (Jean-Luc Schwartz in GIPSA-Lab Grenoble, Julien Diard in LPNC Grenoble, Pierre Bessière in ISIR Paris), in collaboration with Pierre-Yves Oudeyer in INRIA Bordeaux.
The PhD position is open from September 2014, or slightly later if necessary.
Candidates should have a master, some knowledge about speech and cognitive modeling, and ability to program and to develop computational models.
They must send a CV, together with a letter explaining why they are interested in the project. They should also provide two names (with email addresses) for recommendations about their applications.
This should be send before June 15th to Jean-Luc Schwartz (Jean-Luc.Schwartz@gipsa-lab.grenoble-inp.fr). Interviews will be done with preselected candidates. Decision will occur in the following weeks.