In-depth Study Explanations
The following sections describe some of our recent and ongoing research projects.
Table of contents:
Stream Segregation
One of the most important tasks facing an infant is learning his or her native language. In the course of only a few years, infants must learn the phonotactic and prosodic patterns in their language, how to segment the fluent speech stream into its component words, and the meanings of thousands of individual words.
Clearly, infants could not succeed at such a task without sufficient exposure to their language. Yet, simple exposure to language is not enough. Much of the language input may not be ideal for language-learning. For example, considerable amounts of the speech which infants hear undoubtedly occurs in the context of other types of noise. Imagine an infant being spoken to by her primary caregiver. In order to learn from that caregiver's speech, the infant must be able to separate that speech from the sounds of the vacuum cleaner, the dog, and the cars going past outside. Some of this background noise may even take the form of speech: for example, there may be sounds from the TV down the hall, or from siblings in the adjacent room.
In a survey of the parents visiting our lab, two-thirds reported than when they are speaking to their infant, there are frequently other members of the household talking at the same time. Only one parent (of 48) reported that this almost never occurred. This suggests that infants seem to be in these multi-talker environments quite often. How do succeed at learning language in these situations?
 |
We have been investigating the cues that infants use to succeed at this task. In our first set of studies, Peter Jusczyk and I (1996) examined 7.5-month-old infants' ability to attend to the speech of a female talker while a male talker spoke simultaneously in the background. We utilized a methodology based on the work of Jusczyk and Aslin (1995), in which infants were initially familiarized with two target words and were then played passages that contained either those target words or novel words. Typically, infants listened longer to passages containing the familiarized target words. We altered this methodology slightly by presenting two talkers speaking simultaneously during the familiarization phase. One voice, that of the target speaker, repeated two isolated words in an infant-directed manner, as in the original Jusczyk and Aslin study. The other voice read a fluent speech distractor passage. |
Following this dual presentation, infants heard test passages in the target voice, that contained either words she had said during familiarization (the target words) or novel words. We found that infants listened significantly longer to the passages containing the familiarized words, suggesting that they were able to selectively attend to one speech stream despite the simultaneous presentation of an additional, competing stream.
However, the infants could separate the two voices only under very specific conditions. These conditions are an important first step in understanding how infants learn language in noisy environments.
First, infants only succeeded at learning from the target speech stream when the target voice was more intense than the background speech. When the two voices were of equal amplitude, the infants did not show a preference for the familiarized words. This suggests that a difference in amplitude between the competing speech streams is an important cue that infants rely on for speech segregation.
A second cue that infants use for speech segregation is the perceptual difference between the voices. In the original study, we specifically chose two voices which were rated dissimilar by adult listeners, in order to maximize the likelihood that infants would succeed at the task of attending to the target voice. In more recent research, we have found that infants do much poorer when presented with two female voices speaking simultaneously.
A third cue that infants rely on is familiarity with a voice. Brittan Barker and I (2000) presented infants with two female voices speaking simultaneously. Half of the infants heard their own mother as the target voice, while the other infants heard an unknown mother's voice. Those infants listening to their own mother succeeded at separating the two voices, while the infants listening to an unknown voice did not.
George Hollich, Peter Jusczyk, and I are currently investigating infants' use of visual information from a talker's face (to see a sample stimulus, click here and then select "continuous speech stream"). This visual information has been shown to be very important for adult stream segregation, and we expect that infants will also be able to benefit from this cue.
More recently, we have begun exploring younger children's ability to separate streams of speech, focusing on words they typically learn very early in life: their own names. In this study, infants hear either their own name, or another child's name, while noise is presented in the background; if they can managed to separate the streams, they should listen longer to their own name than to other children's names. For a report on this work, click here. To listen to what these items sound like, click here.
Barker, B. A. & Newman, R. S. (2000). The cocktail party effect in infants: Following one's mothers' voice. In S. Catherine Howell et al. (Eds.), BUCLD 24 Proceedings. Sommerville, MA: Cascadilla Press. pp. 92-103.
Newman, R. S. & Jusczyk, P. W. (1996). The cocktail party effct in infants. Perception & Psychophysics, 58(8), 1145-1156.
Jusczyk, P.W. & Aslin, R. N. (1995). Infants' detection of the sound patterns of words in fluent speech. Cognitive Psychology, 28.
back to top of page
Amount of Exposure
One of the most vexing questions in the field of language development is how much exposure infants need in order to begin acquiring a language. Although many language acquisition theories depend upon the infant being exposed to large amounts of language, what constitutes "large amounts" remains unclear. This issue has proven difficult to examine directly, as there is no way of controlling the language input a child receives. Peter Jusczyk and I realized that we could address this issue by controlling exposure to a second language. To the extent that the amount of language exposure required to begin learning a second language is related to the amount of exposure required to learn a first language, this methodology provides a way of addressing an otherwise intractable problem.
One of the first steps to learning a language is learning where the boundaries are between words. This is not an obvious part of the speech signal; when presented with speech in a foreign language, adults often claim to have difficulty separating the speech into individual words. Infants are in a similar situation, yet they clearly need to do so in order to learn their first words. We have selected this stage of word learning as the focus of our work.
In our first pilot study, we tried presenting infants with a videotape in Mandarin Chinese, and tested whether infants would later be able to learn individual words from Mandarin fluent speech. The infants did not succeed in the task, even though they succeeded in a comparable task with English speech. It is unclear whether the infants simply did not receive sufficient exposure, or whether the problem lies with the fact that videotapes are not interactive. Many researchers believe that infants are incapable of any language-learning unless they are exposed to the language in an interactive setting. If so, this would have vast implications for theories of language acquisition. If infants are only capable of learning language when they are in an interactive situation, then they would fail to benefit from the many occasions in which adults are speaking to one another in the infants' presence. This would serve to greatly reduce the infants' opportunities for learning language, and would have a serious impact on theories of language acquisition dependent upon exposure to large amounts of speech.
We began testing this by presenting infants with videotaped language exposure in English. Rebecca Ribar and I designed a videotape to include a number of novel words, repeated frequently in several stories. Infants watched this videotape in their homes 10 times over a 2 week period, and then visited the lab, where we tested their recognition of these key words. (This methodology is based on that of Jusczyk and Hohne, 1997). We found that infants were successful at learning these novel English words. Infants who did not see this videotape, or who saw it only 5 times, did not show this learning.
In collaboration with Jane Tsay, we recently translated our videotapes into Chinese, and we are now testing infants' ability to begin learning a foreign language, using the same methodology.
back to top of page
Plurals
A great deal of research has focused on children's acquisition of words, and recent research has begun focusing on syntax acquisition as well. But there remains little research on acquisition of inflectional morphology, such as the plural marker. Like many words, the plural morpheme marks a particular occurrence in the world; however, it may well be less salient than most words, as it does not constitute its own lexical item and is generally unstressed even when it constitutes its own syllable. Although researchers have investigated the production of inflectional morphemes (for example, see Marcus, 1995), little research has directly examined infants' developing comprehension of these items. Yet children frequently comprehend aspects of language far earlier than they can produce them (see Gerken, 1993).
 |
The present research is a developmental study of infants' comprehension of the plural marker, for both known and novel objects. We are presenting infants at a variety of ages with two videotapes in a preferential looking procedure. In a typical version, one videoscreen would show a single object, while the other would show a pair of objects. In the baseline condition, the videos are presented without any concomittant speech. In the singular condition, infants hear a voice saying, "Find the goish" (for a novel object) or "Find the couch" (for a familiar object). In the plural condition, infants hear the same voice saying, "Find the goishes" or "Find the couches". We are looking for an increase in infants' looking times to the appropriate video in the named condition as compared to their looking time to that object in the baseline condition.
Our results to date suggest that infants go through a series of stages during their learning of the plural marker. In particular, infants show comprehension of the plural marker at an earlier age for familiar objects than for unfamiliar objects, even when those familiar objects are unlikely to occur in the plural form outside of the laboratory. Thus, infants seem to initially understand the plural marker only in known contexts, and then later on they are able to generalize this knowledge. |
Marcus, G. F. (1995). Children's overregularization of English plurals: A quantitative analysis. Journal of Child Language, 22, 447-459.
Gerken, L.A. & McIntosh, B. J. (1993). The interplay of the function morphemes and prosody in early language. Developmental Psychology, 29(3), 448-457.
back to top of page
Phoneme Restoration
This study examines the "phoneme restoration effect" in infants. In this auditory illusion, a sound is removed from a word, and replaced with noise. So, for example, Warren and Obusek (1971) removed the "s" from the word "legislatures" and replaced it with a cough. Adults used their knowledge of possible words to "fill in the gap" -- they "heard" the full word. More surprisingly, they could not even tell that a sound was missing. That is, they were not simply guessing the word from what part of it they heard -- they actually had the illusion that the word was fully there.
This may seem like an odd task, but when we listen to people speak, it is not that unusual for a sudden noise in the environment to mask part of what a person says -- phoneme restoration is just a laboratory analog of this same, common even. When this occurs in the real world, listeners manage to fill in the gaps on the basis of the context.
But in order to figure out what a person meant to say, a listener needs to have a fair degree of knowledge about what things make up real words in the language. The reason the listeners in the Warren study heard legislatures, rather than legiplatures, is that they knew that only the former is a real word in the language.
We are examining the development of this interaction between pre-existing knowledge and perception. Toddlers are shown videotapes of two known objects, such as a cat and a dog. They hear these items labeled either in a clear condition, or in a noisy one. We are comparing their preference for the appropriate object across these conditions.
Warren, R. M. & Obusek, C. J. (1971). Speech perception and phonemic restorations. Perception & Psychophysics, 9 (3-B), 358-362.
back to top of page
Lexical Factors in Naming
As speakers, we are constantly faced with the task of accessing words for spontaneous usage: any time we speak, we need to be able to find the appropriate words to express our intended thoughts. Although typically automatic, occasionally this skill is disrupted, and we are left searching for the word we wanted to use. We may even substitute a different word entirely, only afterwards realizing that we misspoke.
This research, in collaboration with Diane German at National Louis University, has been examining aspects of words that might make them easier or harder to name. We have been examining both typically-developing children and adults, and individuals who have been diagnosed as having difficulties with word-finding. We have found better naming performance for words that are high in frequency, learned earlier in life, and have few lexical neighbors. Moreover, these lexical factors also influence the types of errors that children make, and the types of words they tend to err to. These results are important theoretically, in that they support models of word recognition in which access paths are strengthened with successful use. But they also have clinical implications, in that they suggest that clinicians might be able to predict the type of word-finding errors children are likely to make, and thus carry out a strategic, research based, word-finding intervention, matching the retrieval strategies to the specific target words to be accessed.
Newman, R.S. & German, D.J. (2002). Effects of lexical factors on lexical access among typical language-learning children and children with word-finding difficulties. Language & Speech, 43(3), 285-317.
back to top of page
Neighborhood Effects
How do we recognize words? This turns out to be a very complicated question, involving a number of different factors. One such factor is lexical neighborhood. Some words, like "cat" are similar to many other words in the language (words like cut, kit, bat, cap, etc.). Other words, like "orange" really are not similar to many words at all. These similar words are referred to as "neighbors", and they can influence perception of words in multiple ways.
 |
Jim Sawusch, Paul Luce and I originally began investigating whether neighborhoods could influence phoneme perception. In a series of studies (Newman et al. 1997, 1999) we presented listeners with nonwords ranging along two series. For example, one pair of series ranged from "beysh" to "peysh", the other ranged from "beyth" to "peyth". Two of these endpoints ("peysh" and "beyth") have more neighbors than do the other two endpoints. We found that listeners were more likely to label ambiguous members of the series as being whichever endpoint has a higher neighborhood value. These neighborhood effects also influence the amount of time required to decide whether an item is a real word in the language or a nonsense word. These results suggest that multiple possible words are activated during the process of recognizing a single item, and this massive activation can influence how you hear the item presented. |
More recently, we have been focusing on determining the best ways to calculate neighborhood values. More specifically, we have shown that neighborhood effects are not driven exclusively by those items which match the target item word-initially. Even neighbors like "bat", that diverge from the target word "cat" very early in processing, will still exert an influence on perception of the target word.
In addition, Prahlad Gupta , Larissa Samuelson and I are in beginning to investigate the effect of neighborhoods on the ability to learn new words, in both children and adults. There may be conflicting contributions of neighborhoods at different levels of processing -- a word with more neighbors may be easier to learn to pronounce (as the sound pattern is common) but may be more difficult to learn a meaning for (as there will be more words with which it can be confused).
Newman, R. S., Sawusch, J. R. & Luce, P. A. (1999). Underspecification and phoneme frequency in speech perception. In M. Broe & J. Pierrehumbert (Eds.), Papers in Laboratory Phonology V: Language Acquisition and the Lexicon, pp. 298-311. Cambridge, UK: Cambridge University Press.
Newman, R. S., Sawusch, J. R. & Luce, P. A. (1997). Lexical neighborhood effects in phonetic processing. Journal of Experimental Psychology: Human Perception and Performance, 23(3), 873-889.
back to top of page
Rate Normalization Effects
One of the fundamental issues in speech perception research involves the apparent lack of invariance between the acoustic signal and the listener's perception. Listeners somehow manage to perceive messages correctly, despite the variability in the acoustic signal caused by changes in speaking rate, talkers, and dialect. One of these sources of variability in the acoustic signal is the rate at which a person speaks. People do not talk at a constant rate, and certain phonemes change substantially in duration as speaking rate changes. This poses a problem for the listener, because some phonemic contrasts are cued, in whole or in part, by their duration. For instance, the /b/ - /w/ manner contrast can be cued by differences in duration alone, with shorter initial transitions being heard as more "b-like" and longer transitions as more "w-like". However, when we listen to someone who talks very quickly, we still hear /w/ phonemes: they do not all sound like stops. Conversely, when we listen to someone who speaks very slowly, their intended /b/s do not all sound like /w/s. Obviously, we must somehow compensate for the differences in speaking rate among talkers and among different tokens from any particular talker.
In our first study, Jim Sawusch and I demonstrated that speaking rate normalization occurs over a relatively short temporal winow, without reference to either the type of phoneme affected, or the legality of the sequence. In more recent work, we have shown that it will also occur across talkers. For example, in one such study we created a series ranging from "bee" to "wee". Part-way through the vowel, the voice changed from that of a high-pitched female talker to a low-pitched male talker. Despite this radical change in talker, the listeners used the duration of the vowel following the voice change as a cue to the speaking rate of the initial portion. A longer vowel, indicating a slower speaking rate, made the initial constrast seem relatively short in comparison, even though the two items were spoken by different talkers. These results suggest that rate normalization is a fairly early, automatic stage of language processing. More recent work is examing the conditions under which these cross-voice effects occur.
Newman, R. S. & Sawusch, J. R. (1996). Perceptual normalization for speaking rate: Effects of temporal distance. Perception & Psychophysics, 58(4), 540-560.
Sawusch, J. R. & Newman, R. S. (2000). Perceptual normalization for speaking rate II: Effects of signal discontinuities. Perception & Psychophysics, 62.
back to top of page
Phonotactic Effects
We appear to have a great deal of implicit knowledge about our native language. One form of such knowledge is phonotactic probability -- the likelihood that any two phonemes co-occur. Some pairs of phonemes are much more frequent than others; for example, there are many words that end in "nt", but very few that end in "mt". These probabilities can influence our perception of the language. In a recent pair of studies, we presented listeners with multisyllabic nonwords. Some of these nonwords contained real words embedded within them; for example, "minteyth" contains the real word "mint". Listeners were asked to decide whether or not there was an embedded word in each nonword item.
There were four types of embedded words in this study. Half of the embedded words contained an entire consonant cluster at the end (for example, "mint" in "minteyth"), while the other words contained only half of a consonant cluster (for example, "fin" in "finteyth"). In addition, half of the clusters were common in the language (such as "nt") while the other half were uncommon (such as "mt").
We predicted that common clusters might cohere better than rare ones. This would suggest that for common clusters, the longer words (such as "mint") would be easier to hear than would the shorter words (such as fin) -- the two sounds in the cluster would remain together, making it difficult to hear "fin". In contrast, it would be easier to hear the shorter words in the rare cluster condition, since the two phonemes in the cluster would have a tendency to break apart. We did find a significant effect of phonotactic probability on our listerner's ability to detect embedded words. However, the effect went in exactly the opposite direction from what we had predicted. We are currently in the midst of exploring the reasons for this surprising result.
back to top of page
Effects of Word Boundary Cues on Lexical Access
Spoken language is marked by ambiguity at every level of analysis. From concept, clause, and lemma to phonological form, the spoken message may offer the listener multiple possible interpretations. For example, segmenting the continuously-varying speech stream into individual words often results in momentary ambiguities regarding the appropriate placement of word boundaries. This segmentation problem arises primarily from the failure of the signal to provide consistent and reliable acoustic indicators to the beginnings and endings of words.
How, then, does the listener cope? One approach to the problem of word boundary ambiguity in spoken language perception has been to assume that multiple interpretations of the speech signal are entertained during the temporal course of processing. Perceptual choice among the possible interpretations is accomplished through satisfying a set of constraints that draw on acoustic, phonetic, lexical, syntactic, semantic, and contextual information. In short, most current models of spoken word recognition assume that multiple items consistent with the speech input are initially activated in memory, only to be decided among through a process of constraint satisfaction.
However, there are several versions of this approach, which differ in the limitations they place on multiple activation. Some models, such as Trace, Shortlist, and PARSYN, claim that form-based lexical representations consistent with the stimulus input may be activated at any point in the speech stream. This is in contrast to a number of models that limit activation to specific points in the signal. For example, the metrical segmentation strategy (MSS) states that representations are activated at the onset of stressed syllables only. These theories would then have different predictions regarding the activation of words that do not begin at a word boundary.
Along with Paul Luce, I have been examining the degree to which word boundary information in two-word sequences may limit lexical hypotheses during recognition. Using a cross-modal priming technique, we presented spoken two-word utterances, such as NOTE-RAIL, and investigated the activation of possible lexical items spanning the word boundary (e.g., TRAIL). Our evidence suggests that these words are activated during the course of processing.
We have also found evidence supporting the idea of post-selection inhibition. According to this ntion, one of the primary advantages of multiple activation is that it enables the recognition system to re-analyze ambiguous input if later information suggests that the wrong interpretation had been initially selected. But, if the processing system simply re-samples the lexical hypothesis space a second time, it would most likely again converge on its initial dominant hypothesis. Unless the relative activation levels between the items changed, or the information which forced the re-analysis is integrated with the lexical information, the same incorrect item would remain highly active at the point when the error is noticed. We have found that proposed that lexical entries are inhibited after selection. This post-selection inhibition serves to remove the already-selected item from the pool of candidates, preventing it from interfering with later recognition attempts once the initial interpretation has proven inadequate.
back to top of page
Stream Segregation-Familiar Voices
Frequently, we are in a position of having to listen to one voice when many people are talking simultaneously (for example, at a party). This is known as streaming, and is a primary focus in the labs' infant research. Brittan Barker and I recently completed a paper showing that infants perform better at a streaming task when their mom serves as the target voice (the voice they are supposed to attend to) than when some other mom serves as the target (see stream segregation ). In the process of this study, we were surprised to see that no one had examined the role of voice familiarity in adult streaming. That is the purpose of the current study. We are using the voice of one of the Elementary Psychology professors as our target voice, and an unfamiliar voice as the background voice. Students are asked to "shadow" the target voice (to repeat back what that voice said), and we record the subjects' speech onto cassette tape. We then examine the accuracy of their shadowing (that is, we look for errors, where the students either miss words, or mispronounce them). We are comparing the results of students taking Elementary Psychology from this professor (who thus presumably knew his voice) with those of students taking Elementary Psychology from a different professor. We are also comparing those who are explicitly told who they will be listening to to those who were not told, but still know the talker. The results seem to suggest that knowing who is talking explicitly is more important than being familiar with the talker.
back to top of page
Stream Segregation- "Motherese"
"Motherese" refers to the special way of talking adults typically use with infants. Generally, motherese involves speaking in a higher pitch, with more pitch and volume variability, more whispering, speaking more slowly, etc. Although speaking in motherese to infants appears to be universal, no one has yet found a real "purpose" for it, beyond the fact that infants seem to like listening to it better (and thus attend to it more). But, many of these cues to motherese are the same cues that serve to make separating apart different streams of non-speech sounds easier: for example, it is easier to separate two musical streams if they differ in pitch, or in loudness, etc.. Thus one possibility is that motherese helps to make the caretaker's voice more distinct from the background, making it easier for infants to attend to it. In the present study, adult listeners were asked to shadow a particular talker, just as in the familiar voice study above. This talker either spoke in an adult-directed or infant-directed manner. The results suggest that motherese speech does aid in separating different streams of speech and selectively attending to one of them.
back to top of page
Word-Finding in Menopausal Women
This research is being performed in collaboration with Diane German at National Louis University. Recently, there has been a great deal of interest on work on memory in menopausal women. The hormone changes that occur at menopause frequently cause memory difficulties, which are alleviated by taking estrogen. Although a great deal of research has examined this, all of the studies to date have used very simple memory tests, where the individual is given a list of words, asked to memorize them, and later tested on their recall. This taks is not very representative of the types of memory tasks that people ordinarily face. It also fails to distinguish between problems in storing newly-learned information and difficulties in retrieval of already-known information.
The present study investigates one particular form of memory, that of accessing words. We all have had the occasional occurrence of being unable to retrieve a word from memory, but feeling that it is "right on the tip of our tongue". There have been anecdotal reports that women in perimenopause have especial difficulty finding words, but this has never been studied. We are comparing word-finding performance of women currently on hormone replacement therapy with that of women experiencing the symptoms of perimenopause. So far, it seems that hormone replacement therapy does reduce word-finding problems, but does not increase the speed of word-finding overall.
back to top of page
Finding Word Boundaries in a Foreign Language
Remember the last time you heard someone speaking in a foreign language? People often report that foreign languages sound as if they don't have any pauses in them-- that they cannot tell where the word boundaries are. In fact, fluent speech (regardless of the language) only rarely has breaks between words. Unlike printed text, there are no pauses to indicate where one word ends, and another begins. The cues to word boundaries are more subtle than that, and have to be learned for the specific language. (This is why they do not seem to transfer to unknown languages).
This is the general dogma, and it suggests that listeners will not be able to locate word boundaries are in foreign languages at all -- but this actually has not been well tested. We have been investigating this topic, and have found that listeners can detect word boundaries in a foreign language much beter than would be expected by chance. (Some sample stimuli are located on the adult laboratory page.) We are currently investigating the cues listeners use to identify these word boundaries. One possible cue is the intonation (or "melody") of the speech; another possibility is that adults use certain sounds (particularly stop consonants) to signal word boundaries.
back to top of page
The Influence of Talker Variability on Speech Perception
One of the primary issues in the field of speech perception has been the apparent "lack of invariance" between the acoustic information in a signal and the listener's phonemic perception. This variability can be caused by a variety of factors, including dialect, social group, speaking rate, emotional state, gender, vocal tract length, articulatory habits, and phonetic context. Regardless of the cause, the effect is that the same intended phoneme can be produced with a wide range of acoustic values, and that two different intended phonemes can occasionally have similar or identical acoustic values. Although the existence of this variability in production is not in question, the degree to which listeners have to account for it in everyday listening situations is unclear. There have been few studies that have examined the extent of this variability, even for individual phonemes in a laboratory setting. While different phonemes may occasionally overlap on some acoustic values, it is not at all clear how common an event this is. Without this information, it is difficult to determine the degree to which individuals need to adjust perception for the individual talker or utterance. This line of research is designed to examine the degree of this variability among a set of talkers in their speech production, and the influence that this variability has on listeners' perceptions.
back to top of page
|
 |
|
|