• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Mar 2009

Volume 125, Issue 3, pp. EL85-1845

back to top
RSS Feeds

Does harmonicity explain children’s cue weighting of fricative-vowel syllables?

Susan Nittrouer and Joanna H. Lowenstein

J. Acoust. Soc. Am. Volume 125, Issue 3, pp. 1679-1692 (2009); (14 pages) | Cited 5 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
When labeling syllable-initial fricatives, children have been found to weight formant transitions more and fricative-noise spectra less than adults, prompting the suggestion that children attend more to the slow vocal-tract movements that create syllabic structure than to the rapid gestures more closely aligned with individual phonetic segments. That explanation fits well with linguistic theories, but an alternative explanation emerges from auditory science: Perhaps children attend to formant transitions because they are found in voiced signal portions, and so formants share a common harmonic structure. This work tested that hypothesis by using two kinds of stimuli lacking harmonicity: sine-wave and whispered speech. Adults and children under 7 years of age were asked to label fricative-vowel syllables in each of those conditions, as well as natural speech. Results showed that children did not change their weighting strategies from those used with natural speech when listening to sine-wave stimuli, but weighted formant transitions less when listening to whispered stimuli. These findings showed that it is not the harmonicity principle that explains children’s preference for formant transitions in phonetic decisions. It is further suggested that children are unable to recover formant structure when those formants are not spectrally prominent and/or are noisy.
Show PACS
43.71.An Models and theories of speech perception
43.71.Ft Development of speech perception

Vowel devoicing and the perception of spoken Japanese words

Anne Cutler, Takashi Otake, and James M. McQueen

J. Acoust. Soc. Am. Volume 125, Issue 3, pp. 1693-1703 (2009); (11 pages) | Cited 3 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Three experiments, in which Japanese listeners detected Japanese words embedded in nonsense sequences, examined the perceptual consequences of vowel devoicing in that language. Since vowelless sequences disrupt speech segmentation [ Norris et al. (1997). Cognit. Psychol. 34, 191–243 ], devoicing is potentially problematic for perception. Words in initial position in nonsense sequences were detected more easily when followed by a sequence containing a vowel than by a vowelless segment (with or without further context), and vowelless segments that were potential devoicing environments were no easier than those not allowing devoicing. Thus asa, “morning,” was easier in asau or asazu than in all of asap, asapdo, asaf, or asafte, despite the fact that the /f/ in the latter two is a possible realization of fu, with devoiced [u]. Japanese listeners thus do not treat devoicing contexts as if they always contain vowels. Words in final position in nonsense sequences, however, produced a different pattern: here, preceding vowelless contexts allowing devoicing impeded word detection less strongly (so, sake was detected less accurately, but not less rapidly, in nyaksake—possibly arising from nyakusake—than in nyagusake). This is consistent with listeners treating consonant sequences as potential realizations of parts of existing lexical candidates wherever possible.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech

Temporal integration in vowel perception

Andrew B. Wallace and Sheila E. Blumstein

J. Acoust. Soc. Am. Volume 125, Issue 3, pp. 1704-1711 (2009); (8 pages) | Cited 3 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Psychoacoustic research suggests that multiple auditory channels process incoming sounds over temporal windows of different durations, resulting in multiple auditory representations being available to higher-level processes. The current experiments investigate the size of the temporal window used in vowel quality perception using an acoustic priming paradigm with nonspeech and speech primes of varying duration. In experiment 1, identification of vowel targets was facilitated by acoustically matched nonspeech primes. The magnitude of this effect was greatest for the shortest (25 and 50 ms) primes, remained level at medium (100 and 150 ms) duration primes, and declined significantly at longer prime durations, suggesting that the auditory stages of vowel quality perception integrate sensory input over a relatively short temporal window. In experiment 2, the same vowel targets were primed by speech stimuli, consisting of vowels using the same duration values as those in experiment 1. A different pattern of results emerged with the greatest priming effects found for primes of around 150 ms and less priming at the shorter and longer durations, indicating that longer-scale temporal processes operate at higher levels of analysis.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.Rt Sensory mechanisms in speech perception
43.66.Lj Perceptual effects of sound

Multisyllabic nonwords: More than a string of syllables

Lisa M. D. Archibald, Susan E. Gathercole, and Marc F. Joanisse

J. Acoust. Soc. Am. Volume 125, Issue 3, pp. 1712-1722 (2009); (11 pages) | Cited 2 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Nonword repetition is closely associated with the learning of the phonological form of novel words. Several factors influence nonword repetition performance such as short-term memory, phonotactic probability, lexical knowledge, and prosodic factors. The present study examined the influence of list duration, coarticulation, and prosody on nonword repetition by comparing naturally articulated multisyllabic nonwords to multisyllabic nonwords formed by concatenating syllables produced in isolation and serial lists (experiment 1), to multisyllabic forms that incorporated either valid or invalid coarticulatory information (experiment 2), and to multisyllabic forms either with or without common English within-word stress patterns (experiment 3). Results revealed superior recall for naturally articulated nonwords compared to lists of matched duration or sequences with invalid coarticulatory cues. Within-word stress patterns also conveyed a repetition advantage. The findings clearly establish that the coarticulatory and prosodic cues of naturally articulated multisyllabic forms support retention.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.Sy Spoken language processing by humans

Modeling the effect of channel number and interaction on consonant recognition in a cochlear implant peak-picking strategy

Carl Verschuur

J. Acoust. Soc. Am. Volume 125, Issue 3, pp. 1723-1736 (2009); (14 pages) | Cited 1 time

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Difficulties in speech recognition experienced by cochlear implant users may be attributed both to information loss caused by signal processing and to information loss associated with the interface between the electrode array and auditory nervous system, including cross-channel interaction. The objective of the work reported here was to attempt to partial out the relative contribution of these different factors to consonant recognition. This was achieved by comparing patterns of consonant feature recognition as a function of channel number and presence/absence of background noise in users of the Nucleus 24 device with normal hearing subjects listening to acoustic models that mimicked processing of that device. Additionally, in the acoustic model experiment, a simulation of cross-channel spread of excitation, or “channel interaction,” was varied. Results showed that acoustic model experiments were highly correlated with patterns of performance in better-performing cochlear implant users. Deficits to consonant recognition in this subgroup could be attributed to cochlear implant processing, whereas channel interaction played a much smaller role in determining performance errors. The study also showed that large changes to channel number in the Advanced Combination Encoder signal processing strategy led to no substantial changes in performance.
Show PACS
43.71.Ky Speech perception by the hearing impaired
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech

The neural processing of masked speech: Evidence for different mechanisms in the left and right temporal lobes

Sophie K. Scott, Stuart Rosen, C. Philip Beaman, Josh P. Davis, and Richard J. S. Wise

J. Acoust. Soc. Am. Volume 125, Issue 3, pp. 1737-1743 (2009); (7 pages) | Cited 8 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
It has been previously demonstrated that extensive activation in the dorsolateral temporal lobes associated with masking a speech target with a speech masker, consistent with the hypothesis that competition for central auditory processes is an important factor in informational masking. Here, masking from speech and two additional maskers derived from the original speech were investigated. One of these is spectrally rotated speech, which is unintelligible and has a similar (inverted) spectrotemporal profile to speech. The authors also controlled for the possibility of “glimpsing” of the target signal during modulated masking sounds by using speech-modulated noise as a masker in a baseline condition. Functional imaging results reveal that masking speech with speech leads to bilateral superior temporal gyrus (STG) activation relative to a speech-in-noise baseline, while masking speech with spectrally rotated speech leads solely to right STG activation relative to the baseline. This result is discussed in terms of hemispheric asymmetries for speech perception, and interpreted as showing that masking effects can arise through two parallel neural systems, in the left and right temporal lobes. This has implications for the competition for resources caused by speech and rotated speech maskers, and may illuminate some of the mechanisms involved in informational masking.
Show PACS
43.71.Rt Sensory mechanisms in speech perception
43.71.Qr Neurophysiology of speech perception

Multisensory integration enhances phonemic restoration

Antoine J. Shahin and Lee M. Miller

J. Acoust. Soc. Am. Volume 125, Issue 3, pp. 1744-1750 (2009); (7 pages) | Cited 3 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Phonemic restoration occurs when speech is perceived to be continuous through noisy interruptions, even when the speech signal is artificially removed from the interrupted epochs. This temporal filling-in illusion helps maintain robust comprehension in adverse environments and illustrates how contextual knowledge through the auditory modality (e.g., lexical) can improve perception. This study investigated how one important form of context, visual speech, affects phonemic restoration. The hypothesis was that audio-visual integration of speech should improve phonemic restoration, allowing the perceived continuity to span longer temporal gaps. Subjects listened to tri-syllabic words with a portion of each word replaced by white noise while watching lip-movement that was either congruent, temporally reversed (incongruent), or static. For each word, subjects judged whether the utterance sounded continuous or interrupted, where a “continuous” response indicated an illusory percept. Results showed that illusory filling-in of longer white noise durations (longer missing segments) occurred when the mouth movement was congruent with the spoken word compared to the other conditions, with no differences occurring between the static and incongruent conditions. Thus, phonemic restoration is enhanced when applying contextual knowledge through multisensory integration.
Show PACS
43.71.Sy Spoken language processing by humans
43.71.Rt Sensory mechanisms in speech perception
Close

close