• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Next Issue

Jul 2011

Volume 130, Issue 1, pp. EL1-641

back to top
RSS Feeds

The mutual roles of temporal glimpsing and vocal characteristics in cocktail-party listening

Martin D. Vestergaard, Nicholas R. C. Fyson, and Roy D. Patterson

J. Acoust. Soc. Am. Volume 130, Issue 1, pp. 429-439 (2011); (11 pages) | Cited 3 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
At a cocktail party, listeners must attend selectively to a target speaker and segregate their speech from distracting speech sounds uttered by other speakers. To solve this task, listeners can draw on a variety of vocal, spatial, and temporal cues. Recently, Vestergaard et al. [J. Acoust. Soc. Am. 125, 1114−1124 (2009)] developed a concurrent-syllable task to control temporal glimpsing within segments of concurrent speech, and this allowed them to measure the interaction of glottal pulse rate and vocal tract length and reveal how the auditory system integrates information from independent acoustic modalities to enhance recognition. The current paper shows how the interaction of these acoustic cues evolves as the temporal overlap of syllables is varied. Temporal glimpses as short as 25 ms are observed to improve syllable recognition substantially when the target and distracter have similar vocal characteristics, but not when they are dissimilar. The effect of temporal glimpsing on recognition performance is strongly affected by the form of the syllable (consonant-vowel versus vowel-consonant), but it is independent of other phonetic features such as place and manner of articulation.
Show PACS
43.71.An Models and theories of speech perception
43.71.Bp Perception of voice and talker characteristics
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.66.Ba Models and theories of auditory processes

Laminar cortical dynamics of conscious speech perception: Neural model of phonemic restoration using subsequent context in noise

Stephen Grossberg and Sohrob Kazerounian

J. Acoust. Soc. Am. Volume 130, Issue 1, pp. 440-460 (2011); (21 pages) | Cited 3 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
How are laminar circuits of neocortex organized to generate conscious speech and language percepts? How does the brain restore information that is occluded by noise, or absent from an acoustic signal, by integrating contextual information over many milliseconds to disambiguate noise-occluded acoustical signals? How are speech and language heard in the correct temporal order, despite the influence of contexts that may occur many milliseconds before or after each perceived word? A neural model describes key mechanisms in forming conscious speech percepts, and quantitatively simulates a critical example of contextual disambiguation of speech and language; namely, phonemic restoration. Here, a phoneme deleted from a speech stream is perceptually restored when it is replaced by broadband noise, even when the disambiguating context occurs after the phoneme was presented. The model describes how the laminar circuits within a hierarchy of cortical processing stages may interact to generate a conscious speech percept that is embodied by a resonant wave of activation that occurs between acoustic features, acoustic item chunks, and list chunks. Chunk-mediated gating allows speech to be heard in the correct temporal order, even when what is heard depends upon future context.
Show PACS
43.71.An Models and theories of speech perception
43.71.Rt Sensory mechanisms in speech perception
43.66.Ba Models and theories of auditory processes
43.71.Sy Spoken language processing by humans

Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design

Tyler K. Perrachione, Jiyeon Lee, Louisa Y. Y. Ha, and Patrick C. M. Wong

J. Acoust. Soc. Am. Volume 130, Issue 1, pp. 461-472 (2011); (12 pages) | Cited 1 time

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Studies evaluating phonological contrast learning typically investigate either the predictiveness of specific pretraining aptitude measures or the efficacy of different instructional paradigms. However, little research considers how these factors interact—whether different students learn better from different types of instruction—and what the psychological basis for any interaction might be. The present study demonstrates that successfully learning a foreign-language phonological contrast for pitch depends on an interaction between individual differences in perceptual abilities and the design of the training paradigm. Training from stimuli with high acoustic-phonetic variability is generally thought to improve learning; however, we found high-variability training enhanced learning only for individuals with strong perceptual abilities. Learners with weaker perceptual abilities were actually impaired by high-variability training relative to a low-variability condition. A second experiment assessing variations on the high-variability training design determined that the property of this learning environment most detrimental to perceptually weak learners is the amount of trial-by-trial variability. Learners’ perceptual limitations can thus override the benefits of high-variability training where trial-by-trial variability in other irrelevant acoustic-phonetic features obfuscates access to the target feature. These results demonstrate the importance of considering individual differences in pretraining aptitudes when evaluating the efficacy of any speech training paradigm.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.66.Hg Pitch
43.71.Ft Development of speech perception
43.71.Hw Cross-language perception of speech

Effects of spectral smearing and temporal fine-structure distortion on the fluctuating-masker benefit for speech at a fixed signal-to-noise ratio

Joshua G. W. Bernstein and Douglas S. Brungart

J. Acoust. Soc. Am. Volume 130, Issue 1, pp. 473-488 (2011); (16 pages) | Cited 12 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Normal-hearing listeners receive less benefit from momentary dips in the level of a fluctuating masker for speech processed to degrade spectral detail or temporal fine structure (TFS) than for unprocessed speech. This has been interpreted as evidence that the magnitude of the fluctuating-masker benefit (FMB) reflects the ability to resolve spectral detail and TFS. However, the FMB for degraded speech is typically measured at a higher signal-to-noise ratio (SNR) to yield performance similar to normal speech for the baseline (stationary-noise) condition. Because the FMB decreases with increasing SNR, this SNR difference might account for the reduction in FMB for degraded speech. In this study, the FMB for unprocessed and processed (TFS-removed or spectrally smeared) speech was measured in a paradigm that adjusts word-set size, rather than SNR, to equate stationary-noise performance across processing conditions. Compared at the same SNR and percent-correct level (but with different set sizes), processed and unprocessed stimuli yielded a similar FMB for four different fluctuating maskers (speech-modulated noise, one opposite-gender interfering talker, two same-gender interfering talkers, and 16-Hz interrupted noise). These results suggest that, for these maskers, spectral or TFS distortions do not directly impair the ability to benefit from momentary dips in masker level.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.An Models and theories of speech perception
43.66.Dc Masking
43.71.Ky Speech perception by the hearing impaired

Perceptual weighting of the envelope and fine structure across frequency bands for sentence intelligibility: Effect of interruption at the syllabic-rate and periodic-rate of speech

Daniel Fogerty

J. Acoust. Soc. Am. Volume 130, Issue 1, pp. 489-500 (2011); (12 pages) | Cited 1 time

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Listeners often only have fragments of speech available to understand the intended message due to competing background noise. In order to maximize successful speech recognition, listeners must allocate their perceptual resources to the most informative acoustic properties. The speech signal contains temporally-varying acoustics in the envelope and fine structure that are present across the frequency spectrum. Understanding how listeners perceptually weigh these acoustic properties in different frequency regions during interrupted speech is essential for the design of assistive listening devices. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for interrupted sentence materials. Perceptual weights were obtained during interruption at the syllabic rate (i.e., 4 Hz) and the periodic rate (i.e., 128 Hz) of speech. Potential interruption interactions with fundamental frequency information were investigated by shifting the natural pitch contour higher relative to the interruption rate. The availability of each acoustic property was varied independently by adding noise at different levels. Perceptual weights were determined by correlating a listener’s performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated similar relative weights across the interruption conditions, with emphasis on the envelope in high-frequencies.
Show PACS
43.71.Gv Measures of speech perception (intelligibility and quality)
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.72.Ar Speech analysis and analysis techniques; parametric representation of speech
43.66.Ba Models and theories of auditory processes

Benefit of temporal fine structure to speech perception in noise measured with controlled temporal envelopes

Joanne M. Eaves, A. Quentin Summerfield, and Pádraig T. Kitterick

J. Acoust. Soc. Am. Volume 130, Issue 1, pp. 501-507 (2011); (7 pages) | Cited 2 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Previous studies have assessed the importance of temporal fine structure (TFS) for speech perception in noise by comparing the performance of normal-hearing listeners in two conditions. In one condition, the stimuli have useful information in both their temporal envelopes and their TFS. In the other condition, stimuli are vocoded and contain useful information only in their temporal envelopes. However, these studies have confounded differences in TFS with differences in the temporal envelope. The present study manipulated the analytic signal of stimuli to preserve the temporal envelope between conditions with different TFS. The inclusion of informative TFS improved speech-reception thresholds for sentences presented in steady and modulated noise, demonstrating that there are significant benefits of including informative TFS even when the temporal envelope is controlled. It is likely that the results of previous studies largely reflect the benefits of TFS, rather than uncontrolled effects of changes in the temporal envelope.
Show PACS
43.71.Gv Measures of speech perception (intelligibility and quality)
43.71.Rt Sensory mechanisms in speech perception

Detection thresholds for gaps, overlaps, and no-gap-no-overlaps

Mattias Heldner

J. Acoust. Soc. Am. Volume 130, Issue 1, pp. 508-513 (2011); (6 pages) | Cited 1 time

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Detection thresholds for gaps and overlaps, that is acoustic and perceived silences and stretches of overlapping speech in speaker changes, were determined. Subliminal gaps and overlaps were categorized as no-gap-no-overlaps. The established gap and overlap detection thresholds both corresponded to the duration of a long vowel, or about 120 ms. These detection thresholds are valuable for mapping the perceptual speaker change categories gaps, overlaps, and no-gap-no-overlaps into the acoustic domain. Furthermore, the detection thresholds allow generation and understanding of gaps, overlaps, and no-gap-no-overlaps in human-like spoken dialogue systems.
Show PACS
43.71.Sy Spoken language processing by humans
Close

close