• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue

Dec 2008

Volume 124, Issue 6, pp. 3351-EL365

back to top
RSS Feeds

Speaker normalization using cortical strip maps: A neural model for steady-state vowel categorization

Heather Ames and Stephen Grossberg

J. Acoust. Soc. Am. Volume 124, Issue 6, pp. 3918-3936 (2008); (19 pages) | Cited 1 time

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Auditory signals of speech are speaker dependent, but representations of language meaning are speaker independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by adaptive resonance theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [ Peterson, G. E., and Barney, H.L., J. Acoust. Soc. Am. 24, 175–184 (1952). ] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.
Show PACS
43.71.An Models and theories of speech perception
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.72.Bs Neural networks for speech recognition

The concept of signal-to-noise ratio in the modulation domain and speech intelligibility

Finn Dubbelboer and Tammo Houtgast

J. Acoust. Soc. Am. Volume 124, Issue 6, pp. 3937-3946 (2008); (10 pages) | Cited 6 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
A new concept is proposed that relates to intelligibility of speech in noise. The concept combines traditional estimations of signal-to-noise ratios (S/N) with elements from the modulation transfer function model, which results in the definition of the signal-to-noise ratio in the modulation domain: the (S/N)mod. It is argued that this (S/N)mod, quantifying the strength of speech modulations relative to a floor of spurious modulations arising from the speech-noise interaction, is the key factor in relation to speech intelligibility. It is shown that, by using a specific test signal, the strength of these spurious modulations can be measured, allowing an estimation of the (S/N)mod for various conditions of additive noise, noise suppression, and amplitude compression. By relating these results to intelligibility data for these same conditions, the relevance of the (S/N)mod as the key factor underlying speech intelligibility is clearly illustrated. For instance, it is shown that the commonly observed limited effect of noise suppression on speech intelligibility is correctly “predicted” by the (S/N)mod, whereas traditional measures such as the speech transmission index, considering only the changes in the speech modulations, fall short in this respect. It is argued that (S/N)mod may provide a relevant tool in the design of successful noise-suppression systems.
Show PACS
43.71.An Models and theories of speech perception
43.72.Dv Speech-noise interaction
43.66.Mk Temporal and sequential aspects of hearing; auditory grouping in relation to music

The contribution of obstruent consonants and acoustic landmarks to speech recognition in noise

Ning Li and Philipos C. Loizou

J. Acoust. Soc. Am. Volume 124, Issue 6, pp. 3947-3958 (2008); (12 pages) | Cited 11 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The obstruent consonants (e.g., stops) are more susceptible to noise than vowels, raising the question whether the degradation of speech intelligibility in noise can be attributed, at least partially, to the loss of information carried by obstruent consonants. Experiment 1 assesses the contribution of obstruent consonants to speech recognition in noise by presenting sentences containing clean obstruent consonants but noise-corrupted voiced sounds (e.g., vowels). Results indicated substantial (threefold) improvement in speech recognition, particularly at low signal-to-noise ratio levels (−5 dB). Experiment 2 assessed the importance of providing partial information, within a frequency region, of the obstruent-consonant spectra while leaving the remaining spectral region unaltered (i.e., noise corrupted). Access to the low-frequency (0–1000 Hz) region of the clean obstruent-consonant spectra was found to be sufficient to realize significant improvements in performance and that was attributed to improvement in transmission of voicing information. The outcomes from the two experiments suggest that much of the improvement in performance must be due to the enhanced access to acoustic landmarks, evident in spectral discontinuities signaling the onsets of obstruent consonants. These landmarks, often blurred in noisy conditions, are critically important for understanding speech in noise for better determination of the syllable structure and word boundaries.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.Gv Measures of speech perception (intelligibility and quality)

The relationship between native allophonic experience with vowel duration and perception of the English tense/lax vowel contrast by Spanish and Russian listeners

Maria V. Kondaurova and Alexander L. Francis

J. Acoust. Soc. Am. Volume 124, Issue 6, pp. 3959-3971 (2008); (13 pages) | Cited 2 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Two studies explored the role of native language use of an acoustic cue, vowel duration, in both native and non-native contexts in order to test the hypothesis that non-native listeners’ reliance on vowel duration instead of vowel quality to distinguish the English tense/lax vowel contrast could be explained by the role of duration as a cue in native phonological contrasts. In the first experiment, native Russian, Spanish, and American English listeners identified stimuli from a beat/bit continuum varying in nine perceptually equal spectral and duration steps. English listeners relied predominantly on spectrum, but showed some reliance on duration. Russian and Spanish speakers relied entirely on duration. In the second experiment, three tests examined listeners’ use of vowel duration in native contrasts. Duration was equally important for the perception of lexical stress for all three groups. However, English listeners relied more on duration as a cue to postvocalic consonant voicing than did native Spanish or Russian listeners, and Spanish listeners relied on duration more than did Russian listeners. Results suggest that, although allophonic experience may contribute to cross-language perceptual patterns, other factors such as the application of statistical learning mechanisms and the influence of language-independent psychoacoustic proclivities cannot be ruled out.
Show PACS
43.71.Hw Cross-language perception of speech
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.An Models and theories of speech perception

Recognition of spectrally degraded phonemes by younger, middle-aged, and older normal-hearing listeners

Kara C. Schvartz, Monita Chatterjee, and Sandra Gordon-Salant

J. Acoust. Soc. Am. Volume 124, Issue 6, pp. 3972-3988 (2008); (17 pages) | Cited 2 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The effects of spectral degradation on vowel and consonant recognition abilities were measured in young, middle-aged, and older normal-hearing (NH) listeners. Noise-band vocoding techniques were used to manipulate the number of spectral channels and frequency-to-place alignment, thereby simulating cochlear implant (CI) processing. A brief cognitive test battery was also administered. The performance of younger NH listeners exceeded that of the middle-aged and older listeners, when stimuli were severely distorted (spectrally shifted); the older listeners performed only slightly worse than the middle-aged listeners. Significant intragroup variability was present in the middle-aged and older groups. A hierarchical multiple-regression analysis including data from all three age groups suggested that age was the primary factor related to shifted vowel recognition performance, but verbal memory abilities also contributed significantly to performance. A second regression analysis (within the middle-aged and older groups alone) revealed that verbal memory and speed of processing abilities were better predictors of performance than age alone. The overall results from the current investigation suggested that both chronological age and cognitive capacities contributed to the ability to recognize spectrally degraded phonemes. Such findings have important implications for the counseling and rehabilitation of adult CI recipients.
Show PACS
43.71.Lz Speech perception by the aging
43.66.Ts Auditory prostheses, hearing aids
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.66.Sr Deafness, audiometry, aging effects
Close

close