• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Aug 2005

Volume 118, Issue 2, pp. 555-1220

back to top
RSS Feeds

Speaker recognition with temporal cues in acoustic and electric hearing

Michael Vongphoe and Fan-Gang Zeng

J. Acoust. Soc. Am. Volume 118, Issue 2, pp. 1055-1061 (2005); (7 pages) | Cited 20 times

Online Publication Date: 04 Aug 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Natural spoken language processing includes not only speech recognition but also identification of the speaker’s gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
Show PACS
43.71.Bp Perception of voice and talker characteristics
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.72.Fx Talker identification and adaptation algorithms
43.66.Fe Discrimination: intensity and frequency

Evaluating models of vowel perception

Michelle R. Molis

J. Acoust. Soc. Am. Volume 118, Issue 2, pp. 1062-1071 (2005); (10 pages) | Cited 6 times

Online Publication Date: 04 Aug 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
There is a long-standing debate concerning the efficacy of formant-based versus whole spectrum models of vowel perception. Categorization data for a set of synthetic steady-state vowels were used to evaluate both types of models. The models tested included various combinations of formant frequencies and amplitudes, principal components derived from excitation patterns, and perceptually scaled LPC cepstral coefficients. The stimuli were 54 five-formant synthesized vowels that had a common F1 frequency and varied orthogonally in F2 and F3 frequency. Twelve speakers of American English categorized the stimuli as the vowels ∕ɪ∕, ∕ʊ∕, or ∕ɝ∕. Results indicate that formant frequencies provided the best account of the data only if nonlinear terms, in the form of squares and cross products of the formant values, were also included in the analysis. The excitation pattern principal components also produced reasonably accurate fits to the data. Although a wish to use the lowest-dimensional representation would dictate that formant frequencies are the most appropriate vowel description, the relative success of richer, more flexible, and more neurophysiologically plausible whole spectrum representations suggests that they may be preferred for understanding human vowel perception.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.An Models and theories of speech perception

Age-related differences in weighting and masking of two cues to word-final stop voicing in noise

Susan Nittrouer

J. Acoust. Soc. Am. Volume 118, Issue 2, pp. 1072-1088 (2005); (17 pages) | Cited 5 times

Online Publication Date: 04 Aug 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Because laboratory studies are conducted in optimal listening conditions, often with highly stylized stimuli that attenuate or eliminate some naturally occurring cues, results may have constrained applicability to the “real world.” Such studies show that English-speaking adults weight vocalic duration greatly and formant offsets slightly in voicing decisions for word-final obstruents. Using more natural stimuli, Nittrouer [J. Acoust. Soc. Am. 115, 1777–1790 (2004)] found different results, raising questions about what would happen if experimental conditions were even more like the real world. In this study noise was used to simulate the real world. Edited natural words with voiced and voiceless final stops were presented in quiet and noise to adults and children (4 to 8 years) for labeling. Hypotheses tested were (1) Adults (and perhaps older children) would weight vocalic duration more in noise than in quiet; (2) Previously reported age-related differences in cue weighting might not be found in this real-world simulation; and (3) Children would experience greater masking than adults. Results showed: (1) no increase for any age listeners in the weighting of vocalic duration in noise; (2) age-related differences in the weighting of cues in both quiet and noise; and (3) masking effects for all listeners, but more so for children than adults.
Show PACS
43.71.Ft Development of speech perception
43.71.An Models and theories of speech perception

Decline of speech understanding and auditory thresholds in the elderly

Pierre L. Divenyi, Philip B. Stark, and Kara M. Haupt

J. Acoust. Soc. Am. Volume 118, Issue 2, pp. 1089-1100 (2005); (12 pages) | Cited 14 times

Online Publication Date: 04 Aug 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
A group of 29 elderly subjects between 60.0 and 83.7 years of age at the beginning of the study, and whose hearing loss was not greater than moderate, was tested twice, an average of 5.27 years apart. The tests measured pure-tone thresholds, word recognition in quiet, and understanding of speech with various types of distortion (low-pass filtering, time compression) or interference (single speaker, babble noise, reverberation). Performance declined consistently and significantly between the two testing phases. In addition, the variability of speech understanding measures increased significantly between testing phases, though the variability of audiometric measurements did not. A right-ear superiority was observed but this lateral asymmetry did not increase between testing phases. Comparison of the elderly subjects with a group of young subjects with normal hearing shows that the decline of speech understanding measures accelerated significantly relative to the decline in audiometric measures in the seventh to ninth decades of life. On the assumption that speech understanding depends linearly on age and audiometric variables, there is evidence that this linear relationship changes with age, suggesting that not only the accuracy but also the nature of speech understanding evolves with age.
Show PACS
43.71.Lz Speech perception by the aging
43.71.Qr Neurophysiology of speech perception
43.71.Ky Speech perception by the hearing impaired
43.71.An Models and theories of speech perception

Spectral peak resolution and speech recognition in quiet: Normal hearing, hearing impaired, and cochlear implant listeners

Belinda A. Henry, Christopher W. Turner, and Amy Behrens

J. Acoust. Soc. Am. Volume 118, Issue 2, pp. 1111-1121 (2005); (11 pages) | Cited 41 times

Online Publication Date: 04 Aug 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Spectral peak resolution was investigated in normal hearing (NH), hearing impaired (HI), and cochlear implant (CI) listeners. The task involved discriminating between two rippled noise stimuli in which the frequency positions of the log-spaced peaks and valleys were interchanged. The ripple spacing was varied adaptively from 0.13 to 11.31 ripples/octave, and the minimum ripple spacing at which a reversal in peak and trough positions could be detected was determined as the spectral peak resolution threshold for each listener. Spectral peak resolution was best, on average, in NH listeners, poorest in CI listeners, and intermediate for HI listeners. There was a significant relationship between spectral peak resolution and both vowel and consonant recognition in quiet across the three listener groups. The results indicate that the degree of spectral peak resolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 1–2 ripples/octave may result in highly degraded speech recognition. These results suggest that efforts to improve spectral peak resolution for HI and CI users may lead to improved speech recognition.
Show PACS
43.71.Ky Speech perception by the hearing impaired
43.66.Ts Auditory prostheses, hearing aids
43.66.Sr Deafness, audiometry, aging effects
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech

Using auditory-visual speech to probe the basis of noise-impaired consonant–vowel perception in dyslexia and auditory neuropathy

Joshua Ramirez and Virginia Mann

J. Acoust. Soc. Am. Volume 118, Issue 2, pp. 1122-1133 (2005); (12 pages) | Cited 5 times

Online Publication Date: 04 Aug 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant–vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.
Show PACS
43.71.-k Speech perception
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.Ky Speech perception by the hearing impaired
43.71.Rt Sensory mechanisms in speech perception
Close

close