• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Nov 2009

Volume 126, Issue 5, pp. EL107-2839

back to top
RSS Feeds

Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model

Tim Jürgens and Thomas Brand

J. Acoust. Soc. Am. Volume 126, Issue 5, pp. 2635-2648 (2009); (14 pages) | Cited 5 times

Online Publication Date: 05 Nov 2009

Full Text: Read Online (HTML) | Download PDF

Show Abstract
This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. “Microscopic” is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human’s auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703–1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model’s a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.
Show PACS
43.71.An Models and theories of speech perception
43.66.Ba Models and theories of auditory processes
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.72.Dv Speech-noise interaction

Perceptual learning of time-compressed and natural fast speech

Patti Adank and Esther Janse

J. Acoust. Soc. Am. Volume 126, Issue 5, pp. 2649-2659 (2009); (11 pages) | Cited 2 times

Online Publication Date: 05 Nov 2009

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Speakers vary their speech rate considerably during a conversation, and listeners are able to quickly adapt to these variations in speech rate. Adaptation to fast speech rates is usually measured using artificially time-compressed speech. This study examined adaptation to two types of fast speech: artificially time-compressed speech and natural fast speech. Listeners performed a speeded sentence verification task on three series of sentences: normal-speed sentences, time-compressed sentences, and natural fast sentences. Listeners were divided into two groups to evaluate the possibility of transfer of learning between the time-compressed and natural fast conditions. The first group verified the natural fast before the time-compressed sentences, while the second verified the time-compressed before the natural fast sentences. The results showed transfer of learning when the time-compressed sentences preceded the natural fast sentences, but not when natural fast sentences preceded the time-compressed sentences. The results are discussed in the framework of theories on perceptual learning. Second, listeners show adaptation to the natural fast sentences, but performance for this type of fast speech does not improve to the level of time-compressed sentences.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.Bp Perception of voice and talker characteristics
43.71.Gv Measures of speech perception (intelligibility and quality)

Perceptual adaptation and intelligibility of multiple talkers for two types of degraded speech

Tessa Bent, Adam Buchwald, and David B. Pisoni

J. Acoust. Soc. Am. Volume 126, Issue 5, pp. 2660-2669 (2009); (10 pages) | Cited 5 times

Online Publication Date: 05 Nov 2009

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Talker intelligibility and perceptual adaptation under cochlear implant (CI)-simulation and speech in multi-talker babble were compared. The stimuli consisted of 100 sentences produced by 20 native English talkers. The sentences were processed to simulate listening with an eight-channel CI or were mixed with multi-talker babble. Stimuli were presented to 400 listeners in a sentence transcription task (200 listeners in each condition). Perceptual adaptation was measured for each talker by comparing intelligibility in the first 20 sentences of the experiment to intelligibility in the last 20 sentences. Perceptual adaptation patterns were also compared across the two degradation conditions by comparing performance in blocks of ten sentences. The most intelligible talkers under CI-simulation also tended to be the most intelligible talkers in multi-talker babble. Furthermore, listeners demonstrated a greater degree of perceptual adaptation in the CI-simulation condition compared to the multi-talker babble condition although the extent of adaptation varied widely across talkers. Listeners reached asymptote later in the experiment in the CI-simulation condition compared with the multi-talker babble condition. Overall, these two forms of degradation did not differ in their effect on talker intelligibility, although they did result in differences in the amount and time-course of perceptual adaptation.
Show PACS
43.71.Gv Measures of speech perception (intelligibility and quality)

On the assimilation-discrimination relationship in American English adults’ French vowel learning

Erika S. Levy

J. Acoust. Soc. Am. Volume 126, Issue 5, pp. 2670-2682 (2009); (13 pages) | Cited 5 times

Online Publication Date: 05 Nov 2009

Full Text: Read Online (HTML) | Download PDF

Show Abstract
A quantitative “cross-language assimilation overlap” method for testing predictions of the Perceptual Assimilation Model (PAM) was implemented to compare results of a discrimination experiment with the listeners’ previously reported assimilation data. The experiment examined discrimination of Parisian French (PF) front rounded vowels /y/ and /œ/. Three groups of American English listeners differing in their French experience (no experience [NoExp], formal experience [ModExp], and extensive formal-plus-immersion experience [HiExp]) performed discrimination of PF /y-u/, /y-o/, /œ-o/, /œ-u/, /y-i/, /y-ɛ/, /œ-ɛ/, /œ-i/, /y-œ/, /u-i/, and /a-ɛ/. Vowels were in bilabial /rabVp/ and alveolar /radVt/ contexts. More errors were found for PF front vs back rounded vowel pairs (16%) than for PF front unrounded vs rounded pairs (2%). Overall, ModExp listeners did not perform more accurately (11% errors) than NoExp listeners (13% errors). Extensive immersion experience, however, was associated with fewer errors (3%) than formal experience alone, although discrimination of PF /y-u/ remained relatively poor (12% errors) for HiExp listeners. More errors occurred on pairs involving front vs back rounded vowels in alveolar context (20% errors) than in bilabial (11% errors). Significant correlations were revealed between listeners’ assimilation overlap scores and their discrimination errors, suggesting that the PAM may be extended to second-language (L2) vowel learning.
Show PACS
43.71.Hw Cross-language perception of speech
43.71.An Models and theories of speech perception
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.Gv Measures of speech perception (intelligibility and quality)

Consonant recognition loss in hearing impaired listeners

Sandeep A. Phatak, Yang-soo Yoon, David M. Gooler, and Jont B. Allen

J. Acoust. Soc. Am. Volume 126, Issue 5, pp. 2683-2694 (2009); (12 pages) | Cited 5 times

Online Publication Date: 05 Nov 2009

Full Text: Read Online (HTML) | Download PDF

Show Abstract
This paper presents a compact graphical method for comparing the performance of individual hearing impaired (HI) listeners with that of an average normal hearing (NH) listener on a consonant-by-consonant basis. This representation, named the consonant loss profile (CLP), characterizes the effect of a listener’s hearing loss on each consonant over a range of performance. The CLP shows that the consonant loss, which is the signal-to-noise ratio (SNR) difference at equal NH and HI scores, is consonant-dependent and varies with the score. This variation in the consonant loss reveals that hearing loss renders some consonants unintelligible, while it reduces noise-robustness of some other consonants. The conventional SNR-loss metric ΔSNR50, defined as the SNR difference at 50% recognition score, is insufficient to capture this variation. The ΔSNR50 value is on average 12 dB lower when measured with sentences using standard clinical procedures than when measured with nonsense syllables. A listener with symmetric hearing loss may not have identical CLPs for both ears. Some consonant confusions by HI listeners are influenced by the high-frequency hearing loss even at a presentation level as high as 85 dB sound pressure level.
Show PACS
43.71.Ky Speech perception by the hearing impaired
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
Close

close