• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Apr 2005

Volume 117, Issue 4, pp. 1675-2625

back to top
RSS Feeds

A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners

Koenraad S. Rhebergen and Niek J. Versfeld

J. Acoust. Soc. Am. Volume 117, Issue 4, pp. 2181-2192 (2005); (12 pages) | Cited 20 times

Online Publication Date: 08 Apr 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The SII model in its present form (ANSI S3.5-1997, American National Standards Institute, New York) can accurately describe intelligibility for speech in stationary noise but fails to do so for nonstationary noise maskers. Here, an extension to the SII model is proposed with the aim to predict the speech intelligibility in both stationary and fluctuating noise. The basic principle of the present approach is that both speech and noise signal are partitioned into small time frames. Within each time frame the conventional SII is determined, yielding the speech information available to the listener at that time frame. Next, the SII values of these time frames are averaged, resulting in the SII for that particular condition. Using speech reception threshold (SRT) data from the literature, the extension to the present SII model can give a good account for SRTs in stationary noise, fluctuating speech noise, interrupted noise, and multiple-talker noise. The predictions for sinusoidally intensity modulated (SIM) noise and real speech or speech-like maskers are better than with the original SII model, but are still not accurate. For the latter type of maskers, informational masking may play a role. © 2005 Acoustical Society of America.
Show PACS
43.71.An Models and theories of speech perception
43.66.Ba Models and theories of auditory processes
43.71.Gv Measures of speech perception (intelligibility and quality)
43.72.Kb Speech communication systems and dialogue systems

Perception of pitch location within a speaker’s F0 range

Douglas N. Honorof and D. H. Whalen

J. Acoust. Soc. Am. Volume 117, Issue 4, pp. 2193-2200 (2005); (8 pages) | Cited 12 times

Online Publication Date: 08 Apr 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Fundamental frequency (F0) is used for many purposes in speech, but its linguistic significance is based on its relation to the speaker’s range, not its absolute value. While it may be that listeners can gauge a specific pitch relative to a speaker’s range by recognizing it from experience, whether they can do the same for an unfamiliar voice is an open question. The present experiment explored that question. Twenty native speakers of English (10 male, 10 female) produced the vowel /ɑ/ with a spoken (not sung) voice quality at varying pitches within their own ranges. Listeners then judged, without familiarization or context, where each isolated F0 lay within each speaker’s range. Correlations were high both for the entire range (0.721) and for the range minus the extremes (0.609). Correlations were somewhat higher when the F0s were related to the range of all the speakers, either separated by sex (0.830) or pooled (0.848), but several factors discussed here may help account for this pattern. Regardless, the present data provide strong support for the hypothesis that listeners are able to locate an F0 reliably within a range without external context or prior exposure to a speaker’s voice. © 2005 Acoustical Society of America.
Show PACS
43.71.An Models and theories of speech perception
43.71.Bp Perception of voice and talker characteristics

Perception of aperiodicity in pathological voice

Jody Kreiman and Bruce R. Gerratt

J. Acoust. Soc. Am. Volume 117, Issue 4, pp. 2201-2211 (2005); (11 pages) | Cited 10 times

Online Publication Date: 08 Apr 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Although jitter, shimmer, and noise acoustically characterize all voice signals, their perceptual importance in naturally produced pathological voices has not been established psychoacoustically. To determine the role of these attributes in the perception of vocal quality, listeners were asked to adjust levels of jitter, shimmer, and the noise-to-signal ratio in a speech synthesizer, so that synthetic voices matched naturally produced tokens. Results showed that, although listeners agreed well in their judgments of the noise-to-signal ratio, they did not agree with one another in their chosen settings for jitter and shimmer. Noise-dependent differences in listeners’ ability to detect changes in amounts of jitter and shimmer implicate both listener insensitivity and inability to isolate jitter and shimmer as separate dimensions in the overall pattern of aperiodicity in a voice as causes of this poor agreement. These results suggest that jitter and shimmer are not useful as independent indices of perceived vocal quality, apart from their acoustic contributions to the overall pattern of spectrally shaped noise in a voice. © 2005 Acoustical Society of America.
Show PACS
43.71.Bp Perception of voice and talker characteristics

Consonant recognition and the articulation index

Jont B. Allen

J. Acoust. Soc. Am. Volume 117, Issue 4, pp. 2212-2223 (2005); (12 pages) | Cited 9 times

Online Publication Date: 08 Apr 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The purpose of this paper is to provide insight into how speech is processed by the auditory system, by quantifying the nature of nonsense speech sound confusions. (1) The Miller and Nicely [J. Acoust. Soc. Am. 27(2), 338–352 (1955)] confusion matrix (CM) data are analyzed by plotting the CM elements Si,j(SNR) as a function of the signal-to-noise ratio (SNR). This allows for the robust clustering of perceptual feature (event) groups, not robustly defined by a single CM table, where clusters depend on the sound order. (2) The SNR is then re-expressed as an articulation index (AI), and used as the independent variable. The normalized log scores log(1−Si,i(AI)) and log(Si,j(AI)), ji, then become linear functions of AI, on log-error versus AI plots. This linear dependence may be interpreted as an extension of the band-independence model of Fletcher. (3) The model formula for the average score for the finite-alphabet case Pc(AI,H) = ∑i = 1NSi,i/N is then modified to include the effect of entropy H. Due to the grouping of sounds with increased SNR (and AI), the sound-group entropy Hg plays a key role in this performance measure. (4) A parametric model for the confusions Si,j(AI,Hg) is then described, which characterizes the confusions between competing sounds within a group. © 2005 Acoustical Society of America.
Show PACS
43.71.Gv Measures of speech perception (intelligibility and quality)
43.71.An Models and theories of speech perception
43.72.Ne Automatic speech recognition systems

Coherence and the speech intelligibility index

James M. Kates and Kathryn H. Arehart

J. Acoust. Soc. Am. Volume 117, Issue 4, pp. 2224-2237 (2005); (14 pages) | Cited 10 times

Online Publication Date: 08 Apr 2005

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The speech intelligibility index (SII) (ANSI S3.5-1997) provides a means for estimating speech intelligibility under conditions of additive stationary noise or bandwidth reduction. The SII concept for estimating intelligibility is extended in this paper to include broadband peak-clipping and center-clipping distortion, with the coherence between the input and output signals used to estimate the noise and distortion effects. The speech intelligibility predictions using the new procedure are compared with intelligibility scores obtained from normal-hearing and hearing-impaired subjects for conditions of additive noise and peak-clipping and center-clipping distortion. The most effective procedure divides the speech signal into low-, mid-, and high-level regions, computes the coherence SII separately for the signal segments in each region, and then estimates intelligibility from a weighted combination of the three coherence SII values. © 2005 Acoustical Society of America.
Show PACS
43.71.Gv Measures of speech perception (intelligibility and quality)
43.72.Dv Speech-noise interaction
43.71.Ky Speech perception by the hearing impaired
Close

close