• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Aug 2008

Volume 124, Issue 2, pp. 689-EL61

back to top
RSS Feeds

Consonant confusions in white noise

Sandeep A. Phatak, Andrew Lovitt, and Jont B. Allen

J. Acoust. Soc. Am. Volume 124, Issue 2, pp. 1220-1233 (2008); (14 pages) | Cited 7 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). [“Consonant and vowel confusions in speech-weighted noise,” J. Acoust. Soc. Am. 121, 2312–2316 ]. The consonant scores in white noise can be categorized in three sets: low-error set {/m/, /n/}, average-error set {/p/, /t/, /k/, /s/, /ʃ/, /d/, /g/, /z/, /ʒ/}, and high-error set {/f/, /θ/, /b/, /v/, /ð/}. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.
Show PACS
43.71.An Models and theories of speech perception
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.66.Dc Masking
43.72.Dv Speech-noise interaction

Cue-specific effects of categorization training on the relative weighting of acoustic cues to consonant voicing in English

Alexander L. Francis, Natalya Kaganovich, and Courtney Driscoll-Huber

J. Acoust. Soc. Am. Volume 124, Issue 2, pp. 1234-1251 (2008); (18 pages) | Cited 3 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
In English, voiced and voiceless syllable-initial stop consonants differ in both fundamental frequency at the onset of voicing (onset F0) and voice onset time (VOT). Although both correlates, alone, can cue the voicing contrast, listeners weight VOT more heavily when both are available. Such differential weighting may arise from differences in the perceptual distance between voicing categories along the VOT versus onset F0 dimensions, or it may arise from a bias to pay more attention to VOT than to onset F0. The present experiment examines listeners’ use of these two cues when classifying stimuli in which perceptual distance was artificially equated along the two dimensions. Listeners were also trained to categorize stimuli based on one cue at the expense of another. Equating perceptual distance eliminated the expected bias toward VOT before training, but successfully learning to base decisions more on VOT and less on onset F0 was easier than vice versa. Perceptual distance along both dimensions increased for both groups after training, but only VOT-trained listeners showed a decrease in Garner interference. Results lend qualified support to an attentional model of phonetic learning in which learning involves strategic redeployment of selective attention across integral acoustic cues.
Show PACS
43.71.An Models and theories of speech perception
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.Rt Sensory mechanisms in speech perception

Coding of intonational meanings beyond F0: Evidence from utterance-final /t/ aspiration in German

Oliver Niebuhr

J. Acoust. Soc. Am. Volume 124, Issue 2, pp. 1252-1263 (2008); (12 pages)

Full Text: Read Online (HTML) | Download PDF

Show Abstract
An acoustic analysis of a German read-speech corpus showed that utterance-final /t/ aspirations differ systematically depending on the accompanying nuclear accent contour. Two contours were included: Terminal-falling early and late F0 peaks in terms of the Kiel Intonation Model. They correspond to H+L*L−% and L*+HL−% within the autosegmental metrical (AM) model. Aspirations in early-peak contexts were characterized by (a) “short”, (b) “high-intensity” noise with (c) “low” frequency values for the spectral energy maximum above the lower spectral energy boundary. The opposite holds for aspirations accompanying late-peak productions. Starting from the acoustic analysis, a perception experiment was performed using a variant of the semantic differential paradigm. The stimuli were varied in the duration and intensity pattern as well as the spectral energy pattern of the final /t/ aspiration. Results revealed that the different noise patterns found in connection with early and late peak productions were able to change the attitudinal meaning of the stimuli toward the meaning profile of the respective F0 peak category. This suggests that final aspirations can be part of the coding of meanings, so far solely associated with intonation contours. Hence, the traditionally separated segmental and suprasegmental coding levels seem to be more intertwined than previously thought.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.70.Fq Acoustical correlates of phonetic segments and suprasegmental properties: stress, timing, and intonation

Consonant identification in noise by native and non-native listeners: Effects of local context

Anne Cutler, Maria Luisa Garcia Lecumberri, and Martin Cooke

J. Acoust. Soc. Am. Volume 124, Issue 2, pp. 1264-1268 (2008); (5 pages) | Cited 6 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Speech recognition in noise is harder in second (L2) than first languages (L1). This could be because noise disrupts speech processing more in L2 than L1, or because L1 listeners recover better though disruption is equivalent. Two similar prior studies produced discrepant results: Equivalent noise effects for L1 and L2 (Dutch) listeners, versus larger effects for L2 (Spanish) than L1. To explain this, the latter experiment was presented to listeners from the former population. Larger noise effects on consonant identification emerged for L2 (Dutch) than L1 listeners, suggesting that task factors rather than L2 population differences underlie the results discrepancy.
Show PACS
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.Hw Cross-language perception of speech
43.71.Sy Spoken language processing by humans

The combined effects of reverberation and nonstationary noise on sentence intelligibility

Erwin L. J. George, Joost M. Festen, and Tammo Houtgast

J. Acoust. Soc. Am. Volume 124, Issue 2, pp. 1269-1277 (2008); (9 pages) | Cited 9 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Listening conditions in everyday life typically include a combination of reverberation and nonstationary background noise. It is well known that sentence intelligibility is adversely affected by these factors. To assess their combined effects, an approach is introduced which combines two methods of predicting speech intelligibility, the extended speech intelligibility index (ESII) and the speech transmission index. First, the effects of reverberation on nonstationary noise (i.e., reduction of masker modulations) and on speech modulations are evaluated separately. Subsequently, the ESII is applied to predict the speech reception threshold (SRT) in the masker with reduced modulations. To validate this approach, SRTs were measured for ten normal-hearing listeners, in various combinations of nonstationary noise and artificially created reverberation. After taking the characteristics of the speech corpus into account, results show that the approach accurately predicts SRTs in nonstationary noise and reverberation for normal-hearing listeners. Furthermore, it is shown that, when reverberation is present, the benefit from masker fluctuations may be substantially reduced.
Show PACS
43.71.Gv Measures of speech perception (intelligibility and quality)
43.71.An Models and theories of speech perception
43.55.Hy Subjective effects in room acoustics, speech in rooms
43.66.Mk Temporal and sequential aspects of hearing; auditory grouping in relation to music

Perception of silent-center syllables by native and non-native English speakers

Catherine L. Rogers and Alexandra S. Lopez

J. Acoust. Soc. Am. Volume 124, Issue 2, pp. 1278-1293 (2008); (16 pages) | Cited 2 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The amount of acoustic information that native and non-native listeners need for syllable identification was investigated by comparing the performance of monolingual English speakers and native Spanish speakers with either an earlier or a later age of immersion in an English-speaking environment. Duration-preserved silent-center syllables retaining 10, 20, 30, or 40 ms of the consonant-vowel and vowel-consonant transitions were created for the target vowels /i, ɪ, eɪ, ε, æ/ and /ɑ/, spoken by two males in /bVb/ context. Duration-neutral syllables were created by editing the silent portion to equate the duration of all vowels. Listeners identified the syllables in a six-alternative forced-choice task. The earlier learners identified the whole-word and 40 ms duration-preserved syllables as accurately as the monolingual listeners, but identified the silent-center syllables significantly less accurately overall. Only the monolingual listener group identified syllables significantly more accurately in the duration-preserved than in the duration-neutral condition, suggesting that the non-native listeners were unable to recover from the syllable disruption sufficiently to access the duration cues in the silent-center syllables. This effect was most pronounced for the later learners, who also showed the most vowel confusions and the greatest decrease in performance from the whole word to the 40 ms transition condition.
Show PACS
43.71.Hw Cross-language perception of speech
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech

The effect of age on auditory spatial attention in conditions of real and simulated spatial separation

Gurjit Singh, M. Kathleen Pichora-Fuller, and Bruce A. Schneider

J. Acoust. Soc. Am. Volume 124, Issue 2, pp. 1294-1305 (2008); (12 pages) | Cited 1 time

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The contributions of auditory and cognitive factors to age-dependent differences in auditory spatial attention were investigated. In conditions of real spatial separation, the target sentence was presented from a central location and competing sentences were presented from left and right locations. In conditions of simulated spatial separation, different apparent spatial locations of the target and competitors were induced using the precedence effect. The identity of the target was cued by a callsign presented either prior to or following each target sentence, and the probability that the target would be presented at the three locations was specified at the beginning of each block. Younger and older adults with normal hearing sensitivity below 4 kHz completed all 16 conditions (2-spatial separation method × 2-callsign conditions × 4-probability conditions). Overall, younger adults performed better than older adults. For both age groups, performance improved with target location certainty, with a priori target cueing, and when location differences were real rather than simulated. For both age groups, the contributions of natural spatial cues were most pronounced when the target occurred at “unlikely” spatial listening locations. This suggests that both age groups benefit similarly from richer acoustical cues and a priori information in difficult listening environments.
Show PACS
43.71.Lz Speech perception by the aging
43.66.Pn Binaural hearing
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.66.Qp Localization of sound sources
Close

close