• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

May 2012

Volume 131, Issue 5, pp. EL355-4232

back to top
RSS Feeds

Syllable structure and integration of voicing and manner of articulation information in labial consonant identification

Noah H. Silbert

J. Acoust. Soc. Am. Volume 131, Issue 5, pp. 4076-4086 (2012); (11 pages) | Cited 2 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Speech perception requires the integration of information from multiple phonetic and phonological dimensions. A sizable literature exists on the relationships between multiple phonetic dimensions and single phonological dimensions (e.g., spectral and temporal cues to stop consonant voicing). A much smaller body of work addresses relationships between phonological dimensions, and much of this has focused on sequences of phones. However, strong assumptions about the relevant set of acoustic cues and/or the (in)dependence between dimensions limit previous findings in important ways. Recent methodological developments in the general recognition theory framework enable tests of a number of these assumptions and provide a more complete model of distinct perceptual and decisional processes in speech sound identification. A hierarchical Bayesian Gaussian general recognition theory model was fit to data from two experiments investigating identification of English labial stop and fricative consonants in onset (syllable initial) and coda (syllable final) position. The results underscore the importance of distinguishing between conceptually distinct processing levels and indicate that, for individual subjects and at the group level, integration of phonological information is partially independent with respect to perception and that patterns of independence and interaction vary with syllable position.
Show PACS
43.71.An Models and theories of speech perception
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.Sy Spoken language processing by humans

Objective evaluation of speech signal quality by the prediction of multiple foreground diagnostic acceptability measure attributes

Deep Sen and W. Lu

J. Acoust. Soc. Am. Volume 131, Issue 5, pp. 4087-4103 (2012); (17 pages)

Full Text: Read Online (HTML) | Download PDF

Show Abstract
A methodology is described to objectively diagnose the quality of speech signals by predicting the perceptual detectability of a selected set of distortions. The distortions are a statistically selected subset of the broad number of distortions used in diagnostic acceptability measure (DAM) testing. The justification for such a methodology is established from the analysis of a set of speech signals representing a broad set of distortions and their respective DAM scores. At the heart of the ability to isolate and diagnose the perceptibility of the individual distortions is a physiologically motivated cochlear model. The philosophy and methodology is thus distinct from traditional objective measures that are typically designed to predict mean opinion scores (MOS) using well versed functional psychoacoustic models. Even so, a weighted sum of these objectively predicted set of distortions is able to predict accurate and robust MOS scores, even when the reference speech signals have been subject to the Lombard effect.
Show PACS
43.71.Gv Measures of speech perception (intelligibility and quality)
43.71.Bp Perception of voice and talker characteristics
43.71.An Models and theories of speech perception

Contributions of cochlea-scaled entropy and consonant-vowel boundaries to prediction of speech intelligibility in noise

Fei Chen and Philipos C. Loizou

J. Acoust. Soc. Am. Volume 131, Issue 5, pp. 4104-4113 (2012); (10 pages) | Cited 2 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Recent evidence suggests that spectral change, as measured by cochlea-scaled entropy (CSE), predicts speech intelligibility better than the information carried by vowels or consonants in sentences. Motivated by this finding, the present study investigates whether intelligibility indices implemented to include segments marked with significant spectral change better predict speech intelligibility in noise than measures that include all phonetic segments paying no attention to vowels/consonants or spectral change. The prediction of two intelligibility measures [normalized covariance measure (NCM), coherence-based speech intelligibility index (CSII)] is investigated using three sentence-segmentation methods: relative root-mean-square (RMS) levels, CSE, and traditional phonetic segmentation of obstruents and sonorants. While the CSE method makes no distinction between spectral changes occurring within vowels/consonants, the RMS-level segmentation method places more emphasis on the vowel-consonant boundaries wherein the spectral change is often most prominent, and perhaps most robust, in the presence of noise. Higher correlation with intelligibility scores was obtained when including sentence segments containing a large number of consonant-vowel boundaries than when including segments with highest entropy or segments based on obstruent/sonorant classification. These data suggest that in the context of intelligibility measures the type of spectral change captured by the measure is important.
Show PACS
43.71.Gv Measures of speech perception (intelligibility and quality)
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech

Effects of spectral smearing on the identification of speech in noise filtered into low- and mid-frequency regions

Agnès C. Léger, Brian C. J. Moore, Dan Gnansia, and Christian Lorenzi

J. Acoust. Soc. Am. Volume 131, Issue 5, pp. 4114-4123 (2012); (10 pages)

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Léger et al. [J. Acoust. Soc. Am. 131, 1502–1514 (2012)] reported deficits in the identification of consonants in noise by hearing-impaired listeners using stimuli filtered into low- or mid-frequency regions in which audiometric thresholds were normal or near-normal. The deficits could not be fully explained in terms of reduced audibility or temporal-envelope processing. However, previous studies indicate that the listeners may have had reduced frequency selectivity, with auditory filters broadened by a factor of about 1.3, despite having normal or near-normal audiometric thresholds in the tested regions. The present study aimed to determine whether the speech-perception deficits could be explained by such a small reduction of frequency selectivity. Consonant identification was measured for normal-hearing listeners in quiet and in unmodulated and modulated noises using the same method as Léger et al. The signal-to-noise ratio was set to −3 dB for the masked conditions. Various amounts of reduced frequency selectivity were simulated using a spectral-smearing algorithm. Performance was reduced only for spectral-smearing factors greater than 1.7. For all conditions, identification scores for hearing-impaired listeners could not be explained by a mild reduction of frequency selectivity.
Show PACS
43.71.Gv Measures of speech perception (intelligibility and quality)
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.66.Mk Temporal and sequential aspects of hearing; auditory grouping in relation to music

Fast recognition of musical sounds based on timbre

Trevor R. Agus, Clara Suied, Simon J. Thorpe, and Daniel Pressnitzer

J. Acoust. Soc. Am. Volume 131, Issue 5, pp. 4124-4133 (2012); (10 pages) | Cited 1 time

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Human listeners seem to have an impressive ability to recognize a wide variety of natural sounds. However, there is surprisingly little quantitative evidence to characterize this fundamental ability. Here the speed and accuracy of musical-sound recognition were measured psychophysically with a rich but acoustically balanced stimulus set. The set comprised recordings of notes from musical instruments and sung vowels. In a first experiment, reaction times were collected for three target categories: voice, percussion, and strings. In a go/no-go task, listeners reacted as quickly as possible to members of a target category while withholding responses to distractors (a diverse set of musical instruments). Results showed near-perfect accuracy and fast reaction times, particularly for voices. In a second experiment, voices were recognized among strings and vice-versa. Again, reaction times to voices were faster. In a third experiment, auditory chimeras were created to retain only spectral or temporal features of the voice. Chimeras were recognized accurately, but not as quickly as natural voices. Altogether, the data suggest rapid and accurate neural mechanisms for musical-sound recognition based on selectivity to complex spectro-temporal signatures of sound sources.
Show PACS
43.71.Qr Neurophysiology of speech perception
43.66.Jh Timbre, timbre in musical acoustics
43.64.Sj Neural responses to speech
43.72.Qr Auditory synthesis and recognition
Close

close