• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

May 1990

Volume 87, Issue S1, pp. S1-S164

back to top
RSS Feeds
back to top Session VV. Speech Communication VIII: Acoustic Cues to Consonant Perception
Contributed Papers
FREE

Measurement of consonant confusions by human listeners in whispered speech (A)

Doug Martin

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S116-S116 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The acoustic differences between whispered and normally phonated speech are large; yet whispered speech is still very discriminable. Since replacing a periodic voicing source with an aperiodic excitation does not significantly affect perception, there exists the possibility of a central processing stage to augment the normal peripheral processing. In the present experiments, the 16 CV syllables used by Miller and Nicely (1955) are whispered by three male and three female talkers. The data are used to collect confusion matrices from listeners who discriminate among 64 sample syllables from each talker. Data are also collected with stimuli involving normal and loud speech, in an attempt to determine what, if any, acoustic cues might discriminate between the consonants across stimulus conditions. The consonant confusions are obtained using a multitalker babble as a masker. Babble is used instead of white noise because it should equally mask all speech stimuli, whereas white noise may mask some stimuli, such as fricatives, more than others. Probabilities of consonant confusions across the voicing dimension for whispered speech will be presented.
FREE

When is a stop aspirated (A)

Leigh Lisker

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S116-S116 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
A simple answer to the question is this: A stop is aspirated when a linguist/phonetician judges it to be so. This property, often referred to in descriptions of the stop consonants of English and many other languages, has been variously defined in physical terms, aerodynamically, articulatorily, and acoustically. Linguists seem generally to agree that in English aspiration is a predictable, i.e., context determined, and hence a nondistinctive attribute of/p t k/in certain contexts. In some other languages, e.g., Hindi, aspiration is a distinctive feature of certain stop phonemes, so that [t] contrasts with [th]. Hindi‐English bilinguals, however, when speaking English, regularly use Hindi /t/ and not/th/, where a “predictably” aspirated English/t/would be expected. For native English speakers, including linguists, this choice seems erroneous. Consistent with this choice by Hindi speakers, is the general opinion of many Hindi‐speaking linguists that, contrary to the opinion of American linguists, English /p t k/are not aspirated. Labeling tests of edited naturally produced nonsense CV syllables yielded data showing that Indian and American linguists differ considerably in locating an inaspirate‐aspirate boundary along the VOT dimension.
FREE

Medial voicing distinctions in English trochees (A)

Arthur S. Abramson, Leigh Lisker, and Laura Koenig

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S116-S116 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Phonetic studies of distinctive consonantal voicing in English have focused on utterance‐initial and final positions. Much less attention has been given to medial stops following stressed and preceding unstressed vowels, as in pairs like tugging/tucking. The high frequency of occurrence of unaspirated voiceless stops in this context is commonly overlooked because of the usual emphasis on the aspiration of/p t k/in stressed syllables. The voiced stops in this context are likely to have glottal pulsing through much or all of the closure, unlike their utterance‐initial counterparts, which may not show glottal pulsing until the moment of release. For this study, a number of such word pairs were recorded in carrier sentences at a normal rate by three native speakers of American English: Three acoustic features were measured: voice timing, closure duration, and amplitude of release burst. Values of the first two features were clearly different for the two categories. With respect to the third feature, burst amplitude, the two categories differed much less clearly. To generate sets of stimuli for perception testing, the three features were covaried incrementally by waveform editing. Labeling data collected from a jury of native American English listeners will be presented to show the relative efficacy of these features for perception.
FREE

The perceptual locus of spectral tilt (A)

Kevin H. Richardson and James R. Sawusch

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S116-S116 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
In a previous study [Richardson and Sawusch, J. Acoust. Soc. Am. Suppl. 1 85, S136 (1989)] using natural, synthetic, and modified voiced stops it was shown that spectral tilt does not play a role in human stop identification. In the present study, subjects were presented with three sets of tone analogs that mimicked the voiced stops [b], [d], and [g] in conjunction with the vowels [ɛ], [al, and [u]. All stimuli were based on the speech tokens described by Richardson and Sawusch (1989). The first set (“natural” tones) was based upon high‐quality, synthetic CV tokens. The second set (“stylized” tones) was also based upon the synthetic CV tokens. In this set, each token with the same vowel (e.g., [ba], [da], and [ga]) was given the same set of parameter values as for the vowel portion of the syllable. The third set (“modified” tones) was based upon the stylized tones and contained identical changes in spectral tilt for each stop place of articulation. Both speech and nonspeech subjects were run in a perceptual grouping task in order to investigate the perceptual locus of any effects of the change in spectral tilt. Results will also be discussed in terms of general human auditory‐phonetic coding ability and the nature of the phonetic representation used in speech perception. [Work supported by NIDCD Grant No. DC00219 to SUNY at Buffalo.]
FREE

Perceptual identification of ambiguous consonant onsets (A)

Michael P. Karnell and Karen L. Landahl

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S116-S117 (1990); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Theories of acoustic invariance propose that stop consonant place identification can be made independently of full consonant‐transition information. Consonant and vowel identification is possible within a few milliseconds of stop release, although the distribution of the acoustic energy in the consonant onset spectrum is frequently inconsistent with the distribution for the following vowel. The presence of oral physiologic impairment may be expected to influence whatever consonant‐vowel interaction may contribute to syllable perception. The purpose of this study was to examine consonant/vowel identification for two glossectomee speakers and one normal control. For the most impaired speaker, removal of poorly identified vowel information enhanced consonant identification. For the normal speaker and the less impaired speaker, full CVs were well identified. Removal of vowel information diminished correct consonant identification for the glossectomee speaker but for the normal speaker affected only those consonants that had spectra consistent with the following vowel. The data show that when onset spectra are ambiguous, the vowel spectrum weights the onset spectrum to influence identification of the intended consonant place of articulation.
FREE

Are F2‐ and F3‐onset frequencies equal cues for place of articulation (A)

Xiao‐Feng Li, Richard E. Pastore, and Jennifer Scheer

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S117-S117 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
This study reevaluated F2‐ and F3‐onset frequencies as cues for place of articulation. Seven levels of F2‐onset frequency, ranging from 600 to 1800 Hz, were factorially combined with eight levels of F3‐onset frequency, ranging from 1400 to 4200 Hz, to create 56 synthetic syllables. Subjects were asked to classify each initial phoneme of the syllable into/ba/, /da/,/ga/, or/others/. A canonical correlation analysis was conducted to recover the relationship between the percentage of each response category and the values of F2‐ and F3‐onset frequencies. The canonical correlation structure shows that F2‐onset frequency has a high negative correlation with the /ba/ category, and a positive correlation with the/da/ category, while both F2‐ and F3‐onset frequencies are moderately correlated with the /ga/ category. This result suggests that F2‐onset frequency is an adequate cue to distinguish /ba/ from /da/. However, both F2‐ and F3‐onset frequencies are interacted to cue /da/ and /ga/ distinction. This result is in contrast to claims that either F2‐ or F3‐onset frequency is a strong cue for three place categories.
FREE

Rate of transition as a cue for place perception in normal and hearing‐impaired listeners (A)

Renée A. E. Zakia

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S117-S117 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Recent research exploring the role of rate of transition in the identification of place of articulation of stop consonants revealed that rate of transition influences place perception when formant frequencies are ambiguous between [da] and [ga] [Zakia and Kingston, J. Acoust. Soc. Am. Suppl. 1 85, S 135 (1989)]. In the experiments reported here, variations in F1, F2, and F3 transitions between values appropriate for [dr] and values appropriate for [gɛ] were combined with variations in rate of transition in 5‐ms steps between 20 and 60 ms. Stimuli were presented to two populations of young listeners, one with normal hearing and the other with mild hearing impairment of postlingual onset, in one‐and two‐interval forced choice identification of place of articulation tasks. It is predicted that normal‐hearing listeners will identify stimuli with longer transition durations as [gɛ] and stimuli with shorter transition durations as [dɛ] at “intermediate” formant patterns (i.e., those which specify neither [dɛ] nor [gɛ]). Performance of hearing‐impaired listeners on the same task is predicted to be poorer overall, as they suffer a decreased ability to identify place of articulation from transition information alone [Q. Summerfield et al., Speech Commun. 4, 213–229 (1985)]. Furthermore, if the poorer temporal resolution characteristic of hearing impairment leads to distortion of the distribution of spectral energy over time, it is predicted that rate of transition will not aid hearing‐impaired subjects in identifying place of articulation or bias their judgment when formant frequencies are ambiguous between [dɛ] and [gɛ]. [Work supported by the International Center for Hearing and Speech Research.]
FREE

Age‐related differences in formant‐transition effects within syllables and across syllable boundaries (A)

Susan Nittrouer

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S117-S117 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Previous work [S. Nittrouer and M. Studdert‐Kennedy, J. Speech Hear. Res. 30, 319–329 (1987)] demonstrated that young children based phonemic judgments of syllable‐initial fricatives on the formant transitions of the following vowel to a greater extent than did older children or adults. In this study, three sets of stimuli were developed to investigate further the effects of vocalic formant transitions on consonant judgments. Two sets consisted of monosyllables, and were used to measure transition effects on a preceding consonantal segment. One set consisted of disyllables, and was used to measure these effects on a following consonantal segment across a syllable boundary. All three sets were presented to adults and to children, 5 and 7 yr of age, who were asked to identify the stimuli. Results for the monosyllables showed that the effects of the vocalic formant transitions on consonant judgments declined with increasing age, while results for the disyllables showed that the effects increased with increasing age. These findings suggest that children at roughly 5 yr of age are most sensitive to whole‐syllable forms, as represented here by within‐syllable formant transitions. Then, as the segment gains independent status for the child, formant transitions come to be treated similarly within syllables and across syllable boundaries.
FREE

Asymmetries in vowel‐fricative and fricative‐vowel information (A)

D. H. Whalen

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S117-S118 (1990); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Vocalic formant transitions have been shown to affect identification of both initial and final fricatives. However, Soli and Mann [J. Acoust. Soc. Am. Suppl. 1 73, S53 (1983)] found that transitions did not contribute as much to final as to initial fricative identification. The present study extends that work by manipulating the digitized waveform of spoken words (“sack,” and “shack,” “Cass,” and “cash”). Fricative continua were created by combining the natural /s/ and/ /ʃ fricative noises in varying proportions. The vocalic segments, including initial or final transitions, were played forward or backward, so both types of transitions occurred in both positions. In each of these conditions, the individual pitch periods were either in their original direction or reversed (to control for the odd voice quality of reversed speech). Preliminary results with adult listeners show that both types of transitions are less influential with final fricatives than with initial fricatives. This seems to indicate that different perceptual strategies are used for different parts of the syllable. [Work supported by NIH Grant No. HD‐01994.]
FREE

Compensation for talker variability and vowel variability in the perception of fricatives (A)

Keith Johnson

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S118-S118 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
In the perception of a fricative continuum from [s] to [ʃ], hearers are influenced both by rounding of a following vowel and by the gender of the talker. The experiment reported in this paper replicates and extends this finding. Subjects identified the initial fricative in CV syllables composed of synthetic fricatives from an [s]‐[ʃ] continuum and natural productions of either [a] or [u]. A male and female talker produced the vowels. Vowel rounding produced a shift in the [s]‐[ʃ] boundary both when syllables containing different vowels were randomly intermixed with each other and when the syllables were blocked by a vowel. However, the influence of talker gender was only present when the two talkers' productions were intermixed with each other. When syllables were blocked by the talker, no boundary shift occurred. Reaction time data had the same pattern for both sets of stimuli. Reaction time was longer when syllables with vowel or speaker differences were randomly intermixed. The boundary shift data suggest that the perceptual differences between talkers are enhanced when tokens are intermixed with each other, and generally that talker characteristics have a less robust effect on perceptual processing than do the properties of contextual segments. [Work supported by NIH Training Grant No. NS‐07134‐11 to Indiana University.]
FREE

The spectrogram as an aid in studying speech intelligibility at low S/N ratios (A)

R. Plomp and J. H. M. van Beck

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S118-S118 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Current research [M. ter Keurs, R. Plomp, and J. M. Festen, Proc. Eur. Conf. Speech Commun. Technol. Paris 1, 251–253 (1989)] has shown that resynthesizing the speech signal from the output of a short‐time fast Fourier transform (FFT) (16‐ms time segments, with overlapping additions, phase relations preserved) results in speech almost indiscriminable from the input signal for spectral‐envelope smearing over a bandwidth up to 1/8 oct. This means that the spectrotemporal energy distribution of a short sentence (2 s, 60–8000 Hz) is faithfully represented by a spectrogram consisting of about 7000 points. This representation can be of help to understand the effects of interfering noise, or a competing voice, by eliminating all points below a certain critical S/N ratio. Resynthesized speech signals without energy for the spectrotemporal elements corresponding to the eliminated points can be used to study the relative contribution of weaker components to speech perception. Examples will be given.
Close

close