• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Apr 1991

Volume 89, Issue 4B, pp. 1851-2015

back to top
RSS Feeds
back to top Session 8SP: Speech Communication: Consonant and Vowel Perception
Contributed Papers
FREE

Thresholds for formant‐frequency discrimination of vowels in consonantal context (A)

Diane Kewley‐Port and Charles S. Watson

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1996-1996 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
The discrimination thresholds for shifts in formant frequency were shown to be in the range of 1%–2%, in a recent report to this society [Kewley‐Port, J. Acoust. Soc. Am. Suppl. 1 87, S159 (1990)]. Thresholds for F1 and F2 obtained from well‐trained subjects listening to vowels under minimal stimulus uncertainty were a factor of 3 lower than earlier estimates. The present experiment extends that study to examine the effects of placing a vowel in a consonantal context. The vowel /l/ was synthesized in CVC syllables for the consonants /b/, /d/, /g/, /z/, /m/, and /l/. For F1=450 Hz, the threshold, ΔF, was the same for isolated /l/ as the ΔF averaged over all CVC contexts, about 12 Hz. For F2=2300 Hz, ΔFwas significantly larger (45 Hz) for the vowel in the average CVC context than in isolation (25 Hz). Thresholds for individual CVC's were significantly different from the threshold for isolated /l/, in about one‐half of the cases examined. These differences are discussed in terms of the extent of the formant transitions and the durations of the steady‐state vowel formants. [Research supported by NIH and AFOSR.]
FREE

Are articulations integrated in the perception of vowel height? (A)

John Kingston

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1996-1996 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Using the Garner paraidgm [W. Garner (1974)], Kingston [Phonetica (in press)] demonstrated that the acoustic effects of differences in velum height (the frequency separation of the nasal pole and zero =nasalization) and rate of vocal fold vibration (fundamental frequency) which covary with tongue height in vowels are integrated perceptually with the acoustic effect of that articulation (first formant frequency), perhaps because they exaggerate the perceptual value of the latter articulation. The failure to separate perceptually the acoustic effects of these three articulations challenges the claim of direct realists [e.g., C. Fowler, J. Phonet. 14, 3–28 (1986)] that articulatory gestures are the objects of speech perception, but in only a limited way, since the stimuli were brief and simple enough that they may not have allowed listeners to attribute these acoustic effects appropriately to their articulatory sources. Experiments are currently in progress to test whether similar perceptual integration occurs even when other aspects of the stimuli would allow the acoustic effects to be attributed to coarticulation [R. Krakow et al., J. Acoust. Soc. Am. 83, 1146–1158 (1988)], e.g., is nasalization integrated with first formant frequency in nasal as well as oral consonant contexts? Integration will also be tested more rigorously than in the earlier work, using precepts of signal detection theory.
FREE

Assessing the role of FO in vowel perception via linear logistic modeling (A)

T. M. Nearey and J. E. Andruski

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1996-1996 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Logistic models provide powerful tools in evaluation of stimulus‐response relationships in speech perception [T. Nearey, J. Phonet. 18, 347–373 (1990)]. Excellent fits result when such models are applied to vowel perception data. These models allow insight into the possible normalizing role of F0 in vowel perception. If (as in many current perceptual accounts) the role of FO is restricted to an additive formant normalization factor (for some nonlinear transform of the frequency axis), then optimized logistic models using formant frequencies and the fundamental as predictors should show certain specific patterns of correlation among certain estimated parameters across vowel categories. Preliminary results from the analysis of data in these laboratories indicate that this is indeed the case. Furthermore, FO normalization appears to occur nearly independently in “head” and “tail” sections of “hybrid” syllables, where, e.g., the ending portion of a syllable from a male speaker is spliced (following a period of silence) to the beginning portion from a female. [Work supported by SSHRC.]
FREE

An approach to the classification of American English diphthongs (A)

Michael Gottfried

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1996-1997 (1991); (2 pages)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Six diphthongs of American English (/au, aɪ, eɪ, o, ɔɪ, ju/) were produced by four midwestern American speakers (two male, two female) at two tempos (slow, fast) with differing stress (stressed, unstressed) in two contexts ([b_d], [h_d]). Using a plot of the fundamental frequency and the first three formants derived from linear‐prediction‐coding (LPC) analysis, the onset and offset of each production was determined. The pattern of formants and fundamental frequency at the onset and offset of diphthongs was used to establish a set of parameters that can classify intended productions of the American English diphthongs in varying stress and tempo conditions with an average accuracy of 93%. Results are also presented for diphthong, target‐syllable, and sentence durations. The classification results are discussed with respect to hypotheses concerning the perception of diphthongs.
FREE

Auditory‐perceptual interpretation of vowels and diphthongs: A progress report (A)

James D. Miller

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1997-1997 (1991); (1 page) | Cited 1 time

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
The auditory‐perceptual interpretation of vowels [Miller, J. Acoust. Soc. Am. 85, 2114–2134 (1989)] will be reviewed briefly and a similar approach to diphthongs, based on spectral glides, will be presented. The relations between acoustic descriptions in the auditory‐perceptual space and articulatory descriptions will be outlined. Also, issues relating to the roles of formants versus spectral patterns and the precise meaning of target zones will be presented. The distinction between target zones for vowels produced as steady states as opposed to target zones for vowels produced as spectral glides will be emphasized. Preliminary criteria that may serve to distinguish these will be mentioned. Finally, recent data, which appear to be consistent with the auditory‐perceptual approach, will be presented.
FREE

“Correction” in the perception of filtered vowels (A)

Elizabeth E. Shriberg and John J. Ohala

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1997-1997 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Two studies examined the effect of cues to channel characteristics on listeners' perception of low‐pass‐filtered (1000‐Hz) vowels. In experiment 1, 85‐ms steady‐state portions of 11 English vowels excised from digitized natural speech were preceded by a sentence. Following an ulfiltered sentence, filtered front vowels were largely perceived as back vowels; however, following a filtered sentence, the effect was reduced, and front vowels were “corrected” at high rates (χ2= 108.45, p < 0.001). In experiment 2, the sentence was eliminated and a “masker” was added to the filter‐reject region of the stimuli; again, a striking increase in front‐to‐back confusions occurred when [vowels were filtered, and a decrease in these errors occurred when high‐frequency noise was added to the filtered vowels (χ2 = 63.74, p < 0.001 ). in both experiments, a small but stable increase in back‐to‐front errors in conditions containing cues to filtering was also observed. Results suggest that in these conditions, but not in those lacking cues, listeners deter‐ mined which filtered vowels were actual front vowels.
FREE

Different discrimination strategies for vowels and consonants (A)

Bert Schouten and Arjan van Hessen

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1997-1997 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
In speech perception research, one is often interested in the relationship between phoneme identification and discrimination of stimuli drawn from the same stimulus continuum. If discrimination performance is completely predictable from identification, perception is often said to be completely categorical: In both tasks, subjects (can) only use phoneme labels. How should one compare a one‐interval identification task with a two‐interval forced‐choice discrimination task in which subjects have to determine the order of the stimuli? Using standard SDT assumptions about optimal processing [D. M. Green and J. A. Swets, Signal Detection Theory and Psychophysics (New York, 1974)], it was found that identification and discrimination of natural vowels were nearly equivalent, but with natural stop consonants identification d′ was, paradoxically, twice as high as discrimination d′. It was concluded that subjects probably use a different strategy for the discrimination of natural stops: They do not subtract the traces of the two stimuli, but the estimated distances between the stimuli and the phoneme prototypes. Such an assumption yields d′ values that are twice as high as the standard values.
FREE

The nonlinear dynamics of categorical perception (A)

Betty Tuller, J. A. Scott Kelso, Pamela Case, and Mingzhou Ding

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1997-1997 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Much research on speech perception over the years has focused on uncovering examples of the nonlinear relationship between acoustics and perception (so‐called “categorical perception”). However, little is known concerning the dynamics of this phenomenon. In a variation of the classical categorical perception paradigm, the present experiment explored gradual increases or decreases in a single acoustic parameter. The resulting patterns of perceptual change showed rich dynamics, including hysteresis, “anticipation,” a single boundary, and the progression from hysteresis to anticipation over multiple trials. A dynamical system that could account for these perceptual patterns was investigated by specifying a potential function that corresponds to the layout of phonetic (attractor) states, and how that layout alters as the acoustic parameter changes. The model reproduces the observed features of the experimental data, and makes further predictions about perception, currently being tested. [Work supported by NIDCD and NIMH.]
FREE

The role of multiple acoustic properties in specifying the internal structure of phonetic categories (A)

Philip Hodgson and Joanne L. Miller

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1997-1998 (1991); (2 pages)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
The extent to which a phonetic segment may be specified by more than one acoustic property was investigated using the trading relation between vowel duration and closure duration, two properties known to play a role in specifying the voicing contrast for intervocalic bilabial stops. Specifically asked was whether a change in preceding vowel duration results in a comprehensive remapping between closure duration and phonetic category, or whether trading effects are confined to the category boundary. Two series of disyllables were created ranging from /aba/ through /apa/ to ∗ /apa/ (an exaggerated /p/) having initial vowel durations of 153 and 250 ms, respectively. Closure duration in each series varied from 20 to 400 ms. A preliminary experiment revealed a standard trading relation, in that the /b/‐/p/category boundary was located at a longer closure duration for the stimuli with a long, compared to a short, preceding vowel. In the main experiment, listeners were asked to judge each disyllable in each series for the goodness of its consonant as a member of the /p/ category. For both series the /p/ category was perceived as having internal structure, with a limited range of stimuli being judged as the best exemplars. Furthermore, the range of best exemplars for the long vowel series was displaced relative to that for the short vowel series toward longer values of closure duration. These findings indicate that the acoustic properties in question trade against each other not only at the phonetic category boundary but also within the category. This results in a comprehensive remapping of phonetic category structure similar to that observed in earlier research for changes in speaking rate. [Work supported by NIH.]
FREE

Influence of a syllable's form on the perceived internal structure of voicing categories (A)

Lydia E. Volaitis and Joanne L. Miller

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1998-1998 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
The role of syllable structure on voice‐onset time (VOT) was examined by comparing VOT values in consonant‐vowel (CV) and consonant‐vowel‐consonant (CVC) syllables, across a range of speaking rates. In a production study, when CV and CVC syllables were equated for overall duration, VOT values were found to be consistently shorter for the CVC than the CV syllables. Furthermore, when syllables were equated for CV duration, VOT values for CV and CVC syllables tended to be equal, suggesting that speakers were producing VOT values with regard to the syllable's CV duration. In a subsequent perception study, listeners adjusted for these changes in VOT by altering three aspects of category structure in relation to the syllable's CV duration, and not to its overall duration — the location of the voiced — voiceless category boundary, the upper limit of the voiceless category, and the range of “good” exemplars that lies between those two boundaries. These findings support the notion that listeners perceptually restructure their phonetic categories so as to accommodate changes in VOT that occur in production as a result of the syllable's phonological context, as”well as its speaking rate. [Work supported by NIH.]
FREE

Independence of scene analys's and the speech module (A)

D. H. Whalen and Alvin M. Liberman

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1998-1998 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Even the first example of the duplex effect [Rand, J. Acoust. Soc. Am. 55, 678–680 (1974)] gives evidence that speech perception can bring together portions of the speech signal that scene analysis says are separate: Although a formant transition on one ear sounds like a nonspeech “chirp,” its speech information is used by the syllable on the other ear. The present work explores the competition between these two organizations of the signal by extending previous work [Whalen and Liberman, Science 237, 169–171 (1987)] in which F3 formant sinusoids provided speech information both below and above the intensity level at which the nonspeech aspect of the tone was just perceptible. As an extension, alternative “scenes” were made plausible for the transitional tones by (1) extending them 50 ms before the syllable; (2) extending them 50 ms after the syllable; (3) creating two harmonies based on the tone; and (4) creating four harmonics based on half the frequency of the tone. Identification of the speech sounds was always above chance, indicating the success of the speech module. However, the harmonic series (below the detection level) and the precursor tone (below and above the detection level) reduced the accuracy of the speech judgments, suggesting competition with scene analysis. Scene analysis and speech perception, though independent, are often in competition for the same signal, and speech takes precedence in a wide array of circumstances. [Work supported by NIH Grant HD‐01994.]
FREE

Effects of lexical status on perceptual organization in duplex perception (A)

Lynne C. Nygaard

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1998-1998 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Duplex perception occurs when a synthetic syllable is split so the third‐formant transition is presented to one ear and the rest of the syllable (the base) is presented to the other ear. Listeners report hearing two distinct percepts — a complete syllable in the ear with the base and a nonspeech chirp in the ear with the transition. A modification of this duplex phenomenon can be created by presenting a third‐formant transition in isolation to one ear and the same transition electronically mixed with the base to the other ear. In this case, the transition information fuses to form a chirp percept in the center of the head and the syllable in the other ear becomes clearer than the one produced with the standard duplex procedure. Nygaard [J. Acoust. Soc. Am. Suppl. 1 87, S71 (1990)] found that when the spectral composition or onset frequency of the isolated transition was varied relative to the complete syllable base, both phonetic integration and nonphonetic fusion remained remarkably intact even with large differences in spectral composition between components. In a series of experiments, the lexical status of the syllable base was varied to determine the effect of lexical information on perceptual organization of acoustic components that differ in spectral composition. It was found that the lexical status of the eventual phonetic percept influenced the phonetic integration of acoustic components into syllable percepts, but had no effect on fusion of the third‐formant transitions to create a centered chirp percept. These results suggest that lexical information contributes to the perceptual grouping of acoustic components into phonetic percepts.
FREE

An investigation of locus equations as a source of relational invariance for stop place categorization (A)

Harvey M. Sussman

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1998-1998 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Locus equations were investigated as a potential higher‐order metric capable of illustrating relational invariance for place of articulation in voiced initial stop consonants independently of vowel context. Locus equations are straight‐line regression fits to data points formed by plotting onsets of F2 transitions along the y axis and their corresponding midvowel nuclei along the x axis. Twenty subjects, 10 male and 10 female, produced /b/v/t/, /d/v/t/, and /g/v/t/ tokens for 10 vowel contexts. Each CVC token was repeated in a carrier phrase five times yielding 150 tokens per subject. Formant measures were obtained using the MacSpeech Lab II speech analysis system. Locus equation scatter‐plots revealed extremely tight clustering of points around the regression line that were consistent across speakers and gender. Derived slope and y‐intercept parameters were significantly different across stop place categories. The relative value of F2onset as it linearly changes in relation to the coarticulatorily produced vowel reflects an acoustic correlate of relational invariance for stop place. A discriminant analysis using F2onset & vowel as predictors showed 82%, 78%, and 67% classification rates for labial, alveolar, and velar place. Using derived slope and y‐intercept values as predictors led to 100% classification into stop place categories. A neurobiologically oriented perspective on the invariance issue is explored and a brain‐based recognition algorithm for stop place integrating burst and F2 cues is offered. [Work supported by NSF.]
FREE

The role of transition duration in the perception of speech and nonspeech analogs of place of articulation (A)

Reneacute;e A. E. Zakin and Richard N. Aslin

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1999-1999 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
A previous report [R. Zakia, J. Acoust. Soc. Am. Suppl. 1 87, S117 (1990)] demonstrated that at a formant frequency pattern ambiguous between alveolar and velar, subjects identify synthetic speech stimuli with longer transition durations as velars and stimuli with shorter transition durations as alveolars. These results suggest the operation of either (1) an articulatory property characteristic of velars (longer transition durations distinguish velars because of their slower articulatory release), or (2) differential perceptual sensitivity to transition duration for spectral patterns characteristic of velars. To evaluate these possibilities, nonspeech analogs of a formant pattern ambiguous between an alveolar and a velar were generated with transition durations ranging between 20 and 50 ms in 5‐ms steps. In contrast to the previous identification task, the nonspeech stimuli were presented in a same‐different discrimination task. For comparison purposes, this same‐different task was also conducted with the speech stimuli. Transition duration was an effective cue to discrimination of both speech and nonspeech stimuli, suggesting that the link to articulatory mechanisms is not essential. However, discrimination performance for the speech stimuli was generally poorer than for the nonspeech stimuli, suggesting that the link to articulatory mechanisms is not essential. However, discrimination performance for the speech stimuli was generally poorer than for the nonspeech stimuli, suggesting either that within‐category judgments are more difficult to make for broadband signals or that perception of the speech signals as phonetic segments interferes with auditory discrimination.
FREE

The duration and perception of English epenthetic and underlying stops (A)

Sook‐hyang Lee

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 1999-1999 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
In American English, an intrusive stop occurs before the fricative in words such as tense and false, making them very much like words with underlying stops, such as tents and faults. Ohala (1975) treats the inserted stop as an artifact of universal physiological or aerodynamic constraints. But this approach cannot account for the fact that South African English speakers do not insert the stop between sonorant and fricative clusters (Fourakis and Port, 1986). Another approach posits a language‐ or dialect‐specific phonological rule which inserts a phonological segment (Zwicky, 1972). Fourakis and Port (1986) argue against this approach on the grounds that in some pairs the intrusive stop is significantly shorter than the underlying one (although the difference is always very small). This paper presents perception data and duration measurements supporting Zwicky's approach. Phrases with intrusive and underlying stops (intense and in tents, respectively) in citation forms produced by three speakers of midwestern dialects were presented over earphones in random order for subjects to identify. Identification was very poor, just at chance level. Also, duration measurements of the silence gap between the /n/ and /s/ in these words show no significant difference, contrary to Fourakis and Port's findings. Moreover, token judgments in the perception experiment show very poor correlation with the durations except for one speaker, implying that whatever duration differences there are might not be a crucial cue that listeners exploit for labeling the words with epenthetic and underlying stops.
Close

close