• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

May 1984

Volume 75, Issue S1, pp. S1-S93

back to top
RSS Feeds
back to top Session SS. Speech Communication IX: Vowel Perception
Contributed Papers
FREE

Perception of vowel quality in the presence of other sound sources (A)

Christopher J. Darwin

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S85-S85 (1984); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
If the energy of a harmonic (500 Hz) near the first formant of a vowel is increased, the vowel quality changes. But if the tone corresponding to this increase in energy starts or stops at a different time from the original vowel, then listeners can perceive the original vowel quality [Darwin, Attention and Performance, Vol. X (Erlbaum, 1983)] even though the spectrum present during the vowel is inappropriate. Listeners apparently partition the total sound present into separate percepts on the basis of onset‐ and offset‐time differences. The present experiments show that the same partitioning still occurs when the “original” vowel has reduced energy at 500 Hz. Now adding extra energy to it restores a normal vowel spectrum. Listeners report a vowel quality corresponding to the “original” depleted vowel when the additional energy starts or stops at a different time from the rest of the vowel, even though the spectrum present during the vowel is appropriate to a vowel with a normal spectral envelope. The partitioning of sound sources observed here cannot be explained by a tendency to perceive vowels with conventional spectral envelopes. [Work supported by SERC.]
FREE

Multidimensional scaling and perceptual features: Reflections of stimulus processing or long‐term memory prototypes? (A)

Robert Allen Fox

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S85-S85 (1984); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Many researchers have obtained information on phonetic perception by using various multidimensional scaling (MDS) procedures to discover underlying perceptual dimensions. Such dimensions are usually characterized in terms of the acoustic characteristics of the stimuli and have been interpreted as representing factors used to identify vowel (or consonant) quality in phonetic perception. However, several studies suggest that such dimensions may reflect properties of the vowel's long‐term memory prototype rather than the actual acoustic nature of the stimuli. For example, MDS studies commonly find no features reflecting dynamic acoustic information [e.g., R. Fox, Lang. Speech 26, 21–60 (1983)] and one study [B. Rakerd, J. Acout. Soc. Am. Suppl. 1 73, S54 (1983)] found that similar perceptual features were extracted when Ss either heard or imagined remembered) stimulus tokens. To investigate this issue an experiment was designed to determine the extent to which slight acoustic variation in a subset of the synthetic vowels presented to Ss for scaling would produce differences in perceived perceptual distance estimates. Results suggest that Ss are sensitive to relatively small acoustic differences while making dyadic comparisons even when there is no concomitant phonemic quality variation or specific instructions expect such variations. It will be argued that such results suggest that MDS procedures do tap lower‐level perceptual processes and do not merely reflect long‐term memory prototypes.
FREE

The influence of postvocalic consonants on the duration of Spanish vowels (A)

Maria Ignacia Massone and Ana Maria Borzone de Manrique

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S85-S85 (1984); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
The present work was undertaken in order to provide further evidence about the role played by the physiological and phonological factors in vowel lengthening before voiced consonants. Previous experiments compared cross‐linguistically different consonantal contexts and syllabic types, but, with respect to Spanish disregarded neutralization in final syllable position. Two male adults recorded nonsense bisyllabic words and two other speakers recorded meaningful frequent words where the opposition voiced/voiceless was spontaneously neutralized. Results showed that in open and closed syllables of nonsense utterances (average ratios: 1.4 and 1.23, respectively) vowels are shorter before voiceless consonants than before voiced. In meaningful words, vowels presented longer duration before voiced realizations (20%) of lengthening. However, for nonsense utterances, higher values were obtained and we can assume that this difference is indicating some phonological effect. The lengthening value observed in meaningful words seems to represent more closely the physiological effect.
FREE

Vowel information is integrated across intervening nonlinguistic sounds (A)

D. H. Whalen and Arthur G. Samuel

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S85-S86 (1984); (2 pages)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
When the fricative noise of a fricative‐vowel syllable is replaced by a noise from a different vocalic context, listeners experience delays in identifying both the fricative and the vowel (D. H. Whalen, Perception & Psychophysics, 1984). Listeners (unconsciously) detect a mismatch between the vowel information in the fricative noise and in the vocalic segment. In the current experiment, noises and vowels were again cross‐spliced, but, in addition, the first 60 ms of the vocalic segment either had a nonlinguistic noise added to it or was replaced by that noise. The fricative noise and the majority of the vocalic segment were left intact, and both were quite identifiable. Mismatches of vowel information caused delays for all stimuli, both originals and ones with the noise. Additionally, syllables with a portion replaced by noise took longer to identify than those that had the noise added to them. The results indicate that listeners integrate all relevant information even across a nonlinguistic noise. Similarly, having the signal present along with the noise delayed identifications less than replacing the signal completely. [Work supported by NIH.]
FREE

On reconciling monophthongal vowel percepts and continuously varying F patterns (A)

Leigh Lisker

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S86-S86 (1984); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
When a sequence of pictures is presented in rapid succession the illusion of continuous movement can be created. A continuously varying acoustic signal may, contrariwise, be perceived as a sequence of “still” sounds. Not only is speech perceived as discrete sounds in sequence, but speakers will oblige, especially in the case of stressed vowels, by “citing” them in the form of steady state phonations judged to match auditorily the vowels in their natural contexts. These steady state imitations are adequately characterized by just two numbers, the frequencies of the two lowest vocal‐tract resonances. Acoustic analyses of a number of tokens of the English nonsense forms [bεb dεd gεg bæb dæd gæg] produced by a single talker indicate that, if each token is represented by a single pair of formant frequencies, there is a pattern of variation rather different from the cross‐talker variations reported by Peterson and Barney in 1951. Moreover the variation patterns are different within syllable types, for the same vowel across contexts, for the same contexts across vowels, and for the two formants. It is, moreover, not that simple to apply the target‐plus‐under‐shoot model to explain the patterns of variation observed. [The support of the NICHHD is gratefully acknowledged.]
FREE

Vowel recognition in the absence of formant cues: Dynamic contributions to perception (A)

H. Timothy Bunnell

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S86-S86 (1984); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
The primary acoustic cue to vowel identity has typically been related to the center frequencies of the first two or three vowel formants. In experiments using specially degraded stimuli, results consistent with this assumption were obtained for stationary synthetic vowels, but not for vowels presented in syllabic contexts (e.g., /VwV/ and /ərVd/). For these experiments, vowel stimuli (both stationary and in context) were synthesized with high‐frequency square waves substituted for the standard synthesizer voicing source (Klatt, 1980 cascade synthesizer). Spectra associated with these stimuli have peaks, due to the source harmonics, located at odd multiples of the fundamental frequency. In identification studies using the vowels [ /i/, /ɪ/, /ε/, /æ/, /a/ ], listeners tended to identify stationary vowels on the basis of the frequencies of their largest‐magnitude spectral peaks. Since such peaks were invariably coincident with source harmonics, identification errors were frequent. Errors were, however, restricted almost entirely to the vowels [/ɪ/, /ε/, /æ/]. By contrast, vowels in syllabic context were reliably more resistent to identification errors, despite nearly identical acoustic structure throughout the medial portion of the vowel. Results are interpreted to suggest that listeners are well attuned to the contrast between source and vocal tract contributions to the signal. In order to separate these two contributions, however, listeners appeared to need information about changes over time in at least one function (the vocal tract function for these stimuli). [Work supported by NINCDS and the University of Maryland.]
FREE

Dynamic specification of vowels in CVC syllables spoken in sentence context (A)

Winifred Strange

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S86-S86 (1984); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
In previous studies of citation‐form /b/‐V‐/b/ syllables, vowel identification accuracy of “silent‐center” syllables (in which steady states were attentuated to silence) was not significantly worse than for unmodified control syllables [W. Strange et al., J. Acoust. Soc. Am. 74, 695–705 (1983)]. New studies were performed in which vowels were produced in /b/‐/b/, /d/‐/d/, and /d/‐/t/ syllables, spoken in a carrier sentence, “Say the word ‐‐‐‐ again.” Silent‐center syllables, in which all but the first three and last four pitch periods were attenuated to silence, were identified with 85% accuracy, as compared to 97% for control syllables. However, vowel identification was still much more accurate than when either the initial transitions or final transitions were presented alone (54% and 37% accuracy, respectively). Performance on silent‐center syllables was not significantly better when subjects were presented the stimuli blocked according to consonantal context (87%). Neutralizing syllable duration differences decreased accuracy somewhat (78%), especially for short vowels. Additional identification studies are underway in which portions of syllables containing different consonants were interchanged, in order to test hypotheses about trajectories specified by initial and final transitions. [Work supported by NIMH.]
FREE

Vowel perception in reverberation (A)

Anna K. Nabelek and Tomasz Letowski

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S86-S86 (1984); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Because identification of vowels was not affected by reverberation, possible changes in perception were assessed in a paired comparison paradigm. Fifteen English vowels and diphthongs recorded with and without reverberation (T = 1.2 s) were paired with each other. Ten normal‐hearing subjects made similarity judgments using a scale from 1 to 7. Response matrices for the two conditions were analyzed using multidimensional scaling procedures. A three‐dimensional solution was sought for the data on the basis of the experimental conditions and results of previous studies. Three‐ and two‐way analyses were performed using a ALSCAL subroutine. Both with and without reverberation, the first two dimensions were identified as back‐front and low‐high, confirming the results of previous studies. The third dimension was identified as long‐short while others interpreted it as tensness, openness, or left it unidentified. In both three‐ and two‐way analyses, the shifts in stimulus configuration between test and retest and between reverberant and nonreverberant solutions were of the same order. Therefore, we concluded that reverberation did not contribute significantly to the perceptual distances among vowels and diphtongs. [Supported by NIH.]
FREE

The “center of gravity” and perceived vowel height (A)

Patrice Specter Beddot and Sarah Hawkins

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S86-S87 (1984); (2 pages)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
In oral vowels, perceived height is determined by the “center of gravity” of the spectral prominence in the vicinity of F1 rather than by F1 peak frequency [Chistovich and Lublinskaya, Hear. Res. 1, 185–95 (1979)]. The present study of nasal vowels assessed the generality of the center of gravity effect. Five nasal vowels,/ɪ̄ ē ǣ ā ō/, were synthesized. For each nasal vowel, a continuum of corresponding oral vowels was synthesized by manipulating the frequency of F1. Each continuum included one stimulus whose F1 frequency matched that of the nasal vowel and one whose centroid (a measure of center of gravity) matched. The five vowel sets consisted of oral‐nasal vowel pairs; 20 listeners selected the “best‐match” pair for each set. Subjects chose the F1 match for /ɪ̄/ only; for nonhigh vowels, choices fell between F1 and centroid matches, but significantly closer to the centroid. Apparently center of gravity influences perception of nasal vowel height, but the centroid as a measure of this needs refinement. Whether the centroid is an appropriate measure of perceived oral vowel height, or whether another metric applicable to both oral and nasal vowels can be found, is currently under investigation. [Work supported by NIH.]
FREE

The role of nasalization in the perception of synthesized speech (A)

John C. Thomas, Jonas N. A. Nartey, Mary Beth Rosson, and Judy Klavans

J. Acoust. Soc. Am. Volume 75, Issue S1, pp. S87-S87 (1984); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Thus far synthesized speech has been reported by phonetically naive listeners as sounding “metallic” or “as if the speaker had a cold.” One of the explanations for the above is that the simulation of nasal coupling is not close enough to natural speech. We report on research on diphone synthesized utterances (Dixon and Maxey, 1968). In order to simulate a better set of nasalized vowels in synthesized speech, a new set of diphones was created to replace the original ones in those cases in which there is a nasal consonant in the immediate vicinity. These new diphones were the result of hand‐painting single nasal formants in addition to the regular formant frequencies found in the original diphones. Utterances made of both the new and old sets of diphones were played to phonetically naive listeners. Results on naturalness judgments and on intelligibility will be presented.
Close

close