• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Nov 1981

Volume 70, Issue S1, pp. S1-S109

back to top
RSS Feeds
back to top Session RR. Speech Communication VI: Perception of Vowels; Theories of Perception
Contributed Papers
FREE

Identifying vowels in CVC syllables: Effects of inserting silence and noise (A)

Randy L. Diehl and Ellen M. Parker

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S96-S96 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Listeners were asked to identify natural vowels in /d_d/ context under various deletion conditions. Deletion intervals ranged from 60% to 90% of the syllable duration (measured from initial burst to final closure onset) and were centered about the midpoint. The deleted syllable portions were replaced with either silence or broadband noise. In one experiment, the three syllable types used were /did/, /ded/, and /dud/; in the other experiment, the three syllable types were /dΙd/, /dεd/, and /dΛd/. To ensure that identification performance was based solely on spectral information, tokens were selected such that, within an experiment, average syllable duration was approximately equal across types. Identification performance in the 60% and 70% deletion conditions was not substantially worse than for full syllables [cf., W. Strange, J. J. Jenkins, and T. R. Edman, J. Acoust. Soc. Am. Suppl. 1 61, S39 (1977)]. Even the 90% deletion conditions yielded performance well above chance, indicating that significant vowel information is contained in the first and last 10 or 15 ms of the syllable. Results will be discussed in terms of residual acoustic cues.
FREE

Fundamental frequency and vowel perception (A)

John H. Ryalls and Philip Lieberman

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S96-S96 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
The set of nine nondiphthong vowels of American English were synthesized using the averaged male values from Peterson and Barney (1952). These vowels were produced in three conditions of the fundamental frequency: (1) average 135 Hz, (2) low 100 Hz, and (3) high 250 Hz. In forced‐choice testing, subjects identified vowels in the average and low condition of the f0 with greater accuracy than the high f0 vowels. A second experiment was conducted using the female formant frequency values from Peterson and Barney and the same conditions of f0. Subjects still performed better on the low and average condition of the f0 for these vowels. These results suggest that the human formant frequency extractor is aided by the more detailed transfer function of a lower fundamental. The pattern of vowel errors was not the same across f0 conditions nor was it the same across male or female formant values. The point vowels [i], [u], and to a lesser degree [a] were the most consistently identified across conditions of the fundamental.
FREE

The role of spectral distribution and decay time in the auditory perception of duration in speech and nonspeech stimuli (A)

Vincent J. van Heuven and Marcel P. R. van den Broecke

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S96-S96 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
In two experiments subjects reproduced the steady state duration (varying between 100 and 450 ms) of various signal types (sine, white noise, sawtooth, synthesized vowel /a/) with abrupt (10 ms) or gradual (50 ms) decay portions. Results indicate that (1)jnd ranges from 20% for short reference durations to less than 10% for longer reference values, (2) jnd is smaller for gradually decaying signals, (3) the duration of abruptly terminating signals is perceived as somewhat longer than that of signals with smooth decays, (4) narrow‐band signals are perceived as shorter than wideband signals, and (5) the time order error is a function of both spectral distribution and signal decay characteristic. Implications of this study are that (1) duration discrimination for speechlike signals is more accurate than has hitherto been assumed, and (2) the temporal organization of the acoustic cues to the fortis‐lenis contrast in postvocalic obstruents is optimally suited to counteract forward masking effects of the vowel on the following noise burst.
FREE

Recognition of cross‐modal correspondences between the visual and auditory products of articulation in early infancy: Report on a method and preliminary data (A)

Patricia K. Kuhl and Andrew N. Meltzoff

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S96-S97 (1981); (2 pages)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
We have developed an experimental procedure for studying the recognition of cross‐modal correspondence between the visual and auditory products of articulation in early infancy. In a series of experiments, infants were presented with filmed displays of two faces, one producing the articulatory movements corresponding to the vowel [a] and the other producing the movements corresponding to the vowel [i]. One of the sound tracks (either [a] or [i]) was played in synchrony with the faces. The infants' visual fixations to the two films were scored by an observer who could not hear the sound track presented to the infant nor see the faces. We hypothesized that an infant's visual preference for one of the two faces would be influenced by the auditory signal that was presented. The temporal synchrony of the visual and auditory stimuli could be adjusted, allowing us to separate the infant's ability to recognize (i) the structural correspondence between a visual and an auditory event from (ii) the temporal synchrony between a visual and an auditory event. Preliminary data on infants under six months of age will be presented. [Work supported by NSF.]
FREE

Effects of glottal waveform on the perception of talker sex (A)

Thomas D. Carrell

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S97-S97 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Glottal volume velocities produced by male and female talkers have been shown to differ from one another in a systematic fashion [R. B. Monsen and A.M. Engebretson, J. Acoust. Soc. Am. 62, 981–993 (1977)] due, primarily, to the relative sizes of the vocal folds. The present study examined the perceptual importance of these differences with an identification task. Subjects were presented with continua of synthetic vowel stimuli, each ranging from male to female in terms of their formant characteristics (based on Fant's k factor) [G. A. Fant, Speech Sounds and Features, 84–93 (1973)]. Half of the stimuli were synthesized with a relatively symmetric source waveform, similar to that of the glottal volume velocity of a female, and the other half were synthesized with a more asymmetric source waveform, similar to that of a male. The subjects' task was simply to indicate for each stimulus whether it sounded as if it were produced by a male or a female talker. The perceptual crossover from male to female along the k‐factor continuum occurred at lower values for the stimuli synthesized with femalelike voicing and at higher values for stimuli with malelike voicing. These results demonstrate that glottal source information plays an important role in the perception of the talkers sex. Moreover, the findings open the possibility that the glottal waveform may be an important factor in vowel normalization as well. [Supported by NINCDS grant NS‐12179.]
FREE

Assessment of individual phonetic discrimination performance using factor analysis (A)

Kurt P. Kitselman and Pierre L. Divenyi

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S97-S97 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
The purpose of this study was to determine the usefulness of Q‐type factor analysis [F. N. Kerlinger, Foundations of Behavioral Research (Holt, Rinehart, & Winston, New York, 1973)] for obtaining an objective measurement of individual phonetic discrimination performance. Five conditions of phonetic contrasts were presented to ten aphasic and ten control subjects in a standard AX discrimination paradigm. The first analysis of group differences utilized the average discrimination profiles which were compared between groups within each stimulus conditions using the rank correlation procedure applied under similar conditions by Liberman et al. [J. Exp. Psychol. 61, 379–388 (1961)]. The weakness of this method is that individual discrimination performances are ignored. This problem may be alleviated through the use of Q‐type factor analysis. The results of this analysis consisted of a loading value on the principle factor for each subject within each stimulus condition. Only the subjects with high loading exhibited discrimination profiles with the expected peaks of discrimination. Thus the loading value was interpreted as a measure of phonetic discrimination performance. Analysis of variance using the loading values produced the same results obtained in the earlier analysis, but factor analysis offered the advantage of producing a single, objective measure of discrimination performance for individual subjects, a measure that can then be compared with other behavioral measures.
FREE

An interactive activation model of speech perception (A)

Jeffrey L. Elman and James L. McClelland

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S97-S97 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
We describe a model of speech perception [based on the Interactive Activation Model of Visual Word Perception (cf. McClelland and Rumelhart, in press; Rumelhart and McClelland, in press)] in which excitatory and inhibitory interactions among nodes for phonetic features, phonemes, and words are used to account for aspects of the interaction of bottom‐up and top‐down processes in perception of speech. Results from a working computer simulation of this model are presented. Input to the program consists of specifications of distinctive features of speech as they unfold in time. Features, phonemes, and words consistent with the input are activated, missing specifications may be filled in, and slight errors may be corrected so that the “percept” formed by the simulation exhibits such phenomena as phonemic restoration and related perceptual effects. [Work supported by NSF.]
FREE

Confidence ratings in discrimination of prosodic features of unintelligible sentences (A)

C. L. Farrar

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S97-S97 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Three types of sentence materials were presented to test perception of prosodic features. The sentences differed in grammatical phrase structure, major constituent boundary (MCB), or contrastive stress. Subjects made forced‐choice responses to the prosodic features and assigned a confidence rating estimating of the probability of being correct. Sentences were presented under three experimental conditions: audition alone (AUD), vision alone (VIS), and audiovisual (AV). The acoustic signal was low‐pass noise, amplitude modulated by the sentence materials. Irrespective of correctness, highest ratings were assigned to responses under the AV condition, next highest to VIS, and lowest to AUD conditions. The relationship of accuracy to rating assignment varied with sentence material and experimental condition. For the test of grammatical phrase structure (5 AFC) low ratings were assigned to correct responses under AUD conditions, suggesting that these were correct guesses. In locating the MCB (16 AFC), rating assigned to correct responses were not greatly different from ratings assigned to error responses. If subjects use ratings as instructed, the probability of obtaining a correct response would correspond to the ranges of certainty for each rating category, independent of experimental condition. Under AUD and VIS conditions, subjects over‐ or underestimated the probability of being correct. Under AV conditions, subjects were most successful in estimating accuracy with confidence ratings. [Work done while author was at University of Virginia.]
FREE

Some acoustical correlates of speech‐rate perception (A)

Ronald N. Bond and Stanley Feldstein

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S97-S98 (1981); (2 pages)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
A previous study by the authors found that vocal frequency and intensity both influenced the perception of speech rate, and proposed three alternative explanations of the results. They argued that the findings were a function of: a methodological limitation; the exception that if one speech characteristic changes, the others change in a similar direction; or experience with producing and hearing covariation among pitch, loudness, and speech rate in ordinary speech. The present study tested the hypothesis that such covariation does occur in the perception of speech, and assessed the plausibility of the three explanations. Three levels of frequency and intensity were factorially varied within each of three different speech rates, using a 20‐s speech segment played backwards to produce the stimuli. Each stimulus was compared with a standard stimulus in terms of four scales—perceived speech rate, pitch, loudness, and duration—by 21 males and 40 females. ANOVAs indicated that frequency and intensity positively influenced the perception of speech rate, pitch, and loudness, and that frequency negatively affected the perception of duration. The results suggest the third explanation above as the most viable.
FREE

Speeded classification of natural and synthetic speech in a lexical decision task (A)

D. B. Pisoni

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S98-S98 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
How fast can human observers recognize isolated words? How is the process of word recognition affected by the quality of the initial acoustic‐phonetic input in the speech signal? In this paper we report the results of a lexical decision experiment which examined the response times for classification of natural and synthetic word and nonword stimuli. The results showed an overall increase in response time of 145 ms for synthetic stimuli compared to natural speech. In addition, words were recognized 140 ms faster than matched nonword controls. However, there was no interaction between signal type (natural versus synthetic) and classification response (word versus nonword). These results suggest that differences in perception between natural and synthetic speech lie at early stages of perceptual analysis in which the initial phonetic or segmental representation of the input signal is developed rather than at later stages of lexical access and search where these representations are examined or compared prior to execution of the observer's classification response. The findings will be discussed in terms of the extensive use of synthetic speech stimuli in perceptual experiments and its application in voice response systems used in applied settings. [Supported by NINCDS grant NS‐12179.]
FREE

Capacity demands in short‐term memory for synthetic and natural word lists (A)

T. C. Feustel, P. A. Luce, and D. B. Pisoni

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S98-S98 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Two experiments comparing recall for synthetic and natural lists of monosyllabic words were conducted to determine the locus of previously demonstrated perceptual difficulties for synthetic speech produced by rule [D. B. Pisoni and S. Hunnicutt, ICASSP, 572–575 (1980)]. If the perception of synthetic speech demands increased processing capacity in early encoding stages, recall differences between synthetic and natural speech should arise when short‐term memory is differentially stressed. In the first experiment three presentation rates (1, 2, and 5 s per word) were used to manipulate the demands placed on short‐term memory. Although recall was consistently poorer for the synthetic lists at all presentation rates, the decrement for synthetic stimuli did not increase with faster rates. A similar pattern of results was obtained in the second experiment in which strings of digits of varying length (0, 3, and 6 digits) were presented visually for retention prior to each spoken word list. However, the recall of the digits was considerably worse for the 6‐digit list relative to the 3‐digit list when the following word lists were synthetic. These results indicate that at least some of the difficulties observed in the perception and comprehension of continuous synthetic speech are due to increased processing demands in short‐term memory. [Supported by grants from NIMH and NINCDS.]
FREE

Misarticulating children's perception of the VOT contrast (A)

Ray Daniloff, Paul Hoffman, Peter Alfonso, and Gordon Schuckers

J. Acoust. Soc. Am. Volume 70, Issue S1, pp. S98-S98 (1981); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Seven multiple‐phoneme misarticulating children (ages 5.6 to 6.6) and seven age‐matched normally articulating children labeled a seven‐step VOT continuum (/pi/ to /bi/) via a picture pointing task. The normal children identified the first three stimuli as /b/ at greater than chance level (p < 0.01), stimulus 4 was identified at chance level (54% /b/) and stimuli 5–7 were identified as /p/ (p < 0.01). The misarticulating children also identified stimuli 1 through 3 as /b/ at greater than chance level (p < 0.05). However, their phoneme boundary included both stimuli 4 and 5 (55% and 40% /b/). Stimuli 6 and 7 were identified as /p/ (75% and 74% /p/) at greater than chance level. Results suggest that multiply misarticulating children may misperceive the VOT contrast even when voicing is not among their production errors.
Close

close