• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

May 1990

Volume 87, Issue S1, pp. S1-S164

back to top
RSS Feeds
back to top Session MMM. Speech Communication XI: Vowel and Formant Perception and Psychoacoustics
Contributed Papers
FREE

Effect of attentional demands and auditory memory degradation on vowel discrimination (A)

Robert Allen Fox

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S158-S158 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
This study explores the effects of attentional demands in a vowel discrimination test when auditory memory has been degraded by irrelevant intervening nonspeech stimuli. The experiment utilized a same—different (AX) discrimination task with both within‐ and between‐category pairs from an [i‐I‐ɛ] continuum. The interstimulus interval was 2000 ms in duration and was filled either with silence or 1, 2, or 3 short noise bursts. Listeners were required to make pairwise discriminations under two different attend conditions: (1) They either ignored any possible noise burst in the ISI and made only same/different judgments or (2) they made the same/different judgments and then indicated how many noise bursts occurred. Neither the noise condition nor the attend condition produced significant difference on the within‐category discriminations. Increasing the number of noise bursts affected the listeners' ability to make between‐category discriminations but there were no significant attend condition effects. Possible differences between younger and older adults will be addressed in terms of short‐memory availability and/or attentional effects in speech discrimination. [Supported by Grant 1 R01 AG08353‐01 from the National Institute on Aging.]
FREE

Psychoacoustic evidence for a contextual effect model (A)

Masato Akagi

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S158-S159 (1990); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
In previous work toward speech recognition [M. Akagi, J. Acoust. Soc. Am. Suppl. 1 85, S86 (1989)], a model was developed that predicted target formants in reduced vowels based on the interaction between spectral peak pairs. To substantiate this model, two psychoacoustic experiments were carried out that measured the amount of phoneme boundary shift with (1) a single‐formant stimulus as a preceding anchor and (2) a vowel as a preceding anchor. The results of the first experiment were compared with the spectral peak interaction obtained from real speech data using the model. This comparison showed that the perceptual boundary shift with a single‐formant anchor is similar to the spectral peak interaction analyzed by the model. Thus the neutralization recovery model is formulated as the sum of the contextual effects resulting from interaction between spectral peaks. Additionally, the comparison of these results with those of the second experiment showed that the phoneme boundary shift with a vowel anchor can be postulated as the sum of the shift with the single‐formant anchor and a feedback factor from the perceived preceding anchor.
FREE

The role of the critical band in learning vowel categorizations (A)

Raymond S. Weitzman

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S159-S159 (1990); (1 page) | Cited 1 time

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The role of the critical band in learning to categorize steady‐state, synthesized vowels was examined in a series of concept formation experiments. Five groups of subjects were given the task of learning to assign sets of vowels, whose only difference was in the frequency of F2, to one of two possible categories, and were then tested to see how well they learned them. The sets of vowels in each experiment differed from each other on average by 0.4, 0.6, 0.8, 1.0, or 1.2 Bark. In the learning phase of the experiment, it was found that the error rate in categorizing vowels that differed by 1.0 and 1.2 Bark was significantly lower than the error rate in categorizing vowels that differed by 0.4, 0.6, and 0.8 Bark. Furthermore, the test phase of the experiment showed that over 60% of the subjects learned the two categories when they differed by 1.0 and 1.2 Bark, while less than 23% learned the two categories when they differed by 0.4, 0.6, and 0.8 Bark. The implications of these findings will be discussed.
FREE

Thresholds for formant‐frequency discrimination in isolated vowels (A)

Diane Kewley‐Port

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S159-S159 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The threshold of discrimination (or DL) of a shift in formant frequency is thought to be in the range of 3%‐6%, based on the oft‐replicated results of Flanagan [J. Acoust. Soc. Am. 27, 613‐617 (1955)]. This range may not, however, represent the limits of resolution in the auditory system because, for most experiments, subjects were not trained and several pairs of stimuli were presented within trial blocks. In present experiments, thresholds were obtained from well‐trained subjects listening to vowels under minimal stimulus uncertainty using an adaptive‐tracking paradigm. Thresholds were determined for increments and decrements in F 1 and F 2 for ten synthetic, steady‐state vowels modeling a female talker. Results indicated that the DL for both increments and decrements in frequency of the formants increased linearly as a function of frequency, with the exception of 50% higher thresholds where a harmonic fell exactly on the formant frequency. About 80% of the ΔF/F ratios were below 0.02 for formant frequencies greater than 600 Hz. Thus the DL for formant‐frequency discrimination appears to be in the range of 1%‐2%, or a factor of 3 lower than previous estimates. [Research supported by NIH and AFOSR.]
FREE

Difference limens for synthetic vowel spectra (A)

John W. Hawks

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S159-S159 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Difference limens (DLs) along synthetic vowel continua were measured as distances in the auditory‐perceptual space proposed by J. D. Miller [J. Acoust. Soc. Am. 85, 2114–2134 (1989)]. The continua represent straight lines that lie parallel with one of the axes of the space (x′, y′, and z′). Groups of six continua share a common center value, or reference point. Movement along these continua result in distinct patterns of frequency change for F1, F2, and F3 relative to the reference point formant values. These patterns vary with the direction and axis of movement. Reference points were selected from the interiors of ten vowel zones and seven boundary areas between vowel zones. An adaptive up—down procedure employing a cued, two‐alternative, forced‐choice (2AFC) task was utilized to estimate the 79.5% correct point along each continuum. Vowel tokens representing the reference points served as the cue stimuli. DLs were estimated twice for each continuum from each of four subjects. In general, the results of this experiment reflect smaller DLs for vowel formant frequencies than have been found in the past and suggest that while an overall average DL for distance in the auditory‐perceptual space may be estimated, DLs vary significantly with the axis of movement. Additionally, no significant difference was found between continua associated with vowel centers and vowel boundaries. These results will be compared with similar results estimating DLs for single‐formant variation in vowels. [Work supported by NIDCD.]
FREE

Vowel perception: Spectral shape versus formants (A)

Amir J. Jagharghi and Stephen A. Zahorian

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S159-S159 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Traditional theories of vowel perception favor formants over global spectral shape as the primary perceptual cues to vowel identity. In previous ASA meetings, results of speaker‐independent automatic recognition experiments for vowels were reported that contrasted global spectral shape versus formants [A. J. Jagharghi and S. A. Zahorian, J. Acoust. Soc. Am. Suppl. 1 81, S18 (1987); S. A. Zahorian and A. J. Jagharghi, J. Acoust. Soc. Am. Suppl. 1 82, S37 (1987)]. These results indicate that automatic recognition rates based on global spectral shape are generally slightly superior to recognition rates based on formants. In the present study, the perception of vowels is investigated for vowels synthesized such that the synthesized tokens contain conflicting cues to vowel identity based on overall spectral shape versus formants. Two distinct but close vowels are selected. The spectral shape of the first vowel is modified to match, to the extent possible, the spectral shape of the second vowel without any change in the formant frequencies for F1, F2, and F3. Thus the modified vowel has the same formants as the first vowel, but its spectral shape matches that of the second vowel. Listening experiments indicate that, for most conditions, the modified vowel segments are perceived according to spectral shape cues rather than formant cues. The details of the experimental procedures and the results of the listening experiments will be presented at the meeting. [Work supported by NSF.]
FREE

Spectral envelope distortion and vowel perception: Evidence for a central, auditory form of perceptual compensation (A)

Anthony J. Watkins

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S159-S160 (1990); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The response of an /I/‐/E/ spectral difference filter (SDF) is the spectral envelope of /I/ minus the spectral envelope of /E/. An “itch” to “etch” continuum was constructed by interpolating between the spectral envelopes of the vowels of appropriate VCs. These test sounds were preceded by a carrier sound filtered by an SDF and presented for identification. Perceptual compensation produces a phoneme boundary difference between the /I/‐/E/ SDF and its inverse. Carriers were the phrase “the next word is” spoken by the same (male) speaker as the test sounds, signal correlated noise derived from this phrase, the same phrase spoken by a female speaker, male and female versions played backward, and a repeated end‐point vowel. The carrier and test were presented to the same ear, to different ears, and from different directions (by varying interaural time delay). The gap between carrier and test was either 160 ms or zero. The pattern of results indicates that the compensation observed is unlike peripheral phenomena, such as adaptation, and unlike speech‐specific effects, such as vowel normalization.
FREE

The role of peak amplitude in vowel recognition (A)

L. Garrison‐Shaffer, D. L. Dutton, and J. R. Sawusch

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S160-S160 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Many theories of speech perception rely on the loci of spectral peaks as at least one factor upon which pattern recognition is based. However, when a peak is lower in amplitude than its neighbors, it may not be used in phonetic recognition. In the first experiment, a [u]‐[i] series was constructed by manipulating the amplitude of a spectral peak (851 Hz). Subjects readily identified an item from the series with a low‐amplitude, 851‐Hz spectral peak as an [i]. It would appear that this peak, at a low amplitude, is not used at a phonetic level of processing. Further experiments test the perceptual locus of the use (or nonuse) of this low‐amplitude peak information. Selective adaptation experiments were run in which the adaptors, including the [i] from the first experiment, varied in spectral overlap with a [u]‐[u] test series in order to determine the degree to which the low‐amplitude, 851‐Hz peak is utilized in processing. The results will be discussed in terms of how peaks are analyzed at different levels of processing and how this relates to various theories of speech perception. [Work supported by NIDCD DC00219.]
FREE

The roles of three possible cues to dynamic vowel identity (A)

Dawn L. Dutton

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S160-S160 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Recent research [Nearey and Assmann (1986);Strange (1989)] suggests that the changes in vowels over time are perceptually relevant to their identification. Possible “cues” for dynamic vowels include the frequency locus of spectral peaks over time, duration, and extent, rate, and direction of peak movement over time. The possible roles of duration and extent and rate of peak movement over time in cuing the [ʌ]‐[a] distinction were examined using synthetic [ʌ]‐[a] series. For each of three continua, two of the cues were varied while the third was held constant. Pilot testing showed that the continua were ambiguous on the basis of frequency alone and direction of spectral peak movement over time was nondistinctive. The ability of subjects to use the two vowel categories consistently was the basis on which the efficacy of the cues was judged. Duration was found to reliably cue vowel identity while no evidence of an effect of either rate of movement of spectral peak change or extent of spectral peak movement was found. Results will be discussed with respect to potential acoustic “cues” for dynamic vowels. [Work supported by NIDCD Grant DC 00219 to SUNY at Buffalo.]
FREE

Vowel nasalization: An acoustic and perceptual study of natural and synthetic vowels (A)

H. R. Gilbert

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S160-S160 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The purpose of the present study was to determine the acoustic cues used by speakers to nasalize vowels. These cues were then used to synthesize nasal vowels. The synthetic vowels, along with their naturally produced counterparts, were presented to a group of judges to rate vowel identity, naturalness, and the presence or absence of nasality. Speech samples were obtained from ten speakers who produced the vowels /i, u, æ, a, ɔ, ʌ/in a variety of non‐nasal and nasal contexts. DFT analyses were performed on the utterances. Differences between the non‐nasal and nasal vowels were studied by comparing pole frequency, amplitude and band‐width values, zero frequency values, F0 values, first‐harmonic amplitude in relation to first‐formant amplitude (H1‐A1) values, spectral tilt, and open‐quotient values. The 1988 version of the Klatt synthesizer was used to synthesize nasality in vowels based upon the cues identified. Accuracy of the judges perceptions of the [+ nasal] feature in the synthesized vowels was dependent on the presence of a nasal pole‐zero pair in the vowel spectrum. Frequency of the nasal pole‐zero pair varied with the specific vowel being synthesized.
FREE

On the perceptual vowel space of Modern Greek (A)

Marios S. Fourakis and John W. Hawks

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S160-S160 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
In this paper, 7703 vowel tokens were synthesized using the Klatt synthesis program. The values of F0, F1, F2, and F3 of each token were such that the token represented a unique point in a plane of the auditory perceptual space proposed by Miller [J. Acoust. Soc. Am. 85, 2114–2134 (1989)]. Each token was 400 ms long and had a rise‐fall pitch contour. The tokens were identified by one native speaker of Modern Greek (MSF) as one of the five Greek vowels [i, e, a, o, u], or not as a vowel of Greek. In the latter case, the subject was then asked to indicate the closest Greek vowel to which the token could be assigned. Additionally, all tokens were rated for clarity on a scale from 1 (poor) to 5 (excellent). The results suggest that the vowels of Greek may be represented as nonoverlapping subspaces in the auditory perceptual space. The subspaces corresponding to the three point vowels of Greek ([i, a, u]) exhibit extensive overlap with the subspaces occupied by the same vowels in English, as determined from a similar experiment with a native speaker of American English. The subspaces corresponding to the Greek midvowels ([e, o]), on the other hand, show partial overlap with the spaces for American English [ɛ, ɔ]. The subspaces corresponding to the American English lax vowels [I, æ ʌ, u] are not used in Modern Greek. These results will be compared to subspaces obtained from production data for American English [Miller (1989)] and for Greek [Jongman et al., Language and Speech (in press)]. [Work supported by NIDCD.]
FREE

Mapping the organization of vowel sequences into words (A)

Magdalene H. Chalikia and Richard M. Warren

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S160-S161 (1990); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Previous research [Warren et al., Percept. Psychophys. (in press)] has shown that when listeners hear recycled sequences of steady‐state vowels, they do not perceive them as a succession of vowels. Illusory phonemes are introduced and real or nonsense words are heard. Often the sequence of vowels is split perceptually in two simultaneous words differing in both quality and phonemic content. The present study employed sequences of eight 80‐ms vowels, and mapped the perceptual phonemes to acoustic phoneroes by terminating the repeated sequence at various positions and determining the last sound heard in the perceived word for each position. When two simultaneous words were heard, they both were mapped. Relations between the acoustic phonemes and the perceived phonemes will be described, and implications concerning the perceptual organization of speech will be discussed. [Work supported by NIH and AFOSR.]
FREE

Can principal components be identified as distinctive features? (A)

N. Nguyen‐Trong, S. Santi, and C. Cave

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S161-S161 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
It is often considered that principal components analysis (PCA) makes it possible to define distinctive features of vowels. In this case, they are represented as points in an acoustic space. Applied to the vocalic system of French, a PCA gives three first factors that are usually associated with the three main distinctive features. The psychological reality of such factors has been less investigated, despite the importance of the question. The PCA is based on the notion of variance, stating that the descriptive value of a factor is proportional to the initial variance that it “explains.” Can the perceived information be represented as linear combinations of the original acoustic variables, that maximize the explained variance? An identification test from 60 realizations of the French oral vowels was carried out. Each vowel was synthesized and transformed through five different orthogonal projections. The corresponding confusion matrices confirm the phonetic interpretation of the axes. However, a divergence between explained variance and correct identification percentage is noticed, which is explained through a hypothesis that contradicts the acoustic dispersion theory.
Close

close