• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Apr 1982

Volume 71, Issue S1, pp. S1-S113

back to top
RSS Feeds
back to top Session LL. Speech Communication VII: Segmental Phonetic Perception
Contributed Papers
FREE

The perception of sine‐wave analogs of speech (A)

Eileen C. Schwab

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S74-S74 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Subjects identified various sets of tone stimuli which consisted of one, two, or three sine‐wave components. The stimuli varied in their degree of similarity to the syllables /ba/, /ab/, /ds/, and /ad/. Half of the subjects performed an auditory labeling task in which they identified the direction and location of the pitch transition. The other half of the subjects performed a phonetic labeling task in which they identified the tones as the CV syllables upon which they were based. The results showed a strong interaction between stimulus components and labeling performance. When a low‐frequency component (similar to an F1) was present, the phonetic‐label subjects performed best while the auditory‐label subjects performed worst. Also, various masking effects were present for the auditory‐label subjects which were not present for the phonetic‐label subjects. The results indicate that phonetic processing is not necessarily invoked by stimulus structure alone. However, when phonetic processing is engaged, the output of auditory analysis becomes unavailable for further processing. These results suggest that auditory and phonetic modes of processing may be mutually exclusive. Furthermore, the findings provide another demonstration of the reality of a distinct speech mode of perception. [Work supported by NIMH.]
FREE

Auditory perception experiments with sine‐wave analogs to the voice‐onset time dimension (A)

James Hillenbrand

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S74-S75 (1982); (2 pages)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Previous research has shown that the voice‐onset time dimension can be modeled using nonspeech stimuli. Labeling data from these experiments are in close agreement with the VOT crossover point for labial stops. The VOT boundary, however, varies with place of articulation; this effect appears to be attributable to differences in the duration of the first‐format transition. Results from the present study demonstrate that this effect can also be found with stimuli consisting of a midfrequency sine wave (analogous to F2) and a low‐frequency sine wave (analogous to F1). The two frequency components were separated in relative onset time by 0 to 50 ms in 10‐ms steps and were synthesized with frequency sweeps similar to formant transitions found in speech. Labeling results showed that, like synthetic speech, the boundary location for relative onset time is influenced by the duration of the lower frequency tone sweep. These findings, and the results of discrimination tests with these stimuli, suggest an auditory rather than phonetic explanation for the shift in VOT crossover with place of articulation.
FREE

Perceptual and acoustic evidence for tradeoffs in anticipatory coarticulation effects (A)

James G. Martin and H. Timothy Bunnell

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S75-S75 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
The stimuli were /stri, stru/ pairs produced by four talkers. A part of /s/, /t/, or /r/ was exchanged between the pairs so that any coarticulatory information within the exchanged interval incorrectly predicted the remainder of the sequence. Recognition time (RT) was observed to final vowel targets /i/ or /u/. Results showed generally slower RT (interference) in exchanged compared to intact (as spoken) sequences. Within a given pair, however, when there was large interference from exchange of /t/ there was little interference from exchange of /r/, and vice versa. The tradeoff in perceptual effects between /t/ and /r/ was mirrored in acoustic differences, e.g., rate of F2 rise in /r/ toward /i/. Apparently the effects of anticipatory coarticulation are not necessarily always manifested to the greatest degree in the nearest segment. Articulation appears to be a more or less closed system of mutual constraints among segments in a sequence such that the various tokens of a given sequence may be seen as members of a set or class of utterances. Discussion concerns some theoretical implications of this notion for production and perception. [Work supported by NIH.]
FREE

Perceptual assessment of coarticulation in sequences of two stop consonants (A)

Bruno H. Repp

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S75-S75 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
This study investigated whether any perceptually useful coarticulatory information is carried by the release bursts and formant transitions of two successive, nonhomorganic stop consonants. The VC or CV portions of natural VCCV utterances were replaced with matched synthetic stimuli from a VC or CV continuum spanning the three places of stop articulation. When the VC and CV portions in the resulting hybrid VCCV stimuli were separated by a fixed silent interval, the context in which the natural portion has been produced had no influence on listeners' identification of the synthetic portion, suggesting that VC and CV formant transitions and CV release bursts contained no perceptually salient coarticulatory cues. However, when a natural VC portion was separated from a synthetic CV portion by the original closure interval, which included a brief release burst of the first stop, there was a sizeable effect of the original CV context on the perception of the second stop consonant. Thus the release burst of a syllable‐final stop contains significant coarticulatory information about a following, nonhomorganic stop. The data also revealed contrast effects and other perceptual interactions between two successive stop consonants. This report will include results of an acoustic analysis of the stimuli. [Research supported by NICHD and BRS.]
FREE

The influence of vocalic context on the /s/ − /ʃ/ distinction in initial and final position (A)

Sigfrid D. Soli and Virginia A. Mann

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S75-S75 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Using natural FV and VF productions, where F = /s/, /ʃ/ and V = /a/, /u/, a series of hybrid stimuli were made by replacing the natural friction with synthesized noises from a nine‐member /s/ − /ʃ/ continuum. Listeners were asked to label the fricatives in these stimuli as /s/ or /ʃ/. The identity of both the vowel and the (excised) original fricative influenced labeling of the synthesized noises. More /s/ responses were given in the /u/ context than in the /a/ context for both the FV and VF stimuli. However, the influence of the original fricative was greater for the FV than the VF stimuli. FV stimuli originally beginning with /s/ received more /s/ responses than VF stimuli originally ending with /s/. The acoustic consequences of articulatory timing in FV and VF production can account for these results. Equivalent anticipatory and perseveratory lip rounding in utterances with /u/ should produce symmetrical vowel context effects in perception, while the asymmetrical perceptual influence of original fricative identity can be attributed to differences in the timing of both assimilatory tongue movement and the alternation of periodic and friction noise sources in FV and VF productions. [Work supported by NICHD and BRS.]
FREE

Development of perceptual adjustment for the coarticulatory effects of rounded vowels on preceding fricatives (A)

V. A. Mann, M. F. Dorman, D. Strawhun, and H. M. Sharlin

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S75-S75 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
When synthetic fricative noises from an [ʃ] − [s] continuum are followed by [a] and [u], listeners perceive more instances of [s] in the context of [u] [V. A. Mann and B. H. Repp, Perc. Psychophys. 28, 213–228 (1980)]. This perceptual context effect presumably reflects adjustment for the coarticulatory effects of rounded vowels on preceding fricatives. We have begun to examine the development of this effect, asking children to label stimuli from “save”‐“shave” and “sue”‐“shoe” continua, constructed by following noises from an [ʃ] and [s] continuum with periodic portions excerpted from natural tokens of “shave” and “shoe.” The subjects include normal adults, 5‐, 7‐, and 9‐year‐old children who correctly produce both [ʃ] and [s], and 7‐year‐olds who misarticulate these fricatives. Thus far we have found that the magnitude of the vocalic context effect is typically greater for adults, yet children at all ages show a significant effect whose magnitude is not a function of age. These findings, to be supplemented with those obtained from the misarticulating children, may clarify the developmental basis of listeners' tacit knowledge of coarticulation. [Work supported by NICHD and BRS.]
FREE

On the use of the sameness measure in scaling analysis of confusion matrices (A)

Moshe Yuchtman and Robert C. Bilger

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S75-S76 (1982); (2 pages)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
A measure of sameness, S, is often used as a metric in scaling procedures applied to confusion matrices [L. Goldstein, UCLA‐WPP 39, 1–35 (1977)]. The sameness between any two sounds in a matrix is defined as the ratio between the frequency of incorrect responses (Fij + Fij) and the frequency of correct responses (Fii + Fjj). Assuming equal a‐priori stimulus probabilities we can transform Fii and Fji to proportions of hits, P(H), and false alarms, P(FA), respectively. For a constant S, P(H) is related to P(FA) by a linear function with a slope of 1 in ROC space. This function implies that for a given signal discriminability, P(H) and P(FA) can vary only to the extent that a constant proportion of correct responses is maintained. Furthermore, it implies a decision axis in which the two sounds are each represented by a rectangular distribution. The literature of psychophysics suggests that perceptual processes are not well approximated by such distributions. In an attempt to further study this issue, confusion‐matrix data reported by Miller and Nicely [J. Acoust. Soc. Am. 27, 338–352 (1955)] and by Wang and Bilger [J. Acoust. Soc. Am. 54, 1248–1266 (1973)] were reduced to individual 2 × 2 matrices. In addition to S, the index of sensitivity d′, amount of transmitted information Uxy, and phi‐squared were calculated. The last three measures, which are not based on rectangular distributions, were linearily related to each other and were nonlinearily related to S. The results of scaling analyses employing these measures will be discussed in reference to the role of speech features in speech perception.
FREE

Discrimination and identification of voicing and place contrasts in aphasic patients (A)

Grace H. Yeni‐Komshian and Linda Lafontaine

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S76-S76 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Aphasic patients and controls were given discrimination and identification tasks using synthesized stop‐initial CV syllables. Three different seven‐step continua were used as stimuli: voicing (bee‐pea); place (bee‐dee); and vowel (bee‐bow). All subjects were given both tasks for each continuum. The aphasic patients were devided into a good and a moderate comprehension group on the basis of their scores on an auditory language comprehension test. In comparison to the controls, aphasics with moderate comprehension had difficulty perceiving voicing and place contrasts, while aphasics with good comprehension generally had difficulty with place contrasts only. There were no group differences on the vowel tasks. The majority of the subjects either discriminated and identified the stimuli, did not discriminate or identify the stimuli, or discriminated but did not identify the stimuli. A small number of subjects could not discriminate but could identify the stimuli, mainly on the voicing continuum.
FREE

Context effects in Japanese perception of English /r/ and /I/ (A)

Patricia Dissosway‐Huff and Robert Port

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S76-S76 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Perception of English /r/ and /l/ is a well‐known difficulty for Japanese learning English. An identification test for minimal pairs read by two American speakers was administered to 32 Japanese students of English, first just after arrival in the United States, and then nine weeks later at the completion of an intensive English program emphasizing oral skills. Little improvement was observed after oral training. The perception of /r/ and /l/ as singletons and in consonant clusters exhibited quite opposite trends. In clusters, /l/ was perceived more accurately than /r/ (66% versus 52%), while for singletons, /l/ was somewhat worse than /r/ (64% versus 70%). Singleton consonants in word‐final position were more accurately perceived than initial singletons (77% versus 57%) while for clusters, the finals were slightly worse than initials (56% versus 62%). Thus both the /r‐l/ effect and word‐position effect interact with the singleton‐cluster factor but not with each other. These findings will be discussed in relation to the phonotactics of both Japanese and English and in terms of their acoustic correlates. [Supported by NIH, HD12511.]
FREE

A cross‐language study of categorical perception for semi‐vowel and liquid glide contrasts (A)

Catherine T. Best, Kristine S. MacKain, and Winifred Strange

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S76-S76 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Recently we replicated categorical perception of /r/ and /l/ in Americans and its absence in native Japanese, whose language lacks that contrast; Japanese with extensive conversational English experience (experienced), however, showed greater categoricity [MacKain, Best, and Strange, Applied Psycholinguistics (in press)]. We retested 18 of the same subjects (nine Americans, nine Japanese) on perception of a prevocalic /w/‐/j/ continuum; these glides contrast phonemically in both languages. The phonetic and acoustic characteristics of /w/‐/j/ place them between prototypical vowels and consonants, suggesting perception should be less categorical than for a clearer consonantal contrast such as /r/‐/l/. Results upheld predictions derived from the language differences: Japanese and Americans showed equally categorical /w/‐/j/ perception, but the Japanese were less categorical on /r/‐/l/ than on /w/‐/j/. Experienced Japanese showed this pattern also, but perceived /r/‐/l/ more categorically than the less experienced subgroup. Although for Americans, /w/‐/j/ identifications appeared somewhat more continuous than for /r/‐/l/, the difference was nonsignificant. Prevocalic semi‐vowels may thus be perceived as categorically as natively contrasting liquids. [Supported by NINCDS, NICHD, and NIH.]
FREE

Individual differences in the perception of isolated vowels and vowels in a consonantal context (A)

Brad Rakerd and Robert R. Verbrugge

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S76-S76 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Previous studies comparing the perception of isolated vowels with that of vowels produced in some consonantal context have generally focused on similarities and differences in the average performance of subjects in the two conditions. In this investigation, we wanted to look carefully at individual differences in vowel perception both within and between these conditions. We did so with the aid of a nonmetric individual differences scaling procedure developed by Takane et al. [Psychometrika 42, 7–67 (1977)]. The variance common to all subjects (in the two conditions combined) was modeled in a single multidimensional space, and the individual differences were represented as weights (or saliences) that each subject attached to the dimensions of the space. The data were similarity judgments collected (with the method of triadic comparisons) for a set of isolated vowels and for a set of vowels in context recorded by the same talker. The perceptual space had three dimensions. The first of these corresponded closely to the articulatory dimension of tongue advancement, the second to tongue height, and the third to tenseness. It was consistently observed that subjects who rated vowels in context attached a substantial weight to all three perceptual dimensions. This pattern was also exhibited by several subjects who rated isolated vowels, but others in the isolated vowels condition heavily weighted one of the three dimensions to the near exclusion of the other two, and still others weighted two dimensions substantially but attached little weight to the third. Potential accounts of this condition difference will be discussed. [Work supported by NIH grants HD01994 and RR05596.]
FREE

Acoustic and perceptual correlates of nasal vowels (A)

Kenneth N. Stevens and Sarah Hawkins

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S76-S77 (1982); (2 pages)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Various versions of different nasal vowels have been synthesized by systematically modifying the all‐pole transfer function corresponding to nonnasal vowels. These modifications were achieved by manipulating the frequencies and bandwidths of the first ferment and of an added pole‐zero pair over ranges that are consistent with acoustic theory and analysis of nasal vowels, including the effect of the sinuses. The stimuli were evaluated or identified by listeners (principally Gujarati speakers) whose language included a nasal‐nonnasal opposition. Nasal judgments were obtained when the pole‐zero pair was in the vicinity of 400 Hz, but for some vowels (especially high vowels) nasal responses were also obtained when the additional resonance was at a higher frequency. It is concluded tentatively that the nasal‐nonnasal distinction in language is based on the fact that the auditory system responds distinctively when the spectrum in the vicinity of the first formant is flattened to yield a less prominent low‐frequency spectral peak. This modification is achieved by the introduction of additional low‐frequency resonances and by increased bandwidth for some of the low‐frequency peaks. [Supported in part by a grant from NINCDS.]
FREE

Interactive effects of sensation level, filtering, and consonant environment on the perception of vowels (A)

S. Williamson and A. E. Carney

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S77-S77 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
This experiment assesses the interaction of three factors on vowel perception in normal listeners: filtering, sensation level, and consonant environment. These factors were chosen to provide a more comprehensive analog to vowel perception in the hearing impaired. Sixty subjects were divided into six groups, each group receiving a different experimental condition. Subjects could receive one of three filtering conditions: no filtering, low‐Pass, or high‐pass filtering, in combination with one of two vowel‐environment conditions: isolated or pVp condition. All subjects were presented with stimuli at 10, 20, and 30 dB SL. The vowel stimuli were /i, ɪ, ɛ, æ, ɑ, ɔ, , u, ʌ/. All stimuli were produced by a single female speaker in both isolated and pVp conditions. Articulation functions were plotted for each experimental condition. Results indicated a low error rate for subjects in all conditions, even for the lowest sensation levels. Differences between articulation functions were observed between experimental conditions. These data will be discussed with regard to the applicability of such simulations to speech perception and auditory training for the hearing impaired.
FREE

Perception of French vowels by native French and American English listeners (A)

Terry L. Gottfried

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S77-S77 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Studies of American English vowel identification have found that vowels are better identified in consonantal context than in isolation. Strange, Edman, and Jenkins [J. Exp. Psych.: Hum. Percept. Perform. 5, 643–656 (1979)] attribute this finding to dynamic acoustic information carried by formant transitions and vowel duration. One might expect different results for French, because phonological constraints and phonetic characteristics of French vowels are different from those of English. Four native Parisian French speakers produced 11 stressed, oral French vowels in four syllabic contexts: /t/‐vowel‐/t/ (TVT), vowel‐/t/ (VT), /t/‐vowel (TV), and isolated vowel (♯V♯). Native French speakers identified vowels in TV and ♯V♯ contexts more accurately than vowels in VT and TVT contexts. In a categorial ABX discrimination task, non‐native French speakers and nonspeakers of French also perceived isolated French vowels better than vowels in TVT context, arguing against phonological interpretations of the differences between languages. Acoustic factors accounting for differences in French and English vowel perception are discussed. [Supported by NIMH, NICHHD, NSF, and Graduate School of University of Minnesota.]
FREE

Visual speech synthesis for speech perception experiments (A)

N. M. Brooke

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S77-S77 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
Analytical investigations of speech perception in the audio‐visual domain require a visual stimulus that is plausibly lifelike, controllable and well‐specified. A computer package has been developed to produce real‐time animated graphics which simulate the front‐facial topography and articulatory movements of the lips and jaw during VCV speech utterances. It is highly modular and can simulate a wide range of facial features, shapes, and movements. It is currently driven by streams of time‐varying positional data obtained from experimental measurements of human speakers enunciating VCV utterances. The measurements of a series of point coordinates are made from sequential single frames of a videotape recording using a microprocessor‐linked data‐logging device. Corrections are made for the effects of global head and body movements. This is the lowest level of control in a hierarchy whose higher levels could include algorithms for generating the articulatory trajectories by rule from phonetic transcriptions. Although the development of the synthesizer is still at an early stage, the acceptability of its display suggests great potential for use in analytical investigations for which the graphics will eventually by synchronized with an audio‐speech synthesizer. [Work supported by MRC.]
FREE

Detection and resolution of audio‐visual conflict in the perception of vowels (A)

Quentin Summerfield, Matthew McGrath, and John Foster

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S77-S77 (1982); (1 page)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
To determine whether optical cues can influence the identification of acoustically defined vowels, three series of /bVd/ syllables were synthesized with the medial vowel ranging from /u/ to /a/, /a/ to /i/, and /i/ back to /u/. The members of each series were synchronized with video‐recordings of the face of a (natural) talker uttering the end‐point syllables of that series, and were presented to adults with normal hearing and vision (a) to obtain ratings of audio‐visual compatibility, and (b) for forced choice identification. Perceived compatibility decreased essentially monotonically across each series as the acoustical syllable became less like that normally appropriate for the syllable displayed optically. Nonetheless, systematic changes in patterns of identification occurred between audio‐visual and audio‐alone presentations, with audio‐visual vowels identified as more like the vowel presented visually. These results demonstrate that analogous effects obtained previously with consonants are not restricted to a particular phonetic class, and are compatible with theoretical arguments [Summerfield, Phonetica 36, 314–331 (1979)] that perceptual integration of information in the two modalities occurs prior to phonetic categorization.
FREE

Bimodal speech perception in early infancy (A)

Patricia K. Kuhl and Andrew N. Meltzoff

J. Acoust. Soc. Am. Volume 71, Issue S1, pp. S77-S78 (1982); (2 pages)

Online Publication Date: 12 Aug 2005

Full Text: | Download PDF

Show Abstract
We previously reported preliminary data on infants' abilities to detect the cross‐modal correspondences between the visual and auditory concomittants of speech [Kuhl and Meltzoff, J. Acoust. Soc. Am. Suppl. 1 70, S96 (1981)]. We have now completed testing on 32 4.5‐ to 5‐month‐old infants. Each was shown a filmed display of two faces, one producing the articulatory movements corresponding to the vowel [a] and the other producing the articulatory movements corresponding to the vowel [i]. One of the sound tracks (either [a] or [i]) was played in synchrony with the faces. The infants' visual fixations to the faces were scored by an observer who could not hear the sound track presented to the infant nor see the faces. Results demonstrated that the infants looked longer at the face matching the sound being presented than at the nonmatching face (p < 0.01). We hypothesized that the recognition of these cross‐modal equivalences was based on the structural correspondence between a particular articulatory movement and a particular vowel sound, rather than on any temporal correspondence between a particular face‐sound pair. This hypothesis predicts that if the crucial spectral information is removed from the vowels while the temporal information is preserved, performance should drop to chance. Experiment II tested this hypothesis using pure‐tone stimuli. Performance fell to chance. [Supported by NSF.]
Close

close