• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

May 1990

Volume 87, Issue S1, pp. S1-S164

back to top
RSS Feeds
back to top Session XX. Speech Communication X: Cross Modal and Auditory Speech Perception
Contributed Papers
FREE

Lipreading with vibrotactile vocoders (A)

Lynne E. Bernstein, Marilyn E. Demorest, Michael P. O'Connell, and David C. Coulter

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S124-S125 (1990); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
A training study was conducted to compare aided lipreading of normal‐hearing and deaf adults, each assigned to one of three vibrotactile vocoders. Vocoders were (1) the Queen's University/Central Institute for the Deaf (QU/CID) vocoder, with one‐third octave filter spacing and logarithmic output scaling; (2) the QU/CID vocoder with linear output scaling; and (3) the GU (Gallaudet University) vocoder designed for greater resolution than the others in the F2 region and linear output scaling. Subjects received stimuli in baseline (no vocoder) and treatment (vocoder) conditions. In addition, two subjects served as visual‐only controls. Stimuli were provided by a live talker and two talkers prerecorded on laser videodisc (Bernstein and Eberhardt, 1986). Preliminary analysis of the results suggests that (1) the QU/CID linear vocoder was most effective, followed by the GU vocoder with linear output; and (2) regardless of experimental condition, normal‐hearing subjects' lipreading improved over the approximately 65‐h experiment. Results with deaf adults, along with results of the visual control subjects, suggest that careful control of visual learning is needed in experiments involving aided lipreading. Results will be compared with a previous study that involved similar procedures and several different transformations of fundamental frequency for a single vibrotactile channel. [Research supported by NIH.]
FREE

The influence of orthographic information on the identification of an auditory speech event (A)

Jody K. Layer, Richard E. Pasture, and Ellen Rettberg

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S125-S125 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
It has been shown that the identification of an auditory or visual speech event can be influenced by information from the other modality when the information is perceived to arise from the same event. Employing a selective attention task, Logan et al. (1990) demonstrated an influence of an auditory speech event on the identification of orthographic characters over a range of stimulus onset differences. The current research investigates the influence of orthographic information on the identification of the initial phoneme of an auditory speech event. The stimuli were nonword CVC syllables. The stimuli were edited natural speech and orthographic representations of these stimuli plus a set that was neutral with respect to the auditory set. A range of stimulus onset differences was employed. The results show that when the auditory and visual information agree, there is faster responding for the identification of the initial auditory phoneme. When the information is discrepent, responding is slowed. These results do not support a qualitative change in perception with differing information in the two modalities. Implications for the nature of cross‐modal integration and speech event processing will be discussed.
FREE

Exploring the basis of the “McGurk effect”: Can perceivers combine information from a female face and a male voice? (A)

Kerry P. Green, Erica B. Stevens, Patricia K. Kuhl, and Andrew M. Meltzoff

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S125-S125 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
In the “McGurk” effect, observers typically report the illusory syllable /da/ when they hear the auditory syllable /ba/ presented in synchrony with a video display of a talker saying /ga/. In such experiments, there is usually congruence between the two modalities in that the same talker produces both the auditory and the visual signals. In the experiments reported here, the effect of reducing the congruence between the two modalities on the magnitude of the McGurk effect was examined. This was accomplished by dubbing a male talker's voice onto a video tape containing a female talker's face, and a female talker's voice onto a video tape containing a male talker's voice. These “cross‐dubbed” video tapes were compared to normal video tapes in which the male talker's voice was dubbed onto a male talker's face, and the female talker's voice was dubbed onto a female talker's face. The results show that even though there was clear incompatibility in the talker characteristics between the auditory and visual signals for the cross‐dubbed stimuli, there was little difference in the magnitude of the effect compared to the normal stimuli. These results indicate that the mechanism for integrating speech information from the two modalities is not sensitive to certain incompatibilities, even when they are perceptually apparent. [Work supported by NIH.]
FREE

Cross‐modal semantic priming of neighbors of multisyllabic words (A)

Paul A. Luce and Michael S. Cluff

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S125-S125 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The present experiment was designed to test the principle of delayed commitment in spoken word recognition by using a cross‐modal priming paradigm. Subjects were asked to made a lexical decision on visually presented targets that were preceded by auditory primes. These auditory primes consisted of spondees (words containing two individual lexical items) that had second syllables with at least two meanings. For example, the second syllable of “baseball” could refer to either a round object for throwing or to a formal dance. Targets were words that were either related to the alternate meaning of the prime's second syllable or were unrelated. For example, after hearing “logjam,” subjects were presented with either “JELLY,” which is related to an alternate meaning of “jam,” or “BOMB,” which is unrelated to either meaning of “jam.” Evidence of priming in the related condition suggests that multiple candidates for recognition remain activated until well after the word's isolation point, contrary to the predictions of cohort theory. However, consistent with the neighborhood activation model, these results demonstrate that word recognition operates by a principle of delayed commitment.
FREE

Effects of visual word stimuli on speech perception (A)

Tadahisa Kondo and Kazuhiko Kakehi

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S125-S125 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The effects of visual word stimuli on speech perception in Japanese are investigated. An experiment is constructed to discriminate effects mediated by phonological and semantical codes of written words. Subjects are simultaneously presented with a visual word stimulus and an auditory word stimulus embedded in a short sentence. The recognition rates for a syllable in a spoken word, where the syllable is either replaced with white noise or has white noise added to it, are measured in various conditions in which various types of visual word stimuli are presented. Four types of visual word stimuli are used: (1) matching words, (2) nonword strings with the same pronunciation, (3) associative meaning words, and (4) nonrelative words in regard to auditory word stimuli. Additionally, non‐character stimuli are used as a control condition. The mean syllable recognition rates for five conditions are in order of (1), (2), (3), control, and (4), from higher ones. The rates for (1) and (2) are significantly different from the rate for the control condition. Consequently, both phonological and semantical codes of visual words are important factors affecting speech perception.
FREE

Similarity neighborhoods of spoken two syllable words: Retroactive effects on multiple activation (A)

Michael S. Cluff and Paul A. Luce

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S125-S126 (1990); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
This research examined the recognition of two‐syllable spoken words and the means by which the auditory word recognition system deals with ambiguous stimulus information. The perceptual identification of two‐syllable words comprised of two monosyllabic words (spondees) was examined. Individual syllables within a spundee were characterized as either “easy” or “hard” depending on the neighborhood characteristics of the syllable. An “easy” syllable was defined as a high‐frequency word in a sparse neighborhood of low‐frequency words, whereas a “hard” syllable was a low‐frequency word in a high‐density, high‐frequency neighborhood. Neighborhood structure was found to have a strong effect on identification. In particular, identification performance for spondees with a hard—easy syllable pattern was higher than for spondees with an easy‐hard syllable pattern, indicating a primarily retroactive pattern of influence in spoken word recognition. These results strongly suggest that spoken word recognition involves multiple activation and delayed commitment, thus ensuring accurate and efficient recognition.
FREE

Effects of speech intelligibility upon performance (A)

Georges R. Garinther, Leslie J. Peters, and Leslie A. Whitaker

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S126-S126 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The assessment of the ability of a talker to communicate with other persons, under given conditions, is normally accomplished by means of speech intelligibility testing. The resultant scores, however, do not provide any quantification of how well the communicating individuals performed the task at hand. An experiment was conducted in which professional military crews operated a tank simulator at five levels of speech intelligibility ranging from very good to extremely poor. Measures such as time to drive to an engagement location, number of targets hit, time to perform the entire mission, etc., were obtained. The results of this study will serve as a first step for establishing more realistic acoustical limits for military systems, guide the design of improved communication systems, and assist operations analysts in better defining war gaming parameters.
FREE

Hemispheric differences in spoken word recognition (A)

Edward T. Auer and A. Luce

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S126-S126 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Previous research on visual word recognition has demonstrated differences in the manner that the left and right hemispheres process lexically ambiguous words [Burgess and Simpson, Brain Language 33, 86–103 (1988)]. The present study evaluated these findings for spoken word recognition. Lexically ambiguous primes (e.g., BANK) were presented binaurally. A target word was then presented monaurally to the left or right ear for a speeded lexical decision response (i.e., WORD‐NON‐WORD). Target words were either (1) related to the dominant meaning of the ambiguous prime (BANK‐MONEY), (2) related to the subordinate meaning of the prime (BANK‐RIVER), or (3) unrelated to the prime. Priming of dominant meanings produced more facilitation than priming of subordinate meanings regardless of ear of presentation. In addition, a right‐ear advantage was observed for words but not for non‐words. Implications of these results for models of spoken word recognition will be discussed.
FREE

Effects of masker fluctuations, number of maskers, and reverberation on binaural speech perception in normal and impaired hearing (A)

Adelbert W. Bronkhorst and Reinier Plomp

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S126-S126 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Binaural speech‐reception thresholds (SRT) in noise for both normal‐hearing and hearing‐impaired listeners were determined (1) using earphone presentation of stimuli simulating a free‐field environment with between one and six noise sources, and (2) in a reverberant environment with the listener in either the direct or indirect field of the speech and/or noise source. The masking noise was either steady state, or modulated like running speech. It was found that normal‐hearing listeners benefit considerably from both masker fluctuations and interaural level differences due to headshadow: Both cues caused an advantage of up to 7 dB in terms of SRT. Much smaller gains were attained by most hearing impaired listeners. The gains also diminished when reverberation was present, or when the number of maskers was increased. Results further showed that in almost all conditions where interaural time or phase differences were present, the release from masking experienced by both normal‐hearing and hearing‐impaired listeners was in the order of 2–3 dB.
FREE

STI approach for predicting the effect of fluctuating interference on speech intelligibility (A)

Adelbert W. Bronkhorst and Tammo Houtgast

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S126-S126 (1990); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The modulation transfer function (MTF) and the derived speech transmission index (STI) have been successfully applied to predict the effects of interfering noise and time‐domain distortions on speech intelligibility. Results of experiments using fluctuating interference (e.g., chopped noise) show, however, that the STI can grossly underestimate performance in such conditions. It was found that STI predictions for fluctuating interference can be considerably improved by modifying the calculation scheme. The normal approach is based on calculation of the modulation reduction m according to the simple relation m = 1/ (I + IN/IS), where the signal and noise intensities IN, and IN are averaged over a certain time interval. In the modified approach, m instead of noise intensity is averaged, and, additionally, a low‐pass filter is applied to the intensity envelope function to account for the limited auditory temporal resolution. Predictions based on this method are generally quite good, except for an underestimation of the masking efficiency of slowly fluctuating signals (with modulation frequencies below 4 Hz).
FREE

Visual perception of anticipatory rounding gestures (A)

Pierre Escudier, Christian Benoît, and Tahar Lallouache

J. Acoust. Soc. Am. Volume 87, Issue S1, pp. S126-S127 (1990); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The mechanisms of anticipation for the rounding gesture have been repeatedly investigated in previous works (see, e.g., the controversy between Lubker and Gay concerning the extent of anticipation in Swedish versus American English, or the support found in the French language for Henke's “look ahead” model as exemplified by data from Benguerel's famous “sinistre structure”). Concerning the visual perception of such an anticipation, McGurk [in The Cognitive Representation of Speech, edited by T. Myers et al. (North‐Holland, Amsterdam, 1981), p. 336] has briefly mentioned an experiment using reaction times in CV identification. He demonstrated that listeners do take visual information about anticipation into account, and identify CV syllables on the basis of lip movement information prior to their being perceived auditorily. In the present experiment, this same result is found for French, with a different experimental protocol, taking into account simultaneous acoustic and articulatory measurements. Here, /zV1zV2/ trajectories (V1, V2, = /i/ or /y/), were used and compared with auditory identification data obtained from gated signals with results of visual identification for front face video images taken every 20 ms along the V1 → V2 trajectory. The following results were found: (1) Lip area transitions clearly show the asymmetry of vowel‐to‐vowel gestures. The transition from /y/ to /i/ begins at the acoustic: onset of the consonant, while the transition from /i/ to /y/ can begin very early in the /i/; (2) this anticipation of the rounding gesture is clearly perceived visually by the subjects who are able to identify the /y/ vowel before the end of the /i/; and (3) visual detection of the rounding gesture thus comes prior to its auditory detection, which seems, in fact, disturbed by the acoustic mixture of the vocalic gesture (/i/ → /y/ or /y/ → /i/) and the consonantal gesture (/z/). Implications for the timing and perception of the vowel‐to‐vowel gesture are discussed.
Close

close