• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Apr 1991

Volume 89, Issue 4B, pp. 1851-2015

back to top
RSS Feeds
back to top Session 9SP: Speech Communication: Word and Sentence Effects
Contributed Papers
FREE

Underwater audiological testing of a diver communicator (A)

Richard W. Ranlet

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2009-2009 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
An underwater communicator was developed to evaluate the suitability of direct audio voice transmission as a practical means to enable close‐range communications between divers using open‐circuit SCUBA. A communicator that transmits speech directly into the water was designed and human engineered to yield a practical working prototype suitable for normal open‐water dive conditions. Intelligibility testing of the device entailed underwater audiological testing that attempted to match in‐air standards and practices. Measurements of hearing thresholds, speech reception thresholds, and speech discrimination/recognition were performed using accepted ASHA practices. Intelligibility of sentences deemed relevant to the diving environment was also tested. In‐water intelligibility testing levels were correlated with accepted in‐air standards using measured hearing and speech reception thresholds as common reference points between the two media. Binaural intelligibility tests were performed in a pool using Auditec and Campbell PB50 word lists. Based on averaged results from two underwater listeners, intelligibility for the finished communicator measured 82% correct responses (5‐m distance between transmitting and receiving divers). Similar tests for sentence intelligibility yielded near 100% correct responses for the same conditions.
FREE

The intelligibility of words, sentences, and continuous discourse using the articulation index (A)

Rory DePaolis

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2010-2010 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
The relation between the intelligibility of speech stimulus with widely varying message redundancy was measured. Using the original articulation index (AI) methodology [N. R. French and J. C. Steinberg, J. Acoust. Soc. Am. 19, 90–119 (1947)], frequency importance functions and transfer functions (AI versus percent correct) were determined for one speaker speaking 616 PB‐50 words, 200 meaningful SPIN sentences, and 44 7th‐grade reading level continuous discourse, (CD) passages. Thirty subjects were instructed to write down each word and to estimate the percentage of each sentence and CD passage that they heard correctly. The stimulus was degraded with 4 noise and 11 filtered conditions. The results demonstrate a trend toward less significance of high‐frequency cues as message redundancy increases. There is also evidence to recommend the use of speech type specific frequency importance functions when calculating the AI.
FREE

A probability model for distributions of speech intelligibility data (A)

Caldwell P. Smith

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2010-2010 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
It was determined that the compound Poisson probability distribution as described by William Feller in An Introduction to Probability Theory and Its Applications, 2nd ed. [(Wiley, New York, 1975), pp. 270–273] is a valid model for distributions of speech intelligibility scores from diagnostic rhyme tests. This was established from details of scores from 110 multispeaker tests of a variety of speech processing conditions. Probability models were constructed by first converting feature scores to integers representing frequencies of errors in listener responses, and calculating means and variances of those distributions. Variance of a compound Poisson distribution is equal to the mean divided by p, and in this corpus of data the value of p tended to remain relatively fixed at an average value of 0.129, with the consequence that distributions were essentially defined by mean values and dispersions a linear function of means. In these measures, variance averaged 7.75 times the mean, with the average value of this coefficient varying over a limited range with different speech processing conditions: for LPC processors, the average was 8.37; for wideband processors, 7.12; for processors in tandem, 6.75; and for speech in Gaussian noise, 8.06. Partitioned into separate data sets for voiced and unvoiced feature scores, the same trends were observed, but with coefficients approximately 15% larger with voiced data, and approximately 20% smaller with unvoiced data.
FREE

Multimode database and its preliminary results (A)

Karen Wallace, Josephine Horna, Yukiko Monzen, and Noriko Umeda

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2010-2010 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Data from four speaking modes—word‐list reading, text reading, conversation, and reading of sentences occurring in the conversation—were collected from American English speakers (five male, five female). Each speaker spoke for approximately 5 h. Smaller amounts of similar data are being collected for Japanese and Spanish. The principal objective is to better understand speech activities and their underlying rules by capturing them in speaking modes that exhibit very different ranges of variation. The reading of sentences spoken in conversation is particularly useful, because it allows us to directly compare acoustic attributes of linguistic forms in formal versus spontaneous speech. Several research projects that contrast conversation and more formal modes (two of which will be presented at this meeting) are being carried out. These include formant frequencies of unstressed vowels [Wallace]; acoustic properties of /ɚ/s [Horna); prosodic attributes of stressed and unstressed vowels; perception and acoustic correlates of degrees of emphasis. Ideas for future research, including a cross‐language study of contractions used in conversation, will be proposed.
FREE

Time course of hemispheric differences in spoken word recognition (A)

Edward T. Auer and Paul A. Luce

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2010-2010 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Previous research has demonstrated differences between hemispheric processing of auditorily and visually presented lexically ambiguous words. The present study further examined the time course of hemispheric processing of lexically ambiguous spoken words. Lexically ambiguous primes (e.g., BANK) were presented binaurally. Fifty or 500 ms later, a target word was then presented monaurally to the left or right ear for a speeded lexical decision response (i.e., WORD‐NONWORD). Target words were either (1) related to the dominant meaning of the ambiguous prime (BANK‐MONEY), (2) related to the subordinate meaning of the prime (BANK‐RIVER), or (3) unrelated to the prime. No significant differential effects of facilitation were found in any condition. In addition, only the fastest subjects in the 50‐ms ISI experiment showed a significant right‐ear advantage. Implications of these results will be discussed in terms of modality specificity and temporal versus spatial stimulus array.
FREE

Speeded comparisons of spoken words (A)

Paul A. Luce

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2010-2010 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
This research evaluated the claim that spoken words are recognized as soon as they diverge from all other words in memory. Pairs of consonant‐vowel‐consonant (CVC) words were presented for a speeded SAME‐DIFFERENT response. Subjects were instructed to respond SAME or DIFFERENT as soon as they could determine if the second word of the pair was the same as or different from the first word. Reaction times were measured from the onset of the second word. The overlap between the two members of the pairs of spoken words was systematically manipulated to determine if reaction times to say DIFFERENT were a simple function of the point at which the two words differed or whether overlapping segments after the divergence point would affect reaction times. The results are consistent with the notion that the temporal window for mapping information onto representations of spoken words in memory spans more than a feature or segment and that spoken words are not necessarily recognized as soon as they diverge from other words in memory.
FREE

Some lexical effects in a generalized phoneme monitoring task (A)

Scott E. Lively and David B. Pisoni

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2010-2011 (1991); (2 pages)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
A long‐standing concern in the spoken word recognition literature has been whether use of the lexicon is necessary to complete a phoneme monitoring task. Eimas et al. [J. Mem. Lang. 29(2), 160–180 (1990)], for example, found no lexical effects in a phoneme monitoring task until monitoring responses were accompanied by lexical decisions or noun‐verb categorizations. Frauenfelder and Segui [Mem. Cog. 17(2), 134–140 (1989)], in contrast, found facilatory priming effects in a monitoring task when the position of the target phoneme varied randomly from trial to trial. The current study adopts Frauenfelder and Segui's generalized phoneme monitoring paradigm in a speeded phoneme classification task. Subjects participating in a blocked condition demonstrated word frequency and density effects only for blocks of trials in which targets occurred word finally. No lexical effects were observed for responses to word initial targets. Subjects in a mixed condition showed frequency and density effects for word final targets and frequency effects for word initial targets. In terms of Cutler's race model, the data indicate that subjects adopt a postlexical response strategy when targets occur late in the stimulus word or when attention cannot be consistently focused on a particular target position. [Work supported by PHS RO1 DC011‐15.]
FREE

The role of lexical status in the segmentation of fluent speech (A)

Anne S. Henly and Howard C. Nusbaum

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2011-2011 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Theories of word recognition propose that listeners use lexical status to segment one word from another in fluent speech. Thus words must be recognized one at a time, in the order in which they were produced. This leads directly to the following predictions: (1) Words should be easier to identify following a word than following a nonword. (2) The lexical status of a syllable following a word should not affect the identification accuracy of that word. Subjects in the present experiment were asked to identify monosyllabic and trisyllabic target words presented in noise. Target words were presented with preceding word and nonword context syllables, as well as following word and nonword context syllables. Although the results confirm that listeners are able to use lexical status to facilitate segmentation, they also strongly suggest that listeners' use of lexical status is quite unlike the segmentation strategies proposed by most models of word recognition. [Research supported by NIDCD.]
FREE

Listening to the sound of sentences (A)

Howard C. Nusbaum, Kevin J. Broihier, and Judith C. Goodman

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2011-2011 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Although intonation conveys a great deal of information relevant to understanding sentences, it is unknown how listeners actually use this information. How do listeners integrate information in the intonation of a sentence with information derived from a linguistic analysis of the words in the sentence? Syntactic information from the order of words in a sentence may be processed independently from syntactic information perceived from intonation. On the other hand, different sources of syntactic information may be treated as equivalent and integral. Subjects were instructed to judge whether the intonation of a sentence was declarative or interrogative. Statements and questions were produced in two forms; with a declarative intonation and with an interrogative intonation. In one condition, syntactic structure was constant for all trials and intonation varied. In a second condition, syntactic structure varied across trials, as did intonation. The results indicate that listeners are unable to ignore the syntactic structure of a sentence in judging the intonation of the sentence. Listeners treat different types of syntactic information as integral. These results suggest that perception of intonation is a direct and integral part of sentence understanding. [Research supported by NIDCD.]
FREE

Some effects of text coherence on the comprehension of natural and synthetic speech (A)

James V. Ralston, Scott E. Lively, and David B. Pisoni

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2011-2011 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Subjects listened to naturally and synthetically produced (Votrax Type‐n‐Talk) passages of varying levels of difficulty in a sentence‐by‐sentence listening time task. Listeners controlled the intersentence interval while listening to passages presented in their normal sentence order or in a random sentence order. Subjects listening to Votrax speech had significantly longer intersentence response times in both the normally ordered and the randomly ordered sentence conditions. Furthermore, in a recognition test given after each passage, subjects' performance varied as a function of speech type, passage difficulty, and recognition question type. Subjects listening to synthetic speech responded more accurately to word recognition questions than to proposition recognition questions. Listeners who heard natural speech, in contrast, demonstrated better proposition recognition performance. The results indicate that listeners who heard synthetic speech attended more closely to the acoustic‐phonetic input than to the propositions of the passages. The results are discussed in terms of a limited capacity attentional mechanism. [Work supported by NSF IRI 86‐17847.]
FREE

Context effects in the perception of personal information in the speech signal (A)

John Mullennix, Keith Johnson, and Meral Topcu

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2011-2011 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
The speech signal contains linguistic, personal, and social information. Many studies have demonstrated that the perception of linguistic information is subject to context effects. This paper is a report of a study concerning context effects in the perception of personal information. When listeners were asked to identify the speaker of synthetic stimuli (the vowel /i/) in terms of male/female attributes, their responses were most affected by F0 and formant values with only a small effect of glottal waveform shape. The results of a perceptual anchoring study will be reported, in which listeners were asked again to identify the stimuli on the basis of speaker attributes, but with one endpoint of the synthetic continuum presented more often than any of the other stimuli. The results of this experiment will be discussed in terms of the hypothesis that listeners' perceptions of personal information in the speech signal are influenced by context. [Work supported by NIH.]
FREE

On the perceptual differentiation of spontaneous and prepared speech (A)

Robert E. Remez, Stefanie M. Berus, Jennifer S. Nutter, Jessica M. Dang, Lila Davachi, and Philip E. Rubin

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2011-2012 (1991); (2 pages)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Naive listeners arc readily able to differentiate spontaneously produced speech from speech produced from text. The prior studies have employed lexically, syntactically, and thematically identical pairs of natural sentences extracted from brief fluent monologs (< 40 s in duration), finding relatively high levels of performance in tests of perceptual differentiation. To determine which attributes of the speech signal contribute to the perceptual differentiation of spontaneous and prepared speech, the present study manipulated several likely acoustic parameters employing techniques of speech synthesis. One condition reduced the frequency variation of the synthetic copies of the utterances to a monotone. A second condition removed the segmental attributes (consonants and vowels) from the sentence pairs by low‐pass filtering of the synthetic signals, leaving metrical and fundamental frequency variation intact. The final condition neutralized both the segmental and phonatory attributes, leaving only metrical properties available to perceivers by which to differentiate the sentence pairs. Although systematic perceptual effects were anticipated, in fact these acoustic conditions modulated the differentiability of individual sentence pairs in different ways. Evidence of this kind indicates that perceptual analysis of spontaneity takes place at the level of the sentence, and comparisons across the set of conditions prove that no single acoustic emblem of the speech signal conveys spontaneity to the listener.
FREE

Further investigation of the semantic and pragmatic effects on speech production (A)

Jan Charles‐Luce and Basiliki Papadimitriou

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2012-2012 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Previous studies have shown that preceding semantic information affects the production of target words, for example blocking phonological rules or modifying duration. In addition, there is evidence that speakers modify their production of words during the speaker/listener exchange, for example in formal versus informal speaking situations. In the present study, two experimental environments were established to address the effects of semantic versus pragmatic contexts directly. In both environments, a subject was visually presented with a prime and then a target. The prime was either semantically related or unrelated to the target. The subject said the prime and target words aloud. In one experimental environment, a second person was present in the room to which the subject was to communicate. In the other environment, a subject performed the task alone. Dependent variables were duration of, fundamental frequency of, and reaction time to onset of pronunciation of the target word. The results will be discussed in the frameworks of interactive activation and pragmatic compensation.
FREE

Analysis of hesitations in spontaneous speech (A)

D. O'Shaughnessy

J. Acoust. Soc. Am. Volume 89, Issue 4B, pp. 2012-2012 (1991); (1 page)

Online Publication Date: 14 Aug 2005

Full Text: | Download PDF

Show Abstract
Spontaneous speech differs from read speech in several ways, especially in hesitation phenomena. This paper reports results on hesitation pauses (filled and unfilled) and restarts. For comparison purposes, the acoustic correlates of (unintended) hesitation pauses are compared to those for intentional pauses. A distinction is made between grammatical pauses (at major syntactic boundaries) and ungrammatical ones. Such pause types cannot be separated based on silence or prepausal duration, but rather in the pitch of the prepausal word. Ungrammatical pauses tended to have few F0 continuation rises, whereas virtually all grammatical pauses were accompanied by a prior F0 rise of at least 10 Hz. While silent pauses are easy to locate in speech recognition applications, filled pauses (e.g., “err,” “umm”) resemble words in continuous speech. Filled pauses at major syntactic boundaries were about 300–450 ms, whereas those within syntactic units were shorter. Filled pauses had falling or flat and low F0 patterns. Ones at syntactic boundaries tended to start higher in F0 and then fall, whereas filled pauses internal to a syntactic unit had lower F0 patterns. Concerning restarts in spontaneous speech, when a work was completely repeated, it had virtually the same prosodics in both its instances. When a word was changed in the restart, its second instance was more stressed. [Work supported by Canadian government.]
Close

close