• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Nov 1976

Volume 60, Issue S1, pp. S1-S125

back to top
RSS Feeds
back to top Session M. Speech Communication II: High Order Units—Production and Perception
Contributed Papers
FREE

Distribution of timing events in speech production (A)

G. Weismer and D. R. Ingrisano

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S26-S26 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
The timing of articulatory gestures in speech production was studied in three speakers. Each speaker recorded a list of 400 sentences comprised of a randomized arrangement of 4 sentences by 5 emphatic stress conditions in which each sentence‐emphatic stress combination was repeated 20 times, One sentence was designated as the target utterance, and the remaining three sentences were included as dummy utterances. Subjects were trained to place emphatic stress on a given content word in each utterance or to produce sentences with normal lexical stress patterns. Spectrograms of each target utterance‐emphatic stress combination were prepared and analyzed according to a time normalization procedure in which the time of occurrence of predetermined events in the articulatory sequence was referred to the normalized utterance duration. The hypothesis tested in this investigation—as predicated by Martin's [Psych. Rev. 79, 487–509 (1972)] theory of timing in speech production—was that the repetition variance associated with the normalized “times‐of‐articulation” would be conditioned by the location of emphatic stress within utterances. Preliminary results tend to support this prediction. [Supported by NIMH Grant #R03‐MH 28705‐01.]
FREE

Syllable timing as a function of position‐in‐utterance in infant babbling (A)

D. Kimbrough Oller and Bruce Smith

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S26-S26 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
For several years now it has been known that a number of languages show regular variations of syllable duration as a function of position‐in‐utterance. The most striking finding has been that final syllable vowels (in English, for instance) are up to 125 msec longer than comparable nonfinal syllable vowels. The present study attempts to determine whether or not the same sorts of patterns of syllable duration obtain in infant utterances. A corpus of reduplicated CVCV—utterances of several infants were tape‐recorded and analyzed spectrographically. The durations of the infant consonants and vowels were compared with the durations of adult consonants and vowels as pronounced by adult speakers who were asked to read sequences phonetically like the infant utterances. The results show only a slight tendency toward final syllable lengthening in the infant utterances, a tendency which is demonstrably less than that which occurred in the comparable adult utterances. [Work supported by NIH‐NICHD contract HD‐3‐2793 and Grant 5R01 HD 09906‐02.]
FREE

Temporal patterns of syllables (A)

Linda R. Shockey and Ignatius G. Mattingly

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S26-S27 (1976); (2 pages)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Comparisons of the temporal patterns of pairs of syllables differing only as to presence versus absence of a particular consonant suggests that the temporal effect of the consonantal articulation is not necessarily either additive or simply related to the duration of the acoustic segment conventionally associated with the consonant. In a study of such temporal effects, various monosyllables were embedded in turn in a short carrier sentence, and each version of the sentence was read 30 times at a carefully controlled tempo by one speaker. Durations of segments, syllables and breath‐groups were measured from oscillograms, and least pairs, e.g., say versus slay, compared. It appears that final stop or final nasal lengthens both syllable and breath group by an amount roughly equal to the durations of the corresponding acoustic segment; an initial stop, liquid or aspirate induces no appreciable lengthening, other segments in the syllable being sharply reduced; and initial /s/ lengthens the syllable breath group by an amount approaching the duration of the fricative segment the durations of the other acoustic segments being slightly reduced. It is hoped that these results will contribute to more adequate synthesis by rule. [Work supported by NIH and VA.]
FREE

Use of nonsense‐syllable mimicry in the study of prosodic phenomena (A)

Mark Y. Liberman and Lynn A. Streeter

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S27-S27 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
The technique of nonsense‐syllable mimicry of natural utterances [used by Lindbloom and Rapp, Publ. No. 21, Institute of Linguistics, University of Stockholm (1973) (unpublished)] has many advantages in the study of prosodic phenomena, especially duration. In analytic studies, the elimination of segmental effects as a factor makes data collection much more efficient, and requires only one segmentation criterion. In perceptual studies, the technique eliminates lexical information without unnatural distortions of the signal. In a series of validation experiments, we have found that: (a) the patterns of duration obtained by using this technique were stable and reproducible within and across speakers; (b) mimicry of different natural models with identical stress patterns and constituent structures produced indistinguishable nonsense‐syllable duration patterns; (c) obtained duration patterns correspond closely to results of work on natural speech by ourselves and others.
FREE

Syntactic boundaries in the timing of trochaic speech (A)

William E. Cooper, Steven G. Lapointe, and Jeanne M. Paccia

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S27-S27 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
The duration of a stressed syllable is shortened when it is immediately followed by an unstressed syllable. Previous work showed that this effect operates across word boundaries but is diminished in magnitude by the presence of an intervening syntactic boundary. In this study, the durations of key segments within stressed syllables were measured in sentence pairs containing a matched phonetic environment. The results for ten speakers showed that the shortening effect was blocked in the presence of a number of major syntactic boundaries, including the boundaries between a noun phrase (NP) and a prepositional phrase (PP), between two PP's, and between a NP or PP and a separate clause. Lesser syntactic boundaries, including the boundaries between a verb and NP direct object and between an NP direct object and NP indirect object, did not block the shortening rule. The magnitude of the blocking effect did not depend on the transformational history or internal structure of the syntactic constituents as much as on the boundary type.
FREE

Syntactic determinants of word duration in speech (A)

Richard Goldhor

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S27-S27 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
A preliminary identification of the primary syntactic determinants of duration in English declarative sentences has been made. A hundred declarative sentences were generated from a small set of words, and the durational variations of those words which appeared in more than six sentences were studied. The three syntactic determinants which were found to most strongly and consistently affect word duration are: (1) the length of the phrase in which a word appears; (2) the position of the phrase in its dominating clause; and (3) the number and type of clauses in the sentence. Additionally, these determinants were found to affect phrase‐final words differently than phrase‐nonfinal words. These findings have been incorporated into a simple prosodic durational algorithm suitable for use in a text‐to‐speech system.
FREE

Reading speed as a clue to text structure (A)

M. O. Harris

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S27-S27 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Text consists of structured content. Do readers reflect that structure in their production? A system is under design to map the content pattern of a given text, sentence by sentence, thus locating major and lesser topic boundaries. The system depends on word frequency and distribution in that text, and on the recognition of synonyms and other forms of reference, which establish links betwen text segments. Acoustic tools help to identify significant variations in reading performance, and these can then be interpreted in terms of the text structure. Material for the present paper consists primarily of four passages, each read by three speakers; about an hours' reading in all. The passages are structurally independent excerpts from much larger works. All of the sentences were randomized together and read as one list, and then each passage was read in original form. A study based on duration measurements, indicates clear agreement among speakers as to the locations where reading speed changes occur: e.g., in sentences of transition between major topics, and at points of salient semantic shift. However, the nature of the change increase or decrease in speed is highly speaker dependent.
FREE

Perceptual phonetics of coarticulation (A)

J. G. Martin, R. H. Meltzer, and C. B. Mills

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S27-S27 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
In experiments reported earlier at ASA meetings (November, 1975) and elsewhere, reaction time (RT) of subjects monitoring stop‐consonant phoneme targets in tape‐recorded sentences was observed. RT was compared when the target (a) was carried by the normal, intact sentence version or (b) was temporally displaced by experimental intervention, that is, separated from prior sentence context by addition of 200 msec to the normal pre‐stop‐consonant silent interval. Faster RT to temporally displaced than to normal targets was interpreted in terms of coarticulatory cues to target existing in the speech interval preceding the intervention which were used to anticipate the target across the intervention interval. In further analysis, data were separated on the basis of four classes of pretarget phonetic context: stop, fricative, sonorant, and vowel. All classes produced coarticulatory effects (relatively faster RT to displaced compared to normal targets), some more than others. Additional analysis indicated similar effects in the normal sentence versions also. Discussion concerns the mapping of perceptual results onto acoustic and articulatory data. [Work supported by NIMH, ARIBSS.]
FREE

Factors affecting the detection of mispronunciations (A)

R. A. Cole and J. A. Jakimik

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S27-S27 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
In a series of experiments, subjects were presented with sentences or short stories, and required to press a response key whenever they detected a mispronounced word. Mispronunciations were produced by changing one phonetic segment in a word to produce a nonword. Detection of mispronunciations was effected by factors at several linguistic levels: (a) the type of phonetic change, (b) the position of the changed segment in the word, (c) the stress of the syllable (e.g., “contain” versus “contour”, where /k/ was pronounced /g/ in each word), (d) the syntactic structure of the sentence, and (e) whether the word containing the mispronunciation occurred in a previous sentence. The results will be summarized and discussed.
FREE

Word‐letter phenomenon with speech stimuli: a word‐segment effect (A)

Joseph Rogers, Anne B. Thistle, and Brooke Neilson

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S27-S28 (1976); (2 pages)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Single speech segments excised from natural speech were significantly better identified when embedded in words than when embedded in incomplete words or presented alone. This held true, however, only when a pre‐ and post‐exposure mask designed to contain a randomly ordered selection of the features of speech segments was employed. Under no‐mask or white‐noise‐mask conditions, segments showed a slight advantage over words. These findings parallel those previously reported for visual stimuli: single letters in word context are significantly better identified than single letters alone, but only with random letter‐feature masking. Without such masking there is a slight advantage for letters alone. This “word‐letter phenomenon” has been offered as evidence for heirachical processing in vision. Likewise, a word‐segment effect suggests at least some form of heirarchical processing in speech perception. Further, the parallel results of both visual and auditory studies indicate that auditory and visual language processing channels are either remarkably similar or converge at a relatively early stage. [Work supported by a UCSD Academic Senate Grant to James L. McClelland.]
FREE

Test of speech intelligibility in noise using sentences with controlled word predictability (A)

D. N. Kalikow, K. N. Stevens, and L. L. Elliott

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S28-S28 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
This paper describes a test of everyday speech reception, in which a listener's utilization of the linguistic‐situational information of speech is assessed, and is compared with the utilization of acoustic‐phonetic information. The test items are sentences which are presented in babble‐type noise, and the listener response is the final word in the sentence (the key word) which is always a monosyllabic noun. Two types of sentences are used: high‐predictability items for which the key word is somewhat predictable from the context, and low‐predictability items for which the final word cannot be predicted from the context. Both types are included in several 50‐item forms of the test, which are balanced for intelligibility, key‐word familiarity and predictability, phonetic content, and length. Performance of normally‐hearing listeners for various signal‐to‐noise ratios shows significantly different functions for low‐ and high‐predictability items. The potential application of this test to assessment of speech reception in the hearing impaired is discussed. [Work supported by NINCDS.]
FREE

Performance of children from 11 to 17 years of age on a sentence test of speech intelligibility in noise (A)

L. L. Elliott

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S28-S28 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Ninety‐six normally hearing children (24 each at ages 11, 13, 15, and 17 years) were individually administered the Speech Perception in Noise (SPIN) Test (see preceding paper by Kalikow, Stevens, and Elliott) at signal‐to‐noise (SIN) levels of 5, 0, and −5 dB in counterbalanced order and in quiet. Their task was to repeat the last word of each sentence. Average performance of the 17‐year old Ss was very similar to performance of adults on all conditions. An age effect appeared in the results for the high‐predictability sentences at 0‐dB S/N level, with younger Ss showing poorer performance. This age effect was specific to the listening‐in‐noise condition, since the 11‐year‐old Ss demonstrated perfect or near perfect scores when later tested in quiet. Implications of these findings will be discussed and the results of testing some children with learning disabilities will also be presented. [Supported by a grant from B.E.H.]
FREE

Hearing “words” without words: speech prosody and word perception (A)

Lloyd H. Nakatani and Judith A. Schaffer

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S28-S28 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
The role of prosody in word perception was studied by asking listeners to locate word boundaries in speech with natural prosody but without phonotactic or semantic cues. To obtain such speech, seven speakers read sentences (e.g., “The noisy dog never stopped barking”) with “ma” substituted for each syllable of the three‐syllable adjective‐noun phrase (e.g., “The mama ma never stopped barking”). These nonsense phrases with all possible stress patterns were excised and played to 38 naive listeners who judged whether they heard each phrase as “ma mama” or “mama ma.” The stress pattern determined how well the phrases were parsed into words. When the stress pattern made the word division unambiguous (01‐1, 1‐12, 1‐10 where 0, 2, 1 denote no stress, midstress, and high stress), listeners used the pattern to parse the phrases easily; striking individual differences were observed. When the stress pattern left the word division ambiguous (11‐1, 1‐11, 12‐1, 1‐21, 10‐1, 1‐01), parsing was harder but still better than chance. So not only global patterns, but also detailed information about prosody, are used in word perception.
FREE

Word juncture perception as a function of word‐initial phonemes (A)

Kathleen D. Dukes and Lloyd H. Nakatani

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S28-S28 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
A previous study showed that allophonic variations in word‐initial phonemes were the primary cues for word juncture perception. This study examined juncture perception as a function of the word‐initial phonemes occurring singly and in clusters. Pairs of two‐word phrases contrasting in juncture location (e.g., “he praises” and “heap raises”) and representing a wide variety of word‐initial vowels, consonants and consonant clusters were recorded in sentence contexts by four talkers. The phrases were excised from the sentences and played to listeners who indicated which phrase they heard. Juncture perception was good for word‐initial voiceless stops, vowels, affricates, and nasals, and fair for fricatives and voiced stops. Juncture perception was poor for liquids and semivowels, but good when they occurred in a consonant cluster. Juncture perception for clusters beginning with /s/ was better than for /s/ alone. Juncture perception was poor when the same consonant occurred before and after the juncture (e.g., “heap praises”); this suggests that segmental duration is not an important cue for juncture perception.
FREE

Perceptual determinants of phrase boundary placement (A)

Lynn A. Streeter

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S28-S28 (1976); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
In speech production many acoustic cues are correlated with the presence of a major syntactic boundary. Here the relative contribution of three acoustic cues, fundamental frequency contour, duration, and amplitude to perceptual segmentation process was evaluated. The stimuli were ambiguous algebraic expressions, such as “A times E plus 0,” in which the middle term E could be a member of either the first or the second phrase. Two levels (appropriate and neutral) of each of the three cues were varied in a complete factorial design. The results indicated that while all three cues significantly influenced the placement of a phrase boundary, duration was more potent than either amplitude or the fundamental frequency contour.
FREE

How is the memory search of a two‐clause complex sentence in immediate memory carried out? (A)

Leonard Shedletsky

J. Acoust. Soc. Am. Volume 60, Issue S1, pp. S28-S29 (1976); (2 pages)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
An item recognition task was performed with 32 native English speaking, adult, right‐handed subjects who listened to eight two‐clause complex sentences presented to the left ear, each immediately followed by a probe word presented to the right ear. The subjects indicated whether or not the probe word occurred in the sentence and their recognition latency was measured. An analysis of variance was performed on recognition latency as a function of the three independent variables: (a) the serial position of the target word, early or late, within (b) a main or subordinate clause, in (c) initial or final clause position. The findings of this experiment were: (a) a word in the final clause is recognized significantly faster than a word in the initial clause; (b) for subordinate clauses, subjects take longer to respond to a target word occurring late in the clause than to a target word occurring early in the clause; for main clauses, subjects take longer to respond to a target word occurring early in the clause than to a target word occurring late in the clause. Present storage models of sentence processing and memory search models are inadequate to account for all the data. A combined storage‐search account was proposed. A serial self‐terminating model of clause accessing, with final clause search occurring prior to initial clause search, fit the data better than a simultaneous search of both clauses. Clauses are searched either in a primary or a secondary buffer, depending on clause type (main or subordinate) and clause position (initial or final) in the sentence. To explain the difference in mode of search between main and subordinate clauses, it was suggested that main clauses exhibit a property of primacy over subordinate clauses.
Close

close