• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue

Jun 1977

Volume 61, Issue S1, pp. S1-S96

back to top
RSS Feeds
back to top Session TT. Speech Communication VII: Speech Modeling and Timing
Contributed Papers
FREE

Relationships among formant frequency measures of vowels in an “imitation” dialect (A)

George Papçun

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S89-S89 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Twenty trained phoneticians of widely varying physical characteristics imitated vowels from Japanese, Spanish, and English under conditions designed to allow them to achieve maximally accurate imitation. In a previous report [George Papqun and Richard Harshman, “How do different speakers say the same vowels?” J. Acoust. Soc. Am. 59, S71(A) (1976)], discriminant analysis used with a subset of this data showed the necessity of using nonlinear terms in describing the overall phonetic space. In this report, regression and similar analyses are applied to vowels individually to examine the issue of which combinations of formant frequencies may be construed as representing one and the same vowel. [Work supported by USPHS and NSF.]
FREE

User adaption in operational speaker verification (A)

George R. Doddington and Barbara M. Hydrick

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S89-S89 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Texas Instruments controls entry to its corporate computer facility by speaker verification. During two and one half years of operation, 0.4 million verifications have been performed for 300 users. Current performance is 99.9% user acceptance with 99% impostor rejection. Initial user acceptance is not as good (99%), but steady‐state performance is achieved after 50–100 verifications. Careful user orientation is important, and user adaptation and voice consistency continue to improve over a long period of time. The variance of input speech data for those 36 users (24 men, 12 women) who had been verified at least three thousand times was plotted versns session number. Average variance was 108 at 250 sessions, 105 at 500, 101 at 1000, 96 at 2000, and 93 at 3000 sessions. How do you explain this continued improvement?
FREE

Tone space and universals of tone systems (A)

Jean‐Marie Hombert

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S89-S89 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
The purpose of this paper is to present a mathematical model which predicts the most likely tone shapes of a tone system. Given the total number of lexical tones the model will output two numbers for each tone. The first number represents the onset and the second number the offset of the tone on a five‐point scale. This model is based on the principle of maximal perceptual distance and minimal articulatory difficulty. The parameters measuring perceptual distance and articulatory difficulties were obtained from consideration of previously reported data by other researchers and two recent experiments, The effect of using various measurements of perceptual distance (euclidian, city block, etc.) and various relative weights for articulatory vs auditory factors is also investigated. The predictions made by the model were satisfactorily matched with actual data from more than one hundred tone languages. The notion of “most likely tone shapes” based on a trade off between maximum perceptual distance and minimum articulatory difficulty is very useful for understanding why and how tone systems evolve. [Work supported by NSF.]
FREE

Nasalization as a style marker in Persian (A)

M. Kahn, A. Ordubadi, and J. Bernstein

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S90-S90 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Nasal output in informal (conversation) and formal (read, paradigms) Persian were sampled from a fairly homogeneous population of Iranian speakers. Measurement was accomplished by recording the output of a small accelerometer placed on the noise. The peaks on the accelerometer out‐put were then compared to the raw speech wave. Overall degree and quantity of nasalization was measured across speakers and compared with similar measurements obtained for hearing adult speakers of English, from Stevens et al. [Speech Hear. Res. 19, 2 (1976)]. Then nasalization was contrasted within each speaker's repertoire of styles. Styles were objectively labeled by appearance of particular lexical items and phonological substitutions. Particular attention was paid to the substitution of postvocalic nasalization for a non‐nasal alveolar stop, e.g., the morpheme −i:d→−ĩ:. Oddly enough, this phonological change has seemingly evolved in absence of any original motivating nasal consonant.
FREE

Palatalization of /l/ in Italian and Spanish (A)

Hector Javkin

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S90-S90 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
The palatalization of /l/ in Italian and Spanish presents an important case of the interaction of universal phonetic and language specific factors. In both languages, short, non‐post‐occlusive /l/ to /j/. Spanish palatalized geminate /l/ to /j/. Spanish palatalized geminate /l/ to /j/. These changes appear to be due to the fact that post‐occlusive and geminate /l/ both have a higher tongue position than short, non‐post‐occlusive /l/: geminate /l/ due to a continuation of the tongue‐raising gesture after the /l/ contact is reached; post‐occlusive /l/ because the mandible is in a higher position following the occlusive. The higher tongue position alters the acoustic characteristics of the /l/ release, making it perceptually similar to a palatal glide. Both types of /l/ thus contain the phonetic conditions for a sound change. Why Italian and Spanish chose to palatalize only one of these types of /l/ (and why each made a different choice) is a question that the phonetician cannot answer. [Work supported by NSF.]
FREE

Stress correlates in Montreal French (A)

Malcah Yaeger and William Kemp

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S90-S90 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
We have impressionistically determined the meaningful stress categories for Spontaneous Montreal French. These categories are distinguished by variations of F0, amplitude, and syllable duration. Sentential stress is realized phonetically as reduced fundamental and amplitude, but increased duration of the final syllable. In contrast, comma stress is realized as raised amplitude and either (a) raised F0 with no durational increment (“emphatic stress”), or (b) a fall—rise contour with some durational increment (“list intonation”). In all cases we find [similarly to Klatt, J. Phonet. 3, 159 (1975); Kloker, J. Acoust. Soc. Am. 57, S33 (A) (1975)], the word preceding the stressed word is lengthened. We analyzed the interaction of stress with syntax and with variable phonological realization of certain segments. Whereas the durational and pitch correlates of stress appear fairly constant across all eight of our speakers analyzed in depth to date, the influence of stress on consonant cluster retension and on rate and trajectory of diphthongization of monophthongs is determined by social factors. [Research funded by Killam Grant: Conseil des Arts.]
FREE

“I don't WANT to see her grease gun fittings”—Do post‐nuclear accents occur in English? (A)

Ralph Vanderslic

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S90-S90 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Encoder subjects hears, for each item, a disambiguating context such as either “Mac has some cover stories,” or “Max is covering some stories,” followed by the cue “Let's go and see them/him.” Subjects' task was to respond emphatically and negatively in the form “I don't WANT to go and see Mac's/Max cover stories.” These responses were recorded on tape and played to decoder subjects, whose task was to repeat the target phrases in citation form—i.e., with disambiguating accentuation. According to received prosodemic theory, decoders should reliably recover the structures given to encoders, whose responses would, ex hypothesi, carry the same “stress” distinctions, merely realized at weaker “levels.” In fact, however, the patterns given to encoders and those elicited from decoders failed to show significant (above chance) agreement. This finding is interpreted as support for the theory of “binary” suprasegmental features [R. Vanderslice and P. Ladefoged, Language 48, 819–838 (1972)], which claims that the feature ACCENT does not occur postnuclearly (i.e., the intonation contour, in this case CADENCE, begins with the last accented syllable of the sense group), and thus predicts exactly the loss of prosodemic contrast observed in these cases. The tune in question, with leftward “shift” of the sentential accent to the emphatic word WANT, must not be confounded with one also having EMPHASIS (and thus typically highest pitch) on WANT, yet with unshifted sentence accent on either cover or stories. The latter, disambiguating, tune is superficially similar but systematically distinct. It should also be noted that the “binary” prosodic features can, mutatis mutandis, be viewed as singularly elements (morphons) in a Cognitive‐Stratificational account of English suprasegmentals.
FREE

The declination effect (A)

J. Breckenridge

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S90-S90 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
The perception of the declination effect, which is the nearly universal tendency of intonation to drift downward in pitch, was investigated. The perception of this effect is interesting because it exemplifies the general problem of how the moving pitches used in intonation are processed and remembered. The stimuli for the experiments were nonsense sentences of the form “ma MA ma ma MA ma.” The pitch on one stressed syllable was varied by small increments and the resynthesized stimuli were randomizod before presentation. Subjects judged which peak was higher in pitch. A second peak with the same pitch as the first generally sounded higher. This demonstrated that the declination effect is psychologically real. The crossover point, or point of subjective equality, for wide pitch range stimuli reflected the large declination effect found in animated speech, whereas a negligible effect was found for narrow pitch range stimuli. A 3.6‐dB increase in amplitude on the second stressed syllable shifted the crossover point downward to a surprising degree. Raising the pitch on the unstressed syllables made the second stressed syllable sound lower. [Work carried out at Bell Laboratories.]
FREE

Modeling of durational patterns in reiterant speech (A)

M. Y. Liberman

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S90-S90 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
In a paper presented at the 1976 ASA fall meeting, Liberman and Streeter [J. Acoust. Soc. Am. 60, S27 (A) (1976)] in which a nonsense syllable is substituted for each of the original syllables of a target phrase) shows stable durational patterns, influenced by the stress and constituent structure of the target phrase, and relatively unaffected by its segmental makeup. In the present paper, a model of durational patterns in reiterant speech is described, and used to argue that English stress patterns maintain their phonetic reality even when they are not differentiated by pitch.
FREE

Segment and word durational correlates of syntactic boundaries in Italian (A)

Marina A. Nespor and George D. Allen

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S91-S91 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Major syntactic boundaries are marked in many languages by pauses, but the acoustic phonetic correlates of these pauses differ from language to language. The majority of languages, including English and Italian, signal pauses by lengthening the prepausal segment(s), syllable(s), or word(s). In Italian, however, certain syntactic boundaries satisfying a “left branch” condition on the structural description of the sentence are marked by events following the boundary. In northern Italian dialects, the word immediately following the boundary is shortened; in central, southern, and insular dialects, the initial consonant of the second word following the boundary is lengthened. These three phenomena, prepausal lengthening, immediate postpaudal shortening, and lengthening of the initial consonant of the second word after the pause, are not incompatible, and may in fact be manifestations of one rhythmic principle. Examples of the post‐boundary phenomena, their syntactic conditions, and their phonetic correlates will be discussed. [Supported in part by NSF Grant No. BNS74‐02513.]
FREE

On S duration: a case for surface and underlying representation (A)

D. Ingrisano and G. Weismer

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S91-S91 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Attempts to detail minimum/maximum intrinsic durational characteristics of phonetic segments and thereby deduce underlying phonemic correspondence is complicated by several factors (stress, number of syllables, syntax, nature of carrier phrase, orthography, etc). The objectives of this study were twofold: (a) to determine minimum/maximum durational characteristics of [s] while systematically permuting lexical and sentential stress, morpheme and syllable location, and syntax, and (b) to examine the hypothesis that an underlying double [s] may account for longer [s] durations observed in the first member of stimulus pairs such as misstep vs. mistake [Klatt, J. Speech Hear. Res. 17, 51–63 (1974)]. In order to control orthographic effects, subjects participated in two experimental conditions which consisted of Delayed Imitation and Reading of target phonemes embedded in sentential material. The responses of 20 subjects were tape recorded, and duration data were derived from oscilloscopic displays under consistent segmentation criteria. Results are discussed relative to intrinsic durational characteristics of [s], the influence of phonemic representation upon the temporal structure of [s], and the influence of orthographic effects on the study of durational phenomena.
FREE

Temporal control of medial stop consonant clusters in English (A)

John R. Westbury

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S91-S91 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Measurements were made of closure durations of the eighteen possible voiced‐voiceless and voiceless—voiced combinations of /p/, /t/, /k/ and /b/, /d/, /g/ produced medially in isolated nonsense disyllables by three English‐speaking males. Preliminary results show that closure durations depend on both voicing and place of articulation of first and second members of such clusters. Mean closure durations were significantly longer (by approximately 20 msec) for voiceless‐voiced than voiced—voiceless clusters. Clusters with first‐member velars were longer than those with first‐member labials and alveolars, while clusters with second‐member velars were shorter than those with other second members. Sequences of stops requiring the greatest changes in points of articulation (labial → velar, or velar → labial) were longer than sequences requiring smaller changes in the same direction (labial → alveolar, alveolar → velar, or velar → alveolar, alveolar → labial). Closure durations generally varied inversely with durations of surrounding vowels. Clusters were longest, for example, in the frame /pI__It/, somewhat shorter in/pI__at/, shorter still in /pa__It/, and shortest in /pa__at/. But, durations for like clusters were no shorter in the frame /pI__Id/ than in /pI__It/. These latter findings suggest first that closure durations for medial step clusters are governed extrinsically by a tendency toward isochrony. However, whatever principles lead toward isochrony appear to operate before the rule which results in greater length of final‐syllable vowels before voiced stops in English.
FREE

Some effects of speaking rate on timing in speech production (A)

G. Weismet and D. Ingrisano

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S91-S91 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
At a previous meeting of the Society, we reported certain effects on speech timing due to emphatic‐stress placement within a five‐syllable utterance [Weismer and Ingrisano, J. Acoust. Soc. Am. 60, S26(A) (1976)]. The present investigation explores the stability of these effects across two speaking rates for five subjects. Each of the subjects recorded a list of 400 sentences at both speaking rates (conversational and ‘very fast’); the lists contained a randomized arrangement of 4 sentences × 5 emphatic‐stress conditions in which each sentence‐emphatic‐stress combination was repeated twenty times. The time‐normalization analysis procedure described in our previous report was used to allow direct comparison of articulatory timing patterns associated with identical phoneme sequences spoken at two distinct rates. Results are discussed relative to the notion of reorganization of articulatory commands with large changes in speaking rate [Gay and Hirose, Phonetiea 27, 44–56 (1973)] or the lack thereof [Lindblom, J. Acoust. Soc. Am. 35, 1773–1781 (1963)]. [Supported by NIMH Grant #R03‐MH 28705‐01.]
FREE

An interaction between auditory and oral sensory feedback in speech regulation (A)

M. A. Crary, D. Fucci, and J. A. Warren

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S91-S91 (1977); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Investigation into the relative contribution of auditory and oral sensory feedback in speech regulation has received much attention in the last twenty years. Many studies have disrupted one or both of the primary feedback modalities in speech and reported the resulting effects on production. Recently, data has been reported which suggest an interaction between auditory and oral sensory feedback in speech regulation. To investigate further the interaction between the auditory and oral sensory feedback modalities during speech production lingual vibrotactile thresholds were obtained from Ss in the following conditions: (1) before and after speech production with normal auditory feedback, (2) before and after speech production under exposure to auditory masking, and (3) before and after exposure to auditory masking without performing speech tasks. In addition, duration measurements were obtained for selected speech sounds to investigate temporal changes in the articulatory patterns of Ss in the various conditions. Lingual sensory decreases and temporal reorganization were observed only in Ss speaking under auditory masking. These data suggest a balanced interaction between auditory and oral sensory feedback modalities which, when disturbed, results in nonphonemic changes in speech production.
FREE

Speech production measured of speech perception: replications and extensions (A)

R. J. Porter, Jr. and F. X. Castellanos

J. Acoust. Soc. Am. Volume 61, Issue S1, pp. S91-S92 (1977); (2 pages)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
In the early 1960's, Chistovitch and her colleagues used rapid shadowing (vocal reaction time) and manual reaction time to explore the perceptual processing of speech. Two ingriguing aspects of their data were the relatively fast vocal reaction times and the subjects' tendency to begin articulatory responses before all of the acoustic cues for consonants were presented. These results, among others, support the classic notion of an intimate link between perceptual analysis of speech signals and the mechanisms of control in speech production. The study to be reported replicates and extends some of Chistovitch's paradigms using modifications designed to eliminate possible sources of confounding. Results tend to support the conclusions of the Russian researchers and point to the additional ways in which the modified procedure might be used in exploring phonetic decoding. [Supported by NIH and the Sloan Foundation.]
Close

close