• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Nov 1988

Volume 84, Issue S1, pp. S2-S224

back to top
RSS Feeds
back to top Session KK. Speech Communication VII: Production, Part C (Poster Session)
Contributed Papers
FREE

A study on the formant analysis of Korean monophthongs and their resonance effects in vocal tract (A)

Hyun Jae Shin and Suk Wang Yoon

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S112-S112 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Twelve Korean monophthongs pronounced by five male vocal musicians with five fundamental frequencies were studied by formant analysis. Fundamental frequencies and their harmonics were considered as the parameters of analysis. This study shows that the first and the second formants are characterized by the resonance of the cavities of pharynx and mouth, respectively. The lip‐rounding effect decreases the second formant frequency. The phonemes of /a, ɑ/, /e, ɛ/, and /ə, ʌ/ were not distinguished well in this formant analysis for Korean.
FREE

Acoustic correlates of apical and laminal articulations (A)

Sarah N. Dart

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S112-S112 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Several indigenous languages of the western United States have a phonemic contrast between segments articulated with the apex of the tongue up (apical) and those articulated with the apex down behind the lower teeth and the contact on the palate made instead with the blade of the tongue (laminal). This difference in apical and laminal articulation is also seen in individual variation in languages without such a phonemic distinction. In the present study, palatograms and linguagrams with synchronous audio recordings were taken of a number of speakers of French, American English, and several different Native American languages, illustrating apical and laminal articulations in languages both with and without the pertinent phonemic distinction. The segments studied include /t,d,n,l,s,z/. This paper reports on the acoustic correlates of the articulatory differences found within each language as well as some cross‐language comparisons. [Work supported by NSF Grant BNS 8704361.]
FREE

Coarticulation effects in Japanese velar stop consonants: Observations with dynamic velography (A)

Noriko Suzuki and Ken‐ichi Michi

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S112-S112 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The positioning of the tongue body is strongly influenced by coarticulation effects. The tongue body contact position for /k/ and /g/ varies especially widely, moving in the direction demanded by the surrounding phonemes. Coarticulation effects of post‐consonantal vowels on the place of tongue body contact for Japanese velar stop consonants were observed by dynamic velography. The dynamic velography system, developed by the authors is a 36‐electrode electropalatographic technique for observing lingual contact with the soft palate. The resulting velograms can be compared with spectrograms made from simultaneous recordings of speech. Three Japanese male subjects produced VCV utterances with stress on the second syllable. The velogram patterns for [aka], [ako], [aga], [ago] were similar, with contact located at the posterior part of the soft palate. Contact for [ake], [aki], [age], [agi] was located more anteriorly. Thus, as suggested by previous findings, the tongue contact position for Japanese /k/ and /g/ is strongly influenced by the vowels following it.
FREE

Effects of variation in speaking rate, loudness, and vocalic context on linguapalatal contact patterns in Hindi sibilants (A)

R. Prakash Dixit and James E. Flege

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S112-S112 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
A 96‐channel electropalatograph was used to monitor normal, fast, and loud production of sibilants in the nonsense words /bisiÝb/, /basáb/, /busúb/ and /bišíb/, /bašáb/, /bušúb/ spoken in a carrier phrase by a native speaker of Hindi. Results showed the following. (1) Groove width (GW, in number of uncontacted sensors×2 mm) was considerably narrower for /s/ (4.64 mm) than /š/ (9.26 mm). (2) GW for /s/ decreased from normal to fast to loud speech; just the opposite was true for /š/. (3) For both /s/ and /š/, GW was narrower in the context of /a/ than /i/ or /u/; in the latter two, it was narrower in the /u/ context. (4) Anterior‐posterior location (APL) of the groove (in number of contacted rows) occurred generally at the second and third rows for /s/ and the fourth and fifth rows for /š/. Consequently, the center of the groove for /s/ was 4 mm anterior to that for /š/ (i.e., the difference in the number of rows×2 mm). (5) APL of the groove for /s/ was more anterior in loud than normal or fast speech; in the latter two, it was more anterior in normal speech. (6) Groove APL for /s/ was more anterior in the context of /a/ than /i/ or /u/; in the latter two, it was more anterior in the context of /i/. (7) For /š/, groove APL was more posterior in fast than normal or loud speech; in the latter two, it was more posterior in loud speech. (8) Groove APL for /š/ shifted more posteriorly from /i/ to /a/ to /u/ context. [Work supported by NIH Grant NS20572.]
FREE

Acoustic differences correlated with derivational history (A)

M. Peet and M. Withgott

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S113-S113 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
In a study of American English palatals, acoustic characteristics of allophones with different derivational histories were compared. In examples such as “toss sheepskins” versus “josh Sheila,” palatalization was found to be complete, as observed in a spectral analysis, and no rearticulation was found. However, the š derived from a sequence of two š's was observed to exhibit longer duration values than the s from an sš sequence, in keeping with the underlying, intrinsic segment duration values (cf. also, Zue and Shattuck‐Hufnagel, 1980). The results suggest that a production/perception theory should model the underlying segment along with the assimilation process. The data were collected from a corpus of 60 sentences each read by eight adult male speakers. [Work supported, in part, by DARPA‐ISTO.]
FREE

Aerodynamic constraints on language history: The ease of bilabial trills (A)

Ian Maddieson

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S113-S113 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Bilabial trills, transcribed [B], are markedly rare as speech sounds in the world's languages, even though Catford (1977) claims that they are the easiest type of trill to produce. With one exception (Liangshang Yi), all languages known to have bilabial trills developed them in a highly restricted environment consisting (historically) of a sequence of a voiced bilabial nasal, a voiced bilabial stop, and a high rounded vowel, e.g., [mbu] → [mBu]. In many cases, the trill remains allophonic (e.g., Na?ahai, Windua) but some languages have undergone a restructuring resulting in phonemic bilabial trills (e.g., Kurti, Atchin). The limitation to this environment suggests that aerodynamic conditions normally only satisfied in speech by this sequence are required for labial trilling to develop. The essentials would seem to be: (1) transglottal airflow without intraoral pressure buildup, permitted by the nasal escape in [m] and not significantly affected by the very brief closure typically found for a homorganic post‐nasal stop; (2) subsequent maintenance of bilabial closure without nasal flow, providing for an oral release; and (3) a following vowel with a target position for the lips of narrow aperture. Since this target lip position requires only a small movement from closure, articulatot movement is slow [Kent and Moll (1972)]; hence, there is a period of time during which the lips remain close enough together for Bernouilli forces, in the absence of increased intraoral pressure, to reclose them and initiate trilling. A simplified quantitative model of this process will be presented, together with speculations on why bilabial trills do not otherwise occur linguistically. [Research supported by NSF Grant BNS 87‐20098.]
FREE

An acoustic analysis of English liquids uttered by the Japanese and native speakers of English (A)

Hirotake Nakashima, Yukihiro Nakayama, and Charles McHugh

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S113-S113 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
This research deals with the production of English liquids through an acoustic analysis of speech samples uttered by Japanese college students, advanced Japanese learners of English (college teachers of English), and native speakers of English 13–18 years of age who live in Japan. The speech samples are 64 English words with liquids in initial position. These speech samples were analyzed by the autocorrelation method of linear prediction to estimate the formant frequencies every 10 ms using an analysis window of 20‐ms length. The second and third formant frequencies were extracted for each of the liquids. The results showed that: (i) There is a significant difference in the formant values of each liquid between males and females; (2) the formant values of each liquid for advanced Japanese learners of English coincide with those for native speakers of English; and (3) although the values of F2 and F3 of /r/ and /l/ uttered by Japanese students overlapped with each other on the F2‐F3 plane, these separated clearly after pronunciation training.
FREE

A comparison of evaluations by American and Japanese listeners of English spoken by Japanese speakers (A)

Hiroshi Suzuki and Ghen Ohyama

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S113-S113 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
By means of the PARCOR analysis and synthesis technique, various combinations of the three prosodic features, i.e., the duration of each sound, pitch change, and the intensity change of an English sentence uttered by Japanese speakers in typically Japanese fashion were replaced with the same combinations of the prosodic features of the same English sentence read by an American. A group of Americans and a group of Japanese listened to the recording of the modified utterances and judged their English acceptability level, or “Englishness.” For the Americans, pitch seems to play a more important role in such judgments than duration and intensity, while, for the Japanese, pitch is far more important than duration, and intensity seems least important.
FREE

Evaluation of English pronunciation based on the static and dynamic spectral characteristics of words spoken by Japanese (A)

Hiroshi Hamada and Ryohei Nakatsu

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S113-S113 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
To develop an English pronunciation training system, a method is proposed for evaluating the pronunciation ability of Japanese speakers. The quality of English pronunciation is assumed to be determined by the static characteristics of phonetic spectra, the dynamic structure of spectrum sequences, and prosodic characteristics of utterances. Since it is difficult to evaluate these factors directly, evaluation is achieved by comparing English words uttered by a Japanese speaker with those uttered by a native speaker using speech recognition techniques. The static characteristics of phonemes are evaluated by measuring the stability of mapping functions that adapt phonetic spectra of Japanese speakers to those of native speakers. The mapping functions are obtained by speaker adaptation through vector quantization. Evaluation values for the dynamic spectral structure are defined by the DTW matching distance between words spoken by Japanese and those spoken by native speakers. Although prosodic characteristics are not considered, preliminary experiments show that the evaluation results obtained by the proposed method have a good correspondence with human judgments of pronunciation quality.
FREE

Acoustic measurements of induced slips of the tongue in children and adults (A)

Bruce L. Smith

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S113-S113 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Although many studies of slips of the tongue have been conducted with adults, very few investigations have considered children's slips of the tongue. Furthermore, because spontaneous slips of the tongue typically are not captured with tape recordings, very few acoustic studies of slips of the tongue have been conducted with any subjects. The present study utilized an elicitation technique to induce slips of the tongue in a group of 5‐year‐old children and a group of adults. Subjects repeated short tongue‐twister phrases (e.g., Swiss wristwatch shop), as well as control phrases that were easier to produce (e.g., Swiss chocolate store). The types of errors that subjects made (substitutions, dysfluencies, etc.) were computed, and acoustic measurements of certain segments were made. One observation made from the acoustic data was that segment durations produced by both the children and the adults were 30%–4.0% longer for the tongue twisters versus the control phrases, even when productions were correct in both conditions. Additional findings and their implications concerning speech production will be presented.
FREE

Infants' vocalizations in mother‐infant interaction (A)

Yoko Shimura, Satoshi Imaizumi, Tamiko Ichijima, and Kozue Saito

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S114-S114 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The vocal behavior of young infants was investigated to clarify the process of speech acquisition. Acoustic and picture analyses were carried out for audio and video recordings of 26 young infants (aged 4 to 19 months) being addressed by their mothers, unfamiliar adults, or voices presented through a loudspeaker. The results were the following. (1) Mothers tended to use a wider range of F0 than those observed in their speech toward adults. (2) The F0 upper limits in the mothers' speech was closely related to the F0 of their infants' voices. (3) Young infants produced various voice qualities, which could be characterized by the richness of their subharmonics, the richness of noise, or a sudden change in F0. (4) Not only the qualities of the infants' voices, but also their facial expressions and looking behavior seemed to change according to how and by whom they were being addressed. [Work supported by Toyota Foundation.]
FREE

Phonologically motivated substitutions in a 20–22 month old's imitations of intervocalic alveolar stops (A)

Catherine T. Best and Deborah Wilkenfeld

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S114-S114 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Systematic deviations between early word productions and their adult targets reflect not only children's articulatory limitations, but also their phonological development. However, research in child phonology has overlooked an important source of information by excluding from analysis any direct imitations of adult words. Consistent with observations about syntactic and cognitive development, spoken imitations should be systematically modified by the child's phonological system. Imitation studies should thus reveal more about the sophistication of the early phonological system by minimizing demands on memory and lexical access, by allowing presentation of unfamiliar or nonsense words, and by permitting comparison of the child's productions with the actual phonetic and acoustic properties of the adult targets. The present research found remarkable phonological sophistication in a toddler's production of intervocalic stops, based on phonetic and acoustic analyses of her imitations of a set of phonetically reduced adult targets at 20–22 months of age. The adult targets were disyllabic words and nonwords containing /d/ or /t/ preceding 〈‐er〉, 〈‐le〉, or 〈‐en〉, in which the stops were realized as the restricted phonetic variants [r], [dn], or [ˀ], (e.g., respectively, 〈wider〉 or 〈whiter〉; 〈widen〉; 〈whiten〉). Although alveolar stops are not normally produced in this phonetic environment, the child consistently substituted them for the more restricted variants found in the adult targets. The child's phonology also distinguished /t/ and /d/. The findings indicate a rule like that in the adult grammar which determines how these phonemes will be realized in highly specified phonetic contexts and, consequently, how phonetically disparate forms are related in an abstract category. The child's failure to exactly imitate the target utterances and the systematicity of her deviations further argue that the targets were, in fact, filtered through the child's phonology before she produced them. [Work supported by NIH.]
FREE

Two‐stage adult acquisition of intonational contours (A)

Eric Keller

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S114-S114 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The precise process of the acquisition of new speech motor patterns in adults has not been examined extensively. In particular, it is unclear how and in which order control over an unfamiliar motor pattern is acquired. The acquisition by two French‐language speakers of unfamiliar (Chineselike) intonational contours overlaid on the second syllable of /aCa/ nonsense stimuli was studied. Stimuli were recorded from a native speaker of Beijing Chinese. Thirteen hundred F0 measurements of responses obtained in nine training sessions, spread over 3 weeks, were evaluated for three types of learning: (1) accuracy (coefficients of variation over multiple attempts), (2) proximity to target frequency (Hz), and (3) contour fidelity (similarity between target and response contours, Hz). Results showed two acquisitional phases: A rapid, major improvement on measures (1) and (2), evident over the rest of the training period. There was a significant positive linear correlation between measures (1) and (2). These data support notions of both rapid learning and slow fixation and are thus in support of the concept of stored speech motor control patterns. This contradicts current motor theories that advocate new trajectory calculation for every movement, even in the case of well‐established motor patterns. [Work supported by NSERC, Canada.]
FREE

What do mimics do when they imitate a voice? (A)

George Papcun

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S114-S114 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Imitations by both professional and amateur mimics were studied to determine what similarities are achieved between the imitated voice and the imitation thereof. A wide variety of characteristics was approximated, including the following: mean F0, F1, and F2; frequency contours of F0, F1, and F2; degree of nasalization (but not frequency of nasal formants); speech rate and dynamics, including timing, attack, and release characteristics. Contours of F0, F1, and F2 were often matched accurately, even when their absolute frequencies differed considerably from those of the original. Specific images of words and phrases were used, as well as general phonetic characteristics. Imitators tended to concentrate on imitating unusual characteristics of a voice, rather than attempting to imitate all characteristics equally. This observation may be formalized as a model according to which the importance of a parameter is nonlinearly related to the extent to which it diverges from its mean population value. Professional mimics exaggerated the distinctive characteristics of voices and thus may be considered caricaturists rather than mimics, per se.
FREE

Relations between formant slopes and speech intelligibility in neurologically impaired talkers (A)

R. D. Kent, G. Weismer, J. F. Kent, R. E. Martin, and J. C. Rosenbek

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S114-S114 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The relation between F1 and F2 slopes measured from monosyllabic words and speech intelligibility scores for these words was studied for 25 men with amyotrophic lateral sclerosis. The formant slopes were estimated from digitized formant tracks for each test word. The intelligibility scores were computed as the percentage of words correctly identified for each talker by a listener panel of ten young women. The intelligibility scores for the group of 25 talkers ranged from 41%–99%. The correlation coefficient computed between the F2 slopes and the intelligibility scores was 0.76. The F1 slope was not as predictive of intelligibility, perhaps because the dysarthric talkers compensated for impaired tongue control with large jaw movcmcnta (and, hence, large F1 shifts). It is concluded that F2 slope, a dynamic measure of acoustic structure, bears a moderately high correlation with intelligibility in these subjects. Examples of formant patterns are shown for different degrees of speech impairment. [Work supported, in part, by NINCDS.]
FREE

Articulatory dynamics of deaf speakers during plosive production: Aerodynamic and kinematic evidence (A)

James J. Mahshie and Pradeep Yadev

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S114-S115 (1988); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The production of speech requires precise movement and timing of the articulators to accurately valve the egressive airstream. While evident that deaf speakers often experience difficulty controlling such valves for speech production, the nature of such valving difficulties remains elusive. Previous aerodynamic studies have been primarily descriptive, characterizing steady‐state aspects of speech production. The present study described and compared dynamic aspects of deaf and normal‐hearing speakers' speech production using aerodynamic and kinematic measures. Five profoundly deaf and two normal‐hearing young adults produced multiple tokens of voiced and voiceless plosive segments in varied vowel contexts, while simultaneous measures were obtained of oral and nasal airflow, oral pressure, and electroglottograph and acoustic signals. Conductance of oral and nasal vocal tract constrictions were used to describe and compare the magnitude and relative timing of articulatory gestures for deaf and hearing subjects. Computer simulation of the supraglottal air pressure and airflow waveforms was used to further investigate how these waveform variations may be related to articulatory maneuvers. Implications for speech training will be discussed. [Research supported by the Whitaker Foundation.]
FREE

Quantitative evaluation of hypernasality in cleft palate patients (A)

Ryuta Kataoka, Koji Takahashi, Yukari Yamashita, Satoko Imai, Ken‐ichi Michi, Kaoru Okabe, Hareo Hamada, and Tanctoshi Miura

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S115-S115 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
To quantitatively evaluate hypernasality in cleft palate patients, the Japanese vowel /i/ pronounced by six cleft palate patients and four normal children (controls) of similar ages was analyzed acoustically by cepstrum analysis. Spectrum envelopes obtained by the cepstrum method were evaluated every 1/3 octave to obtain the mean level in each band. Ten listeners evaluated a speech sample from each subject for degree of nasality on an equal interval scale ranging from 0 (no nasality) to 4 (strongest nasality). Two factors were obtained from the factor analysis of the judged scores. The first factor, which accounted for 77% of the total variance, was the consensus perception of nasality. The second factor, which accounted for 9%, was the difference among the individual listeners. The levels in two 1/3 octave bands were highly correlated with the first factor. The central frequencies of these two bands were 1 and 5 kHz.
FREE

Acoustic‐phonetic analysis of normal, loud, and Lombard speech in simulated cockpit conditions (A)

Bill J. Stanton, George D. Allen, and Leah H. Jamieson

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S115-S115 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
It has long been recognized that raising one's voice causes perceptible changes in the speech signal beyond mere increase in overall intensity. This paper reports the results of an extensive analysis of speech produced in a simulated fighter cockpit environment, with helmet and oxygen mask in place. Eight talkers each produced 56 utterances under three conditions: (1) normal, (2) loud (nominally 10 dB above normal), and (3) Lombard (evoked by 90 dB of pink noise played through a headset). A total of 17 671 phonemes were hand marked for analysis using 18 acoustic features (ten frequency bands, spectral COG, low‐ and high‐frequency spectral tilt, F0, F1–3, and duration). For most speakers, both loud and Lombard conditions showed the following shifts in comparison with the normal condition: (1) For vowels and sonorant consonants, lower (0–500 Hz) and higher (5–8 kHz) frequency bands lost energy relative to the mid (1–4 kHz) frequencies; (2) also for vowels and sonorants, F0, F1, and spectral COG all rose; and (3) for fricatives, affricates, and voiceless stops, lower (0–3 kHz) frequencies lost energy relative to higher (4–8 kHz) frequencies. There was much variation in these effects among talkers. [Work supported by Air Force Institute of Technology.]
FREE

Fricative consonants: Comparisons between human and mechanical‐model production (A)

Christine H. Shadle

J. Acoust. Soc. Am. Volume 84, Issue S1, pp. S115-S115 (1988); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
In an earlier paper [C. Shadle, J. Acoust. Soc. Am. Suppl. 1 82, S15 (1987)], the derivation of source parameters for fricative consonants from mechanical models of the vocal tract was discussed. Source location and source spectrum at a range of airflows were measured for the fricatives /ʃ, ç, x/, and the separation of these fricatives into two distinct acoustic types was proposed. This paper presents further analysis of these data. A collapsing of the source spectra into a single curve for each fricative, parametrized by flow rate, is proposed. These collapsed source characteristics were used in a frequency‐domain model of the vocal tract [P. Badin and G. Fant, STL‐QPSR 2–3, 53–108 (1984)] to predict the farfield sound generated. These predicted spectra were compared to the sound produced by the mechanical models and by Fant's subject on which the models were based. Some discrepancies between experiment and theory are apparent; these are due to the lack of higher modes in the computer model, some slight anatomical inaccuracies in the mechanical model in the region of the constriction, and probably also to spatial distribution of the source for /ç,x/. Aside from these discrepancies, the comparison demonstrates the validity and usefulness of the mechanical‐model data.
Close

close