• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Nov 2006

Volume 120, Issue 5, pp. A38-EL61

back to top
RSS Feeds

Aerodynamically and acoustically driven modes of vibration in a physical model of the vocal folds

Zhaoyan Zhang, Juergen Neubauer, and David A. Berry

J. Acoust. Soc. Am. Volume 120, Issue 5, pp. 2841-2849 (2006); (9 pages) | Cited 9 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
In a single-layered, isotropic, physical model of the vocal folds, distinct phonation types were identified based on the medial surface dynamics of the vocal fold. For acoustically driven phonation, a single, in-phase, x-10 like eigenmode captured the essential dynamics, and coupled with one of the acoustic resonances of the subglottal tract. Thus, the fundamental frequency appeared to be determined primarily by a subglottal acoustic resonance. In contrast, aerodynamically driven phonation did not naturally appear in the single-layered model, but was facilitated by the introduction of a vertical constraint. For this phonation type, fundamental frequency was relatively independent of the acoustic resonances, and two eigenmodes were required to capture the essential dynamics of the vocal fold, including an out-of-phase x-11 like eigenmode and an in-phase x-10 like eigenmode, as described in earlier theoretical work. The two eigenmodes entrained to the same frequency, and were decoupled from subglottal acoustic resonances. With this independence from the acoustic resonances, vocal fold dynamics appeared to be determined primarily by near-field, fluid-structure interactions.
Show PACS
43.70.Aj Anatomy and physiology of the vocal tract, speech aerodynamics, auditory kinetics
43.70.Bk Models and theories of speech production

Perception of synthetic vowel exemplars of 4 year old children and estimation of their corresponding vocal tract shapes

Richard S. McGowan

J. Acoust. Soc. Am. Volume 120, Issue 5, pp. 2850-2858 (2006); (9 pages) | Cited 1 time

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Formant scalings for vowel exemplars of American 4 year olds who were imitating adult production were used along with published data of American adult male vowel production to synthesize /ɑ, æ, u, i/. Other vowel exemplars were also synthesized. Adult listeners were asked to categorize these synthetic vowels in a forced choice task. With some exceptions, the formant frequencies preferred for the vowels /ɑ, æ, u, i/ were close to the published data. In order to gain insight on children’s articulation during imitation of vowels /ɑ, æ, u, i/, a five-tube model was used in an algorithm to infer vocal tract shape from the first three formant frequencies of the adult productions, the formant frequencies derived for 4 year olds by scaling, and formant frequencies for 4 year olds derived based on the listening experiments. It was found that the rear tube length for the children, in proportionate terms, was nearly always greater than that of the adult. The rear tube length was proportionately twice as long in children compared to adults for the vowel /u/. Tongue root flexibility and the oblique angle between the pharynx and mouth may be more important than pharynx length in determining formant scalings for 4 year old children.
Show PACS
43.70.Aj Anatomy and physiology of the vocal tract, speech aerodynamics, auditory kinetics
43.71.Gv Measures of speech perception (intelligibility and quality)

Modeling coupled aerodynamics and vocal fold dynamics using immersed boundary methods

Comer Duncan, Guangnian Zhai, and Ronald Scherer

J. Acoust. Soc. Am. Volume 120, Issue 5, pp. 2859-2871 (2006); (13 pages) | Cited 7 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The penalty immersed boundary (PIB) method, originally introduced by Peskin (1972) to model the function of the mammalian heart, is tested as a fluid-structure interaction model of the closely coupled dynamics of the vocal folds and aerodynamics in phonation. Two-dimensional vocal folds are simulated with material properties chosen to result in self-oscillation and volume flows in physiological frequency ranges. Properties of the glottal flow field, including vorticity, are studied in conjunction with the dynamic vocal fold motion. The results of using the PIB method to model self-oscillating vocal folds for the case of 8 cm H2O as the transglottal pressure gradient are described. The volume flow at 8 cm H2O, the transglottal pressure, and vortex dynamics associated with the self-oscillating model are shown. Volume flow is also given for 2, 4, and 12 cm H2O, illustrating the robustness of the model to a range of transglottal pressures. The results indicate that the PIB method applied to modeling phonation has good potential for the study of the interdependence of aerodynamics and vocal fold motion.
Show PACS
43.70.Bk Models and theories of speech production

Interarticulator programming: Effects of closure duration on lip and tongue coordination in Japanese

Anders Löfqvist

J. Acoust. Soc. Am. Volume 120, Issue 5, pp. 2872-2883 (2006); (12 pages) | Cited 3 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
This paper examines the coordination of lip and tongue movements in sequences of vowel-bilabial consonant-vowel where the duration of the oral closure for the consonant is varied for linguistic purposes. Native speakers of Japanese served as subjects. The linguistic material consisted of Japanese word pairs that only differed in the duration of the labial consonant, which was either long or short. Recordings were made of lip and tongue movements using a magnetometer system. Results show a robust difference in closure duration between the long and short consonants. The tongue movement from the first to the second vowel had a longer duration in the long than in the short consonants, and its average speed was slower in the long consonant. The size of the tongue movement path between the vowels did not consistently differ between the long and short consonants. The tongue movement almost always started before the oral closure for the consonant, while the onset of the lip movement towards oral closure mostly started before that of the tongue movement. The offset of the tongue movement occurred after the release of the closure, but there was no clear pattern for the long and short consonants.
Show PACS
43.70.Bk Models and theories of speech production
43.70.Aj Anatomy and physiology of the vocal tract, speech aerodynamics, auditory kinetics

Is fundamental frequency a cue to aspiration in initial stops?

Alexander L. Francis, Valter Ciocca, Virginia Ka Man Wong, and Jess Ka Lam Chan

J. Acoust. Soc. Am. Volume 120, Issue 5, pp. 2884-2895 (2006); (12 pages) | Cited 3 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
One production and one perception experiment were conducted to investigate the interaction of consonant voicing and fundamental frequency at the onset of voicing (onset f0) in Cantonese, a tonal language. Consonantal voicing in English can affect onset f0 up to 100 ms after voicing onset, but existing research provides inconclusive information regarding the effects of voicing on f0 in tonal languages where f0 variability is constrained by the demands of the lexical tone system. Previous research on consonantal effects on onset f0 provides two contrasting theories: These effects may be automatic, resulting from physiological constraints inherent to the speech production mechanism or they may be controlled, produced as part of a process of cue enhancement for the perception of laryngeal contrasts. Results of experiment 1 showed that consonant aspiration affects onset f0 in Cantonese only within the first 10 ms following voicing onset, comparable to results for other tonal languages. Experiment 2 showed that Cantonese listeners can use differences in onset f0 to cue perception of the voicing contrast, but the minimum extent of f0 perturbation necessary for this is greater than is found in Cantonese production, and comparable to that observed in acoustic studies of nontonal languages. These results suggest that consonantal effects on onset f0 are at least partially controlled by talkers, but that their role in the perception of voicing/aspiration may be a consequence of language independent properties of audition rather than listeners’ experience with the phonological contrasts of a specific language.
Show PACS
43.70.Bk Models and theories of speech production
43.70.Fq Acoustical correlates of phonetic segments and suprasegmental properties: stress, timing, and intonation
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.71.An Models and theories of speech perception
43.70.Kv Cross-linguistic speech production and acoustics

On first rahmonic amplitude in the analysis of synthesized aperiodic voice signals

Peter J. Murphy

J. Acoust. Soc. Am. Volume 120, Issue 5, pp. 2896-2907 (2006); (12 pages) | Cited 2 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Rahmonics comprise the prominent peaks in the cepstrum of voiced speech; their locations correspond to the fundamental period and its multiples. The amplitude of the first rahmonic, R1, has previously been used to indicate voice quality. Although a correspondence between R1 and the richness of the harmonic spectrum for voiced speech is well recognized, a formal description has remained absent. A theoretical description of rahmonic analysis of voiced speech containing aspiration noise is provided, leading to a characterization of R1. The theory suggests that R1 is directly proportional to the geometric mean harmonics-to-noise ratio (gmHNR), where the gmHNR is defined as the mean of the individual spectral (i.e. at specific frequency locations) harmonics-to-noise ratios in dB. This hypothesis is validated using synthetically generated voice signals. R1 is shown to be directly proportional to gmHNR (measured directly from the dB spectrum). It is shown that R1 (estimated from speech) is directly proportional to R1 taken from the glottal signal. R1 and gmHNR (measured spectrally) underestimate the actual gmHNR when (averaged) noise levels exceed harmonic levels. Limiting the number of harmonics in the analysis window overcomes this problem and also alleviates the (temporal) window length/f0 dependence of R1 when estimated period synchronously.
Show PACS
43.72.Ar Speech analysis and analysis techniques; parametric representation of speech
43.70.Dn Disordered speech
Close

close