• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Next Issue

Jan 2003

Volume 113, Issue 1, pp. 1-660

back to top
RSS Feeds

Analysis of the three-dimensional tongue shape using a three-index factor analysis model

Yanli Zheng, Mark Hasegawa-Johnson, and Shamala Pizza

J. Acoust. Soc. Am. Volume 113, Issue 1, pp. 478-486 (2003); (9 pages) | Cited 3 times

Online Publication Date: 08 Jan 2003

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Three-dimensional tongue shape during vowel production is analyzed using the three-mode PARAFAC (parallel factors) model. Three-dimensional MRI images of five speakers (9 vowels) are analyzed. Sixty-five virtual fleshpoints (13 segments along the rostral–caudal dimension and 5 segments along the right–left direction) are chosen based on the interpolated tongue shape images. Methods used to adjust the alignment of MRI images, to set up the fleshpoints, and to measure the position of the fleshpoints are presented. PARAFAC analysis of this 3D coordinate data results in a stable two-factor solution that explains about 70% of the variance. © 2003 Acoustical Society of America.
Show PACS
43.70.Aj Anatomy and physiology of the vocal tract, speech aerodynamics, auditory kinetics
43.70.Bk Models and theories of speech production

Flow visualization and pressure distributions in a model of the glottis with a symmetric and oblique divergent angle of 10 degrees

Daoud Shinwari, Ronald C. Scherer, Kenneth J. DeWitt, and Abdollah A. Afjeh

J. Acoust. Soc. Am. Volume 113, Issue 1, pp. 487-497 (2003); (11 pages) | Cited 17 times

Online Publication Date: 08 Jan 2003

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Modeling the human larynx can provide insights into the nature of the flow and pressures within the glottis. In this study, the intraglottal pressures and glottal jet flow were studied for a divergent glottis that was symmetric for one case and oblique for another. A Plexiglas model of the larynx (7.5 times life size) with interchangeable vocal folds was used. Each vocal fold had at least 11 pressure taps. The minimal glottal diameter was held constant at 0.04 cm. The glottis had an included divergent angle of 10 degrees. In one case the glottis was symmetric. In the other case, the glottis had an obliquity of 15 degrees. For each geometry, transglottal pressure drops of 3, 5, 10, and 15 cm H2O were used. Pressure distribution results, suggesting significantly different cross-channel pressures at glottal entry for the oblique case, replicate the data in another study by Scherer et al. [J. Acoust. Soc. Am. 109, 1616–1630 (2001b)]. Flow visualization using a LASER sheet and seeded airflow indicated separated flow inside the glottis. Separation points did not appear to change with flow for the symmetric glottis, but for the oblique glottis moved upstream on the divergent glottal wall as flow rate increased. The outgoing glottal jet was skewed off-axis for both the symmetric and oblique cases. The laser sheet showed asymmetric circulating regions in the downstream region. The length of the laminar core of the glottal jet was less than approximately 0.6 cm, and decreased in length as flow increased. The results suggest that the glottal obliquity studied here creates significantly different driving forces on the two sides of the glottis (especially at the entrance to the glottis), and that the skewed glottal jet characteristics need to be taken into consideration for modeling and aeroacoustic purposes. © 2003 Acoustical Society of America.
Show PACS
43.70.Aj Anatomy and physiology of the vocal tract, speech aerodynamics, auditory kinetics
43.70.Bk Models and theories of speech production

The synergy between speech production and perception

Powen Ru, Taishih Chi, and Shihab Shamma

J. Acoust. Soc. Am. Volume 113, Issue 1, pp. 498-515 (2003); (18 pages) | Cited 1 time

Online Publication Date: 08 Jan 2003

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Speech intelligibility is known to be relatively unaffected by certain deformations of the acoustic spectrum. These include translations, stretching or contracting dilations, and shearing of the spectrum (represented along the logarithmic frequency axis). It is argued here that such robustness reflects a synergy between vocal production and auditory perception. Thus, on the one hand, it is shown that these spectral distortions are produced by common and unavoidable variations among different speakers pertaining to the length, cross-sectional profile, and losses of their vocal tracts. On the other hand, it is argued that these spectral changes leave the auditory cortical representation of the spectrum largely unchanged except for translations along one of its representational axes. These assertions are supported by analyses of production and perception models. On the production side, a simplified sinusoidal model of the vocal tract is developed which analytically relates a few “articulatory” parameters, such as the extent and location of the vocal tract constriction, to the spectral peaks of the acoustic spectra synthesized from it. The model is evaluated by comparing the identification of synthesized sustained vowels to labeled natural vowels extracted from the TIMIT corpus. On the perception side a “multiscale” model of sound processing is utilized to elucidate the effects of the deformations on the representation of the acoustic spectrum in the primary auditory cortex. Finally, the implications of these results for the perception of generally identifiable classes of sound sources beyond the specific case of speech and the vocal tract are discussed. © 2003 Acoustical Society of America.
Show PACS
43.70.Bk Models and theories of speech production

Effects of prosodic boundary on /aC/ sequences: Acoustic results

Marija Tabain

J. Acoust. Soc. Am. Volume 113, Issue 1, pp. 516-531 (2003); (16 pages) | Cited 3 times

Online Publication Date: 08 Jan 2003

Full Text: Read Online (HTML) | Download PDF

Show Abstract
This study presents various acoustic measures used to examine the sequence /a # C/, where “#” represents different prosodic boundaries in French. The 6 consonants studied are /b d g f s ʃ/ (3 stops and 3 fricatives). The prosodic units investigated are the utterance, the intonational phrase, the accentual phrase, and the word. It is found that vowel target values, formant transitions into the stop consonant, and the rate of change in spectral tilt into the fricative, are affected by the strength of the prosodic boundary. F1 becomes higher for /a/ the stronger the prosodic boundary, with the exception of one speaker’s utterance data, which show the effects of articulatory declension at the utterance level. Various effects of the stop consonant context are observed, the most notable being a tendency for the vowel /a/ to be displaced in the direction of the F2 consonant “locus” for /d/ (the F2 consonant values for which remain relatively stable across prosodic boundaries) and for /g/ (the F2 consonant values for which are displaced in the direction of the velar locus in weaker prosodic boundaries, together with those of the vowel). Velocity of formant transition may be affected by prosodic boundary (with greater velocity at weaker boundaries), though results are not consistent across speakers. There is also a tendency for the rate of change in spectral tilt moving from the vowel to the fricative to be affected by the presence of a prosodic boundary, with a greater rate of change at the weaker prosodic boundaries. It is suggested that spectral cues, in addition to duration, amplitude, and F0 cues, may alert listeners to the presence of a prosodic boundary. © 2003 Acoustical Society of America.
Show PACS
43.70.Fq Acoustical correlates of phonetic segments and suprasegmental properties: stress, timing, and intonation

Learning to produce speech with an altered vocal tract: The role of auditory feedback

Jeffery A. Jones and K. G. Munhall

J. Acoust. Soc. Am. Volume 113, Issue 1, pp. 532-543 (2003); (12 pages) | Cited 6 times

Online Publication Date: 08 Jan 2003

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Modifying the vocal tract alters a speaker’s previously learned acoustic–articulatory relationship. This study investigated the contribution of auditory feedback to the process of adapting to vocal-tract modifications. Subjects said the word /tmaths/ while wearing a dental prosthesis that extended the length of their maxillary incisor teeth. The prosthesis affected /s/ productions and the subjects were asked to learn to produce “normal” /s/’s. They alternately received normal auditory feedback and noise that masked their natural feedback during productions. Acoustic analysis of the speakers’ /s/ productions showed that the distribution of energy across the spectra moved toward that of normal, unperturbed production with increased experience with the prosthesis. However, the acoustic analysis did not show any significant differences in learning dependent on auditory feedback. By contrast, when naive listeners were asked to rate the quality of the speakers’ utterances, productions made when auditory feedback was available were evaluated to be closer to the subjects’ normal productions than when feedback was masked. The perceptual analysis showed that speakers were able to use auditory information to partially compensate for the vocal-tract modification. Furthermore, utterances produced during the masked conditions also improved over a session, demonstrating that the compensatory articulations were learned and available after auditory feedback was removed. © 2003 Acoustical Society of America.
Show PACS
43.70.Fq Acoustical correlates of phonetic segments and suprasegmental properties: stress, timing, and intonation
43.70.Aj Anatomy and physiology of the vocal tract, speech aerodynamics, auditory kinetics
43.70.Dn Disordered speech

Individual talker differences in voice-onset-time

J. Sean Allen, Joanne L. Miller, and David DeSteno

J. Acoust. Soc. Am. Volume 113, Issue 1, pp. 544-552 (2003); (9 pages) | Cited 7 times

Online Publication Date: 08 Jan 2003

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Individual talkers differ in the acoustic properties of their speech, and at least some of these differences are in acoustic properties relevant for phonetic perception. Recent findings from studies of speech perception have shown that listeners can exploit such differences to facilitate both the recognition of talkers’ voices and the recognition of words spoken by familiar talkers. These findings motivate the current study, whose aim is to examine individual talker variation in a particular phonetically-relevant acoustic property, voice-onset-time (VOT). VOT is a temporal property that robustly specifies voicing in stop consonants. From the broad literature involving VOT, it appears that individual talkers differ from one another in their VOT productions. The current study confirmed this finding for eight talkers producing monosyllabic words beginning with voiceless stop consonants. Moreover, when differences in VOT due to variability in speaking rate across the talkers were factored out using hierarchical linear modeling, individual talkers still differed from one another in VOT, though these differences were attenuated. These findings provide evidence that VOT varies systematically from talker to talker and may therefore be one phonetically-relevant acoustic property underlying listeners’ capacity to benefit from talker-specific experience. © 2003 Acoustical Society of America.
Show PACS
43.70.Fq Acoustical correlates of phonetic segments and suprasegmental properties: stress, timing, and intonation
43.72.Ar Speech analysis and analysis techniques; parametric representation of speech
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech

Spectral models of additive and modulation noise in speech and phonatory excitation signals

Jean Schoentgen

J. Acoust. Soc. Am. Volume 113, Issue 1, pp. 553-562 (2003); (10 pages) | Cited 1 time

Online Publication Date: 08 Jan 2003

Full Text: Read Online (HTML) | Download PDF

Show Abstract
The article presents spectral models of additive and modulation noise in speech. The purpose is to learn about the causes of noise in the spectra of normal and disordered voices and to gauge whether the spectral properties of the perturbations of the phonatory excitation signal can be inferred from the spectral properties of the speech signal. The approach to modeling consists of deducing the Fourier series of the perturbed speech, assuming that the Fourier series of the noise and of the clean monocycle-periodic excitation are known. The models explain published data, take into account the effects of supraglottal tremor, demonstrate the modulation distortion owing to vocal tract filtering, establish conditions under which noise cues of different speech signals may be compared, and predict the impossibility of inferring the spectral properties of the frequency modulating noise from the spectral properties of the frequency modulation noise (e.g., phonatory jitter and frequency tremor). The general conclusion is that only phonatory frequency modulation noise is spectrally relevant. Other types of noise in speech are either epiphenomenal, or their spectral effects are masked by the spectral effects of frequency modulation noise. © 2003 Acoustical Society of America.
Show PACS
43.72.Ar Speech analysis and analysis techniques; parametric representation of speech
43.70.Bk Models and theories of speech production
Close

close