• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Oct 2007

Volume 122, Issue 4, pp. 1845-EL141

back to top
RSS Feeds

Static features in real-time recognition of isolated vowels at high pitch

Aníbal J. S. Ferreira

J. Acoust. Soc. Am. Volume 122, Issue 4, pp. 2389-2404 (2007); (16 pages) | Cited 1 time

Full Text: Read Online (HTML) | Download PDF

Show Abstract
This paper addresses the problem of automatic identification of vowels uttered in isolation by female and child speakers. In this case, the magnitude spectrum of voiced vowels is sparsely sampled since only frequencies at integer multiples of F0 are significant. This impacts negatively on the performance of vowel identification techniques that either ignore pitch or rely on global shape models. A new pitch-dependent approach to vowel identification is proposed that emerges from the concept of timbre and that defines perceptual spectral clusters (PSC) of harmonic partials. A representative set of static PSC-related features are estimated and their performance is evaluated in automatic classification tests using the Mahalanobis distance. Linear prediction features and Mel-frequency cepstral coefficients (MFCC) coefficients are used as a reference and a database of five (Portuguese) natural vowel sounds uttered by 44 speakers (including 27 child speakers) is used for training and testing the Gaussian models. Results indicate that perceptual spectral cluster (PSC) features perform better than plain linear prediction features, but perform slightly worse than MFCC features. However, PSC features have the potential to take full advantage of the pitch structure of voiced vowels, namely in the analysis of concurrent voices, or by using pitch as a normalization parameter.
Show PACS
43.72.Ar Speech analysis and analysis techniques; parametric representation of speech
43.71.Es Vowel and consonant perception; perception of words, sentences, and fluent speech
43.70.Mn Relations between speech production and perception

Compensatory responses to loudness-shifted voice feedback during production of Mandarin speech

Hanjun Liu, Qianru Zhang, Yi Xu, and Charles R. Larson

J. Acoust. Soc. Am. Volume 122, Issue 4, pp. 2405-2412 (2007); (8 pages) | Cited 2 times

Full Text: Read Online (HTML) | Download PDF

Show Abstract
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice F0 or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by ±3 dB for 200 ms durations. The responses to the feedback perturbations had mean latencies of approximately 142 ms and magnitudes of approximately 0.86 dB. Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to F0 perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both F0 and amplitude of speech production.
Show PACS
43.72.Dv Speech-noise interaction
43.70.Mn Relations between speech production and perception
43.70.Jt Instrumentation and methodology for speech production research
43.72.Ar Speech analysis and analysis techniques; parametric representation of speech
Close

close