• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Nov 1989

Volume 86, Issue S1, pp. S1-S125

back to top
RSS Feeds
back to top Session O. Speech Communication III: Fundamental Frequency and Intonation
Contributed Papers
FREE

Individual differences in voice quality perception (A)

Jody Kreiman, Bruce R. Gerratt, and Kristin Precoda

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S35-S35 (1989); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Sixteen listeners judged the similarity of all possible pairs of 18 pathological voices and, in a separate session, 18 normal voices. Individual differences, multidimensional scaling was used to derive a separate perceptual space for each listener/voice set combination. These scaling solutions accounted for an average of 83% of the variance in similarity ratings for pathological voices, and 77% for normal voices. Listeners varied substantially in the acoustic characteristics they attended to when judging vocal similarity: Although all perceptual spaces included an F0 dimension, no other parameter was used by more than half the listeners, for either voice set. Listeners who shared common perceptual dimensions often differed in the way they used the same acoustic information. For example, F0 was used as a continuous dimension by some listeners, and to sort voices into groups (high‐ and low‐pitched groups, pathological and normal groups, etc.) by others; combinations of these strategies also occurred. Implications of these results for models of voice quality perception will be discussed.
FREE

The perception of the low‐high (LH) tonal sequence (A)

Kazue Hata and Yoko Hasegawa

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S35-S35 (1989); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
It has been reported that the primary cue for the HL tonal perception in Japanese is not the actual F0 peak location but rather a falling F0 contour. The F0 fall may be significantly delayed, resulting in the F0 peak within the L‐toned syllable. Furthermore, it was found earlier that (1) the later the F0 fall in the L‐toned syllable, the steeper the fall rate required and (2) the fall must begin within the first two‐thirds of the duration of the vowel in the L‐toned syllable. The present experiment investigates whether a lack of synchronization between F0 change and syllable boundary can be found in the perception of the LH as well. Synthesized nonsense words/mamama/were prepared in such a way that both the onset of F0 rise and the F0 peak occur at various locations, while maintaining the overall F0 contour (level‐rise‐peak‐slight fall). The stimuli were presented to native speakers of Japanese to determine the boundary between the categorical perception of LHH and LLH. The results show that the LH sequence is more constrained than the HL in terms of temporal alignment of F0 change and the syllable boundary.
FREE

The frequency scale of intonation (A)

Dik J. Hermes and Joost C. van Gestel

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S35-S35 (1989); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Results will be presented showing that accent‐lending pitch movements are perceived on a critical‐band scale. A sentence was resynthesized in two versions differing in pitch and in formant frequencies. The lower‐pitched version sounded like a male voice, the higher one like a female voice. One syllable was rendered prominent by means of a pitch movement. The pitch contours of the two versions ran parallel on any of three frequency scales: a logarithmic frequency scale (semitones), a critical‐band scale, or a linear frequency scale (Hz). In 2AFC experiments, subjects indicated in which version the accented syllable was more prominent. Only when the excursions of the pitch movements were equal on a critical‐band scale, the choice was random. In case of equality in semitones, subjects perceived the accent in the higher version as more prominent. In case of equality in Hz, the accent in the lower version was perceived as more prominent. These results allow a perceptually more realistic measuring of the prominence of accented syllables. [Work supported by Instituut voor Doven, St‐Michielsgestel, The Netherlands.]
FREE

Fundamental frequency and perceived prominence of accented syllables (A)

J. Terken and R. Collier

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S35-S36 (1989); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
In natural speech, accented words may differ as to the degree of perceived prominence. At the acoustic level, two aspects of fundamental frequency (F0) variation may be responsible for the perceived prominence differences: the magnitude of F0 changes and the relative frequencies of F0 maxima. Two experiments, with the same group of subjects, addressed the question of which aspect of F0 better predicts perceived prominence. Both experiments used reiterant speech with synthesized F0 contours. The speech materials consisted of ‘mamamamamamama’ utterances with F0 maxima on the second and penultimate syllables (“P1” and “P2,” respectively). In one experiment, subjects adjusted the frequency of P2 so that it was judged to have the same pitch as P1, for different rates of baseline declination. In the second experiment, subjects adjusted P2 so that it was judged to have the same prominence as P1, again for different declination rates. The results to be presented are relevant for refining the theory of pitch accentuation. For instance, if perceived prominence is predictable from F0 maxima, both experiments should give the same results.
FREE

Continuative intonation in Mandarin (A)

J. S. Mirza

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S36-S36 (1989); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
The linguistic phenomenon of continuation in speech that manifests itself by modification of pitch contour in a preconjunction period was investigated for Mandarin. Mandarin is a tonal language; it has four tones. Its continuative intonation was investigated by examining selected parts of intonation contours prior to conjunctions for nine sentences each spoken by five speakers. Slopes of intonation contours across the words immediately before the conjunction were measured. It is reported here that the effect of continuation on Mandarin tones is highly dependent upon the type of conjunctions used. The continuation mostly drives up the slope values of Mandarin tones 2, 3, and 4, while the slope value of tone 1 is pushed down. The depression in slope value of tone 1, however, cannot be taken seriously because its isolation mode value slope is close to zero and the percent changes can be erroneous. The continuation affects on the average tone 2 most of all (+ 26%), then tone 3 (+ 21%), and then tone 4 (4%). The conjunction “and” has the greatest effect on the tones on the average (+ 25%) followed by conjunction “but” (23%) and “and then” (18%).
FREE

Analysis and synthesis of six voice qualities (A)

T. V. Ananthapadmanabha and Jo Estill

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S36-S36 (1989); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
This is a preliminary study reporting on the acoustic analysis and synthesis of six selected voice qualities; speech, falsetto, low‐larynx, twang, belting, and opera. The purpose of this investigation is to test the ability of the source‐filter model to synthesize this wide range of voice qualities. Five vowels, /i/,/e/,/a/,/o/, and /u/, in six voice qualities spoken by a trained female speaker were recorded in a treated sound booth, allowing SPL and F0 to free vary with a normal decay at the end of each token. These recorded signals were digitized directly onto a computer. Inverse filtering of vowel sounds was performed using the linear prediction technique. Parameters of voice source were extracted from the inverse filtered signal. Also, the spectral envelopes of the vowels were obtained. The six voice qualities were synthesized and compared with the original recording. Informal listening tests indicated little difference between the original and the synthesized tokens, thereby confirming the analysis procedure. To study the relative contributions of source and filter components, vowels were synthesized using the vocal tract transfer function appropriate for each quality but using an arbitrary voice source. The voice source was varied in two ways; (i) keeping the amplitude natural but pulse shape fixed and (ii) keeping both the amplitude and shape fixed. The differences in voice qualities were still distinguishable. Perceptually, these data seem to indicate that the spectral envelope is sufficient to synthesize these voice qualities. Physiologically, it would appear that glottal pulse shape is not significant. However, the spectral envelope may still contain components of the source such as subglottal coupling, spectral slope, and/or the effect of the laryngeal resonator on the vocal cavity transfer function. A demonstration tape will be presented.
FREE

(Semi‐)automatic pitch‐synchronous computation of glottal flow (A)

Jacques Koreman and Ben Cranen

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S36-S36 (1989); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Large‐scale investigations of the aerodynamics of voice production are most easily performed using noninvasive measurement methods. Glottal airflow can be estimated by inverse filtering the mouth flow, measured with a pneumotachograph mask. A semi‐automatic pitch‐synchronous inverse filtering method was developed that combines the advantages of interactive inverse filtering and inverse filtering with fixed‐filter settings: Our method yields accurate and calibrated estimates of glottal flow during VC and CV transitions, which are amenable for parametrization and can be used for bulk processing. Using glottal dosing and opening moments derived from the electroglottogram, covariance LPC analysis on the closed glottis interval is employed to compute the optimal inverse filter for each period. For male speakers, the method yields quite stable waveforms. These will be parametrized in order to develop rules for a physiologically interpretable, time‐varying synthetic voice source. The control parameters used are F0, duty cycle, top‐top flow amplitude, glottal leak area, and vertical phasing. A number of these parameters can be derived from both the EGG and the glottal flow. Comparison of parameter values from the two signals will give an indication of their reliability. The results will be discussed.
FREE

Fundamental frequency database with linguistic and phonetic information (A)

Masanobu Abe, Yoshinori Sagisaka, and Hisao Kuwabara

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S36-S36 (1989); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
An important problem for speech science is the relationship between syntactic information, prosodic information, and fundamental frequency contour. To facilitate the study of the interaction among these three factors, all three have been coordinated in a continuous speech database. Specifications of the database are as follows. (1) Speech samples consist of 503 phoneme‐balanced Japanese sentences spoken by a male professional announcer [Kuwabara et al., ICASSP '89, 560–563 (1989)]. (2) Phonetic transcriptions at several levels of detail are provided [Takeda et al., Euro. Conf. Speech Technol. 2, 13–16 (1987)]. (3) Fundamental frequency is automatically extracted every 2.5 ms and extraction errors are corrected by hand. (4) The corresponding sentence is decomposed into constituent words and morphemes with lexical information such as inflectional categories and is assigned a tree structure. This information is semiautomatically generated from input texts. (5) Each utterance is segmented into minor phrases and each accent position is marked by listening to each utterance. This fundamental frequency database has been used to quantify fundamental frequency control factors and to show the effectiveness of this information.
FREE

Vocal fundamental frequency: Variation by language, language group, and sex (A)

Carolyn Wardrip‐Fruin

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S36-S37 (1989); (2 pages)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Vocal fundamental frequency was measured for speakers of five languages under three conditions (reading English, reading native language, and spontaneous speaking‐native language). The samples were recorded in a sound‐treated booth and analyzed by a Visipitch (Kay Elemetrics) frequency analyzer interfaced to an IBMxt computer. Preliminary analysis suggests that mean fundamental frequency was surprisingly similar across languages for the various speaking conditions, and that the mean fundamental was higher for reading than for speaking (as has been found in studies of English), but that there were significant differences between languages and by sex in standard deviation of the fundamental under the various speaking conditions (reading English, etc.). The results suggest that fundamental frequency is determined primarily by physiological factors with some linguistic variations.
FREE

On the mechanical properties of laryngeal muscles (A)

Fariborz Alipour and Abdolali Najafi

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S37-S37 (1989); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Active properties of canine laryngeal muscles were investigated through a series of experiments conducted in vitro. Samples of the thyroarytenoid medial and lateral muscles, also, cricothyroid pars recta and pars oblique, were dissected from dog larynges excised a few minutes before death and kept in Krebs solution at a temperature of 37 ± 1 °C and a pH of 7.4 ± 0.05. Field stimulation with parallel‐plate platinum electrodes was applied to study twitch and tetanic responses of these muscles in isometric conditions. The active force of each sample was recorded electronically with a dual servo system (ergometer). Results are reported on the twitch contraction times and half relaxation times of laryngeal muscles. Tetanic responses of these muscles were normalized and compared and results reported on tetanic 90% contraction and 90% relaxation times. It was found that the thyroarytenoid lateral muscle was the fastest muscle in the group with the mean twitch contraction time of 12.3 ms and mean half relaxation time of 11.9 ms, and thyroarytenoid medial muscle had mean twitch contraction time of 21.2 ms and mean half relaxation time of 18.6 ms. [Work supported by NINCDS Grant No. NS 16320‐07.]
FREE

Changes in vocal fold length with nerve stimulation in canine larynges (A)

Jiaqi Jiang and Ingo R. Titze

J. Acoust. Soc. Am. Volume 86, Issue S1, pp. S37-S37 (1989); (1 page)

Online Publication Date: 13 Aug 2005

Full Text: | Download PDF

Show Abstract
Vocal fold length change and adductory movement of the glottis were obtained from seven canine larynges when the recurrent and the superior laryngeal nerve were stimulated bilaterally. A surgical procedure was used to make marks on the vocal fold for length measurements. Frame by frame analysis was used to get static and dynamic data on vocal fold length. The mean maximum elongation of the membranous vocal folds was 44.7% with 120‐ms elongation time and 140‐ms relaxation time. The mean maximum shortening of membranous vocal folds was 17.8%, with 110‐ms contraction time and 115‐ms relaxation time. When both agonist and antagonist muscles were in maximum contraction, the mean elongation was 24.3%. The length‐time curve approximates an exponential function. Closing of glottis occurred in less than 35 ms. The temporal results agree with earlier pitch change results. [Work supported by NIH Grant NS16320.]
Close

close