• Volume/Page
  • Keyword
  • DOI
  • Citation
  • Advanced
   
 
 
 

Journal of the Acoustical Society of America

Year Range: 
Search Issue | RSS Feeds RSS
Previous Issue Next Issue

Nov 1975

Volume 58, Issue S1, pp. S2-S132

back to top
RSS Feeds
back to top Session TT. Speech Communication VII: Speech Perception 2
Contributed Papers
FREE

INDSCAL study of the perceptual space of American diphthongs Dale Terbeek (A)

Robert A. Fox

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S91-S91 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Previous MD‐scaling studies of vowel perception [such as S. Singh and G. Woods, J. Acoust. Soc. Am. 49, 1861–1866 (1971)] have been concerned with monophthongs. In this experiment, the stimuli included three vowels with [i] off‐glides, three with [u] off‐glides, and three with level or [ə]‐like off‐glides. The nine vowels, namely, [ei ai oi u ou au ɪ æ ɔ], were spoken by a native American speaker in the phonetic context [# _ t]. Twenty‐one American listeners performed paired‐comparison similarity judgments using a nine‐point rating scale. A different set of 21 listeners performed the task on the same vowel phonemes, recorded without any phonetic context. Both sets of vowel‐by‐vowel dissimilarity matrices were analyzed by INDSCAL [J.D. Carroll and J.‐J. Chang, Psychometrika 35, 283–319 (1970)]. Both stimulus conditions gave the same major results: (1) The most salient property is the presence of rounding anywhere in the vowel versus complete absence of rounding. (2) The next most salient property groups vowels according to off‐glide. (3) Vowel onset height is the least salient dimension to appear. (3) No backness dimension appeared at all. The results are offered as evidence relevant to proposals about distinctive features. For example, the relative salience of rounding over backness speaks against the psychological reality of a phonological rule in English which assigns redundant rounding values based on backness values.
FREE

Influence of F0 pattern on the perception of duration (A)

Ilse Lehiste

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S91-S91 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Thirty listeners were presented with pairs of stimuli of equal duration, but differing in F0, and were asked to decide which of the two stimuli was longer. The stimuli consisted of the vowel /a/ synthesized on a Rockland synthesizer. There were three durations: 270, 300, and 330 msec. In one set, one of the members of the pair had a monotone F0 at 120 Hz; the other member had a rising‐falling F0 curve whose peak ranged in 12 semitone steps from 127 to 240 Hz. In the other set, the monotone frequency was 240 Hz; the changing member of the pair had a falling‐rising F0 curve whose lowest value ranged from 227 to 120 Hz. Level patterns at 120 Hz were also included at each duration. In the case of level patterns, 68.7% of the listeners judged the first member of the pair to be longer. When the pitch inflection was on the first member of the pair, 71.1% of the listeners judged it to be longer. When the second member of the pair had changing pitch, the listeners' judgment that the second member was longer rose from 31.3% for the level pattern to 60.5%.
FREE

Pitch of acoustically altered whispered vowels (A)

R. E. McGlone and W. H. Manning

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S91-S91 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
The present authors have hypothesized that the perceived pitch characteristics of whispered vowels are related to their second formant frequency [McGlone and Manning, J. Acoust. Soc. Am. 57, S69(A) (1975)]. This assumption was based on the rank‐ordering of vowels by judges from highest pitch to the lowest pitch which corresponded directly with the frequencies of F2 of the speakers. To further test the above hypothesis, F2 of vowels /ɪ i, u, ɔ, æ/ in whispered hV syllables were electronically filtered from the speech signal. These altered signals were played in pairs to a group of judges who simply indicated which of the pair had the highest pitch. The vowel quality of the speech sample disappeared but pitch judgments were possible. The order of pitch differed considerably from the unaltered condition. F2 does appear to influence the pitch perception of vowels.
FREE

Stop voicing production and perception: Natural outputs and synthesized inputs (A)

L. Lisker

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S91-S92 (1975); (2 pages)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
In recent years the initial stop consonants of English have been subjected to the relentless attention of speech researchers concerned with the basis for their division into the two categories /p, t, k/ and /b, d, g/. The data which suggest the several hypotheses currently entertained have two main sources: natural productions of “normal” speakers of the language operating in “normal” fashion and the responses of persons of like description to synthesized speech stimuli designed to measure the effect of systematic variation of selected acoustic features. The responses required of subjects in tests of synthetic speech can hardly be considered representative of their behavior in responses to natural speech; what the testing of synthetic speech demonstrates is the capability of the perceptual system to deal with the features selected for study, not that this capability is exploited in the perception of speech. Two kinds of information of relevance to the question of speech cues have not been collected: (1) the extent to which features having potential cue value show variations in natural speech matching the magnitudes tested in synthesis and (2) the extent to which features for which distinctive function is claimed may be subjected to experimental manipulation by skilled speakers without significantly reducing intelligibility. Experimental and other data are presented to indicate that certain acoustic features affecting the perception of stop voicing are of marginal importance or less in the perception of natural speech.
FREE

Prosodic signals of sentence structure: Their syntactic distribution (A)

M. O. Harris

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S92-S92 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
We have previously reported a technique for the use of perception tests to identify boundary signals in continuous reading. The distribution of boundary locations between words was described generally in terms of simple parsing state changes, or lack of change. Now we are beginning to analyze the classification of boundary placement in considerable syntactic detail. Computer storage of the data has made it possible to search quickly for various sets of information about syntax and perception. The present data base for this continuing study includes the responses of three listeners to three speakers, each reading the same 1870‐word text. Because of the sequential nature of the speech stream in time, boundary signals can be imposed on the words and sentence segments only successively, as they occur. But additional analysis suggests the way in which this sequential system realizes the multidimensional combination of sentence segments. At each boundary, there exists a limited range of possibilities for the succeeding segment, with different probabilities of occurrence or different degrees of functional relatedness attached to each. The least familiar will be most highly marked. The set of possibilities at each kind of boundary may be stated in syntactic terms.
FREE

Contextual factors influencing tone discrimination (A)

J. M. Hombert and S. Greenberg

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S92-S92 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Studies by Ohala and Ewan (1973) and Sundberg (1973) suggest that asymmetries found in tone languages (e.g., falling tones are more numerous than rising tones) may be related to constraints imposed by laryngeal displacements. These articulatory constraints may be reinforced by limitations on the auditory system's ability to perform accurate frequency discrimination. This study investigated the manner in which the pitch of one linguistic segment can affect the pitch of its neighbor. The stimuli consisted of synthesized V1CV2 sequences onto which were superimposed two fundamental frequency contours: high‐low or low‐high (high = 150 Hz, low = 120 Hz). Subjects were asked in one condition to indicate whether the pitch of a steady‐state vowel following the stimulus was higher or lower than the pitch of V1. In the other condition, judgments were made with respect to V2. The fundamental frequency of the comparison vowels ranged from 10 Hz below to 10 Hz above the F0 of the target vowel. The data reveal the extent to which discrimination of a linguistic segment's tone is adversely affected by a following (higher) or preceding (lower) tone.
FREE

Very early expectancy effects in continuous speech perception (A)

J. G. Martin

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S92-S92 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
In experiments measuring reaction time (RT) to a phoneme target carried in sentences, earlier work in our laboratory has shown that RT is slower when the target is temporally displaced by experimental intervention, e.g., artificially adding 100 or 200 msec of silence to a pre‐stop‐consonant silent interval two syllables earlier in the sentence. The result was interpreted in terms of disrupted timing expectancies. On the other hand, target RT was faster if the artificially extended silence immediately preceded the target, presumably because that experimental intervention in effect provided the listener with coarticulatory target cues in advance, compared to the normal sentence version. Experiments to be reported show these opposite RT results based on disrupted timing vs coarticulatory cue effects occur when the pretarget intervention is located between the first and second syllable of the sentence. These results and others indicate an interaction between segmental and suprasegmental cue effects and dynamic feed forward processing continuous speech perception. [Work supported by NIMH.]
FREE

Perception of speech segments: Parallel processing of sequentially presented syllables (A)

L. A. Streeter, T. K. Landauer, and B. H. Ross

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S92-S92 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Artificial words (/pətə/, /pəbə/, /dətə/, and /dəbə/) were presented with the sound level of the first, second, both, or neither syllable attenuated by 30 dB. Reaction times to identify the words increased if either the first or second syllable along was attenuated. However, if the second syllable was attenuated, also attenuating the first caused no additional increase in identification time. These results appear to exclude a model of serial processing in which the identification of the first syllable is completed before the identification of the second is initiated. They are consistent with a process in which syllables are discriminated independently and in parallel with the time for the word determined by whichever syllable's identification is completed last.
FREE

Effect of two‐dimensional configurations on the identification of associated vowel sounds (A)

J. E. Miller, O. Fujimura, H. W. Campbell, and J. Kruskal

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S92-S92 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
A computerized two‐dimensional “sort board” is described which allows for the sorting or rank ordering of acoustic stimuli. It features light buttons on a CRT display, which when selected result in the presentation of auditory stimuli. The positions of the buttons can then be moved on the screen by use of a light pen. The study of vowel perception to be reported consists of an experiment in which a given configuration of buttons is displayed on the CRT and a sequence of vowel sounds is presented to the subject in random order. His task on each trial is to point to the button he thinks corresponds to the sound. A flashing of the correct button reinforces his response as right or wrong. His performance is then evaluated in terms of response time and errors. The question under investigation is the significance and effect of a particular spatial arrangement of the stimuli upon his ability to remember correctly and respond quickly in making the identifications.
FREE

Frigidity or feature detectors—slips of the ear (A)

Catherine P. Browman

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S92-S92 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Mistakes in the perceptual process can indicate some of the mechanisms involved in perception. Naturally occurring misperceptions, or “slips of the ear,” are an important source of errors. In this study, over 100 misperceptions were collected by the author and others. The errors are categorized in terms of phonemic similarity and location with respect to possible unit boundaries, such as word and syllable boundaries. Word boundary misperceptions in particular frequently occur—for example, “frigidity” heard for “feature detectors” spoken.
FREE

Perceptual features of aging male speech (A)

D. E. Hartman and J. L. Danhauer

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S92-S93 (1975); (2 pages)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Spontaneous speech samples of 46 male speakers between the ages of 25 and 70 years were played to 40 untrained listeners who estimated the speakers' ages. Samples which showed agreement among untrained listeners were played to 20 trained listeners who described the perceptual features of the given perceived ages via an a posteriori schema. Results showed characteristic perceptual features for four perceived age decades which could be classified to pitch, rate of speech, quality, and articulation. It was concluded that the features identified should be useful for defining the criteria of “normal” aging speech, and in traditional speaker recognition research.
FREE

“Co‐perception” of speech segments (A)

B. H. Repp

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S93-S93 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
A definition of “co‐perception” is proposed, in analogy to the definition of coarticulation: Co‐perception is said to exist whenever the perception of a (phonetically defined) speech segment is influenced by a preceding or following segment. The measure of co‐perception is the reaction time (in “same‐different” judgments, speeded classification, or monitoring) to a certain well‐defined segment, while other segments vary systematically. Three hypotheses to explain co‐perception are discussed: (1) A temporal integration period (or limited buffer memory) in speech perception; (2) genuine functional analogies to coarticulation; and (3) “phonetic coherence” factors, such as the primacy of the CV syllable. These hypotheses may be investigated by appropriate variation of the phonetic composition and the temporal parameters of the speech stimuli used. Some experimental paradigms are discussed, and preliminary data are reported that demonstrate co‐perception of consonants and vowels in CV, VCV, and (perhaps) VC syllables.
FREE

Vowel boundaries and waveform patterns (A)

Brian L. Scott

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S93-S93 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Relations between changes in the temporal fine‐structure of synthetic vowel waveforms and perceptual boundaries are examined. Small changes in fundamental frequency (FO) are shown to affect temporal fine‐structure and perceptual boundaries in a correlated manner. Experiments show that ambiguous vowel sounds lying on the continuum /i/ and /ɪ/ can be shifted to either category by manipulating FO and that the shifts can be predicted from the resulting changes in waveform. In an additional study, a series of pseudovowels identical in FO and formant center‐frequency values but differing in temporal fine‐structure are shown to bear perceptual relationships to one another similar to a normal /i/‐/æ/ vowel series. The results indicate that temporal analysis of the first formant of vowel sounds can aid in delineating vowel boundaries. [Supported by NIH grant NS 03856 to CID.]
FREE

Selective adaptation effects on end‐point stimuli (A)

J. R. Sawusch

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S93-S93 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
A series of voiced CV syllables which varied along the feature of place was used in a selective adaptation paradigm. The end‐point stimuli from the test series and their voiceless counterparts were used as adaptors. Subjects used a six‐point rating scale to respond to the stimuli, instead of the usual two category identification. The average rating for end‐point stimuli from the same category as the adaptor, as well as boundary stimuli, shifted as a function of adaptation. In all cases, the average rating response shifted toward that of the unadapted category. The average rating for end‐point stimuli in the opposite category from the adaptor remained relatively unchanged. Results indicate that the entire category of the adapting stimulus changes as a result of selective adaptation and that the effect is not confined to stimuli near the phonetic boundary. These results agree with end‐point shifts found recently using a dichotic listening task [J.L. Miller, Percep. Psychophys. (in press)]. Results are interpreted in terms of recent models of the adaptation process. [Research supported by NINCDS Grant NS‐12179.]
FREE

First‐formant onset frequency as a cue to stop‐consonant voicing (A)

Quetin Summerfield

J. Acoust. Soc. Am. Volume 58, Issue S1, pp. S93-S93 (1975); (1 page)

Online Publication Date: 11 Aug 2005

Full Text: | Download PDF

Show Abstract
Both we and Stevens and Klatt have suggested that a voiced first formant (F1) transition following voicing onset in a stop‐vowel syllable is a positive cue to stop voicing. Lisker et al. [J. Acoust. Soc. Am. 57, S50(A) (1975)] pointed out that the salient perceptual cue could alternatively be the F1 frequency at voicing onset as low values directly indicate synchrony of glottal vibration with vocal tract closure. Phoneme boundaries were determined by an adaptive PEST algorithm firstly on seven VOT /g‐k/ continua with identical F2 and F3 but differentiated by the frequency of a steady transitionless F1. Phoneme boundary values in VOT increased monotonically as F1 frequency was reduced from 400 to 200 Hz. Secondly, phoneme boundaries were determined on seven VOT /g‐k/ continua in which voicing always onset at 250 Hz in F1 and was followed, irrespective of VOT, by 0, 6, 12, 18, 24, 30, or 36 msec of linear F1 transition at 5 Hz/msec to a variable steady state. With increasing F1 transitions, phoneme boundaries shifted unexpectedly towards shorter VOTs. Thus the trading value against the ether cue, VOT, varies in the expected direction with onset frequency but not with transition extent, supporting Lisker et al. [Work supported by J.S.R.U., U.K.]
Close

close