Speaker Race Identification From Acoustic Cues in the Vocal Signal One-second acoustic samples were extracted from the mid-portion of sustained /a/ vowels produced by 50 black and 50 white adult males. Each vowel sample from a black subject was randomly paired with a sample from a white subject. From the tape-recorded samples alone, both expert and naive listeners could determine ... Research Article
Research Article  |   August 1994
Speaker Race Identification From Acoustic Cues in the Vocal Signal
 
Author Affiliations & Notes
  • Julie H. Walton
    University of Mississippi University
  • Robert F. Orlikoff
    Memphis State University Memphis, TN
  • Contact author: Julie H. Walton, PhD, Department of Communicative Disorders, University of Mississippi, University, MS 38677.
Article Information
Hearing & Speech Perception / Acoustics / Speech, Voice & Prosody / Speech / Research Articles
Research Article   |   August 1994
Speaker Race Identification From Acoustic Cues in the Vocal Signal
Journal of Speech, Language, and Hearing Research, August 1994, Vol. 37, 738-745. doi:10.1044/jshr.3704.738
History: Received April 1, 1993 , Accepted January 4, 1994
 
Journal of Speech, Language, and Hearing Research, August 1994, Vol. 37, 738-745. doi:10.1044/jshr.3704.738
History: Received April 1, 1993; Accepted January 4, 1994

One-second acoustic samples were extracted from the mid-portion of sustained /a/ vowels produced by 50 black and 50 white adult males. Each vowel sample from a black subject was randomly paired with a sample from a white subject. From the tape-recorded samples alone, both expert and naive listeners could determine the race of the speaker with 60% accuracy. The accuracy of race identification was independent of the listener’s own race, sex, or listening experience. An acoustic analysis of the samples revealed that, although within ranges reported by previous studies of normal voices, the black speakers had greater frequency perturbation, significantly greater amplitude perturbation, and a significantly lower harmonics-to-noise ratio than did the white speakers. The listeners were most successful in distinguishing voice pairs when the differences in vocal perturbation and additive noise were greatest and were least successful when such differences were minimal or absent. Because there were no significant differences in the mean fundamental frequency or formant structure of the voice samples, it is likely that the listeners relied on differences in spectral noise to discriminate the black and white speakers.

Acknowledgments
This paper is based on a doctoral dissertation completed by the first author within the Department of Audiology and Speech Pathology at Memphis State University. This research was supported, in part, by the Center for Research Initiatives and Strategies for the Communicatively Impaired (CRISCI), Memphis State University, and by a grant from the Federal Office of Special Education and Rehabilitation Services, #H029D10070, Preparation of Leadership Personnel for Multicultural Issues in Communication Disorders. The authors wish to thank Joel C. Kahane, Walter H. Manning, and Russell E. Thomas for many helpful suggestions made during the course of this research. We would also like to thank E. Thomas Doherty, Gail B. Kempster, Norman J. Lass, and an anonymous reviewer for their insightful comments and suggestions. An earlier version of this paper was presented at the annual convention of the American Speech-Language-Hearing Association held in San Antonio, TX, November 1992.
Order a Subscription
Pay Per View
Entire Journal of Speech, Language, and Hearing Research content & archive
24-hour access
This Article
24-hour access