Application of Psychometric Theory to the Measurement of Voice Quality Using Rating Scales Rating scales are commonly used to study voice quality. However, recent research has demonstrated that perceptual measures of voice quality obtained using rating scales suffer from poor interjudge agreement and reliability, especially in the midrange of the scale. These findings, along with those obtained using multidimensional scaling (MDS), have been ... Tutorial
Tutorial  |   April 01, 2005
Application of Psychometric Theory to the Measurement of Voice Quality Using Rating Scales
 
Author Affiliations & Notes
  • Rahul Shrivastav
    University of Florida, Gainesville
  • Christine M. Sapienza
    University of Florida, Gainesville
  • Vuday Nandur
    University of Florida, Gainesville
  • Contact author: Rahul Shrivastav, Department of Communication Science and Disorders, University of Florida, Dauer Hall, Room 336, Gainesville, FL 32611. E-mail: rahul@csd.ufl.edu
Article Information
Speech, Voice & Prosodic Disorders / Voice Disorders / Speech / Tutorial
Tutorial   |   April 01, 2005
Application of Psychometric Theory to the Measurement of Voice Quality Using Rating Scales
Journal of Speech, Language, and Hearing Research, April 2005, Vol. 48, 323-335. doi:10.1044/1092-4388(2005/022)
History: Received August 25, 2003 , Revised March 2, 2004 , Accepted August 5, 2004
 
Journal of Speech, Language, and Hearing Research, April 2005, Vol. 48, 323-335. doi:10.1044/1092-4388(2005/022)
History: Received August 25, 2003; Revised March 2, 2004; Accepted August 5, 2004
Web of Science® Times Cited: 57

Rating scales are commonly used to study voice quality. However, recent research has demonstrated that perceptual measures of voice quality obtained using rating scales suffer from poor interjudge agreement and reliability, especially in the midrange of the scale. These findings, along with those obtained using multidimensional scaling (MDS), have been interpreted to show that listeners perceive voice quality in an idiosyncratic manner. Based on psychometric theory, the present research explored an alternative explanation for the poor interlistener agreement observed in previous research. This approach suggests that poor agreement between listeners may result, in part, from measurement errors related to a variety of factors rather than true differences in the perception of voice quality. In this study, 10 listeners rated breathiness for 27 vowel stimuli using a 5-point rating scale. Each stimulus was presented to the listeners 10 times in random order. Interlistener agreement and reliability were calculated from these ratings. Agreement and reliability were observed to improve when multiple ratings of each stimulus from each listener were averaged and when standardized scores were used instead of absolute ratings. The probability of exact agreement was found to be approximately .9 when using averaged ratings and standardized scores. In contrast, the probability of exact agreement was only .4 when a single rating from each listener was used to measure agreement. These findings support the hypothesis that poor agreement reported in past research partly arises from errors in measurement rather than individual differences in the perception of voice quality.

Acknowledgments
The authors would like to thank Gary Kidd, J. D. Harnsberger, and W. S. Brown Jr. for their comments on an earlier version of the manuscript. The authors would also like to thank three anonymous reviewers for their suggestions.
Order a Subscription
Pay Per View
Entire Journal of Speech, Language, and Hearing Research content & archive
24-hour access
This Article
24-hour access