Comparing Two Methods for Reducing Variability in Voice Quality Measurements PurposeInterrater disagreements in ratings of quality plague the study of voice. This study compared 2 methods for handling this variability.MethodListeners provided multiple breathiness ratings for 2 sets of pathological voices, one including 20 male and 20 female voices unselected for quality and one including 20 breathy female voices. Ratings for ... Article
Article  |   June 01, 2011
Comparing Two Methods for Reducing Variability in Voice Quality Measurements
 
Author Affiliations & Notes
  • Jody Kreiman
    University of California, Los Angeles
    University of California, Los Angeles
  • Bruce R. Gerratt
    University of California, Los Angeles
    University of California, Los Angeles
  • Correspondence to Jody Kreiman: jkreiman@ucla.edu
  • Editor: Anne Smith
    Editor: Anne Smith×
  • Associate Editor: Robert Hillman
    Associate Editor: Robert Hillman×
Article Information
Speech, Voice & Prosodic Disorders / Voice Disorders / Speech, Voice & Prosody / Speech
Article   |   June 01, 2011
Comparing Two Methods for Reducing Variability in Voice Quality Measurements
Journal of Speech, Language, and Hearing Research, June 2011, Vol. 54, 803-812. doi:10.1044/1092-4388(2010/10-0083)
History: Received March 30, 2010 , Accepted October 15, 2010
 
Journal of Speech, Language, and Hearing Research, June 2011, Vol. 54, 803-812. doi:10.1044/1092-4388(2010/10-0083)
History: Received March 30, 2010; Accepted October 15, 2010
Web of Science® Times Cited: 4

PurposeInterrater disagreements in ratings of quality plague the study of voice. This study compared 2 methods for handling this variability.

MethodListeners provided multiple breathiness ratings for 2 sets of pathological voices, one including 20 male and 20 female voices unselected for quality and one including 20 breathy female voices. Ratings for each listener were averaged together, mean ratings were z transformed, and the likelihood that 2 listeners would agree exactly in their ratings was calculated as a function of averaging and standardizing condition. Data were also multidimensionally scaled to examine similarities among listeners in perceptual strategy. Results were compared with parallel analyses of existing breathiness ratings of the same voices gathered using a method-of-adjustment task.

ResultsThree-way interactions between the mean rating for a voice, standardization condition, and the number of voices averaged together were observed, but no main effect of averaging condition emerged. Multidimensional scaling revealed significant residual differences in perceptual strategy across listeners after averaging and standardizing. Ratings from the method-of-adjustment task showed both high agreement levels and consistent perceptual strategies across listeners, as theoretically predicted.

ConclusionAveraging multiple ratings and standardizing the mean are inadequate in addressing variations in voice quality perception.

Acknowledgments
This research was supported by National Institute on Deafness and Other Communication Disorders Grant DC01797. We thank Norma Antoñanzas-Barroso for significant programming support.
Order a Subscription
Pay Per View
Entire Journal of Speech, Language, and Hearing Research content & archive
24-hour access
This Article
24-hour access