Vocal Tract Representation in the Recognition of Cerebral Palsied Speech PurposeIn this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine.MethodData were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech. In ... Article
Article  |   August 01, 2012
Vocal Tract Representation in the Recognition of Cerebral Palsied Speech
 
Author Affiliations & Notes
  • Graeme Hirst
    University of Toronto, Ontario, Canada
  • Pascal van Lieshout
    University of Toronto, Ontario, Canada
  • Correspondence to Frank Rudzicz, who is affiliated with both the University of Toronto and Toronto Rehabilitation Institute:frank@cs.toronto.edu
  • Editor: Anne Smith
    Editor: Anne Smith×
  • Associate Editor: Wolfram Ziegler
    Associate Editor: Wolfram Ziegler×
Article Information
Speech, Voice & Prosodic Disorders / Dysarthria / Special Populations / Genetic & Congenital Disorders / Speech, Voice & Prosody / Speech
Article   |   August 01, 2012
Vocal Tract Representation in the Recognition of Cerebral Palsied Speech
Journal of Speech, Language, and Hearing Research, August 2012, Vol. 55, 1190-1207. doi:10.1044/1092-4388(2011/11-0223)
History: Received August 13, 2011 , Revised November 3, 2011 , Accepted December 6, 2011
 
Journal of Speech, Language, and Hearing Research, August 2012, Vol. 55, 1190-1207. doi:10.1044/1092-4388(2011/11-0223)
History: Received August 13, 2011; Revised November 3, 2011; Accepted December 6, 2011
Web of Science® Times Cited: 3

PurposeIn this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine.

MethodData were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech. In the 1st experiment, the authors provided a baseline model indicating a relatively low performance with traditional automatic speech recognition (ASR) using only acoustic data from dysarthric individuals. In the 2nd experiment, the authors used various measures of entropy (statistical disorder) to determine whether characteristics of dysarthric articulation can reduce uncertainty in features of dysarthric acoustics. These findings led to the 3rd experiment, in which recorded dysarthric articulation was directly encoded into the speech recognition process.

ResultsThe authors found that 18.3% of the statistical disorder in the acoustics of speakers with dysarthria can be removed if articulatory parameters are known. Using articulatory models reduces phoneme recognition errors relatively by up to 6% for speakers with dysarthria in speaker-dependent systems.

ConclusionsArticulatory knowledge is useful in reducing rates of error in ASR for speakers with dysarthria and in reducing statistical uncertainty of their acoustic signals. These findings may help to guide clinical decisions related to the use of ASR in the future.

Acknowledgments
This work was funded by the University of Toronto, Bell University Labs, the Natural Sciences and Engineering Research Council of Canada (Grant CRDPJ 364360-07), and the Canada Research Chair program. We acknowledge Gerald Penn and Fraser Shein for their academic contributions throughout this process as well as Aravind Namasivayam and Talya Wolff during the collection of the TORGO database. We are grateful to the participants who expended considerable time and effort in contributing to the TORGO database.
Order a Subscription
Pay Per View
Entire Journal of Speech, Language, and Hearing Research content & archive
24-hour access
This Article
24-hour access