An Optimal Set of Flesh Points on Tongue and Lips for Speech-Movement Classification Purpose The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. Method The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, ... Research Article
Research Article  |   February 01, 2016
An Optimal Set of Flesh Points on Tongue and Lips for Speech-Movement Classification
 
Author Affiliations & Notes
  • Jun Wang
    Speech Disorders & Technology Lab, The University of Texas at Dallas
    Callier Center for Communication Disorders, The University of Texas at Dallas
    University of Texas Southwestern Medical Center, Dallas
  • Ashok Samal
    University of Nebraska–Lincoln
  • Panying Rong
    MGH Institute of Health Professions, Boston, MA
  • Jordan R. Green
    MGH Institute of Health Professions, Boston, MA
  • Disclosure: The authors have declared that no competing interests existed at the time of publication.
    Disclosure: The authors have declared that no competing interests existed at the time of publication. ×
  • Correspondence to Jun Wang: wangjun@utdallas.edu
  • Editor: Jody Kreiman
    Editor: Jody Kreiman×
  • Associate Editor: Kate Bunton
    Associate Editor: Kate Bunton×
Article Information
Speech, Voice & Prosody / Speech / Research Articles
Research Article   |   February 01, 2016
An Optimal Set of Flesh Points on Tongue and Lips for Speech-Movement Classification
Journal of Speech, Language, and Hearing Research, February 2016, Vol. 59, 15-26. doi:10.1044/2015_JSLHR-S-14-0112
History: Received April 24, 2014 , Revised November 10, 2014 , Accepted August 7, 2015
 
Journal of Speech, Language, and Hearing Research, February 2016, Vol. 59, 15-26. doi:10.1044/2015_JSLHR-S-14-0112
History: Received April 24, 2014; Revised November 10, 2014; Accepted August 7, 2015

Purpose The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements.

Method The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (support-vector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors.

Results When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front).

Conclusion We identified a 4-sensor set—that is, T1, T4, UL, LL—that yielded a classification accuracy (91%–95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.

Acknowledgments
This work was supported in part by the Excellence in Education Fund, The University of Texas at Dallas; the Barkley Trust, Barkley Memorial Center, University of Nebraska–Lincoln; National Institutes of Health Grants R01 DC009890 (principal investigator: Jordan R. Green), R01 DC013547 (principal investigator: Jordan R. Green), and R03 DC013990 (principal investigator: Jun Wang); and the American Speech-Language-Hearing Foundation through a New Century Scholar Research Grant (principal investigator: Jun Wang). About a quarter of the data and analysis was presented at the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (Vancouver, Canada) and published in the conference's annual proceedings (Wang, Green, & Samal, 2013). We would like to thank Tom D. Carrell, Mili Kuruvilla, Lori Synhorst, Cynthia Didion, Rebecca Hoesing, Kayanne Hamling, Katie Lippincott, Kelly Veys, Tony Boney, and Lindsey Macy for their contribution to participant recruitment, data management, data collection, data processing, and other support.
Order a Subscription
Pay Per View
Entire Journal of Speech, Language, and Hearing Research content & archive
24-hour access
This Article
24-hour access