Effects of Training on the Acoustic–Phonetic Representation of Synthetic Speech Purpose Investigate training-related changes in acoustic−phonetic representation of consonants produced by a text-to-speech (TTS) computer speech synthesizer. Method Forty-eight adult listeners were trained to better recognize words produced by a TTS system. Nine additional untrained participants served as controls. Before and after training, participants were tested on consonant ... Research Article
Research Article  |   December 01, 2007
Effects of Training on the Acoustic–Phonetic Representation of Synthetic Speech
 
Author Affiliations & Notes
  • Alexander L. Francis
    Purdue University
  • Howard C. Nusbaum
    University of Chicago
  • Kimberly Fenn
    University of Chicago
  • Contact author: Alexander L. Francis, Speech, Language, and Hearing Sciences, Purdue University, Heavilon Hall, 500 Oval Drive, West Lafayette, IN 47907. E-mail: francisa@purdue.edu.
Article Information
Speech, Voice & Prosodic Disorders / Hearing & Speech Perception / Acoustics / Augmentative & Alternative Communication / Attention, Memory & Executive Functions / Speech, Voice & Prosody / Speech / Research Articles
Research Article   |   December 01, 2007
Effects of Training on the Acoustic–Phonetic Representation of Synthetic Speech
Journal of Speech, Language, and Hearing Research, December 2007, Vol. 50, 1445-1465. doi:10.1044/1092-4388(2007/100)
History: Received July 5, 2006 , Revised November 6, 2006 , Accepted April 4, 2007
 
Journal of Speech, Language, and Hearing Research, December 2007, Vol. 50, 1445-1465. doi:10.1044/1092-4388(2007/100)
History: Received July 5, 2006; Revised November 6, 2006; Accepted April 4, 2007
Web of Science® Times Cited: 27

Purpose Investigate training-related changes in acoustic−phonetic representation of consonants produced by a text-to-speech (TTS) computer speech synthesizer.

Method Forty-eight adult listeners were trained to better recognize words produced by a TTS system. Nine additional untrained participants served as controls. Before and after training, participants were tested on consonant recognition and made pairwise judgments of consonant dissimilarity for subsequent multidimensional scaling (MDS) analysis.

Results Word recognition training significantly improved performance on consonant identification, although listeners never received specific training on phoneme recognition. Data from 31 participants showing clear evidence of learning (improvement ≥ 10 percentage points) were further investigated using MDS and analysis of confusion matrices. Results show that training altered listeners’ treatment of particular acoustic cues, resulting in both increased within-class similarity and between-class distinctiveness. Some changes were consistent with current models of perceptual learning, but others were not.

Conclusion Training caused listeners to interpret the acoustic properties of synthetic speech more like those of natural speech, in a manner consistent with a flexible-feature model of perceptual learning. Further research is necessary to refine these conclusions and to investigate their applicability to other training-related changes in intelligibility (e.g., associated with learning to better understand dysarthric speech or foreign accents).

Acknowledgments
Some of the data in this article derive from part of a doctoral dissertation submitted by the first author to the Department of Psychology and the Department of Linguistics at the University of Chicago. Some of these results were presented at the 136th meeting of the Acoustical Society of America in Norfolk, Virginia, on October 15, 1998. This work was supported, in part, by a grant from the Division of Social Sciences at the University of Chicago to the second author and by National Institutes of Health Grant R03 DC006811 (awarded to the first author). We are grateful to Lisa Goffman and Jessica Huber for helpful comments on previous versions of this article.
This article is dedicated to the memory of Nick Ing-Simmons, author of rsynth.
Order a Subscription
Pay Per View
Entire Journal of Speech, Language, and Hearing Research content & archive
24-hour access
This Article
24-hour access