Speech Synthesis Using Damped Sinusoids A speech synthesizer was developed that operates by summing exponentially damped sinusoids at frequencies and amplitudes corresponding to peaks derived from the spectrum envelope of the speech signal. The spectrum analysis begins with the calculation of a smoothed Fourier spectrum. A masking threshold is then computed for each frame as ... Research Article
Research Article  |   August 01, 2002
Speech Synthesis Using Damped Sinusoids
 
Author Affiliations & Notes
  • James M. Hillenbrand, PhD
    Department of Speech Pathology and Audiology Western Michigan University Kalamazoo
  • Robert A. Houde
    Department of Speech Pathology and Audiology Western Michigan University Kalamazoo
  • Contact author: James M. Hillenbrand, PhD, Department of Speech Pathology and Audiology, Western Michigan University, Kalamazoo, MI 49008. E-mail: james.hillenbrand@wmich.edu
Article Information
Speech, Voice & Prosodic Disorders / Speech, Voice & Prosody / Speech / Research Articles
Research Article   |   August 01, 2002
Speech Synthesis Using Damped Sinusoids
Journal of Speech, Language, and Hearing Research, August 2002, Vol. 45, 639-650. doi:10.1044/1092-4388(2002/051)
History: Received November 5, 2001 , Accepted March 22, 2002
 
Journal of Speech, Language, and Hearing Research, August 2002, Vol. 45, 639-650. doi:10.1044/1092-4388(2002/051)
History: Received November 5, 2001; Accepted March 22, 2002
Web of Science® Times Cited: 4

A speech synthesizer was developed that operates by summing exponentially damped sinusoids at frequencies and amplitudes corresponding to peaks derived from the spectrum envelope of the speech signal. The spectrum analysis begins with the calculation of a smoothed Fourier spectrum. A masking threshold is then computed for each frame as the running average of spectral amplitudes over an 800-Hz window. In a rough simulation of lateral suppression, the running average is then subtracted from the smoothed spectrum (with negative spectral values set to zero), producing a masked spectrum. The signal is resynthesized by summing exponentially damped sinusoids at frequencies corresponding to peaks in the masked spectra. If a periodicity measure indicates that a given analysis frame is voiced, the damped sinusoids are pulsed at a rate corresponding to the measured fundamental period. For unvoiced speech, the damped sinusoids are pulsed on and off at random intervals. A perceptual evaluation of speech produced by the damped sinewave synthesizer showed excellent sentence intelligibility, excellent intelligibility for vowels in /hVd/ syllables, and fair intelligibility for consonants in CV nonsense syllables.

Acknowledgments
This work was supported by a grant from the National Institutes of Health (2-R01-DC01661) to Western Michigan University. We are grateful to Robert Shannon of House Ear Institute for making the consonant recordings available to us and to Michael Dorman of Arizona State University for providing the HINT sentences. We would also like to thank Michael Clark for helpful comments on an earlier draft.
Order a Subscription
Pay Per View
Entire Journal of Speech, Language, and Hearing Research content & archive
24-hour access
This Article
24-hour access