Speech Enhancement Based on a Sinusoidal Model Sinusoidal modeling is a new procedure for representing the speech signal. In this approach, the signal is divided into overlapping segments, the Fourier transform computed for each segment, and a set of desired spectral peaks is identified. The speech is then resynthesized using sinusoids that have the frequency, amplitude, and ... Research Article
Research Article  |   April 01, 1994
Speech Enhancement Based on a Sinusoidal Model
 
Author Affiliations & Notes
  • James M. Kates
    Center for Research in Speech and Hearing Sciences City University of New York
  • Contact author: James M. Kates, Center for Research in Speech and Hearing Sciences, City University of New York, Graduate Center, Room 901, 33 West 42nd Street, New York, NY 10036.
Article Information
Speech, Voice & Prosodic Disorders / Hearing & Speech Perception / Acoustics / Speech, Voice & Prosody / Hearing / Research Articles
Research Article   |   April 01, 1994
Speech Enhancement Based on a Sinusoidal Model
Journal of Speech, Language, and Hearing Research, April 1994, Vol. 37, 449-464. doi:10.1044/jshr.3702.449
History: Received April 16, 1993 , Accepted November 16, 1993
 
Journal of Speech, Language, and Hearing Research, April 1994, Vol. 37, 449-464. doi:10.1044/jshr.3702.449
History: Received April 16, 1993; Accepted November 16, 1993

Sinusoidal modeling is a new procedure for representing the speech signal. In this approach, the signal is divided into overlapping segments, the Fourier transform computed for each segment, and a set of desired spectral peaks is identified. The speech is then resynthesized using sinusoids that have the frequency, amplitude, and phase of the selected peaks, with the remaining spectral information being discarded. Using a limited number of sinusoids to reproduce speech in a background of multi-talker speech babble results in a speech signal that has an improved signal-to-noise ratio and enhanced spectral contrast. The more intense spectral components, assumed to be primarily the desired speech, are reproduced, whereas the less intense components, assumed to be primarily background noise, are not. To test the effectiveness of this processing approach as a noise suppression technique, both consonant recognition and perceived speech intelligibility were determined in quiet and in noise for a group of subjects with normal hearing as the number of sinusoids used to represent isolated speech tokens was varied. The results show that reducing the number of sinusoids used to represent the speech causes reduced consonant recognition and perceived intelligibility both in quiet and in noise, and suggests that similar results would be expected for listeners with hearing impairments.

Acknowledgments
The assistance of Janet Reath Schoepflin in testing the subjects is greatly appreciated. This work was supported by NIDCD under grant 2P01DC00178.
Order a Subscription
Pay Per View
Entire Journal of Speech, Language, and Hearing Research content & archive
24-hour access
This Article
24-hour access