A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
Science Journal of Circuits, Systems and Signal Processing
Volume 6, Issue 2, April 2017, Pages: 11-17
Received: Sep. 11, 2017;
Accepted: Sep. 21, 2017;
Published: Oct. 23, 2017
Views 1715 Downloads 115
Kazi Mahmudul Hassan, Department of Computer Science & Engineering, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh
Ekramul Hamid, Department of Computer Science & Engineering, University of Rajshahi, Rajshahi, Bangladesh
Khademul Islam Molla, Department of Computer Science & Engineering, University of Rajshahi, Rajshahi, Bangladesh
This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.
Kazi Mahmudul Hassan,
Khademul Islam Molla,
A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image, Science Journal of Circuits, Systems and Signal Processing.
Vol. 6, No. 2,
2017, pp. 11-17.
Jong Kwan Lee, Chang D. Yoo, “Wavelet speech enhancement based on voiced/unvoiced decision”, Korea Advanced Institute of Science and Technology The 32nd International Congress and Exposition on Noise Control Engineering, Jeju International Convention Center, Seogwipo, Korea, August 25-28, 2003.
B. Atal, and L. Rabiner, “A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition,” IEEE Trans. On ASSP, vol. ASSP-24, pp. 201-212, 1976.
S. Ahmadi, and A. S. Spanias, “Cepstrum-Based Pitch Detection using a New Statistical V/UV Classification Algorithm,” IEEE Trans. Speech Audio Processing, vol. 7 No. 3, pp. 333-338, 1999.
Y. Qi, and B. R. Hunt, “Voiced-Unvoiced-Silence Classifications of Speech using Hybrid Features and a Network Classifier,” IEEE Trans. Speech Audio Processing, vol. 1 No. 2, pp. 250-255, 1993.
L. Siegel, “A Procedure for using Pattern Classification Techniques to obtain a Voiced/Unvoiced Classifier”, IEEE Trans. on ASSP, vol. ASSP-27, pp. 83- 88, 1979.
T. L. Burrows, “Speech Processing with Linear and Neural Network Models”, Ph.D. thesis, Cambridge University Engineering Department, U.K., 1996.
D. G. Childers, M. Hahn, and J. N. Larar, “Silent and Voiced/Unvoiced/Mixed Excitation (Four-Way) Classification of Speech,” IEEE Trans. on ASSP, vol. 37 No. 11, pp. 1771-1774, 1989.
Jashmin K. Shah, Ananth N. Iyer, Brett Y. Smolenski, and Robert E. Yantorno “Robust voiced/unvoiced classification using novel features and Gaussian Mixture model”, Speech Processing Lab., ECE Dept., Temple University, 1947 N 12th St., Philadelphia, PA 19122-6077, USA.
Jaber Marvan, “Voice Activity detection Method and Apparatus for voiced/unvoiced decision and Pitch Estimation in a Noisy speech feature extraction”, 08/23/2007, United States Patent 20070198251.
Rabiner, L. R., and Schafer, R. W., Digital Processing of Speech Signals, Englewood Cliffs, New Jersey, Prentice Hall, 512-ISBN-13: 9780132136037, 1978.
Karen Kafadar,” Gaussian white-noise generation for digital signal synthesis” IEEE Transactions on Instrumentation and Measurement, Volume: IM-35, Issue: 4, Dec. 1986 DOI: 10.1109/TIM.1986.6499122
Titze, I. R. “Principles of Voice Production”, Prentice Hall (currently published by NCVS.org) (pp. 188), 1994, ISBN 978-0-13-717893-3.
Baken, R. J. “Clinical Measurement of Speech and Voice”. London: Taylor and Francis Ltd. (pp. 177), 1987, ISBN 1-5659-3869-0.
Alkulaibi, A., Soraghan, J. J., and Durrani, T. S., “Fast HOS based simultaneous voiced/unvoiced detection and pitch estimation using 3-level binary speech signals”, in the proceedings of 8th IEEE Signal Processing Workshop on Statistical Signal and Array Processing, pp. 194-197, 1996.
Lobo, and Loizou, P., "Voiced/unvoiced speech discrimination in noise using Gabor atomic decomposition”, in the Proceedings of ICASSP, pp. 820-823, 2003.