Open Access Open Access  Restricted Access Subscription Access

Evaluation of Speech Activity for Interview in NIST Speaker Recognition

Nirmal Kumar P., Venkatesh C.


Interview speech in ongoing NIST Speaker Recognition Evaluations (SREs) has required the improvement of Speech activity detectors (VADs) that can work under extremely low signal to-noise proportion. This paper features the qualities of Interview speech records in NIST SREs and examines the challenges of distinguishing speech/non-speech fragments in these documents. To mitigate these challenges, this paper proposes a VAD that utilizations noise reduction as a pre-preparing step. A methodology to dodge the undesirable impacts of impulsive signals and sinusoidal foundation signals on the VAD is additionally proposed. The proposed VAD is contrasted and the VAD in the ETSI-AMR speech coder for evacuating quietness areas of interview speech documents. The outcomes show that the proposed VAD is more powerful in detecting speech segments under low SNR, prompting a huge exhibition gain in Common Conditions 1–4 of NIST 2008 SRE.


Keywords: Speech activity detection, far-field microphone, speaker verification, noise reduction, spectral subtraction, NIST speaker recognition evaluations

Full Text:



ETSI, E. 301 708 V7. 1.1 (1999-12), Digital cellular telecommunications system (Phase 2+): Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels. General description(GSM 06.94 version 7.1. 1 Release 1998).

Bimbot, F., Bonastre, J. F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., & Reynolds, D. A. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Advances in Signal Processing, 2004(4), 101962.

Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech communication, 52(1), 12-40.

Jain, A. K., Flynn, P., & Ross, A. A. (Eds.). (2007). Handbook of biometrics. Springer Science & Business Media.

Kung, S. Y., Mak, M. W., & Lin, S. H. (2005). Biometric authentication: a machine learning approach (pp. 27-49). New York: Prentice Hall Professional Technical Reference.

(2009). Speaking up for biometrics. Biometric Technology Today, 8,9–11.

(2005). Financial success for biometrics? Biometric Technology Today, 13(4),9–11.

(2006). ABN AMRO to roll out speaker verification next term system for telephone banking. Biometric Technology Today, 14(7-8), 3–4.

(2009). Speaker verification finds its voice in Australia. Biometric Technology Today,17(6),4.

(2009). T-mobile trials speaker verification. Biometric Technology Today, 11, 2–3.

Rabiner, L., & Sambur, M. (1977, May). Voiced-unvoiced-silence detection using the Itakura LPC distance measure. In ICASSP'77. IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 2, pp. 323-326). IEEE.

Junqua, J. C., Reaves, B., &Mak, B. (1991). A study of endpoint detection algorithms in adverse conditions: incidence on a DTW and HMM recognizer. In Second European conference on speech communication and technology.

Sohn, J., Kim, N. S. & Sung W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.

Chang, J. H., Kim, N. S., & Mitra, S. K. (2006). Voice activity detection based on multiple statistical models. IEEE Transactions on Signal Processing, 54(6), 1965-1976.

Ramírez, J., Segura, J. C., Górriz, J. M., & García, L. (2007). Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(8), 2177-2189.

Kinnunen, T., Saastamoinen, J., Hautamaki, V., Vinni, M., & Franti, P. (2009, April). Comparing maximum a posteriori vector quantization and gaussian mixture models in speaker verification. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4229-4232). IEEE.

Hautamäki, V., Tuononen, M., Niemi-Laitinen, T., & Fränti, P. (2007, October). Improving speaker verification by periodicity based voice activity detection. In Proc. 12th Int. Conf. Speech and Computer (SPECOM’2007) (Vol. 2, pp. 645-650).

Dalmasso, E., Castaldo, F., Laface, P., Colibro, D., & Vair, C. (2009, April). Loquendo-Politecnico di Torino's 2008 NIST speaker recognition evaluation system. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4213-4216). IEEE.

Ephraim, Y., &Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on acoustics, speech, and signal processing, 32(6), 1109-1121.

Ephraim, Y., &Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE transactions on acoustics, speech, and signal processing, 33(2), 443-445.

Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on acoustics, speech, and signal processing, 27(2), 113-120.

Deller, J. R. (1993). JR., JG Proakis, and JHL Hansen. Discrete-time Processing of speech signals, 179, 180.

Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on speech and audio processing, 7(2), 126-137.

Basbug, F., Nandkumar, S., & Swaminathan, K. (1999, June). Robust voice activity detection for DTX operation of speech coders. In 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No. 99EX351) (pp. 58-60). IEEE.

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.

Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304-1312.

Pelecanos, J., &Sridharan, S. (2001). Feature warping for robust speaker verification. In Proc. Speaker Odyssey, 213–218.

Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM super vectors for speaker verification. IEEE signal processing letters, 13(5), 308-311.

Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital signal processing, 10(1-3), 19-41.

Campbell, W. M., Sturim, D. E., Reynolds, D. A., & Solomon off, A. (2006, May). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In 2006 IEEE International conference on acoustics speech and signal processing proceedings (Vol. 1, pp. I-I). IEEE.

Auckenthaler, R., Carey, M., & Lloyd-Thomas, H. (2000). Score normalization for text-independent speaker verification systems. Digital Signal Processing, 10(1-3), 42-54.


  • There are currently no refbacks.