Date of Publication :8th August 2017
Abstract: In the recent times synthetic voice is used to deceive a speaker recognition based biometric access systems. This paper presents synthetic speech detection in automatic speaker verification system (ASV) for spoof detection. Canonical Mel Frequency Cepstral Coefficients (MFCC) algorithm is used for feature extraction and Support Vector Machine (SVM) is used for classification of natural and synthetic voice. Several experiments are carried out on ASVspoof 2015 database, showing that nonlinear SVM performs better than linear SVM
Reference :
-
[1]. Adami, A., Mihaescu, R., Reynolds, D.A., and Godfrey, J.J., “Modeling prosodic dynamics for speaker recognition,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, 2003.
[2]. Ben-Hur, Asa, Horn, David, Siegelmann, Hava, and Vapnik, Vladimir; "Support vector clustering, "Journal of Machine Learning Research, pp.125–137, 2001.
[3]. D erro, I. Sainz E. Navas and I. Hernaez, “Imporoved HNM based vocoder for statistical synthesizers,” in Porc. INTERSPEECH, 2011, pp. 1809-1812.
[4]. D. Erro, I. Sainz, E. Navas, and I. Hernáez, “Harmonics plus noise model based vocoder for statistical parametric speech synthesis,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 2, pp. 184–194, Apr. 2014.
[5]. F. Alegre, A. Amehraye, and N. Evans, “Spoofing countermeasures to protect automatic speaker verification from voice conversion,” in Proc. IEEE ICASSP, May 2013, pp. 3068–3072.
[6]. Fang Zheng, Guoliang Zhang and Zhanjiang Song, "Comparison of Different Implementations of MFCC," J. Computer Science & Technology, vol. 16(6), pp. 582–589, 2001
[7]. Gerhard “Pitch Extraction and Fundamental Frequency: History and Current Techniques” Technical Report TR-CS 2003-06, November, 2003.
[8]. H. Gupta and D. Gupta, "LPC and LPCC method of feature extraction in Speech Recognition System," 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence), Noida, 2016, pp. 498-502
[9]. HMM-Based Speech Synthesis System (HTS). [Online]. Available:
[10]. J. P. Campbell, “Speaker recognition: A tutorial,” Proc. IEEE, vol. 85, no. 9, pp. 1437–1462, Sep. 1997.
[11]. J. Sanchez, I. Saratxaga, I. Hernáez, E. Navas, and D. Erro, “A crossvocoder study of speaker independent synthetic speech detection using phase information,” in Proc. INTERSPEECH, 2014, pp. 1663–1667.
[12]. J. Yamagishi et al., “Robust speaker-adaptive HMM-based text-to-speech synthesis,” IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 6, pp. 1208–1230, Aug. 2009.
[13]. J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, “Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm,” IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 1, pp. 66–83, Jan. 2009.
[14]. Kajarekar, S., Ferrer, L., Venkataraman, A., Sonmez, K., Shriberg, E., Stolcke, A., Bratt, H., Gadde, V.R.R., “Speaker recognition using prosodic and lexical features,” IEEE Speech Recognition and Understanding Workshop, St. Thomas, U.S. Virgin Islands, pp. 19–24.
[15]. L. Muda, M.Begam and I. Elamvazuthi, “Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques”, Journal of Computing, Vol. 2, No. 3, March 2010, pp. 138-143.
[16]. N. Evans, T. Kinnunen, and J. Yamagishi, “Spoofing and countermea-sures for automatic speaker verification," in INTERSPEECH 2013,
[17]. Peng Yuan, Mu Lin, Kong Xiangli, Lin Zhengqing, Wang Lei, "A study on echo feature extraction based on the modified relative spectra (RASTA) and perception linear prediction (PLP) auditory model", Intelligent Computing and Intelligent Systems (ICIS) 2010 IEEE International Conference on, vol. 2, pp. 657-661, 2010.
[18]. R. W. M. Ng, T. Lee, C. C. Leung, B. Ma and H. Li, "Spoken Language Recognition With Prosodic Features," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 9, pp. 1841- 1853, Sept. 2013.
[19]. R.E. Fan; K.W. Chang; C.J. Hsieh; X.-R. Wang; C.J. Lin "LIBLINEAR: A library for large linear classification". Journal of Machine Learning Research. vol. 9, pp. 1871–1874, 2008.
[20]. S. Imai, “Cepstral analysis synthesis on the mel frequency scale,” in Proc. IEEE ICASSP, Apr. 1983, pp. 93–96.
[21]. S. Shabani and Y. Norouzi, "Speech recognition using Principal Components Analysis and Neural Networks," 2016 IEEE 8th International Conference on Intelligent Systems (IS), Sofia, 2016, pp. 90-95.
[22]. Smola, Alex J.; Schölkopf, Bernhard "A tutorial on support vector regression," Statistics and Computing. vol. 14 ,pp. 199–222, 2004.
[23]. T. Ganchev, N. Fakotakis, and G. Kokkinakis, "Comparative evaluation of various MFCC implementations on the speaker verification task," in 10th International Conference on Speech and Computer (SPECOM 2005), Vol. 1, pp. 191–194, 2005.
[24]. T. Kinnunen, Z.-Z. Wu, K. A. Lee, F. Sedlak, E. S. Chng, and H. Li,“Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech,” in Proc. IEEE ICASSP,Mar. 2012, pp. 4401–4404.
[25]. T. Satoh, T. Masuko, T. Kobayashi, and K. Tokuda, “A robust speaker verification system against imposture using an HMM-based speech synthesis system,” in Proc. INTERSPEECH, 2001, pp. 759–762.
[26]. Vapnik, V "Support-vector networks". Machine Learning, vol. 20, pp. 273–297, 1995.
[27]. Wu Zhizhen, Kinnunen Tomi, Evans Nokolas, Yamagishi Junichi, “Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) Database, 2015
[28]. Z. Kons and H. Aronowitz, “Voice transformation-based spoofing of text-dependent speaker verification systems,” in Proc. INTERSPEECH, 2013, pp. 945–949.