Author : Naina Teertha 1
Date of Publication :7th May 2016
Abstract: In this paper, we investigate the use of a new continuity measure based on maximum signal correlation for optimal selection of units in concatenative text-to-speech (TTS) synthesis framework. We explore two formulations for calculating the signal correlation: cross correlation (CC) based and average magnitude difference function (AMDF) based. We first perform an initial experiment to understand the significance of the approach and then build 5 experimental systems which are available a web demo. Evaluations on 30 sentences each for Telugu and Hindi by native users of the languages show that the proposed continuity measure results in more natural sounding synthesis.
Reference :
-
[1] S. K. Rallabandi, A. Vadapalli, S. Achanta, and S. V. Gangashetty, “Iiit-h’s entry to blizzard challenge 2015,” in Interspeech 2015.
[2] S. Kishore and A. W. Black, “Unit size in unit selection speech synthesis,” in Eighth European Conference on Speech Communication and Technology, 2003.
[3] K. Prahallad, A. R. Toth, and A. W. Black, “Automatic building of synthetic voices from large multi-paragraph speech databases.” in INTERSPEECH, 2007, pp. 2901–2904.
[4] H. A. Murthy, “Methods for improving the quality of syllable based speech synthesis,” 2008.
[5] M. V. Vinodh, A. Bellur, K. B. Narayan, D. M. Thakare, A. Susan, N. M. Suthakar, and H. A. Murthy, “Using polysyllabic units for text to speech synthesis in indian languages,” in Communications (NCC), 2010 National Con- ference on, Jan 2010, pp. 1–5.
[6] K. S. Rao and B. Yegnanarayana, “Modeling durations of syllables using neural networks,” Computer Speech & Language, vol. 21, no. 2, pp. 282–295, 2007.
[7] A. Bellur, K. B. Narayan, K. R. Krishnan, and H. A. Murthy, “Prosody modeling for syllable-based concatena- tive speech synthesis of hindi and tamil,” in Communica- tions (NCC), 2011 National Conference on, Jan 2011, pp. 1–5.
[8] H. R. Shiva Kumar, J. K. Ashwini, B. S. R. Rajaram, and A. G. Ramakrishnan, “Mile tts for tamil and kannada for blizzard challenge 2013,” in Blizzard Challenge 2013 workshop, Barcelona, Catalonia. CMU, 2013.
[9] V. R. Lakkavalli, P. Arulmozhi, and A. G. Ramakrishnan, “Continuity metric for unit selection based text-tospeech synthesis,” in Signal Processing and Communica tions (SPCOM), 2010 International Conference on, July 2010, pp. 1– 5.
[10] S. K. H. Rajaram, BSR and A. Ramakrishnan, “Mile tts for tamil for blizzard challenge 2014,” in Signal Process-ing and Communications (SPCOM), 2010 International Conference on. IEEE, 2010, pp. 1–5.
[11] T. Hirai and S. Tenpaku, “Using 5 ms segments in concatenative speech synthesis,” in Fifth ISCA Workshop on Speech Synthesis, 2004.
[12] V. Peddinti and K. Prahallad, “Significance of vowel epenthesis in telugu text-to-speech synthesis,” in Acous- tics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, May 2011, pp. 5348– 5351.
[13] N. K. Elluru, A. Vadapalli, R. Elluru, H. Murthy, and K. Prahallad, “Is word-to-phone mapping better than phone-phone mapping for handling english words?” in ACL (2), 2013, pp. 196–200.