Developing an expressive prosody generator for Vietnamese Text-to-speech


Today, the human-computer interaction is reaching the naturalness and is increasingly similar to the human-human interaction, including the expressiveness (especially emotions and attitudes). In spoken communication, attitudes or social affects are mainly transferred through prosody. For tonal languages such as Vietnamese, prosody is also used to encode semantic information via tones.

Attempts to add expressivity (emotion, attitude) to human-machine speech communication is now a “hot” topic in speech processing field. Study Vietnamese expressive speech and apply to Text to Speech is an approach of MICA institute.

In the work of [1] presented the preliminary attempt to add the expressivity to speech synthesis in Vietnamese. Based on the concept of prosodic contour superposition, a prosodic model was proposed to encode the attitudinal function of prosody for Vietnamese attitudes.

The objective of this project is to apply this proposed model to develop an expressive prosody generator and integrate in the Vietnamese Text-To-Speech system of MICA Institute.



Work description:

This project includes these flowing principle works:

- Firstly, students need to study about the theory of

  • Speech communication and speech processing
  • Prosody and prosody modeling
  • Vietnamese language

- Understanding the prosodic model proposed by [1], thus applying this model to developed a prosodic generator module for expressive speech.

- Integration this module in the MICA’s Text To Speech system.

- Evaluation the system with perception tests.



No preliminary work, some basic knowledge of speech processing and digital signal processing would be useful for this works.

Programing skill: C++, Praat script


Student profile

- Engineer student from computer sciences, signal processing or linguistic field

- Vietnamese or French student (or from other countries)

Supervisors/contacts Dr. Mac Dang Khoa : This email address is being protected from spambots. You need JavaScript enabled to view it.
Dr. Tran Do Dat  : This email address is being protected from spambots. You need JavaScript enabled to view it.



[1] . Mac, D. K., E. Castelli & V. Auberge (2012). Modeling the prosody of Vietnamese attitudes for expressive speech synthesis. Workshop of Spoken Languages Technologies for Under-resourced Languages (SLTU 2012). Cape Town, South Africa: 114-118.