Title Using synonym in machine translation system from/to  Vietnamese


The word ‘translation’ refers to transformation of one language into other.  Machine translation means automatic translation of text by computer from one natural language into another natural language. Today there are many available software for translating between natural languages, consist of commercial software and public software (the most well-known is Systran, Babel Fish, Google etc.).


One of new approaches in machine translation is the statistical approach. Statistical machine translation tries to generate translations using statistical methods based on large parallel bilingual corpora for source and target languages. The translation model is built based on the alignment of words or sequence of words between two languages. To improve the quality of the training process, additional information can be used.


In MICA center, we have developed baseline machine translation systems between Vietnamese – French and Vietnamese – English based on the statistical approach. Because the quantity of the training data for Vietnamese – French (and Vietnamese – English) is not much, so adding other linguistics information can be useful. The Vietnamese vocabulary is really rich with a lot of synonym and the Sino Vietnamese (Han Viet) words. So in this project, we want to research on the problems of:

- How to use synonym in statistical machine translation?

- Whether using synonym data can improve the quality of a statistical machine translation for under resourced language (especially for Vietnamese)?

Work description:


-          Research on the state of art of statistical machine translation

-          Research on the synonym and Sino Vietnamese vocabulary

-          Propose method of integrating these additional data to the statistical machine translation system.


-          Building a dictionary of synonym for Vietnamese and a Sino Vietnamese dictionary

-          Apply the proposed method to use these data in the statistical machine translation Vietnamese – French and Vietnamese – English

-          Verify the efficiency of the proposed method


Student prerequisites

This subject is dedicated to Vietnamese students. The students who have a fairly good knowledge about linguistic, text processing, and programming are privileged.


Dr. Do Thi-Ngoc-Diep: This email address is being protected from spambots. You need JavaScript enabled to view it.