University of Twente Student Theses


Preprocessing on bilingual data for Statistical Machine Translation

Fournier, Bas (2008) Preprocessing on bilingual data for Statistical Machine Translation.

[img] PDF
Abstract:Machine Translation (MT) is the translation of text from one human language to another by a computer. Computers, like all machines, are excellent at taking over repetitive and mundane tasks from humans. As translating long texts from one language to another qualifies as such a task, Machine Translation is a potentially very economic way of translation. Unfortunately natural languages are not very suitable for processing by a machine. They are ambiguous, illogical and constantly evolving, qualities that are difficult to handle with a machine. This makes the problem of Natural Language Processing, and by extension MT, a difficult one to solve. A theoretical method that can analyze a text in a natural language and decipher its semantic content can store this semantic content in a language-independent representation. From this representation, another text with the same semantic content can be generated in any language for which exists a generation mechanism. Such an MT architecture would provide high quality translations, and be modular; a new language could be added to the pool of inter-translatable languages simply by developing an analysis and generation method for that language. Unfortunately this method does not exist. Some existing MT attempts to approach it to a degree, but as long as semantic analysis remains an unsolved problem in the field of Natural Language Processing there can be no true language independent representation.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page