University of Twente Student Theses
Preprocessing on bilingual data for Statistical Machine Translation
Fournier, Bas (2008) Preprocessing on bilingual data for Statistical Machine Translation.
PDF
435kB |
Abstract: | Machine Translation (MT) is the translation of text from one human language to another by a computer. Computers, like all machines, are excellent at taking over repetitive and mundane tasks from humans. As translating long texts from one language to another qualifies as such a task, Machine Translation is a potentially very economic way of translation. Unfortunately natural languages are not very suitable for processing by a machine. They are ambiguous, illogical and constantly evolving, qualities that are difficult to handle with a machine. This makes the problem of Natural Language Processing, and by extension MT, a difficult one to solve. A theoretical method that can analyze a text in a natural language and decipher its semantic content can store this semantic content in a language-independent representation. From this representation, another text with the same semantic content can be generated in any language for which exists a generation mechanism. Such an MT architecture would provide high quality translations, and be modular; a new language could be added to the pool of inter-translatable languages simply by developing an analysis and generation method for that language. Unfortunately this method does not exist. Some existing MT attempts to approach it to a degree, but as long as semantic analysis remains an unsolved problem in the field of Natural Language Processing there can be no true language independent representation. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science MSc (60300) |
Link to this item: | https://purl.utwente.nl/essays/58377 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page