Tree-Based Statistical Machine Translation: Experiments with the English and Brazilian Portuguese Pair

Título: Tree-Based Statistical Machine Translation: Experiments with the English and Brazilian Portuguese Pair

Autores: Beck, Daniel; Caseli, Helena

Resumo: Machine Learning paradigms have dominated recent research in Machine Translation. Current state-of-the-art approaches rely only on statistical methods that gather all necessary knowledge from parallel corpora. However, this lack on explicit linguistic knowledge makes them unable to model some linguistic phenomena. In this work, we focus on models that take into account the syntactic information from the languages involved on the translation process. We follow a novel approach that preprocess parallel corpora using syntactic parsers and uses translation models composed by Tree Transducers. We per- form experiments with English and Brazilian Portuguese, providing the first known results in syntax-based Statistical Machine Translation for this language pair. These results show that this approach is able to better model phenomena like long-distance reordering and give directions to future improvements in building syntax-based translation models for this pair.

Palavras-chave: Statistical Machine Translation; Tree Transducers; Machine Learning

Páginas: 15

Código DOI: 10.21528/lmln-vol11-no1-art2

Artigo em PDF: vol11-no1-art2.pdf

Arquivo BibTex: vol11-no1-art2.bib