You are here

Global view on Machine Translation

Main activities

Our main research interest is the symbiotic integration of human and machine translation (MT).

News

The HLT group at FBK, in conjunction with the the ICT International Doctorate School of the University of Trento, is pleased to announce the availability of three fully-funded 3-year PhD positions

The 2014 edition of the MT Marathon, organised by the HLT group at FBK, will start next week

We are glad to acknowledge a financial gift from eBay Inc. to Marcello Federico in order to support research activities by the PhD students working in the HLT-MT research unit. 

Thanks to Ondrej for sending us this nice picture!

Our students Josè Camargo de Souza, Nick Ruiz and Amin Farajian participated to the Google NLP PhD Summit 2015, held at Google Zurich on 23 - 26 September 2015.

Marcello Federico presented the MMT project at the TAUS Annual Conference, 12-13 October 2015, San Josè, California.

Two papers were presented at the MT Summit XV (Miami, Florida, October 30 – November 3, 2015).

Marco Trombetti and Marcello Federico presented a tutorial on machine translation and the MateCat tool at the 37th Translating and the Computer Conference, in London. 

Presentation by Josè at the Second Italian Conference on Computational Linguistics of the work:

Hamed Zamani, Josè G. C. de Souza, Matteo Negri, Marco Turchi and Daniele Falavigna

"Reference-free and Confidence-independent Binary Quality Estimation for Automatic Speech Recognition"

Sebastian Stüker presents on behalf of Marcello the overview paper:

Mauro Cettolo, Jan Niehues, Sebastian Stüker, Luisa Bentivogli and Marcello Federico

Report on the 12th IWSLT Evaluation Campaign, IWSLT 2015

12th International Workshop on Spoken Language Translation, Da Nang, Vietnam, 2015

A. Bisazza, M. Federico, "A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena", Computational Linguistics, 2016. (Accepted for publication)

"WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words"    
Luisa Bentivogli, Mauro Cettolo, M. Amin Farajian, Marcello Federico.

"The IWSLT Evaluation Campaign: Challenges, Achievements, Future Directions"
Luisa Bentivogli, Marcello Federico, Sebastian Stüker, Mauro Cettolo, Jan Niehues

"An Unsupervised Method for Automatic Translation Memory Cleaning"
Masoud Jalili Sabet, Matteo Negri, Marco Turchi and Eduard Barbu

"Modern MT: A New Open-Source Machine Translation Platform for the Translation Industry"
U. Germann,  E. Barbu, L. Bentivogli, N. Bertoldi, N. Bogoychev, C. Buck, D. Caroselli, L. Carvalho, A. Cattelan, R. Cattoni, M. Cettolo, M. Federico, B. Haddow, D. Madl, L. Mastrostefano, P. Mathur, A. Ruopp, A. Samiotou, V. Sudharshan, M. Trombetti, J. van der Meer

TMOP: A TOOL FOR UNSUPERVISED TRANSLATION MEMORY CLEANING
Masoud Jalili Sabet, Matteo Negri, Marco Turchi, José G. C. de Souza and Marcello Federico

TRANSCRATER: A TOOL FOR AUTOMATIC SPEECH RECOGNITION QUALITY ESTIMATION
Shahab Jalalvand, Matteo Negri, Marco Turchi, José G. C. de Souza and Falavigna Daniele

Neural versus Phrase-Based Machine Translation Quality: a Case Study
Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo and Marcello Federico

An Arabic-Hebrew parallel corpus of TED talks
Mauro Cettolo

Online Automatic Post-Editing across Domains
Rajen Chatterjee, Gebremedhen Gebremelak, Matteo Negri, Marco Turchi

Creating a Ground Truth Multilingual Dataset of News and Talk-Show Transcriptions through Crowdsourcing
Rachele SprugnoliGiovanni MorettiLuisa BentivogliDiego Giuliani 

Title of the thesis: "Adaptive Quality Estimation for Machine Translation and Automatic Speech Recognition"
Advisor: Matteo Negri
Co-advisors: Marco Turchi and Marcello Federico

"Continuous Learning from Human Post-edits for Neural Machine Translation"
Marco Turchi, Matteo Negri, Amin Farajian and Marcello Federico

"Linguistically Motivated Vocabulary Reduction for Neural Machine Translation"
Duygu Ataman, Matteo Negri, Marco Turchi and Marcello Federico

"MMT: New Open Source MT for the Translation Industry"
Nicola Bertoldi, Roldano Cattoni, Mauro Cettolo, Amin Farajian, Marcello Federico, Davide Caroselli, Luca Mastrostefano, Andrea Rossi, Marco Trombetti, Ulrich Germann, David Madl

"Guiding Neural Machine Translation Decoding with External Knowledge"
Rajen Chatterjee, Matteo Negri, Marco Turchi, Marcello Federico, Lucia Specia, and Frédéric Blain

"Multi-Domain Neural Machine Translation through Unsupervised Adaptation"
M. Amin Farajian, Marco Turchi, Matteo Negri, Marcello Federico

"MMT: New Machine Translation Technology for CAT Tools"
Luisa Bentivogli, Marcello Federico

“Can Monolingual Embeddings Improve Neural Machine Translation?”
M.A. Di Gangi, M. Federico

“Multilingual Neural Machine Translation for Low Resource Languages”
S.M. Lakew, M.A. Di Gangi, M. ​Federico

"Assessing the Use of Terminology in Phrase-Based Statistical Machine Translation for Academic Course Catalogues Translation"
R. Scansani, M. Federico, L. Bentivogli

"Improving Zero-Shot Translation of Low-Resource Languages"
Surafel Melaku Lakew, Marcello Federico, Matteo Negri, Marco Turchi, and Quintino Francesco Lotito

"FBK’s Multilingual Neural Machine Translation System for IWSLT 2017"
Surafel Melaku Lakew, Marcello Federico, Matteo Negri, Marco Turchi, and Quintino Francesco Lotito

"Monolingual Embeddings for Low Resourced Neural Machine Translation"
Mattia Antonino Di Gangi and Marcello Federico

Projects

EU-BRIDGE aims at developing automatic transcription and translation technology that will permit the development of innovative multimedia captioning and translation services of audiovisual documents between European and non-European languages. The project will provide streaming technology that can convert speech from lectures, meetings, and telephone conversations into the text in another language.

MateCat pushes what is considered the new frontier of Computer Assisted Translation (CAT) technology, that is, how to effectively and ergonomically integrate Machine Translation (MT)  within the human translation workflow. While today MT is mainly trained with the objective of creating the most comprehensible output, in MateCat we target MT technology that will minimize the translator’s post-edit effort.

MosesCore aims to encourage the development and usage of open source machine translation tools. It will achieve this by organising:

CRACKER is a Machine Translation (MT) research initiative involving 7 major universities and research institutions (DFKI, FBK, Charles University, University of Edinburgh, University of Sheffield, Athena Research and Innovation Center in Information, Communication and Knowledge Technologies, and ELDA).

QT21 is a major European Machine Translation (MT)  research and innovation initiative including 11 universities and research institutions (DFKI, FBK, RWTH, University of Amsterdam, DCU, University of Edinburgh, KIT, CNRS, Charles University, HKUST, and University of Sheffield) and 3 industry partners (TAUS, text&form, TILDE).

The goal of MMT is to deliver a language independent commercial online translation service based on a new open-source machine translation distributed architecture. MMT does not require any initial training phase. Once fed with training data MMT will be ready to translate. MMT de-facto will merge translation memory and machine translation technology into one single product. Quality of translations will increase as soon as new training data are added. MMT manages context automatically so that it will not require building domain specific systems.

Our unit is glad to acknowledge a financial gift from eBay Inc. to Marcello Federico that will support research in machine translation by his PhD students during 2015. The gift has been actually employed to co-fund the yearly bursaries of the students Prashant Mathur and Jose Camargo de Souza, as well their travel expenses for an extended visit at eBay's labs in San Jose, California.  

We are glad to acknowledge a gift by Translated Srl to Marcello Federico in order to support PhD students working on open source projects during 2016.  

Technology

A statistical machine translation system

A toolkit featuring algorithms and data structures to store and access very large n-gram language models

An extension of MGIZA++, which allows to align sentence pair in an online mode

A Machine Translation Dataset Annotated with Binary Quality Judgements

Adaptive Quality Estimation tool for Machine Translation

En-Ita corpus with annotated bilingual terms in IT domain

A ready-to-use version for MT research purposes of the multilingual transcriptions of TED talks

English-Italian Word Alignment Gold Standard