Doctoral defence: Elizaveta Yankovskaya “Quality estimation through attention”

On 17 June at 16:15 Elizaveta Yankovskaya will defend her doctoral thesis “Quality estimation through attention” for obtaining the degree of Doctor of Philosophy (in Computer Science).

Supervisor:
Prof. Mark Fišel, University of Tartu

Opponents:
Prof. Dr. Rico Sennrich, University of Zurich (Switzerland)
Dr. Chi-Kiu Lo, National Research Council Canada (Canada)

Summary
Machine translation has become a part of the life of not only linguists and professional translators, but almost everyone. Most people who have used machine translation have come across funny and sometimes completely incorrect translations that turn the meaning of a sentence upside down. Thus, apart from a machine translation model, we need to use a scoring mechanism that informs people about the quality of translations. Of course, professional translators can assess and, if necessary, edit the machine translation output. However,  using  human  annotations  to  evaluate  translations  of online machine translation systems is extremely expensive and impractical. That is why automated systems for measuring translation quality are a crucial part of the machine translation pipeline.

Quality Estimation aims to predict the quality of machine translation output at run-time without using any gold-standard human annotations. In this work, we focused on Quality Estimation methods and explored the distribution of attention—one of the internal parameters of modern neural machine translation systems—as an indicator of translation quality. We first applied it to machine translation models based on recurrent neural networks (RNNs) and analyzed the performance of proposed methods for unsupervised and supervised tasks. Since transformer-based machine translation models had supplanted RNN-based, we adapted our approach to the attention extracted from transformers. We demonstrated that attention-based methods are suitable for both supervised and unsupervised tasks, albeit with some limitations. Since getting annotation labels is quite expensive, we looked at how much annotated data is needed to train a quality estimation model.

The defence will be held in Zoom (Meeting ID: 955 3734 8678, Passcode: ati).

CERN

Estonia’s full CERN membership expands University of Tartu’s research collaboration opportunities

Pilt laboris

A new approach has the potential to double the efficiency of energy storage devices.

Koroonaviirus

European Commission funds research to tackle long-term COVID-19 health impact