Machine translation is put to test at Wikipedia hackathon

From 19–20 April 2017 the Estonian Language Technology 2017 conference will be held in Tallinn. On 18 April, the day before the conference, several interesting workshops are organised. One of these is the machine translation hackathon, which tests how machines can contribute to the development of Wikipedia.

The Million+ project, launched on the Mother Language Day, aims to increase the volume of the Estonian Wikipedia to one million articles. “How can translation machines contribute to this goal, preserving the linguistic and content quality at the same time? This is the problem to which solutions are searched at the hackathon,” said Kadri Vare, one of the organisers of the conference.

At the hackathon, the newest neural machine translation models are used, which offer fluent translation into Estonian for postediting. Participants are translators who assess the different translation methods, as well as editors, who give blind evaluaton to postedited machine translation and human translation.

One machine translation programme that will be used is the machine translation project KaMa (Kasutatav Eesti Masintõlge) of the UT Institute of Computer Science, developed by Mark Fišel, head of the UT Chair of Language Technology. Also the translation software by Tilde Eesti OÜ, who provides machine translation service on the private market, is tested.

A Python software library workshop will also take place. Python offers several functionalities for processing texts in the Estonian language. In addition, a workshop is organised by Estonia’s first language technology start-up TEXTA, which is a toolkit for exploring and analysing free textual (big) data. In the course of the workshop TEXTA is used to explore the document register of a ministry in Estonia. “For example, who writes letters to the ministry most of all, and on which topics, and to what extent and what kind of personal information can be found in the published documents, or what kind of standard answers are used in official communication,” Vare said.

On 19 April the programme includes an overview of the current National Programme for Estonian Language Technology and an introduction of the new programme starting next year. It is also possible to get familiar with language technology software and applications. 20 April is the day of language resources projects in the Institute of the Estonian Language, held in parallel with the traditional spring conference in applied linguistics.

Everyone interested is welcome! The conference and workshops are free. Registration is required at www.keeletehnoloogia.ee.

Additional information:

Sirli Zupping, Million+ project manager, sirli.zupping@ut.ee
Kadri Vare, programme coordinator of the Center of Estonian Language Resources, kadri.vare@ut.ee

Viivika Eljand-Kärp
Press Officer of the UT
Phone: +372 737 5683
Mobile: +372 5354 0689