Machine Translation Is not just about Post-Editing

by John Moran

In the past few years, the author has contributed chapters to books and material to conferences about ‘post-editing’ ^{1, 2}. However, the term may give a blinkered view of the assistance MT can provide, since observations show that using speech recognition software alongside MT as a passive reference to reduce active terminology research, and even inspire new ideas, can massively increase the number of words per hour that can be translated. So a more optimistic view would be that more could be done to benefit from translation memory at a sub-sentence level. That optimism is now well-founded: a startup, called Lilt, has cracked open the auto-suggest nut, and it seems to be the first commercial system to produce adaptive MT that learns as translators work.

A favourite feature in Lilt is its MT-driven typing assistance. It is a kind of auto-complete mechanism that tries to finish each sentence. These proposals change depending on where the translator is in the sentence.

It feels more like a typing turbo-boost than the find-and-fix-mistakes or post-editing modus operandi imposed by conventional full-sentence MT systems. The ‘deep’ in ‘deep interactive’ refers to the fact that it uses statistically aligned text on the server to provide suggestions rather than just providing a few words of type-ahead from a single MT proposal as found in some other CAT tools.

But that is not its only innovation. It is also useful for texts that contain formatting tags. Translators who type at a keyboard dislike formatting tags but those who dictate using speech recognition software hate them with a passion as they interrupt flow. Lilt solves this problem by using word alignment statistics to guess the correct position of tags in the target segment after it has been committed. Translators need not concern themselves with them during translation, though it is a good idea to check they have been placed correctly during bilingual review in the CAT tool or in the final formatted document.

Many readers of Circuit will be glad to hear that Lilt works well for French to English and English to French. Here are a few tips and tricks that improve the odds of getting good results:

1) Upload the largest translation memory you can find that is relevant to the files you are translating – 20,000 segments is the minimum translation memory size we advise, though you may be lucky if you translate against the baseline or out-of-the-box model. This model is produced using publicly available bilingual data, as data security is an issue for most translators and agencies. The terms and conditions specify that all uploaded data is only to be used for the benefit of the translator or agency uploading it. Unlike Google Translate and Microsoft Translator Hub, Lilt does not mine linguistic data to improve the quality of the MT and so client Non-Disclosure Agreements are respected.

2) Stick at it for a while. A good trial period lasts a few days but a few weeks is better. The machine translation adapts as you translate but adaptation is a two-way street. It takes time to get used to the typing assistance.

3) If you are a user of Dragon Naturally Speaking, as Lilt is browser-based, you will need to deactivate the “Use the dictation box for unsupported applications” checkbox under Tools > Options > Miscellaneous. Also, in English, the command “Press Control Enter” is a convenient way to move from one segment to the next, even if it is probably slightly slower than the equivalent keyboard shortcut. Translators who translate into other languages supported by Dragon will have to refer to their documentation for the equivalent command.

Machine translation is not a threat to translators as long as the technology is used when it is sensible to do so.

John Moran studied Computer Science, Linguistics and German at Trinity College, Dublin. In the 1990s he worked as a lecturer in technical translation and co-founded Transpiral, a technical translation agency based in Dublin, Ireland. Since then he has worked as a German to English technical translator, software engineer and publishing researcher specialized in measuring how linguistic technologies impact translator productivity in CAT tools.

1. Moran, J., D. Lewis, and C. Saam, C. (2014). “Analysis of post-editing data: A productivity field test using an instrumented CAT tool.” In L. Balling, M. Carl, S. O’Brien , M. Simard & L. Specia (eds.), Post-Editing: Processes, Technology and Applications (pp. 99–112). Newcastle upon Tyne, Cambridge Scholars Publishing.

2. Moran, J., C. Saam, and D. Lewis (2014). “Towards desktop-based CAT tool instrumentation.” In Third Workshop on Post-Editing Technology and Practice (pp. 99–112). Vancouver: AMTA.

Partage :