Partage :

Dictation and MT: Strange but productive bedfellows

By John Moran

Most translators are interested in translator productivity, but my interest is on three fronts. My day job entails research into translator productivity with ADAPT, a large language technology-focused research centre in Ireland. I also have ties to a small translation agency I founded back in the 90s called Transpiral, so I am concerned with the productivity of other translators. For agency customers we specialize in German<>English technical translation and, more recently, Irish Gaelic for government customers. Finally, I work irregularly as a reviewer and translator for German to English.

Arising from the research, ADAPT licenses an analysis suite called the iOmegaT Translator Productivity Testbench to large companies like Welocalize and Hewlett Packard, so I get to see productivity data on a larger scale than would be possible in a small agency like Transpiral. Initially, full-sentence MT was our main focus, but it turns out there is more to translator productivity than this. Various forms of interactive MT can be useful as a kind of souped-up auto-complete. Speed gains tend to be undramatic, but translators are presented with fragments of text so the MT is less likely to have a negative impact on writing style.

Dramatic productivity gains

However, in terms of productivity gains and applications across a range of text types, the most interesting computational linguistic technology I am aware of is dictation software. Some older translators have been dictating for manual transcription by typists from before the days of dictation software, so it was natural for them to use it. Some start to use it due to injuries like repetitive strain injury, so it probably is an effective preventative measure. However, they continue to use it afterward, as the productivity gains can be dramatic. Economically — unlike for high-utility customized MT — the productivity gain largely favours the individual translator who brings the technology to the table.

Unfortunately, Irish Gaelic is one of a long list of languages with no commercial speech recognition software, but luckily our core target languages of German and English are very accurate languages for the market-leading desktop-based automatic speech recognition application called Dragon Naturally Speaking (DNS), by Nuance Communications (it's called Dragon Dictate on MacOS). DNS is also available for Dutch, Spanish, French, Italian and Japanese, but we don’t have much data on how accurate those languages are. For translators who translate into languages DNS does not support, Kevin Lossner’s blog Translation Tribulations contains information on how to use Nuance’s online speech recognition services via Android and iOS devices for many more languages.

Recently Dragos Ciabanu, a translation studies researcher at the University of Leeds, published a paper in the online journal Tradumatica titled “Of Dragons and Speech Recognition Wizards and Apprentices.” He analyzed the results of a questionnaire to which just over 40 translators who dictate responded.The median productivity gain was 35%, but many translators reported that it had doubled their productivity. This mirrors our own experience with German and English, particularly for technical content we are familiar with and for easier work that does not require much terminology research or contain many formatting tags.

Though dictation software seems to clash with auto-suggest features in some CAT tools, it can work well alongside full-sentence MT. Even if the word order is poor, it can save on online terminology research or concordancing. As opposed to the post-editing use case, here MT acts as a visual terminology reference — ideally somewhere to the side (not in the target segment). If our Non-Disclosure Agreement permits it, I normally use DNS alongside Microsoft Translator Hub, as it can be customized and is strong on German to English IT material (my specialist area). For data security-sensitive customers, we sometimes use DoMT from Precision Language Tools, an inexpensive desktop statistical MT system based on the open-source Moses toolkit.

A few tips

Translators who translate into two languages, e.g. Dutch and English, normally buy the non-English (in this example, Dutch) version of the DNS Premium Edition, as it also contains English for the same price. Also, it is a good idea to train DNS’s language model on the target side of the project translation memory so that it knows what terminology to expect. I do this by creating a new MemoQ project, adding the tmx file as a source file to translate, exporting it as a two-column table in MS Word and finally deleting the source side.

The Translation Tribulations blog also contains interesting guest blogs on to how to use dictation software built into MacOS to dictate into languages like Portuguese, and a piece by Jim Wardell on how to evaluate CAT tool compatibility with DNS. Here, MemoQ, Wordfast Classic and DVX score high, but Trados Studio and Java-based CAT tools like Wordfast Pro and OmegaT prevent translators from using some of DNS’s more sophisticated features, such as correcting misrecognitions in the CAT tool editor to improve accuracy.

There are many other tips and tricks when it comes to working with dictation software. Translators who are interested can read more about this on the Translators who use Speech Recognition group and my own, somewhat whimsical, Technophile Translators group on Facebook. sr_for_translators on Yahoo Groups and are also good sources of technical advice.


John Moran graduated with a degree in computer science, linguistics and German in 1997. He is a translation technology consultant and research fellow at ADAPT — a language technology research centre at Trinity College Dublin — where he is working on a Ph.D. in computer science on the topic of measuring the impact of translation technologies on translator productivity in running translation projects.

Partage :