Who wouldn’t want a free translation from any language to another, delivered in two shakes of a lamb’s tail? Seems like many of us do, or otherwise the online world wouldn’t be filled with various translation tools, each being described as the best solution for all translation problems and needs. However, for a professional translator tools like Google Translate and Bing are, if not a nightmare, at least a source of endless debates, differing opinions, and sometimes even a direct cause for a decreased work flow. Indeed, why should you pay for a professional translator if you can get the job done for free? Well, there is nothing wrong with this train of thought as such, as long as you are ready to sacrifice quality. Or is it just a myth? Does the use of free translation tools immediately mean decreased quality, even in the year 2014 when technologies develop and the language industry grows online faster than ever? To examine how good (or bad) some popular online translation tools actually are we revisited our previous study and set ourselves a new challenge.
Two tools, five languages, five genres
To assess the quality of translations produced by machine translation tools, our team of language experts compiled a small specialized corpus (for more detailed information on building corpora, see for example Baker (2013), Koester (2010) and Lindquist (2011)) that consisted of 50 English sentences drawn from publically available sources. To ease the comparison and give some more insight to the relatively small amount of linguistic data, the corpus was further divided into five genre-specific sub-corpora: Twitter, literature, news headlines, food recipes and legal texts. The ten sentences in each of the five genre categories were then translated into five languages – Finnish, Swedish, French, Russian and Dutch – with the help of two online translation tools, Google Translate and Bing. These tools were chosen because they displayed the widest combination of languages available for translation; especially Finnish seems to be an option rarely available in machine translation, and thus prevented us from using more tools for the study.
After compiling the corpus and translating the sentences, each translated sentence was ranked and classified in a scale from 0 to 4, where zero marks the worst and four the best possible grade. The criteria to obtain a zero mark included for example several untranslated words, severe grammar mistakes and overall incomprehensibility. To get a four, the translation had to be close to what a human translator would produce, yet some stylistic or contextual problems were allowed. Other marks fell in-between these two extremes.
From Twitter to national law: does machine translation stand any chance?
Overall, all languages ranked quite low when compared with translations produced by humans. Swedish acquiring 120 points out of the possible 200 was a clear winner in our quality comparison, followed closely by Dutch, Russian and French. Finnish was left far behind with its 60 points. It can also be observed that in the battle between Google Translate and Bing, the former was considered to produce translations of better quality in all five of the examined languages and genres.
Figure 1. Results organized according to languages (max. points attainable 200 per tool)
Figure 2. Results organized according to genres (max. points attainable 200 per tool)
However, given the small amount of the analyzed data, the results can’t be generalized too much: other sentences could have provoked the opposite results. Yet, it is not likely that the order of the languages would be random. The highest ranked Swedish as well as the second-ranked Dutch are both languages of Germanic origin, similarly to the source language English while the group’s backmarker, Finnish, differs drastically from English in terms of syntax and word order. The middle position holders French and Russian are also of non-Germanic origin, which would explain their lower scores compared to the Germanic ones. Nevertheless, given the data available in these languages online and their immense popularity among language learners and lovers, it’s not surprising that they rank higher than Finnish.
Figure 3. Quality scores of Twitter according to language (max. points attainable 40 per tool)
As Figure 3 demonstrates, Twitter as a genre acquired the overall lowest quality score. Some variation existed between the two tools: in Finnish and Dutch, Bing was praised as producing better quality work while the remaining three languages preferred Google Translate. The low score could be due to the structurally complicated nature of Twitter posts. A lot of information is condensed into a small number of characters which results in the use of abbreviations, hashtags and other special symbols. Machine translation tools seem to be unable to process these trends. Interestingly enough, a relatively clear pattern occurred between the two examined tools in the treatment of hashtags: for example, in the Finnish data set no changes were made to #teaching by Bing whereas Google Translate changed it into #opetus. The same trend occurred throughout the corpus.
Figure 4. Quality scores of literary texts according to language (max. points attainable 40 per tool)
Literature scored second-lowest in terms of quality, and to many this probably comes as a no surprise. Literary language often contains poetic expressions, complicated structures and other effects that make literary pieces so original. What is interesting in our findings, though, is that this time the Germanic languages Dutch and Swedish scored lower than non-Germanic French and Russian. Overall, Google Translate takes a win in this genre as well, leaving Bing behind in all but one language.
Figure 5. Quality scores of news headlines according to language (max. points attainable 40 per tool)
Similarly to the two aforementioned genres, news headlines provoked some interesting variation between the languages. Somewhat surprisingly French scored the lowest, while other three big languages – Swedish, Russian and Dutch – produced some relatively good results. It seems that the incomplete vocabulary and short sentences typical to news discourse resulting in ungrammaticality lowered the ranking for French. And once again, Google Translate takes the credit for a better-working tool.
Figure 6. Quality scores of food recipes according to language (max. points attainable 40 per tool)
The sub-corpus consisting of sentences from food recipes was ranked as the genre category in which the translation tools produced the best results. Again dominated by Google Translate, and what is especially noteworthy is that in Dutch it produced fully comprehensible results that were considered almost as good as those produced by a human translator. The good quality of translations in this genre could be due to simple sentence structures, such as the frequent use of infinitive structures in English. However, unlike in some other categories, the lexis proved to be the Achilles heel for the tools: they failed to distinguish polysemic words, such as to serve and icing, which lowered the score especially for Finnish.
Figure 7. Quality scores of legal texts according to language (max. points attainable 40 per tool)
Legal texts provided another triumph for Google Translate: it was considered better than Bing in all five languages. However, the overall results demonstrate that the achieved results were relatively low in all languages besides Swedish and French. The low scores are likely to be due to the complicated, field-specific terminology and sentence structures often encountered in legal discourse. Interestingly enough, in this genre several sentences were given a four, while equally many deserved only a zero or one. Thus, drawing any definite conclusion from the patterns in this genre is not possible.
Food for thought
As our small study demonstrates, there is both good and bad in machine translation and in the tools generated for this purpose. It goes without any doubt that, as they are now, the machine translation tools could not replace human translators. This is demonstrated for example by the impossibility for the translation tools to separate different contexts and polysemic words or to understand complicated syntactic structures, such as case endings. However, it can’t be said that these tools would be completely bad either: they worked relatively well with short and simple sentences and instructive texts.
While using online translation tools might work quite well for getting the gist of a news article from a foreign news paper or for preparing that mouth-watering cake of which you couldn’t find a recipe in your own language, it is probably the safest to leave legal and literature translations for human translators. Or what do you think: would you prioritize a product of a machine to a text with a human touch?
Baker, P. (2013). Using corpora in discourse analysis. London: Bloomsbury.
Koester, A. (2010). Building small specialised corpora. In M. McCarthy & A. O’Keeffe (Eds.), The Routledge handbook of corpus linguistics (pp. 66-79). London: Routledge.
Lindquist, H. (2011). Corpus linguistics and the description of English. Edinburgh: Edinburgh University Press.