You are currently browsing the bab.la blog posts tagged: Bing Translate


BING TRANSLATOR vs. GOOGLE TRANSLATE – the Right Text for the Right Tool

Who wouldn’t want a free translation from any language to another, delivered in two shakes of a lamb’s tail? Seems like many of us do, or otherwise the online world wouldn’t be filled with various translation tools, each being described as the best solution for all translation problems and needs. However, for a professional translator tools like Google Translate and Bing are, if not a nightmare, at least a source of endless debates, differing opinions, and sometimes even a direct cause for a decreased work flow. Indeed, why should you pay for a professional translator if you can get the job done for free? Well, there is nothing wrong with this train of thought as such, as long as you are ready to sacrifice quality. Or is it just a myth? Does the use of free translation tools immediately mean decreased quality, even in the year 2014 when technologies develop and the language industry grows online faster than ever? To examine how good (or bad) some popular online translation tools actually are we revisited our previous study and set ourselves a new challenge.

 

Two tools, five languages, five genres

To assess the quality of translations produced by machine translation tools, our team of language experts compiled a small specialized corpus (for more detailed information on building corpora, see for example Baker (2013), Koester (2010) and Lindquist (2011)) that consisted of 50 English sentences drawn from publically available sources. To ease the comparison and give some more insight to the relatively small amount of linguistic data, the corpus was further divided into five genre-specific sub-corpora: Twitter, literature, news headlines, food recipes and legal texts. The ten sentences in each of the five genre categories were then translated into five languages – Finnish, Swedish, French, Russian and Dutch – with the help of two online translation tools, Google Translate and Bing. These tools were chosen because they displayed the widest combination of languages available for translation; especially Finnish seems to be an option rarely available in machine translation, and thus prevented us from using more tools for the study.

After compiling the corpus and translating the sentences, each translated sentence was ranked and classified in a scale from 0 to 4, where zero marks the worst and four the best possible grade. The criteria to obtain a zero mark included for example several untranslated words, severe grammar mistakes and overall incomprehensibility. To get a four, the translation had to be close to what a human translator would produce, yet some stylistic or contextual problems were allowed. Other marks fell in-between these two extremes.

 

From Twitter to national law: does machine translation stand any chance?

Overall, all languages ranked quite low when compared with translations produced by humans. Swedish acquiring 120 points out of the possible 200 was a clear winner in our quality comparison, followed closely by Dutch, Russian and French. Finnish was left far behind with its 60 points. It can also be observed that in the battle between Google Translate and Bing, the former was considered to produce translations of better quality in all five of the examined languages and genres.

 

Folie1

Figure 1. Results organized according to languages (max. points attainable 200 per tool)

Folie2

Figure 2. Results organized according to genres (max. points attainable 200 per tool)

However, given the small amount of the analyzed data, the results can’t be generalized too much: other sentences could have provoked the opposite results. Yet, it is not likely that the order of the languages would be random. The highest ranked Swedish as well as the second-ranked Dutch are both languages of Germanic origin, similarly to the source language English while the group’s backmarker, Finnish, differs drastically from English in terms of syntax and word order. The middle position holders French and Russian are also of non-Germanic origin, which would explain their lower scores compared to the Germanic ones. Nevertheless, given the data available in these languages online and their immense popularity among language learners and lovers, it’s not surprising that they rank higher than Finnish.

Twitter

Folie3

Figure 3. Quality scores of Twitter according to language (max. points attainable 40 per tool)

As Figure 3 demonstrates, Twitter as a genre acquired the overall lowest quality score. Some variation existed between the two tools: in Finnish and Dutch, Bing was praised as producing better quality work while the remaining three languages preferred Google Translate. The low score could be due to the structurally complicated nature of Twitter posts. A lot of information is condensed into a small number of characters which results in the use of abbreviations, hashtags and other special symbols. Machine translation tools seem to be unable to process these trends. Interestingly enough, a relatively clear pattern occurred between the two examined tools in the treatment of hashtags: for example, in the Finnish data set no changes were made to #teaching by Bing whereas Google Translate changed it into #opetus. The same trend occurred throughout the corpus.

Literature

Folie4

Figure 4. Quality scores of literary texts according to language (max. points attainable 40 per tool)

Literature scored second-lowest in terms of quality, and to many this probably comes as a no surprise. Literary language often contains poetic expressions, complicated structures and other effects that make literary pieces so original. What is interesting in our findings, though, is that this time the Germanic languages Dutch and Swedish scored lower than non-Germanic French and Russian. Overall, Google Translate takes a win in this genre as well, leaving Bing behind in all but one language.

News headlines

Folie5

Figure 5. Quality scores of news headlines according to language (max. points attainable 40 per tool)

Similarly to the two aforementioned genres, news headlines provoked some interesting variation between the languages. Somewhat surprisingly French scored the lowest, while other three big languages – Swedish, Russian and Dutch – produced some relatively good results. It seems that the incomplete vocabulary and short sentences typical to news discourse resulting in ungrammaticality lowered the ranking for French. And once again, Google Translate takes the credit for a better-working tool.

Food recipes

Folie6

Figure 6. Quality scores of food recipes according to language (max. points attainable 40 per tool)

The sub-corpus consisting of sentences from food recipes was ranked as the genre category in which the translation tools produced the best results. Again dominated by Google Translate, and what is especially noteworthy is that in Dutch it produced fully comprehensible results that were considered almost as good as those produced by a human translator. The good quality of translations in this genre could be due to simple sentence structures, such as the frequent use of infinitive structures in English. However, unlike in some other categories, the lexis proved to be the Achilles heel for the tools: they failed to distinguish polysemic words, such as to serve and icing, which lowered the score especially for Finnish.

Legal texts

Folie7

Figure 7. Quality scores of legal texts according to language (max. points attainable 40 per tool)

Legal texts provided another triumph for Google Translate: it was considered better than Bing in all five languages. However, the overall results demonstrate that the achieved results were relatively low in all languages besides Swedish and French. The low scores are likely to be due to the complicated, field-specific terminology and sentence structures often encountered in legal discourse. Interestingly enough, in this genre several sentences were given a four, while equally many deserved only a zero or one. Thus, drawing any definite conclusion from the patterns in this genre is not possible.

 

Food for thought

As our small study demonstrates, there is both good and bad in machine translation and in the tools generated for this purpose. It goes without any doubt that, as they are now, the machine translation tools could not replace human translators. This is demonstrated for example by the impossibility for the translation tools to separate different contexts and polysemic words or to understand complicated syntactic structures, such as case endings. However, it can’t be said that these tools would be completely bad either: they worked relatively well with short and simple sentences and instructive texts.

While using online translation tools might work quite well for getting the gist of a news article from a foreign news paper or for preparing that mouth-watering cake of which you couldn’t find a recipe in your own language, it is probably the safest to leave legal and literature translations for human translators. Or what do you think: would you prioritize a product of a machine to a text with a human touch?

 

Iiris Koskimies

 

References:

Baker, P. (2013). Using corpora in discourse analysis. London: Bloomsbury.

Koester, A. (2010). Building small specialised corpora. In M. McCarthy & A. O’Keeffe (Eds.), The Routledge handbook of corpus linguistics (pp. 66-79). London: Routledge.

Lindquist, H. (2011). Corpus linguistics and the description of English. Edinburgh: Edinburgh University Press.

 

 

PROMT, SYSTRAN, GOOGLE, BING – Has the age of machine translation finally arrived?

Checking what online machine translation service is best for you

Some claim that learning foreign languages is a waste of time, that translators are soon to disappear from the professional market and that technology can get you from language A to language B in no time, for free and without any trouble. Myth or reality? The debate is open; however it is true that technology can be a helpful tool and a myriad of online translation software – also known as machine translation system or “MT” – can be found on the web. So many that it may be challenging to find the right MT for the right text. This is why we have a run a cross-comparative analysis between four MT – Systran’s Babel Fish, Google Translate, PROMT and Microsoft’s Bing Translator and using five languages.

The set-up: Turn on the machines!
Machine translation technology is a complex science. There are many types of MT, among others statistical, hybrid, rule-based or sentence-based. We decided to focus on the user’s perspective and test the four MT efficiency, leaving the technical part aside. If you wish to learn more about the different types of MT and how they work, please check Hutchins and Somers.
In order to carry out this experiment, we gathered a corpus of 500 sentences, submitted to the aforementioned four MT. Ten language combinations were put on trial – 50 sentences per combination. These are English-French, English-German, English-Spanish, English-Italian, English-Portuguese and reversed. Each sample of 50 sentences was translated via each of the four MT and their respective results evaluated on a scale from 0 to 3. 0 for untranslated or not understandable results, 1 when the meaning had to be guessed, 2 when the gist was correct with grammatical mistakes and 3 for translation that would almost compete with the work of a professional translator. Each MT could gather up to 120 points per language combination evaluated.
Because we wanted the corpus of sentences to be as well-balanced as possible, we selected 5 sentences of 10 different areas in each batch. These are advertisement, business, financial, gastronomic, legal, literature, medical, religious, slang and Tweet. Each of those features their own difficulties when it comes to translation. This is why we designed a breakdown of their respective scores for each language combination in order to find out which domains are well-handled by MT and which ones are not.

Overall results: Who wins the battle?
Here are the results that we obtained:


Chart 1. Comparison between 4 MT across 10 language combinations
Unit: points (out of a maximum attainable of 120)

The results show interesting trends. Overall, Google Translate seems to be providing a better translation than the other MT, followed by Bing Translator and Systran and PROMT at the end. The only instance where Bing scored highest is for Spanish-English. Some language combinations such as French-English, Italian-English or Portuguese-English managed to spawn relatively good results; Google Translate gathered over 80 points out of 120 in these combinations – close to 75% of perfect translation. However, Spanish MT results were across all four of them very weak. Likewise, German turns out to be more challenging to translate, as source text and as target text alike. Winner: Human translation. Runner-up: Google Translate.

The chart below shows how well each MT performed across all languages and how many point they gathered out of a total of 1,200. Because some language combinations, as well as some areas of translation, are so challenging the overall results are very average: only a third of the text produced is correct for PROMT and Sytran, about half of it for Bing and Google. There is still significant progress to be done before MT can be used with perfect reliability and accuracy.


Chart 2. Total of points gathered across all languages
Unit: points (out of a maximum attainable of 1,200)

Digging deeper: Results across subjects areas
From the graphs below, we can identify some trends. On the “winning” side, medical translation (probably due to the straight-forward descriptive texts) is what can be best handled by MTs, being in the top 3 in 7 out of 10 language combinations. Advertisements are generally well translated as well – 6 times in the top 3, once in the bottom 3. This is generally because ads use simple sentences easy to remember. Finally, recipes (gastronomic) appear 5 times in the top 3 twice in the bottom 3. This area did especially well when English is the source language as recipes are written in the imperative mode in English, an easy mode for MT to handle.
Looking at the bad players of the experiment, literary translation is, without much surprise, the worst case scenario for MT: elaborated syntax, rare words and unusual figures of speech, among others. As a result, literature made to the bottom 3 in 9 of the combinations. 7 times in the bottom free – both slang and Tweets. Slang uses metaphorical speech, which is usually translated literally by the MT, resulting in a nonsensical segment. Tweets contain exclusive elements the other sentences do not have, such as colloquial abbreviations, URL or symbols (#, @). These sometimes induced MT to make mistakes, in addition to the often bad quality of language used on Twitter: abbreviations, slang words, etc. See the detailed results on the graphs below.
Regardless of what you need to translate, do not expect perfection from MT. They can and do produce intelligible results in most cases, given that the source text is basic, well-written and not too ambiguous, if at all. Out of the four MT tested, Google Translate turned out to be the best one to use in most occurrences. Texts that are generally speaking factual (medical, gastronomical) are MT-friendlier than creative writing (literature, Tweets). Plain texts and simple sentences are consequently easier for MT as they avoid multiple possible translations – this is known as controlled language. Finally, we would like to advise you against relying solely on MT for important subjects or published texts – that could bring you serious embarrassments!


Chart 3. Percentage of successful translation for English-French


Chart 4. Percentage of successful translation for French-English


Chart 5. Percentage of successful translation for English-German


Chart 6. Percentage of successful translation for German-English


Chart 7. Percentage of successful translation for English-Spanish


Chart 8. Percentage of successful translation for Spanish-English


Chart 9. Percentage of successful translation for English-Italian


Chart 10. Percentage of successful translation for Italian-English


Chart 11. Percentage of successful translation for English-Portuguese


Chart 12. Percentage of successful translation for Portuguese-English

John Barré