PROMT, SYSTRAN, GOOGLE, BING – Has the age of machine translation finally arrived?

Checking what online machine translation service is best for you

Some claim that learning foreign languages is a waste of time, that translators are soon to disappear from the professional market and that technology can get you from language A to language B in no time, for free and without any trouble. Myth or reality? The debate is open; however it is true that technology can be a helpful tool and a myriad of online translation software – also known as machine translation system or “MT” – can be found on the web. So many that it may be challenging to find the right MT for the right text. This is why we have a run a cross-comparative analysis between four MT – Systran’s Babel Fish, Google Translate, PROMT and Microsoft’s Bing Translator and using five languages.

The set-up: Turn on the machines!
Machine translation technology is a complex science. There are many types of MT, among others statistical, hybrid, rule-based or sentence-based. We decided to focus on the user’s perspective and test the four MT efficiency, leaving the technical part aside. If you wish to learn more about the different types of MT and how they work, please check Hutchins and Somers.
In order to carry out this experiment, we gathered a corpus of 500 sentences, submitted to the aforementioned four MT. Ten language combinations were put on trial – 50 sentences per combination. These are English-French, English-German, English-Spanish, English-Italian, English-Portuguese and reversed. Each sample of 50 sentences was translated via each of the four MT and their respective results evaluated on a scale from 0 to 3. 0 for untranslated or not understandable results, 1 when the meaning had to be guessed, 2 when the gist was correct with grammatical mistakes and 3 for translation that would almost compete with the work of a professional translator. Each MT could gather up to 120 points per language combination evaluated.
Because we wanted the corpus of sentences to be as well-balanced as possible, we selected 5 sentences of 10 different areas in each batch. These are advertisement, business, financial, gastronomic, legal, literature, medical, religious, slang and Tweet. Each of those features their own difficulties when it comes to translation. This is why we designed a breakdown of their respective scores for each language combination in order to find out which domains are well-handled by MT and which ones are not.

Overall results: Who wins the battle?
Here are the results that we obtained:

Chart 1. Comparison between 4 MT across 10 language combinations
Unit: points (out of a maximum attainable of 120)

The results show interesting trends. Overall, Google Translate seems to be providing a better translation than the other MT, followed by Bing Translator and Systran and PROMT at the end. The only instance where Bing scored highest is for Spanish-English. Some language combinations such as French-English, Italian-English or Portuguese-English managed to spawn relatively good results; Google Translate gathered over 80 points out of 120 in these combinations – close to 75% of perfect translation. However, Spanish MT results were across all four of them very weak. Likewise, German turns out to be more challenging to translate, as source text and as target text alike. Winner: Human translation. Runner-up: Google Translate.

The chart below shows how well each MT performed across all languages and how many point they gathered out of a total of 1,200. Because some language combinations, as well as some areas of translation, are so challenging the overall results are very average: only a third of the text produced is correct for PROMT and Sytran, about half of it for Bing and Google. There is still significant progress to be done before MT can be used with perfect reliability and accuracy.

Chart 2. Total of points gathered across all languages
Unit: points (out of a maximum attainable of 1,200)

Digging deeper: Results across subjects areas
From the graphs below, we can identify some trends. On the “winning” side, medical translation (probably due to the straight-forward descriptive texts) is what can be best handled by MTs, being in the top 3 in 7 out of 10 language combinations. Advertisements are generally well translated as well – 6 times in the top 3, once in the bottom 3. This is generally because ads use simple sentences easy to remember. Finally, recipes (gastronomic) appear 5 times in the top 3 twice in the bottom 3. This area did especially well when English is the source language as recipes are written in the imperative mode in English, an easy mode for MT to handle.
Looking at the bad players of the experiment, literary translation is, without much surprise, the worst case scenario for MT: elaborated syntax, rare words and unusual figures of speech, among others. As a result, literature made to the bottom 3 in 9 of the combinations. 7 times in the bottom free – both slang and Tweets. Slang uses metaphorical speech, which is usually translated literally by the MT, resulting in a nonsensical segment. Tweets contain exclusive elements the other sentences do not have, such as colloquial abbreviations, URL or symbols (#, @). These sometimes induced MT to make mistakes, in addition to the often bad quality of language used on Twitter: abbreviations, slang words, etc. See the detailed results on the graphs below.
Regardless of what you need to translate, do not expect perfection from MT. They can and do produce intelligible results in most cases, given that the source text is basic, well-written and not too ambiguous, if at all. Out of the four MT tested, Google Translate turned out to be the best one to use in most occurrences. Texts that are generally speaking factual (medical, gastronomical) are MT-friendlier than creative writing (literature, Tweets). Plain texts and simple sentences are consequently easier for MT as they avoid multiple possible translations – this is known as controlled language. Finally, we would like to advise you against relying solely on MT for important subjects or published texts – that could bring you serious embarrassments!

Chart 3. Percentage of successful translation for English-French

Chart 4. Percentage of successful translation for French-English

Chart 5. Percentage of successful translation for English-German

Chart 6. Percentage of successful translation for German-English

Chart 7. Percentage of successful translation for English-Spanish

Chart 8. Percentage of successful translation for Spanish-English

Chart 9. Percentage of successful translation for English-Italian

Chart 10. Percentage of successful translation for Italian-English

Chart 11. Percentage of successful translation for English-Portuguese

Chart 12. Percentage of successful translation for Portuguese-English

John Barré

Tags: , , , , , ,

{ 19 comments to read ... please submit one more! }

  1. Can you tell us anymore about how the sentences were rated? You outline a scale and general guidelines, but were there any more specific instructions? Who actually did the ratings? (full demographic info if possible?) I think your study is very interesting – just missing a few of the details.

  2. @Anon: As we outlined in the article: We had a scale of 0 to 3 points for each sentence. The ratings werde done inhouse by our content team of native speakers. Hope that helps.

  3. Hi,

    We have recently released which is a system combination of Statistical and Example Based. If you have corpus in a domain and language you can build a bespoke engine in three clicks and I am sure you would see impressive results. There is a free 30 day trial so have a play around – I’d love to see a comparison between these broad domain engines and one built specific to requirements.

  4. @Gavin: Our analysis was based on free-of-charge machine translation tools which provide direct translation results. I am sure that there are a lot of systems just like yours out there helping translators with translations. However, that would be a completely different analysis. All the best to your new venture!

  5. Just to double check, are you aware that a client can take you to court if you translate using online translation tools?


    Aurora Humaran

  6. No, for smaller and more complex and exotic languages with complex grammar, like 3 Baltic languages and Russian (LV and LT has 7 cases, ET 14, Russian 6, not to mention many other grammar rules) the age of MT (usable in any sense for PE) is yet in the distant future. Thank God for that.

  7. As a professional translator I find this study quite interesting, but I should point out that just being a native speaker is not a good standard of quality for a judge of these translations, unless they are formally trained in their own language, such as a linguist or a formally-trained translator. When I began studying, I thought I was great at Spanish, after all, I am a native speaker. However, college proved me wrong, there was so much I didn’t know and so much I did wrong. My training developed my knowledge and my ear for correct Spanish. In my experience, linguists in particular are great at detecting and justifying correct and/or natural Spanish.

    I don’t mean to be overly critical or to offend anyone, I’m just pointing out something that you may not know.

  8. @Aurora, thanks for bringing this up. I had never heard of it before and it certainly is a good thing to know.

    @Uldis, most likely. Even for so called easy languages like Spanish or English that have simple grammar and little signs of inflections or very regular ones, MT proved to be a failure nonetheless. While it would be interesting to appraise the quality of MT with Slavic languages, the scope is for now too limited as only a few MT offer these language combinations.

    @Paul you are quite right, being a native speaker of the language is a requirement, yet not the only one in order to be able to carry out an accurate evaluation. We are fortunate enough to have a team of various native speakers of all the languages tested in the article, all of us having language-related university degrees. Therefore, we believe the results are relevant and reliable.

  9. I find this area of research to be very interesting! I think this was a great study! It is great for a study of this nature to represent with figures, graphs and numbers the extent that MT has grown. I’m curious as to why the ‘religious category’ seem to score relatively high across all language combinations with MT…

  10. @Paul – Excellent comments on translator qualifications.

    In general, I was intrigued by the study, but 5 sentences per subject area hardly seems like a valid sample. A lot would depend upon the specific choice of sentences and the accidental appearance of individual terms.

    Another issue (mentioned by @Aurora) is the privacy/confidentiality concerns with publicly available MT engines. This is of little or no concern for a lot of translation activities, but when these engines are used for proprietary content (including internal emails, for instance) there can be significant legal ramifications. And when we are talking about using MT to improve productivity of a professional translator, that can be a real sticking point.

    And finally, I think there is a sort of false conundrum involved in this study. Repeated studies of MT quality have confirmed that careful customization for a given domain yields much better results than a broad-based general purpose engine. So while the study may point to an engine that is better “in general” it says very little about which engine (or approach) is best for a specific problem. And again, if the goal is productivity enhancement, investment in a customized engine that maximizes that benefit is likely to be cost effective.

  11. I don’t understand the math, there were 50 sentences and each sentence could score max 3 which means 50 *3 = 150 not 120, or did i misunderstand?
    Same thing with the score across all languages 500*3 = 1,500 not 1,200

{ 8 Pingbacks/Trackbacks }

  1. Quora
  1. Translation Guy » The Best Online Machine Translation
  1. Lost in Facebook Translation – Why the Facebook translation tool is a failure - Lexiophiles
  1. Machine Translation Round 2: Popularity Contest - Lexiophiles
  1. Quora
  1. Quora
  1. Quora
  1. Promt translater | Platwebz