Out of all the hundreds of languages and dialects there are out there, which ones are the most searched for? Who wants to learn what? We decided to carry out a study to find out. With the help of Google Trends as a measuring tool, we checked which languages are most searched for around the world in a selection of languages. The methodology is quite simple. We looked at data from 2011 to 2016 in the following countries: Continue reading →
With over 10 million weekly visits, the bab.la community seems to be quite the bunch of language lovers! We were interested in learning more about the linguistic backgrounds as well as opinions of the people interacting with our site on a daily basis, and decided a survey would be just the thing to find out more about the bab.la community. Our hopes were to identify key statistics in order to share them and expand our resources on key areas where users either have difficulties or particular interest. With these findings we can better try to serve our user base, and you can see how others are similar or differ to you personally when it comes to languages with bab.la.
The participation was spectacular, almost exceeding the limit of total responses allowed! We are grateful for all of your participation and support with this research. If you are curious about the outcome too, here are the results! Continue reading →
bab.la’s array of languages and dictionaries keeps growing!
So what has happened so far?
We launched www.babla.co.th – bab.la is translated in Thai, we are hoping to get the Thai-English dictionary out there very soon!
We have a brand new Polish-Russian dictionary
The Greek version of bab.la is online and the first version of the English-Greek dictionary is available.
There are 12 million native speakers of Greek and roughly 2 million of Greek learners. We welcome you all to bab.la! We do hope we’ll be able to help you as much as possible with English (or Greek) learning thanks to our dictionary!
Have you tried out our Greek cuisine quiz yet?
Recently I have gotten my hands on this new kid’s book for German and English learners named Bosley’s New Friends. This dual language book is very useful for young learners of German and English because it has a very easy to read format and is illustrated very well with good vocabulary for children.
The book also benefits from things such as important highlighted words to attract attention which aid the reader’s memorization. The story itself is a very light hearted and nice read which really supports the idea of learning languages to communicate. The bear Bosley is essentially symbolising the reader who doesn’t know a language in his environment and must learn it to communicate with others.
Essentially the story is based on Bosley, a bear in a forest, and has a hard time communicating or playing with the animals and wanders all around without luck of making any new friends until he comes back home and finds out from his family that the other animals are speaking another language.
So Bosley’s dad takes him out to teach the languages the animals speak. During this teaching section the reader is being taught new words in both languages while in the story Bosley is learning the ways other animals communicate.
This way the young reader can see himself in Bosley and that the forest is his new environment and that the animals are the other people in his environment. This way the sub consciousness of the reader takes this underlying hint as a note to learn languages to be able to befriend and play with other kids.
In conclusion this book offers a nice and easy read for the younger audience who are learning either German or English and teaches the reader a lot of new words and nice illustrations while explaining that learning languages is the key to communicate in a new environment.
The book can be purchased online and is available in many language combinations.
Image: Tim Johnson, www.thelanguagebear.com
Other studies of movie titles have shown that translators can go a thousand miles in order to culturally adapt their translations to fit their audience; sometimes to the very extreme (cf. Brew 2008, 50FMTT 2011 and Mahan 2012). Ultimately the producers’ choice, this article nonetheless investigates translations of Disney© movie titles from English into the target languages German, French, Spanish, Russian and Swedish. The selection of target languages was made based on the number of speakers of the languages, but the access to native speakers to evaluate the titles was also taken into consideration. Both a quantitative and a qualitative analysis of the subject will be provided. Continue reading →
Who wouldn’t want a free translation from any language to another, delivered in two shakes of a lamb’s tail? Seems like many of us do, or otherwise the online world wouldn’t be filled with various translation tools, each being described as the best solution for all translation problems and needs. However, for a professional translator tools like Google Translate and Bing are, if not a nightmare, at least a source of endless debates, differing opinions, and sometimes even a direct cause for a decreased work flow. Indeed, why should you pay for a professional translator if you can get the job done for free? Well, there is nothing wrong with this train of thought as such, as long as you are ready to sacrifice quality. Or is it just a myth? Does the use of free translation tools immediately mean decreased quality, even in the year 2014 when technologies develop and the language industry grows online faster than ever? To examine how good (or bad) some popular online translation tools actually are we revisited our previous study and set ourselves a new challenge.
Two tools, five languages, five genres
To assess the quality of translations produced by machine translation tools, our team of language experts compiled a small specialized corpus (for more detailed information on building corpora, see for example Baker (2013), Koester (2010) and Lindquist (2011)) that consisted of 50 English sentences drawn from publically available sources. To ease the comparison and give some more insight to the relatively small amount of linguistic data, the corpus was further divided into five genre-specific sub-corpora: Twitter, literature, news headlines, food recipes and legal texts. The ten sentences in each of the five genre categories were then translated into five languages – Finnish, Swedish, French, Russian and Dutch – with the help of two online translation tools, Google Translate and Bing. These tools were chosen because they displayed the widest combination of languages available for translation; especially Finnish seems to be an option rarely available in machine translation, and thus prevented us from using more tools for the study.
After compiling the corpus and translating the sentences, each translated sentence was ranked and classified in a scale from 0 to 4, where zero marks the worst and four the best possible grade. The criteria to obtain a zero mark included for example several untranslated words, severe grammar mistakes and overall incomprehensibility. To get a four, the translation had to be close to what a human translator would produce, yet some stylistic or contextual problems were allowed. Other marks fell in-between these two extremes.
From Twitter to national law: does machine translation stand any chance?
Overall, all languages ranked quite low when compared with translations produced by humans. Swedish acquiring 120 points out of the possible 200 was a clear winner in our quality comparison, followed closely by Dutch, Russian and French. Finnish was left far behind with its 60 points. It can also be observed that in the battle between Google Translate and Bing, the former was considered to produce translations of better quality in all five of the examined languages and genres.
Figure 1. Results organized according to languages (max. points attainable 200 per tool)
Figure 2. Results organized according to genres (max. points attainable 200 per tool)
However, given the small amount of the analyzed data, the results can’t be generalized too much: other sentences could have provoked the opposite results. Yet, it is not likely that the order of the languages would be random. The highest ranked Swedish as well as the second-ranked Dutch are both languages of Germanic origin, similarly to the source language English while the group’s backmarker, Finnish, differs drastically from English in terms of syntax and word order. The middle position holders French and Russian are also of non-Germanic origin, which would explain their lower scores compared to the Germanic ones. Nevertheless, given the data available in these languages online and their immense popularity among language learners and lovers, it’s not surprising that they rank higher than Finnish.
Figure 3. Quality scores of Twitter according to language (max. points attainable 40 per tool)
As Figure 3 demonstrates, Twitter as a genre acquired the overall lowest quality score. Some variation existed between the two tools: in Finnish and Dutch, Bing was praised as producing better quality work while the remaining three languages preferred Google Translate. The low score could be due to the structurally complicated nature of Twitter posts. A lot of information is condensed into a small number of characters which results in the use of abbreviations, hashtags and other special symbols. Machine translation tools seem to be unable to process these trends. Interestingly enough, a relatively clear pattern occurred between the two examined tools in the treatment of hashtags: for example, in the Finnish data set no changes were made to #teaching by Bing whereas Google Translate changed it into #opetus. The same trend occurred throughout the corpus.
Figure 4. Quality scores of literary texts according to language (max. points attainable 40 per tool)
Literature scored second-lowest in terms of quality, and to many this probably comes as a no surprise. Literary language often contains poetic expressions, complicated structures and other effects that make literary pieces so original. What is interesting in our findings, though, is that this time the Germanic languages Dutch and Swedish scored lower than non-Germanic French and Russian. Overall, Google Translate takes a win in this genre as well, leaving Bing behind in all but one language.
Figure 5. Quality scores of news headlines according to language (max. points attainable 40 per tool)
Similarly to the two aforementioned genres, news headlines provoked some interesting variation between the languages. Somewhat surprisingly French scored the lowest, while other three big languages – Swedish, Russian and Dutch – produced some relatively good results. It seems that the incomplete vocabulary and short sentences typical to news discourse resulting in ungrammaticality lowered the ranking for French. And once again, Google Translate takes the credit for a better-working tool.
Figure 6. Quality scores of food recipes according to language (max. points attainable 40 per tool)
The sub-corpus consisting of sentences from food recipes was ranked as the genre category in which the translation tools produced the best results. Again dominated by Google Translate, and what is especially noteworthy is that in Dutch it produced fully comprehensible results that were considered almost as good as those produced by a human translator. The good quality of translations in this genre could be due to simple sentence structures, such as the frequent use of infinitive structures in English. However, unlike in some other categories, the lexis proved to be the Achilles heel for the tools: they failed to distinguish polysemic words, such as to serve and icing, which lowered the score especially for Finnish.
Figure 7. Quality scores of legal texts according to language (max. points attainable 40 per tool)
Legal texts provided another triumph for Google Translate: it was considered better than Bing in all five languages. However, the overall results demonstrate that the achieved results were relatively low in all languages besides Swedish and French. The low scores are likely to be due to the complicated, field-specific terminology and sentence structures often encountered in legal discourse. Interestingly enough, in this genre several sentences were given a four, while equally many deserved only a zero or one. Thus, drawing any definite conclusion from the patterns in this genre is not possible.
Food for thought
As our small study demonstrates, there is both good and bad in machine translation and in the tools generated for this purpose. It goes without any doubt that, as they are now, the machine translation tools could not replace human translators. This is demonstrated for example by the impossibility for the translation tools to separate different contexts and polysemic words or to understand complicated syntactic structures, such as case endings. However, it can’t be said that these tools would be completely bad either: they worked relatively well with short and simple sentences and instructive texts.
While using online translation tools might work quite well for getting the gist of a news article from a foreign news paper or for preparing that mouth-watering cake of which you couldn’t find a recipe in your own language, it is probably the safest to leave legal and literature translations for human translators. Or what do you think: would you prioritize a product of a machine to a text with a human touch?
Baker, P. (2013). Using corpora in discourse analysis. London: Bloomsbury.
Koester, A. (2010). Building small specialised corpora. In M. McCarthy & A. O’Keeffe (Eds.), The Routledge handbook of corpus linguistics (pp. 66-79). London: Routledge.
Lindquist, H. (2011). Corpus linguistics and the description of English. Edinburgh: Edinburgh University Press.