Thursday, July 3, 2014

The Economist on Machine Translation

There's a blog I follow called Hanzi Smatter, where readers send in pictures of tattoos in Japanese and Chinese, and the author explains their meanings. Spoiler: most of the time they are either gibberish or mean something other than what the inked subject originally thought they meant.

Anyway, the blog was mentioned last month in an Economist article about machine translation. Being that I do a bit of translation myself and am considering it as a possible career in the future, this is an area of great interest to me. Namely, with the rise of computer translation engines like Google Translate, will humans someday be crowded out of this job market by computers? The Economist promised a followup article on that topic, but I haven't found it if it's been published yet and here it is.

MT has come a long way, I'll grant. But Google Translate still churns out a lot of awkward and unnatural text, if it can even get to the base meaning of something. In translations where the source and target are similar, like English and Spanish or German, I imagine improvement may be quick and dramatic. I'm skeptical that leaps and bounds will be made for languages like Japanese, Thai, and Chinese or Korean paired with English. Here's a comment from the article that I think states well this belief (criticism of the publication itself aside):

Good to see MT discussed in the Economist.

Sad to see the same tropes get dragged out all over again. Bad sign translations? Really? Only seen that one in every single online article ever written on MT in the last decade.

One would hope the Economist (of all journals) would write something far more insightful than the 4,757th article full of gasping glee over "magic" technology or how much money Dell saved by using MT. (Ever wonder why Dell is the only TAUS company that keeps getting quoted over and over? Not that many "success stories," it turns out.)

OK, let's review a few dirty little secrets. They can be very revealing.

Dirty little secret #1. There is almost zero money in commercial-market MT. It's a barren desert of "consultants" and shockingly little cash flow. Every company that dips its toes into the water realizes that it cannot survive on hype and no cash. All the VC cash is pouring into platforms for human translation. There is a reason for this.

Dirty little secret #2. Human translation is a $34 billion global market with billion-dollar segments (law, finance, banking, marketing, media, government) where MT -- the perfect technology for fast, cheap and good enough -- will never work.

Dirty little secret #3. The engine driving MT is human translation. Google works by leveraging existing human translations in its databases and only when nothing is there does it lean on predictive algorithms. Hence, the massively uneven quality. If you subtracted all that human-translated content, Google would be a gibberish-producing laughingstock.

Oh, and Google itself does not use Google Translate for its own materials. Who would be silly enough to do that?

If this article were on Internet privacy, we would have just read the version the NSA wants us to read.

I would hope the Economist would get more serious as a journalistic enterprise in exploring the far more fascinating story (the real one) instead of the one just spoon-fed to it by organizations like TAUS, which have their own agenda.

My own brief followup anecdote:

At work, my colleagues and I find and clip articles from major US publications relating to developments in the telecom industry or telecom public policy. Each day we send the ones we choose to a translator, who sends them back in Japanese. Recently we were told by our Factiva (a publication aggregating service that we subscribe to) account representative that there is a tool within the interface that allows for some articles to be translated to Japanese. I expressed skepticism, but she told me she was confident that they would be of good quality and to check out the tool. I did so, and this is what I found -

Their fancy tool is actually just Google Translate. And thus I lost confidence that our account representative really knows what she is talking about.


  1. Very interesting, and explains a lot of my experience with using Google for Japanese to English (Bing's database seems to come from porn). Intermediate in Japanese, Google is some use to me: if I were worse or better it would be of little use. One thing I get a laugh at is the English translation having random pronouns, verbs or verb-tense, as the Japanese sentence may have had none.

  2. Ἀντισθένης - Google is indeed a good tool, even if Translate is lacking. It's better to have some idea of what something means than be completely stumped if you're unfamiliar with some words or grammar. Also just doing general Google searches for words or phrases can sometimes turn up enlightening results.

    But yeah, you're right - there can be some pretty whacky English translations. Same thing with Thai. =P