Review of Neural Machine Translation

This is a post that reviews the recent advancements of machine translation, especially neural machine translation (NMT). 

Prerequisites: this post assumes some prior knowledge about machine learning, artificial neural networks, CNN, RNN (LSTM, GRU) encoder-decoder architecture, seq-to-seq models, etc.

This post is based on the following resources. Feel free to refer to them for more details:

  1. Papers and links appended in the end of this post.
  2. (Dr. Luong, Dr. Cho, and Dr. Manning’s tutorial on NMT at ACL 2016)
  3. (Dr. Luong’s Phd thesis on NMT)
  4. (Dr. Cho’s notes on deep learning for NLP)
  5. (Graham Neubig’s tutorial on NMT)
  6. (Cho’s tutorial of NMT for Nvidia)
  7. (notes of Stanford’s CS224n course)
  8. (Dr. Maosong Sun’s slides: Thoughts on Machine Translation)

This post is summarized as follows:

  1. History and evolvement of machine translation
  2. Neural machine translation: basics
  3. Major challenges and solutions
  4. Potential future directions and my personal thoughts

1.History and evolvement of machine translation

Knowing about history of machine translation not only gives us a broader picture of the domain, but also reminds us that achieving fully automatic high-quality machine translation is still hard and requires a lot of work to do despite of recent exciting advancements because of neural machine translation.

Image courtesy of Christopher D.Manning. Extracted from Dr. Luong’s thesis.

The idea of using digital computers for machine translation originates from a letter written by Warren Weaver[1], who is an American mathematician and scientist, sent to a cyberneticist Norbert Wiener. In the letter, he mentioned the idea of using computers to translate human documents. This idea was then formulated as a memorandum, entitled as “Translation”, in 1949. It stimulated the research in this area then and later. Specifically, he came up with four proposals. 

  1. Word meaning and its context – the problem of multiple meanings of a word might be tackled by the examination of immediate context, and he speculated that the number of context words required is fairly small.
  2. Machine translation and logic – Weaver hypothesized that translation could be addressed as a problem of formal logic, deducing “conclusions” in the target language from “premises” in the source language.
  3. Machine translation and cryptography – impressed by Shannon’s work on information theory and cryptography in World War 2, Weaver thought that some cryptographic methods could be applied to machine translation.
  4. Linguistic universals – Weaver hypothesized that there were linguistic universals underlying all human languages and that we could exploit this property to make translation more straightforward. It led to one of the very well-known metaphors in machine translation:  “Think, by analogy, of individuals living in a series of tall closed towers, all erected over a common foundation. When they try to communicate with one another, they shout back and forth, each from his own closed tower. It is difficult to make the sound penetrate even the nearest towers, and communication proceeds very poorly indeed. But, when an individual goes down his tower, he finds himself in a great open basement, common to all the towers. Here he establishes easy and useful communication with the persons who have also descended from their towers”.

(In honor of his contributions not only to machine translation but also many other aspects of science, the building of NYU’s Courant Institute has been named after Warren Weaver, which is now called Warren Weaver Hall. )

For about a decade since the memorandum, several efforts have been made for the research in MT, including the first MT conference held at MIT in 1952, the first public demonstration of a MT system in 1954, the first journal founded called Mechanical Translation, and the first MT book in 1955.

After the cultivation of initial research, there is a high expectation and passion for MT between 1956-1966. Since the linguistic theory was still not mature then, the research at that time tended to be divided into two directions:
        1. empirical statistical methods to discover lexical and grammar regularities, and
2. fundamental linguistic research, regarded as the beginning of what is now called “computational linguistics”.
More specifically, there arose three basic approaches:
        1. Direct translation from a source language (SL) to the target language (TL) using programming rules with minimal linguistic analysis required;
        2. Interlingua model-an abstract universal representation independent of different languages, where ST is translated into interlingua, which is then translated into TL;
        3. Transfer approach, “where conversion was through a transfer stage from abstract (i.e. disambiguated) representations of SL texts to equivalent TL representations; in this case, translation comprised three stages: analysis, transfer, and generation”[2].
Much of the research projects during that time focused on the direct translation approach, deriving  dictionary rules from actual texts using statistics, while little attention was devoted to interlingua approach. And this decade also saw some fundamental research in formal linguistics stimulated by MT. Notably, in 1960, Yehoshua Bar-Hillel[3], who worked at MIT and organized the first MT conference, argued in a survey that the realization of fully automatic high quality translation was unrealistic and impossible, demonstrated using an example with word “pen”[2].

In 1966, Automatic Language Processing Advisory Committee was set up to examine the prospects and situation of machine translation. In the famous ALPAC report, it concluded that “there is no immediate or predictable prospect of useful machine translation” and that there existed semantic barrier thus computational linguistics should be emphasized more. It marked the beginning of a trough for MT research in the US for more than a decade. On the other hand, computational linguistics has become an independent area of research since then without the “umbrella” of MT.

Despite the trough, MT didn’t get into a completely dead end during the following decade. There was still some work done in different locations worldwide, much of which focused on interlingua approach.

After 1976, there was a revival of MT research, much of which was devoted to the transfer approach mentioned above. Also, some systems turned into practical use and attracted some extend of public attention, such as METEO, METAL, Systran’s system, etc.





























[2] Machine translation: a concise history, W. John Hutchins


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s