They’re all moving to this method not only because they can improve machine translation, but because they can improve it in a much faster and much broader way. “The key thing about neural network models is that they are able to generalize better from the data,” says Microsoft researcher Arul Menezes. “With the previous model, no matter how much data we threw at them, they failed to make basic generalizations. At some point, more data was just not making them any better.”
For machine translation, Google is using a form of deep neural network called an LSTM, short for
long short-term memory. An LSTM can retain information in both the short and the long term—kind of like your own memory.
That allows it learn in more complex ways. As it analyzes a sentence, it can remember the beginning as it gets to the end. That’s different from Google’s previous translation method, Phrase-Based Machine Translation, which breaks sentences into individual words and phrases. The new method looks at the entire collection of words.
Of course, researchers have been trying to get LSTM to work on translation for years. The trouble with LSTMs for machine translation was that they couldn’t operate at the pace we have all come to expect from online service. Google finally got it to work
at speed—fast enough to run a service across the Internet at large. “Without doing lots of engineering work and algorithmic work to improve the models,” says Microsoft researcher Jacob Devlin, “the speed is very much slower than traditional models.”
According to Schuster, Google has achieved this speed partly through changes to the LSTMs themselves. Deep neural networks consists of layer after layer of mathematical calculations—linear algebra—with the results of one layer feeding into the next. One trick Google uses is to start the calculations for the second layer before the first layer is finished—and so on. But Schuster also says that much of the speed is driven by Google’s tensor processing units,
chips the company specifically built for AI. With TPUs, Schuster says, the same sentence that once took ten seconds to translate via this LSTM model now takes 300 milliseconds.