The world is used to seeing headlines about the latest breakthrough by deep learning forms of artificial intelligence. The latest achievement of the DeepMind division of Google, however, might be summarized as, "One AI program that does a so-so job at a lot of things."
Gato, as DeepMind's program is called, was unveiled this week as a so-called multimodal program, one that can play video games, chat, write compositions, caption pictures, and control a robotic arm stacking blocks. It is one neural network that can work with multiple kinds of data to perform multiple kinds of tasks.
"With a single set of weights, Gato can engage in dialogue, caption images, stack blocks with a real robot arm, outperform humans at playing Atari games, navigate in simulated 3D environments, follow instructions, and more," write lead author Scott Reed and colleagues in their paper, "A Generalist Agent," posted on the Arxiv preprint server.
The only catch is that Gato is actually not so great on several tasks.
On the one hand, the program is able to do better than a dedicated machine learning program at controlling a robotic Sawyer arm that stacks blocks. On the other hand, it produces captions for images that in many cases are quite poor. Its ability at standard chat dialogue with a human interlocutor is similarly