deepmind-gato-slash-image-closer-in.png
DeepMind's "Gato" neural network excels at numerous tasks including controlling robotic arms that stack blocks, playing Atari 2600 games, and captioning images. DeepMind

The world is used to seeing headlines about the latest breakthrough by deep learning forms of artificial intelligence. The latest achievement of the DeepMind division of Google, however, might be summarized as, "One AI program that does a so-so job at a lot of things." 

Gato, as DeepMind's program is called, was unveiled this week[1] as a so-called multimodal program, one that can play video games, chat, write compositions, caption pictures, and control a robotic arm stacking blocks. It is one neural network that can work with multiple kinds of data to perform multiple kinds of tasks. 

"With a single set of weights, Gato can engage in dialogue, caption images, stack blocks with a real robot arm, outperform humans at playing Atari games, navigate in simulated 3D environments, follow instructions, and more," write lead author Scott Reed and colleagues in their paper, "A Generalist Agent," posted on the Arxiv preprint server[2]

DeepMind co-founder Demis Hassabis cheered on the team, exclaiming in a tweet[3], "Our most general agent yet!! Fantastic work from the team!" 

Also: A new experiment: Does AI really know cats or dogs -- or anything?[4]

The only catch is that Gato is actually not so great on several tasks. 

On the one hand, the program is able to do better than a dedicated machine learning program at controlling a robotic Sawyer arm that stacks blocks. On the other hand, it produces captions for images that in many cases are quite poor. Its ability at standard chat dialogue with a human interlocutor is similarly

Read more from our friends at ZDNet