The new DeepMind artificial intelligence can perform over 600 tasks, from playing games to controlling robots – TechCrunch

The ultimate achievement for some in the AI ​​industry is the creation of a system with artificial general intelligence (AGI) or the ability to understand and learn any task that a human can. Long relegated to the science fiction domain, it has been suggested that the AGI will create systems capable of reasoning, planning, learning, representing knowledge, and communicating in natural language.

Not every expert is convinced that AGI is a realistic goal – or even possible. However, it could be argued that DeepMind, an Alphabet-backed research lab, took a step in that direction this week by releasing an artificial intelligence system called Gato.

Gato is what DeepMind describes as a “general purpose” system, a system that can be taught to perform many different types of tasks. DeepMind researchers taught Gato to complete 604 accurately, including picture captions, conducting dialogues, stacking blocks with a real robot arm, and playing Atari games.

Jack Hessel, a scientist at the Allen Institute for AI, points out that a single AI system that can solve multiple tasks is nothing new. For example, Google recently started using a system called Unified Multitasking Model, or MUM, in Google Search, which can handle text, images, and videos to complete tasks ranging from finding cross-language spellings for a word to associating a search query with an image. But what is Hessel says potentially newer here is the variety of tasks performed and training methods.

DeepMind’s Gato Architecture. Image credits: DeepMind

“We’ve seen evidence before that individual models can handle surprisingly different sets of input data,” Hessel told TechCrunch via email. “In my opinion, the fundamental question in multitasking learning is whether the tasks are complementary or not. You can imagine a more boring case if the model indirectly breaks the tasks apart before solving them, e.g., “If I detect task A as input, I will use subnet A. If I detect task B instead, I will use a different subnet B.” For this null hypothesis, similar results can be achieved by training A and B separately, which is disappointing. In contrast, if training A and B together improves one (or both!) Then things are more exciting. ”

Like all AI systems, Gato learned by example, assimilating billions of words, images from real and simulated environments, button presses, joint torques, and more in the form of tokens. These tokens were used to represent the data in a way that Gato could understand, allowing the system – for example – to extract the Breakout mechanics or which combination of words in a sentence could make a grammatical sense.

Gato does not necessarily perform these tasks right. For example, when talking to a person, the system often responds with a superficial or factually incorrect answer (eg “Marseille” in response to “What is the capital of France?”). In the pictures with the inscriptions, Gato confuses people. And the system correctly stacks blocks using a robot in the real world only 60% of the time.

But in 450 of the 604 jobs mentioned above, DeepMind claims that Gato is doing better than the expert more than half the time.

“If you think we need a general [systems]that is, a lot of people dealing with artificial intelligence and machine learning [Gato is] it’s a big deal, ”said TechCrunch via e-mail Matthew Guzdial, assistant professor of computer science at the University of Alberta. “I think the people who say this is a big step towards AGI are exaggerating a bit because we are still not with human intelligence and we probably won’t get there soon (in my opinion). Personally, I am more in the camp of many little models [and systems] are more useful, but these generic models definitely pay off in terms of performance for tasks beyond the training data. ”

Interestingly, from an architectural point of view, Gato does not differ dramatically from many currently produced artificial intelligence systems. It has common features with the GPT-3 OpenAI in the sense that it is a “Transformer”. As of 2017, Transformer has become the architecture of choice for complex reasoning tasks, demonstrating its ability to summarize documents, generate music, classify objects in images, and analyze protein sequences.

DeepMind Gato

Various tasks that Gato has learned to perform. Image credits: DeepMind

Perhaps even more remarkable is that Gato is an order of magnitude smaller than single-purpose systems, including GPT-3, in terms of the number of parameters. Parameters are the parts of the system learned from the training data and essentially define the problem solving skills of the system such as text generation. Gato is only 1.2 billion, while GPT-3 is over 170 billion.

DeepMind researchers kept Gato intentionally small so that the system could steer the robot arm in real time. But they hypothesize that, if enlarged, Gato can handle any “task, behavior, and embodiment of interest.”

Assuming that this happens, several other obstacles will have to be overcome for Gato to outperform state-of-the-art single-task systems in specific tasks, such as Gato’s inability to constantly learn. Like most Transformer based systems, Gato’s knowledge of the world is based on training data and remains static. If you ask Gato a dating sensitive question, as with the current US president, chances are he’ll answer the wrong thing.

Transformer – and hence Gato – has a different limitation in its context window, i.e. the amount of information that the system can “remember” in the context of a given task. Even the best language models based on Transformer are unable to write a long essay, let alone a book, without forgetting key details and thus losing orientation in the plot. Forgetting happens in any task, be it typing or controlling a robot, which is why some experts have called it the “Achilles’ heel” of machine learning.

For these and other reasons, Mike Cook, a member of the Knives & Paintbrushes research collective, cautions against assuming Gato is a pathway to general-purpose AI.

“I think the result might be a bit misinterpreted. It sounds exciting that an AI is able to do all these tasks that sound completely different because to us it sounds like writing text is very different from controlling a robot. But it’s actually not much different from GPT-3’s understanding of the difference between plain English text and Python code, ”Cook told TechCrunch via email. “Gato receives specific training data for these tasks, like any other AI of its kind, and learns how patterns in the data relate to each other, including learning to associate certain types of inputs with certain types of outputs. That doesn’t mean it’s easy, but to an outside observer, it may sound as if the AI ​​could also make a cup of tea or easily learn another ten or fifty other tasks, and it can’t do that. We know that the current approach to large-scale modeling can allow him to learn multiple tasks simultaneously. I think it’s a nice job, but I don’t think it’s an important step on the way to anything. ”

Leave a Reply