Named after Gato, it is perhaps the most impressive all-in-one machine learning suite in the world.
According to a DeepMind blog post:
The agent we call Gato acts as a multi-modal, multi-tasking, multi-faceted generalist policy. The same network with the same weights can play Atari, signature images, chat, stack blocks with a real robot arm and much more, decide based on context whether to output text, joint torques, button presses or other tokens.
And while it remains to be seen how well it will do once researchers and users outside of DeepMind’s labs get their hands on it, Gato seems to be all that GPT-3 craves and more.
Here’s why it makes me sad: GPT-3 is a Big Language Model (LLM) produced by OpenAI, the world’s best-funded artificial intelligence (AGI) company.
Before we can compare GPT-3 and Gato, however, we need to understand where both OpenAI and DeepMind are coming from as companies.
OpenAI is Elon Musk’s brainchild, has billions of support from Microsoft, and the US government may actually care less about what it does when it comes to regulation and oversight.
Considering that OpenAI’s sole purpose is to develop and control AGI (it’s an AI capable of doing and learning anything human can, with the same access), it’s a little scary that all the company has managed to produce is a really fancy LLM.
Don’t get me wrong, the GPT-3 is impressive. In fact, it’s arguably just as impressive as Gato DeepMind, but this evaluation requires some nuance.
OpenAI followed the LLM path on its way to AGI for a simple reason: no one knows how to make AGI work.
Just as it took a while between the discovery of fire and the invention of the internal combustion engine, figuring out how to go from deep learning to AGI won’t happen overnight.
GPT-3 is an example of an artificial intelligence that can at least do something that seems human: generate text.
What DeepMind has done to Gato is, well, pretty much the same. It took something that works very much like the LLM and turned it into an illusionist capable of over 600 forms of prestidigitation.
As Mike Cook of the Knives and Paintbrushes research collective recently told TechCrunch’s Kyle Wiggers:
It sounds exciting that an AI is able to do all these tasks that sound completely different because to us it sounds like writing text is very different from controlling a robot.
But it’s actually not much different from GPT-3’s understanding of the difference between plain English text and Python code.
That doesn’t mean it’s easy to do, but to an outside observer, it might sound like the AI could also make a cup of tea or easily learn another ten or fifty other tasks, and it can’t do it.
Basically, Gato and GPT-3 are solid AI systems, but neither is capable of general intelligence.
Here is my problem: Unless you’re betting on AGI that comes up as a result of some random accident – the Short Circuit movie comes to mind – it’s probably time for everyone to reevaluate their AGI timeframe.
I wouldn’t say “never” because that is one of the damn words of science. But it makes it look like AGI is not going to be happening in our lives.
DeepMind has been working on AGI for over a decade, and OpenAI since 2015. Neither of them was able to solve the first problem on the road to AGI: building an artificial intelligence that can learn new things without training.
I believe Gato may be the world’s most advanced multimodal AI system. But I also think DeepMind has taken the same dead end concept for AGI as OpenAI, and just made it more marketable.
Final thoughts: What DeepMind has done is remarkable and will likely make a lot of money for the company.
If I’m the CEO of Alphabet (DeepMind’s parent company), I’m either releasing Gato as a pure product or pushing DeepMind into more development than research.
Gato may have the potential to be more lucrative in the consumer market than Alexa, Siri, or the Google Assistant (with the right marketing and use cases).
But Gato and GPT-3 are no more profitable entry points for AGI than the virtual assistants mentioned above.
Gato’s multitasking ability is more like a video game console that can store 600 different games than a game that can be played in 600 different ways. This isn’t general AI, it’s a handful of pre-trained slim models neatly packed.
It’s not a bad thing if that’s what you’re looking for. But there is nothing in the attached Gato research article to indicate that this is even a glance in the right direction for AGI, much less a springboard.
At some point, the goodwill and capital that companies like DeepMind and OpenAI have generated through their steel stubbornness that AGI is just around the corner will have to show even the slightest dividend.