DeepMind has released a "general purpose" artificial intelligence system that can be trained to perform many different types of tasks. The researchers trained the system, called Gato, to perform 604 tasks, including adding captions to images, engaging in dialogue, stacking blocks with a robotic arm, and playing Atari games.
The advancement in the artificial intelligence industry is the creation of a system with artificial general intelligence (AGI), or the ability to understand and learn any task that a human can perform. AGI was supposed to create systems capable of reasoning, planning, learning, representing knowledge, and communicating in natural language.
When Gato is deployed, a hint, such as a demo, is tokenized to form an initial sequence. The environment then issues the first observation, which is also tokenized and added to the sequence. The system autoregressively selects the action vector, one token at a time.
Jack Hessel, a research fellow at the Allen Institute for Artificial Intelligence, notes that a single AI system that can solve multiple problems is not new. For example, Google recently started using the Multitasking Unified Model, or MUM, in its search engine, which can process text, images, and video to perform tasks ranging from finding cross-language variations in the spelling of a word to matching a search query to an image.
After all the tokens that make up the action vector have been selected, the action is decoded and sent to the environment, which executes the steps and produces a new observation. Then the procedure is repeated. The model always sees all previous observations and actions in its context window of 1024 tokens.
However, Hessel notes the variety of problems solved and Gato’s teaching methods. “We’ve seen evidence before that individual models can handle surprisingly diverse sets of inputs,” he said. — In my opinion, the main question when it comes to multi-tasking learning is whether the tasks complement each other or not. You can imagine a more boring case if the model implicitly separates the tasks before solving them, for example: “If I find task A as input, I will use subnet A. If I find task B instead, I will use a different subnet B ". Similar results could be obtained by training A and B separately, which is not impressive. On the contrary, if training A and B together leads to improvement in one of them (or both!), Then everything becomes more exciting.
Gato is trained on a large number of datasets, including the experiences of agents in both simulated and real environments. This shows the number of tasks where the performance of the pretrained Gateaux model exceeds the peer review percentage.
Gato learned from billions of words, real-world images, and simulated environments. The system does not always perform well. For example, when communicating with a person, she often answers superficially or even incorrectly (for example, she calls Marseille the capital of France). By signing photos, Gato incorrectly determines the gender of people. The system correctly stacks blocks with a real robot only 60% of the time.
The pretrained Gato model labels the images.
But DeepMind claims that in 450 of the 604 tasks above, Gato performs better than a human expert, more than half of the time.
“If you think we need shared systems, which is a lot of people in AI and machine learning, then Gato is a big deal,” says Matthew Guzdial, assistant professor of computer science at the University of Alberta. “I think that people who say this is an important step towards AGI are exaggerating it a little, since we still have not reached the level of human intelligence and probably will not come to this soon. However, these generic models definitely have advantages in terms of their performance on tasks beyond their training data.”
Curiously, from an architectural point of view, Gato does not differ much from many modern artificial intelligence systems. It shares characteristics with GPT-3 OpenAI in that it is a "transformer". Starting in 2017, the transformer has become the architecture of choice for complex logic tasks, demonstrating the ability to summarize documents, generate music, classify objects in images, and analyze protein sequences. Perhaps even more remarkable, Gato is several orders of magnitude smaller than individual systems. Gato has only 1.2 billion parameters, while GPT-3 has over 170 billion.
DeepMind researchers deliberately made Gato small so that it can control the robot arm in real time. But they assume that when scaled, the system will be able to handle any “task, behavior, and expression of interest.”
However, there are several hurdles to overcome in order to make Gato better than advanced single-tasking systems in specific applications. Like most Transformer-based systems, Gato’s knowledge of the world is based on training data and remains unchanged. Thus, the system is not capable of continuous learning. Gato also has a limit in its "context window" or the amount of information that the system can "remember" in the context of a given task. Even the best Transformer-based language models cannot write a long essay, much less a book, without forgetting key details and thus losing sight of the plot. Forgetting happens in any task, whether it’s writing a poem or operating a robot, and some experts call it the "Achilles’ heel" of machine learning.
Mike Cook, a member of the Knives & Paintbrushes research team, cautions against suggesting that Gato is the path to truly general-purpose AI: “It sounds exciting that AI can do all of these things…But it’s really not too different from understanding the GPT-3 difference between plain English text and Python code. Gato receives special training data about these tasks, like any other AI of this type, and learns how the patterns in the data relate to each other, including learning to associate certain kinds of inputs with certain outputs. It’s not easy, but it doesn’t mean that the AI can also make a cup of tea or easily learn ten or fifty other tasks. We know that modern approaches to large-scale simulation allow us to study several problems at the same time. I think it’s a good job, but it doesn’t feel like a big stepping stone to me."
How Gato helps mathematicians
In November 2021, DeepMind showed how its artificial intelligence system helps mathematicians in finding information to develop theorems. The collaborative work of researchers and AI has already led to a breakthrough in topology and representation theory conjecture, as well as a proven knot structure theorem.
In December, DeepMind introduced the Player of Games artificial intelligence system, which can play poker, chess, go and other games.
The company said in February that its AlphaCode system writes computer programs "as well as the average programmer."