Amongst the various domains of Artificial Intelligence (AI) research being advanced at the moment, one domain has become critical to the broader success of artificial intelligence, while still in its infancy. It’s called multi-agent reinforcement learning.
Multi-agent reinforcement learning is the study of numerous artificial intelligence agents cohabitating in an environment, often collaborating toward some end goal. When focusing on collaboration, it derives inspiration from other social structures in the animal kingdom. It also draws heavily on game theory.
Multi-agent systems are not just a research method, they can be used to model many complex problems of today’s society, such as urban and air traffic control (Agogino & Tumer, 2012), multi-robot coordination (Ahmadi & Stone, 2006; Claes, Hennes, Tuyls, & Meeussen, 2012), distributed sensing (Mihaylov, Tuyls, & Now ́e, 2014), and energy distribution (Pipattanasomporn, Feroze & Rahman, 2009).
Despite all of these useful applications, there are more critical reasons for the research community to focus on multi-agent reinforcement learning:
- the role of sociability in increasing a species intelligence
- for the safe development of artificial intelligence
Social Life and Brain Size
Sociability - spending time with family, colleagues and strangers - is cognitively demanding. The neurological functions that promote living in societies - empathy, communication, hierarchy comprehension, all demand significant cognitive reserves. It is often taken for granted because humans do it so effortlessly, but the ability to empathise with one’s peer and to comprehend they’re in danger involves staggeringly complex neurological function. To comprehend that your friend is in danger of being hit by a car as they step out onto the road, and to yell to them to warn them, is a truly impressive cognitive collaboration of empathy, causal understanding, communication and other functions. This isn’t just applicable to human society either, when you put aside our own sense of superiority, the neurological complexity of other primates and avian communication systems are admirable and intricate. Many primates and other species warn each other of predatory threats and new food sources, knowing the benefit of community.
.gif)
Similarly, group hierarchies are an intellectually demanding approach to social living that many species adopt. Humans, other primates, and birds have developed group hierarchies to help navigate the rich ecosystem of social living. (Though one should note that not all social species tend toward hierarchies). A hierarchy can best be understood as a ranking system that helps manage unequal access to limited resources. From food and water, to more abstract human concepts such as money and prestige. It helps us in that we don’t have to fight over the food everytime we sit down to eat. It helps us manage systems such as airline seats, where money is utilised to maintain a seating hierarchy between first class, business, premium economy and economy.
This ushers in a key point - social cooperation and hierarchies are extremely challenging, and this is reflected in the brain. The British Anthropologist Robin Dunbar has shown that across various taxa (e.g. ‘birds’, ‘ungulates’ or ‘primates’) the bigger the average size of the social group in the species, the larger the brain relative to total body size. Secondly, the larger the neocortex, relative to total brain size. Dunbar’s influential ‘social brain hypothesis’ posits that increases in social complexity and the evolutionary expansion of the neocortex has been linked.
This link also occurs within a species. Among some primates, group size can vary tenfold (depending on the richness of the ecosystem). This was modeled in a fascinating neuroimaging study, in which captive macaque monkeys were housed in different sized groups. The bigger the group, the more thickening of the prefrontal cortex and the superior temporal gyrus, a cortical region involved in Theory of Mind, the tighter the activity coupling between the two.

This variety of richness in social ecosystems results in multi-agent spectrum in the animal kingdom: tigers are entirely individualistic, only interacting with other tigers to mate or fight (sometimes both), and ants or bees are entirely motivated by the collective, giving their lives for the group. Our own species sits somewhere in the middle -, we have individual motivations and social motivations. In different circumstances these competing motivations take priority.
What does this have to do with Artificial Intelligence?
If we’re working toward advancing artificial intelligence, it reasonable to posit that the same sort of sociability that is so fundamental to our own intelligence will be important to advancing artificial intelligence.
The work Remi AI have been undertaking in this domain has been primarily focused around either artificial intelligence controls in business operations, i.e. inventory management, fleet optimisation etc. and also language development between different agent systems.
There are many major challenges that have been identified throughout our work, though the largest at this early stage is associating reward with actions.
To illustrate the difficulty of this problem, when designing a single agent to play Space Invaders, it is relatively simple to create a reward and punishment system: the reward is associated in the increase in game points and the punishment is associated with the death of the agent's spacecraft.
Whereas with multi-agent systems it can be difficult to clearly associate a reward with a particular action by a particular agent. For example, if you built a transport simulation, with 10 different agents driving a bus apiece, you could trial rewarding the agents collectively for reducing wait time across all passengers. It is incredibly difficult for the agents to map their own actions to a collective reduced wait time, as the agents can struggle to attribute which action lead to the reduction. This problem increases exponentially with the number of agents. When you factor in the city of London has 8000 buses, you begin to get an understanding of the complexity required.
Multi-agent reinforcement learning and safety.
The majority of modern day Artificial Intelligent systems are, in cognitive terms, still fairly naive and unintelligent.
The supervised methods that are making waves such as deep-learning are analogous, in biological terms, to the development of the visual and auditory cortexes - present in the most many of the simplistic organisms that evolved millions of years ago. State of the art reinforcement learning systems are still largely learning through trial and error, (though we’re making advances in imitation and transfer learning which bring our agent’s learning more similar to the way animals learn from their parents and other tasks respectively).
Although it is incredibly difficult to obtain a consensus around scales of animal intelligence, it is widely considered in A.I research circles that we still haven’t devised an A.I that is as generally intelligent as a rat.
But as our agents become more intelligent as our research advances, multi-agent reinforcement learning will become critical, not only for the development of communication, empathy and other fundamental intellectual capabilities, but because it will teach the agents how to behave in groups without harming each other. This is a critical research step for developing safe agents in general.

Can we just program safety into the agents, or does some part of it need to be learned?
Personally, I am not certain.
One of the few areas where there is consensus around definitions of intelligence is; the less an animal or agent relies on instinct and the more it learns from environment, the smarter it is.
Instincts are, of course, incredibly useful. Just think of the baby turtle who automatically makes a dash for the water after they’ve exited their shell. Without this instinct they’d sit on the beach and watch their fellow newborns be eaten before they learned to rush to the safety of the water.
But instincts, like regulation, can be dumb and restrictive. It can also save lives. Finding the balance between the two is critical.
Humans are regularly the best example of this - our intelligence permits us to even override our instincts, if we allow pause for thought. One of the reasons the human infancy period is so much longer than other species is because our cognition is less instinctive and more learned.
If we’re working toward human-level artificial intelligence, it’s reasonable to posit that along this research path our A.I agents will need to have lengthy learning periods, just like children. With this, they’ll need to learn how to behave.
In the long-run, I don’t see these lengthy learning periods will be a continued requirement. Unlike humans, we’ll be able to transfer most learned behaviours and knowledge between A.I generations. That being said, the real world is a messy, ever changing place and we cannot assume that we won’t have to occasionally update an A.I agents behaviour to encompass new developments.
Finally, it is the opinion of the author that the span between the capabilities of state-of-the-art Multi-agent reinforcement learning and human social capabilities underscores just how far we have to go as a research discipline to achieve advanced artificial intelligence. I also consider it one of the most important research domains, with its relevance to language development, empathy, imagination, ethics and other qualities we associate with being human. Some of human’s most impressive feats, from the Eiffel Tower to the invention of the Internet, have emerged through a collective effort.
If we’re to develop safe, advanced A.I, making progress in multi-agent reinforcement learning will be a critical step.