An agent’s policy is represented by a deep neural network. An observation of the environment is given as input to the network. An action is sampled from the network’s output, and the agent receives a reward and the subsequent observation. The agent learns to select actions that maximizes cumulative future rewards. In this study, each agent learned its policy network independently, that is, each agent treats the other agents as part of the environment. This illustration shows a case with three predators.Collaborative hunting in artificial agents with deep reinforcement learning
accepted by XXXX 2022 as an article
Kazushi Tsutsui, Ryoya Tanaka, Kazuya Takeda, Keisuke Fujii
The videos are examples of predator(s) (dark blue, blue, and light blue) and prey (red) interactions in each experimental condition. The experimental condition was set as the number of predators (one, two, and three), relative mobility (fast, equal, and slow), and reward sharing (individual and shared), based on ecological findings.
- This repository was tested with python 3.6 and 3.7
- To set up the environment, please run the following command:
pip install -r requirements.txt
-
To run the train code, please move on the deirectory corresponding to the number of predators (
c1ae
=one-predator,c2ae
=two-predator, orc3ae
=three-predator). -
Then, please run the python file specifying the predators' movement speed (
3.6
=fast,3.0
=equal, or2.4
=slow) and whether the reward is shared (indiv
=individual orshare
=shared), as follows:
python c2ae.py 2.4 share
-
The output files (network weights) are in the
model
directory.
- The data and models are available in the following figshare repository. These data and models can be used to replicate the figures in the article in the
notebooks
directory.
https://doi.org/10.6084/m9.figshare.21184069.v3
Kazushi Tsutsui (@TsutsuiKazushi)
E-mail: k.tsutsui6<at>gmail.com