This project aims to develop a framework for 3D lidar-based Multiagent Exploration Planning with Dreamer-v3 RL model. The goal is to obtain scalable and robust centralised planner that will generate waypoints for all agents to perform optimal exploration of the 3d environment.
By stating the multiagent exploration in terms of optimization problem where one has to optimize the distance/time required for full exploration of the unknown 3d environment, we can harness the power of reinforcement learning methods in learning the optimal policy. Specifically we use Dreamer-v3 model developed by Google DeepMind to learn optimal waypoint generation using efficient lidar based simulation of our design.
Gif shows how 3 agents learn to efficiently explore 3d environment (although it looks like the environment is a 2d one, in reality the dimension of environment is 64x64x2 voxels. We reduced the z dimmension in order to efficiently iterate framework developement but it's fully adapted to handle varying size 3d environments)
Current version of the framework utilizes Dreamer-v3,[1] (we use PyTorch implementation) as our policy model. We train the model in the customised gym environment that runs the simulation of the multiagent exploration process, where our policy model generates waypoints for each agent as an action. During the rollout of the action all of the agents perform navigation to the generated waypoints using classical algorithms like A*, where exploration of new voxels is being rendered by the bresenhams algorithm creating a lightweight simulation environment.
Our observation space is the 3d voxel space around our agents stacked together as a one 3D tensor of uint8 values. We also observe states of all agents like their current positions. We modified Dreamer-v3 encoder module to handle 3D inputs efficiently by implementing 3D Sparse Convolution[2]. Vector inputs are handled by the mlp encoder. We can also extend the framwork to allow for multimodal exploration using combination of camera and lidar by using camera inputs through the original dreamer's cnn.
Install dreamer-v3 dependencies:
pip install -r dreamerv3-torch/requirements.txt
Install remaining dependencies:
pip install -r requirements.txt
- First prepare maps dataset by running:
python generate_training_maps.py
--num_maps 1
--map_size 64
--map_scale 1
--output_dir maps
- To start training run:
source train.sh path_to_your_logdir
You can resume training by adding resume flag:
source train.sh logdir_path --resume
- Monitor results:
tensorboard --logdir ./logdir
Example evaluation return from tensorboard:
- To evaluate the model you can run:
python benchmark.py
--model_path path_to_your_model
--exploration_percentage 0.95
This script will evaluate model and compare performance with classical benchmark
- exploration_percentage: when to finish exploration
- To perform qualitative evaluation you can run:
python demonstration.py
--model_path path_to_your_model
This script will prepare visualisation of the exploration in the form of a gif (like on Figure 1)