Our datasets can be downloaded from THIS LINK. This link provides multi-behavior datasets with labels, which serve as the ground truth for evaluating clustering results. Additionally, it includes the policies trained using stable-baselines3 for generating the multi-behavior datasets.
It should be noted that all datasets include observations, actions, rewards, terminals, and labels, making them suitable for training policies as well.
The locomotion datasets and robotic hand manipulation datasets are created by us, while the trifinger datasets are created based on an open-source project.
Our project can be installed by cloning the repository as follows:
https://github.com/wq13552463699/Behaviour-aware-clustering-for-offline-policy-learning.git
Or you can download this GitHub project and unzip it locally.
Then you can install the required libraries by running:
pip install -r requirements.txt
You can run the experiments by performing:
python main.py --exp-name <name> --raw-dataset-path <local path of multi-behaviour dataset> --save-path <local path>
This command includes only a subset of the hyperparameters required to execute the experiments. You can find the remaining hyperparameters in the main.py file.
After the clustering process terminates as convergence, a file estimated_traj_labels.pkl will be created in the specified save path. This file contains the clustering results as discrete labels, which can then be compared with the ground truth labels for evaluation.
Our pretrained results can be accessed by following THIS LINK, which contains the tuned hyperparameters, the clustering results, and the trained neural network models.