Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some issues about logo #2

Open
Marioooooooooooooo opened this issue Oct 18, 2023 · 0 comments
Open

some issues about logo #2

Marioooooooooooooo opened this issue Oct 18, 2023 · 0 comments

Comments

@Marioooooooooooooo
Copy link

Dear Desik Rengarajan,

Recently, I learned the LOGO algorithm you published in ICLR2022, and the method of guiding policy learning with demonstration data has achieved remarkable results in mujoco with theoretical assurance. It is a good job! I have some doubts about the collection method of behavioral data and the theoretical derivation, and hope to get your answers.

Following your instructions, I collected behavior data as follows:
Take Hopper-v2 as an example:
(1) The default iteration numbers is 1500. I trained TRPO in dense reward settings for 1000 iterations (namely, the training and test envs are all dense reward settings).
(2) Using the TRPO model trained in 1000 iterations to collecte about 10 eposides (about 3000 rows of data) as the behavioral data.

Except that the behavior data is different from the default data in LOGO code, the other training settings are consistent with the LOGO code. But my training results are very poor (as the below picture). I think it should be a problem of behavioral data collection. Can you elaborate further on your approach to constructing behavioral data, and can you open source the corresponding parts of your behavioral data collection program?
image

question 2:
image

question 3:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant