This project investigates the Sim-to-Real Transfer problem in robotics, focusing on Reinforcement Learning (RL) for the Hopper-v0 environment. The key challenge addressed is the reality gap, where policies trained in simulation often fail to perform optimally in the real world due to discrepancies in dynamics.
We explore Proximal Policy Optimization (PPO) as the RL algorithm and employ Domain Randomization (DR) to improve the generalization of policies trained in a simulated environment with altered dynamics. Our results demonstrate the impact of various randomization strategies on transfer performance.
- Proximal Policy Optimization (PPO) for policy training.
- Reality gap modeling by modifying torso mass in simulation.
- Domain Randomization (DR) to enhance robustness.
- Uniform Domain Randomization (UDR): Varying link masses using a uniform distribution.
- Gaussian Domain Randomization (GDR): Sampling link masses from a Gaussian distribution.
- Comparative analysis of randomization strategies to identify optimal transfer techniques.
- Extended study on Walker2D-v4 to evaluate scalability to more complex robots.
This project utilizes OpenAI Gym environments, particularly Hopper-v0 and Walker2D-v4.
- Environment Creation: Instantiate either the source (modified torso mass) or target (default torso mass) Hopper environment.
- PPO Training: Train policies in the source environment with and without domain randomization.
- Evaluation:
- Evaluate trained policies in both source and target environments.
- Compare source-to-source (S2S) and source-to-target (S2T) performances.
- Extended Analysis:
- Tune individual mass parameters to analyze their effect on transferability.
- Compare uniform vs. Gaussian domain randomization.
- Baseline PPO training shows a significant performance drop when transferring from the source (lower torso mass) to the target (correct torso mass).
- UDR improves transfer performance, but the level of improvement depends on the randomization range.
- GDR provides better transfer results than UDR, as it samples more frequently near nominal values while still exposing the agent to variations.
- Selective mass tuning enhances transferability, especially for the thigh mass, which plays a critical role in stability.
- Walker2D-v4 tests confirm that domain randomization is beneficial but requires careful tuning for complex robots.
- Implement adaptive domain randomization that dynamically adjusts sampling ranges.
- Explore correlated randomization for physically interdependent parameters.
- Extend experiments to real-world robots for validation beyond simulation.
This project was developed for a robot learning course at Politecnico di Torino. Special thanks to instructors and peers for their support.