Skip to content

Sim to real transfer final project for the Robot Learning course

Notifications You must be signed in to change notification settings

alexscavo/Sim-to-Real-Transfer-Project-RL

Repository files navigation

Sim-to-Real Transfer for Hopper Control via Reinforcement Learning and Domain Randomization

Overview

This project investigates the Sim-to-Real Transfer problem in robotics, focusing on Reinforcement Learning (RL) for the Hopper-v0 environment. The key challenge addressed is the reality gap, where policies trained in simulation often fail to perform optimally in the real world due to discrepancies in dynamics.

We explore Proximal Policy Optimization (PPO) as the RL algorithm and employ Domain Randomization (DR) to improve the generalization of policies trained in a simulated environment with altered dynamics. Our results demonstrate the impact of various randomization strategies on transfer performance.

Key Features

  • Proximal Policy Optimization (PPO) for policy training.
  • Reality gap modeling by modifying torso mass in simulation.
  • Domain Randomization (DR) to enhance robustness.
    • Uniform Domain Randomization (UDR): Varying link masses using a uniform distribution.
    • Gaussian Domain Randomization (GDR): Sampling link masses from a Gaussian distribution.
  • Comparative analysis of randomization strategies to identify optimal transfer techniques.
  • Extended study on Walker2D-v4 to evaluate scalability to more complex robots.

Environment Setup

This project utilizes OpenAI Gym environments, particularly Hopper-v0 and Walker2D-v4.

Training Pipeline

  1. Environment Creation: Instantiate either the source (modified torso mass) or target (default torso mass) Hopper environment.
  2. PPO Training: Train policies in the source environment with and without domain randomization.
  3. Evaluation:
    • Evaluate trained policies in both source and target environments.
    • Compare source-to-source (S2S) and source-to-target (S2T) performances.
  4. Extended Analysis:
    • Tune individual mass parameters to analyze their effect on transferability.
    • Compare uniform vs. Gaussian domain randomization.

Results Summary

  • Baseline PPO training shows a significant performance drop when transferring from the source (lower torso mass) to the target (correct torso mass).
  • UDR improves transfer performance, but the level of improvement depends on the randomization range.
  • GDR provides better transfer results than UDR, as it samples more frequently near nominal values while still exposing the agent to variations.
  • Selective mass tuning enhances transferability, especially for the thigh mass, which plays a critical role in stability.
  • Walker2D-v4 tests confirm that domain randomization is beneficial but requires careful tuning for complex robots.

Future Work

  • Implement adaptive domain randomization that dynamically adjusts sampling ranges.
  • Explore correlated randomization for physically interdependent parameters.
  • Extend experiments to real-world robots for validation beyond simulation.

Acknowledgments

This project was developed for a robot learning course at Politecnico di Torino. Special thanks to instructors and peers for their support.

About

Sim to real transfer final project for the Robot Learning course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published