Yanxiao Zhao, Yange Qian, Tianyi Wang, Jingyang Shan, Xiaolin Qin
Learning curves sample efficiency comparison of TD3, SnapshotRL+TD3, and S3RL+TD3 on six MuJoCo environments. For details see W&B Report.
Learning curves sample efficiency comparison of SAC, SnapshotRL+SAC, and S3RL+SAC on six MuJoCo environments. For details see W&B Report.
Learning curves sample efficiency comparison of PPO, SnapshotRL+PPO, and S3RL+PPO on six MuJoCo environments. For details see W&B Report.
Ablation study results showing the impact of key components on the sample efficiency of S3RL+TD3 on six MuJoCo environments. For details see W&B Report.
@article{zhao2024snapshot,
title={Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency},
author={Zhao, Yanxiao and Qian, Yangge and Wang, Tianyi and Shan, Jingyang and Qin, Xiaolin},
journal={arXiv preprint arXiv:2403.00673},
year={2024}
}