I am currently working on world models for robotics research and taking a gap year to pursue Ph.D. opportunities. If you find my background suitable for your research group, please feel free to contact me at longxhe@gmail.com.

My research primarily focuses on reinforcement learning, generative model, and optimization theory, currently, I am interested in

  1. GenAI for RL: Leveraging diffusion models to enhance RL performance.
  2. Robust RL: Designing policies resilient to corrupted data or adversarial environments.
  3. RL+X: Unlocking the potential of RL in other fields or using X to improve RL.

Specifically, I am interested in developing practically efficient algorithms with theoretical justification for fundamental machine learning problems.

I received my master’s degree from Tsinghua University in June 2025, where I was advised by Prof. Xueqian Wang (王学谦) in the Artificial Intelligence Program at Tsinghua Shenzhen International Graduate School. I also work closely with Prof. Li shen (沈力).

🔥 News

  • 2025.09: 🎉 RPEX is accepted by NeurIPS 2025
  • 2024.05: AlignIQL has been preprinted on arXiv
  • 2024.04: DiffCPS has been preprinted on arXiv

📝 Publications

Reinforcement Learning

NeurIPS 2025
sym

Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
Longxiang He, Deheng Ye, Junbo Tan, Xueqian Wang, Li Shen

TLDR

  • We propose RPEX, an Offline-to-Online method that improves the performance of offline pre- trained RL policies under a wide range of data corruptions.

AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization , Arxiv 2024, Longxiang He, Li Shen, Junbo Tan, Xueqian Wang

  • TLDR: We introduce a new method (AlignIQL) to extract the policy from the IQL-style value function and explain when IQL can utilize weighted regression for policy extraction.

DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning, Arxiv 2024, Longxiang He, Li Shen, Linrui Zhang, Junbo Tan, Xueqian Wang

  • TLDR: DiffCPS integrates diffusion-based policies into Advantage-Weighted Regression (AWR) via a primal-dual framework and offers insights into the advantages of employing diffusion models in offline decision-making, as well as elucidates the relationship between AWR and TD3+BC.

FOSP: Fine-tuning Offline Safe Policy through World Models, ICLR 2025, Chenyang Cao, Yucheng Xin, Silang Wu, Longxiang He, Zichen Yan, Junbo Tan, Xueqian Wang


🦉 Blogs