About me

I am seeking a Ph.D. position for Fall 2026. If you find my background suitable for your research group, please feel free to contact me at longxhe@gmail.com. My research primarily focuses on reinforcement learning, generative model, and optimization theory, currently, I am interested in

GenAI and RL for Robotics: Integrating foundation models and diffusion or flow-based generative models with reinforcement learning to enhance embodied agents.
Theory of Offline RL and Reliable AI Systems: Developing policies that are robust to corrupted data and adversarial conditions.
RL+X: Advancing reinforcement learning by applying it to other domains or leveraging external tools and methodologies (X) to improve RL performance.

Specifically, I am interested in developing practically efficient algorithms with theoretical justification for fundamental machine learning problems. I received my master’s degree from Tsinghua University in June 2025, where I was advised by Prof. Xueqian Wang (王学谦) in the Artificial Intelligence Program at Tsinghua Shenzhen International Graduate School. I also work closely with Prof. Li shen (沈力).

News

2025.09: 🎉 RPEX is accepted by NeurIPS 2025
2024.05: AlignIQL has been preprinted on arXiv
2024.04: DiffCPS has been preprinted on arXiv

Publications

Reinforcement Learning

Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption, NeurIPS 2025. [Code]

Longxiang He, Li Shen, Junbo Tan, Xueqian Wang.

AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization, Preprint 2024. [Code]

Longxiang He, Li Shen, Junbo Tan, Xueqian Wang.

DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning, Preprint 2024. [Code]

Longxiang He, Li Shen, Linrui Zhang, Junbo Tan, Xueqian Wang.

FOSP: Fine-tuning Offline Safe Policy through World Models, ICLR 2025.

Chenyang Cao, Yucheng Xin, Silang Wu, Longxiang He, Zichen Yan, Junbo Tan, Xueqian Wang

Blogs
2022.10: 🎉 Transformer Attention Layer gradient The full derivation of Transformer attention gradient. We also compare the gradient we calculated with PyTorch to prove the correctness.
2022.08: 🎉 CNN Stochastic Gradient Descent The full derivation of CNN gradient.

Teaching

Teaching assistant at Tsinghua University

Machine Learning course instructed by Professor Xuegong Zhang (Winter 2024)