Machine Learning Research and Engineering(Home)
- Reinforcement Learning, Large Language Models, Large Scale Machine Learning and Deep Learning
- Recently working on LLM + RL (e.g. Reasoning, RLHF, Virtual Assistant Agent) and applications.
- New Paper on RL + Reasoning: Training Large Language Models to Reason via EM Policy Gradient
- Invited Reviewer: NIPS 2018/2019/2020/2023/2025, ICML 2019/2020/2021/2024/2025, UAI 2019-2025
-
Tianbing Xu, Qiang Liu, Robust Policy Gradient (Working in Progress)
-
Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng. Learning to Explore via Meta-Policy Gradient (ICML 2018, Talk, from 49 min, Video1, Video2)
-
Tianbing Xu, Qiang Liu, Jian Peng. Stochastic Variance Reduction for Policy Gradient Estimation (arXiv, GitHub, Ant, Walker, Swimmer, Hopper, Cheetah)
-
Tianbing Xu, Variational Inference for Policy Gradient (arXiv)
-
Notes on RL Paper Reading(notes)
Efficient LR Machine Learning End-to-End Distributed Training on Spark. (GitHub)
- Tianbing Xu, Jianfeng Gao, Lin Xiao, Amelia C. Regan, Online Classification Using a Voted RDA Method, 28th AAAI Conference on Artificial Intelligence, 2014 (AAAI 14, Oral)
- Xinran He, Junfeng Pan, Ou Jin,Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, Joaquin Quionero Candela. Practical Lessons from Predicting Clicks on Ads at Facebook, The 8th International Workshop on Data Mining for Online Advertising, co-located with KDD’2014 (ADKDD@KDD 14)
- Tianbing Xu, Zhongfei Zhang, Philip Yu, and Bo Long. Generative Models for Evolutionary Clustering, ACM Transactions on Knowledge Discovery from Data (TKDD 12).
- Tianbing Xu, Alex Ihler. Multicore Gibbs Sampling for Unstructured, Dense Graphs, The fourteenth international conference on Artificial Intelligence and Statistics (AISTATS 2011)