Publications

I am broadly interersted in both the theoretical limits and emprical applications of reinforcement learning and online learning (e.g., multi-armed bandits), with current emphasis on theoretically-guaranteed algorithmic design in LLMs. Feel free to reach out if you share similar interests!

Preprints

Yuqi Huang, Yunlong Hou and Vincent Y. F. Tan.
TL;DR: Fixed-budget BAI in the Bayesian setting, with the option of abstention at the end of the sampling process.
Yunlong Hou, Zixin Zhong and Vincent Y. F. Tan.
TL;DR: Regret Minimization with a free exploration time period at the front.
Yunlong Hou, Fengzhuo Zhang, Yuan Cheng, Jiachun Pan, Xingyao Li and Zhuoran Yang.
The first Foundations of Deep Generative Models workshop, ICML 2026; 2026 INFORMS Annual Meeting.
TL;DR: We revisit the generalization ability of SFT and RL training from the data perspective.
Yuan Cheng, Fengzhuo Zhang, Yunlong Hou, Cunxiao Du, Chao Du, Tianyu Tang, Aixin Sun and Zhuoran Yang.
The first Foundations of Deep Generative Models workshop, ICML 2026
TL;DR: The slash pattern in attention is caused by RoPE.

Journals

Yunlong Hou, Vincent Y. F. Tan and Zixin Zhong.
IEEE Transactions on Information Theory (IEEE TIT), Volume 69, Issue 4, April 2023, doi: 10.1109/TIT.2022.3222231.
TL;DR: Best Arm Identification with risk constraint (e.g., variance).

Conferences

Yunlong Hou, Fengzhuo Zhang, Cunxiao Du, Xuan Zhang, Jiachun Pan, Tianyu Pang, Chao Du, Vincent Y. F. Tan and Zhuoran Yang.
International Conference on Machine Learning (ICML), 2025
TL;DR: A training-free approach to adaptively select the draft hyperparameter to improve inference efficiency.
Yunlong Hou, Vincent Y. F. Tan and Zixin Zhong.
Conference on Neural Information Processing Systems (NeurIPS), 2024
TL;DR: Best Arm Identification under the piecewise-stationary environment, where the best arm has the best average performance.
Yunlong Hou, Vincent Y. F. Tan and Zixin Zhong.
International Conference on Machine Learning (ICML), 2023
TL;DR: Regret minimization with any-time risk constraint.