Publications

I am broadly interersted in both the theoretical limits and emprical applications of reinforcement learning and online learning (e.g., multi-armed bandits), with current emphasis on theoretically-guaranteed algorithmic design in LLMs. Feel free to reach out if you share similar interests!

Preprints

Bayesian Best-Arm Identification with Abstention: A Polynomial-to-Exponential Phase Transition

Yuqi Huang, Yunlong Hou and Vincent Y. F. Tan.

TL;DR: Fixed-budget BAI in the Bayesian setting, with the option of abstention at the end of the sampling process.

On the Benefits of Free Exploration for Regret Minimization in Multi-Armed Bandits

Yunlong Hou, Zixin Zhong and Vincent Y. F. Tan.

TL;DR: Regret Minimization with a free exploration time period at the front.

Rethinking ''RL Generalizes, SFT Memorizes'': The Role of SFT Data

Yunlong Hou, Fengzhuo Zhang, Yuan Cheng, Jiachun Pan, Xingyao Li and Zhuoran Yang.
The first Foundations of Deep Generative Models workshop, ICML 2026; 2026 INFORMS Annual Meeting.

TL;DR: We revisit the generalization ability of SFT and RL training from the data perspective.

Demystifying the Slash Pattern in Attention: The Role of RoPE

Yuan Cheng, Fengzhuo Zhang, Yunlong Hou, Cunxiao Du, Chao Du, Tianyu Tang, Aixin Sun and Zhuoran Yang.
The first Foundations of Deep Generative Models workshop, ICML 2026

TL;DR: The slash pattern in attention is caused by RoPE.

Journals

Almost Optimal Variance-Constrained Best Arm Identification

Yunlong Hou, Vincent Y. F. Tan and Zixin Zhong.
IEEE Transactions on Information Theory (IEEE TIT), Volume 69, Issue 4, April 2023, doi: 10.1109/TIT.2022.3222231.

TL;DR: Best Arm Identification with risk constraint (e.g., variance).

Conferences

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

Yunlong Hou, Fengzhuo Zhang, Cunxiao Du, Xuan Zhang, Jiachun Pan, Tianyu Pang, Chao Du, Vincent Y. F. Tan and Zhuoran Yang.
International Conference on Machine Learning (ICML), 2025

TL;DR: A training-free approach to adaptively select the draft hyperparameter to improve inference efficiency.

Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits

Yunlong Hou, Vincent Y. F. Tan and Zixin Zhong.
Conference on Neural Information Processing Systems (NeurIPS), 2024

TL;DR: Best Arm Identification under the piecewise-stationary environment, where the best arm has the best average performance.

Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits

Yunlong Hou, Vincent Y. F. Tan and Zixin Zhong.
International Conference on Machine Learning (ICML), 2023

TL;DR: Regret minimization with any-time risk constraint.