Publications

I am broadly interersted in both the theoretical limits and emprical applications of reinforcement learning and online learning (e.g., multi-armed bandits), with current emphasis on theoretically-guaranteed algorithmic design in Large Language Models. Feel free to reach out if you share similar interests!

Preprints

Demystifying the Slash Pattern in Attention: The Role of RoPE

Yuan Cheng, Fengzhuo Zhang, Yunlong Hou, Cunxiao Du, Chao Du, Tianyu Tang, Aixin Sun, Zhuoran Yang.

TL;DR: The slash pattern in attention is caused by RoPE.

Journals

Almost Optimal Variance-Constrained Best Arm Identification

Yunlong Hou, Vincent Y. F. Tan and Zixin Zhong.
IEEE Transactions on Information Theory (IEEE TIT), Volume 69, Issue 4, April 2023, doi: 10.1109/TIT.2022.3222231.

TL;DR: Best Arm Identification with risk constraint (e.g., variance).

Conferences

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

Yunlong Hou, Fengzhuo Zhang, Cunxiao Du, Xuan Zhang, Jiachun Pan, Tianyu Pang, Chao Du, Vincent Y. F. Tan, and Zhuoran Yang.
International Conference on Machine Learning (ICML), 2025

TL;DR: A training-free approach to adaptively select the draft hyperparameter to improve inference efficiency.

Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits

Yunlong Hou, Vincent Y. F. Tan and Zixin Zhong.
Conference on Neural Information Processing Systems (NeurIPS), 2024

TL;DR: Best Arm Identification under the piecewise-stationary environment, where the best arm has the best average performance.

Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits

Yunlong Hou, Vincent Y. F. Tan and Zixin Zhong.
International Conference on Machine Learning (ICML), 2023

TL;DR: Regret minimization with any-time risk constraint.