PowerUpSpeculativeDecodingInReinforcementLearning

文档摘要

Power Up Speculative Decoding In Reinforcement Learning TL;DR We introduce speculative decoding into the RL sampling process, achieving a significant improvement in sampling speed under appropriate batch sizes. Furthermore, the draft model is also updated during the training process.