PowerUpSpeculativeDecodingInReinforcementLearning


文档摘要

Power Up Speculative Decoding In Reinforcement Learning TL;DR We introduce speculative decoding into the RL sampling process, achieving a significant improvement in sampling speed under appropriate batch sizes. Furthermore, the draft model is also updated during the training process.


发布者: 作者: 转发
评论区 (0)
U