SpeculativeDecoding


文档摘要

Speculative Decoding Speculative decoding is an optimization technique that accelerates inference. While traditional decoding is bottlenecked by memory access when generating one token at a time, speculative decoding leverages a smaller "draft" model to predict next K tokens in one round, achieving up to K times speedup with similar memory usage.


发布者: 作者: 转发
评论区 (0)
U