SpeculativeDecoding

文档摘要

Speculative Decoding Speculative decoding is an optimization technique that accelerates inference. While traditional decoding is bottlenecked by memory access when generating one token at a time, speculative decoding leverages a smaller "draft" model to predict next K tokens in one round, achieving up to K times speedup with similar memory usage.