Training
AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding
AdaPLD introduces a training-free method for adaptive retrieval and draft construction in speculative decoding, enhancing both precision and recall by combining lexical reuse with semantic similarity. It constructs branched hypotheses to address continuation uncertainty, leading to a reduction in target-model forward passes and achieving up to 3.10× speedup in decoding across various benchmarks. This advancement is significant for practitioners as it allows for more efficient generation processes in model-free settings, improving overall performance in real-time applications.
speculative decodingreuseAdaPLD