Products
The Eagle(3) has landed (for Qwen)
The latest release of the llama.cpp framework includes support for the Eagle(3) speculative decoding method, enabled via the `--spec-type draft-eagle3` flag, which requires a draft model. Users can test this with the Qwen 3.6 model (27B parameters) and the corresponding draft model; however, tensor parallelism is currently unsupported, which may impact performance and VRAM usage. This release is significant for practitioners as it introduces a new decoding strategy that could enhance inference efficiency, although it comes with certain limitations in resource management.
qweneaglerelease