InferenceReddit r/LocalLLaMA — 13 d ago

Top-N-Sigma: Remove unconditional softmax+sort by TimNN · Pull Request #22645 · ggml-org/llama.cpp

The Pull Request #22645 introduces a modification to the Top-N-Sigma sampler in the ggml-org/llama.cpp repository, eliminating the unconditional softmax and sort operations that were previously performed at the end of the sampling process. This change resulted in a performance improvement, increasing throughput from approximately 30 tokens per second (t/s) to 45 t/s on a MacBook Pro M3 Max, thereby reducing the time per token by 10 milliseconds. This enhancement is significant for practitioners as it optimizes the sampling process, particularly when Top-N-Sigma is used in conjunction with other samplers, potentially leading to more efficient model inference.

Top-N-Sigmasamplingoptimizationrelevance 0.00 · engagement 0.00

Read at source ↗← all news