InferenceReddit r/LocalLLaMA — 12 d ago

GLM 5.2 on Mac Studio Speedup PR

GLM 5.2 has been optimized for Mac Studio, achieving prefill speeds exceeding 100 tokens per second (t/s) while accommodating larger contexts. This update allows for 4-bit quantization with context sizes over 100,000 tokens, enhancing performance and efficiency for practitioners working with large language models. The improvements are detailed in a pull request by the oMLX creator, indicating significant advancements in model deployment on Mac hardware.

glmmacspeeduprelevance 0.00 · engagement 0.00

Read at source ↗← all news