ProductsReddit r/LocalLLaMA — 10 d ago

My GLM-5.2-FP8 HGX-H200 SGLang docker deploy config

The user shared a Docker deployment configuration for the GLM-5.2 model optimized for the HGX-H200 architecture, achieving a context size of 262k and a throughput of 70 tokens per second. Key parameters include setting shared memory size to 32GB and using Tensor Parallelism (TP) with a static memory fraction of 0.83 to avoid out-of-memory errors. This configuration provides insights for practitioners on maximizing performance with specific hardware setups, particularly in managing memory and throughput in large language model deployments.

glm-5.2dockersglangrelevance 0.00 · engagement 0.00

Read at source ↗← all news