Products
My GLM-5.2-FP8 HGX-H200 SGLang docker deploy config
The user shared a Docker deployment configuration for the GLM-5.2 model optimized for the HGX-H200 architecture, achieving a context size of 262k and a throughput of 70 tokens per second. Key parameters include setting shared memory size to 32GB and using Tensor Parallelism (TP) with a static memory fraction of 0.83 to avoid out-of-memory errors. This configuration provides insights for practitioners on maximizing performance with specific hardware setups, particularly in managing memory and throughput in large language model deployments.
glm-5.2dockersglang