TrainingReddit r/LocalLLaMA — 12 d ago

Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node.

The discussion revolves around the feasibility of running the 467GB Unsloth 4-bit GLM 5.2 model on a cluster of four Dell C6525 servers, each equipped with dual AMD EPYC 7702 processors and 512GB of DDR4 RAM, totaling 2TB of RAM and achieving a memory bandwidth of 409.6 GB/s per node. The user is exploring options for either maximizing token processing speed or accommodating larger model sizes by clustering the servers, despite the absence of GPUs. This scenario is significant for practitioners as it highlights the potential for efficient large model deployment using CPU-only architectures and the importance of memory bandwidth in handling substantial model sizes.

hardwaremodelclusterrelevance 0.00 · engagement 0.00

Read at source ↗← all news