ai-digest.dev
last updated 4 h ago
TrainingarXiv cs.AI 7 d ago

Gefen: Optimized Stochastic Optimizer

Gefen is a new memory-efficient optimizer introduced as a drop-in replacement for AdamW, reducing memory usage by approximately 8x while maintaining performance. It achieves this by sharing second-moment estimates across parameter blocks and quantizing the first moment using a learned codebook, resulting in a reduction of 6.5 GiB per billion parameters. This optimization allows for larger microbatches and improved throughput during training, making it particularly beneficial for practitioners working with large models or batch sizes in frameworks like FSDP and DDP.

optimizergefensmemoryrelevance 0.00 · engagement 0.00
Read at source ↗← all news
Gefen: Optimized Stochastic Optimizer — AI News Digest