Research
Characterizing Software Aging in GPU-Based LLM Serving Systems
This paper presents a novel methodology to investigate software aging in GPU-based large language model (LLM) serving systems, contrasting with traditional CPU-centric approaches. Through a 216-hour empirical study across six deployments, the authors observed significant memory aging, with leak rates varying based on serving runtime and deployment configurations. This research establishes a reproducible framework for further exploration of software aging and rejuvenation in the context of LLM serving, highlighting the need for practitioners to consider these dynamics in system design and maintenance.
software agingLLM servingmethodology