Research
When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis
The paper introduces a new metric called "fragility" for analyzing LLM pre-training, which measures the activation-noise level at which probe accuracy collapses, providing insights beyond standard linear probing. Unlike accuracy, which saturates early in training, fragility reveals ongoing structural changes in model representations, such as the emergence of moralized representations and a robustness gradient that develops over time. This metric is crucial for practitioners as it highlights the importance of data curation and the evolving nature of representation in LLMs, enabling more nuanced assessments of model performance.
llmpre-trainingaccuracyfragility