FOGO: Forgetting-aware Orthogonalization Optimizer
The paper introduces FOGO (Forgetting-aware Orthogonalization Optimizer), a novel optimizer designed to address both short-term and long-term forgetting during training by orthogonalizing momentum updates and utilizing a compact codebook memory for storing past gradient directions. FOGO employs lightweight orthogonal corrections to resolve conflicts between current and stored updates, demonstrating improved convergence and knowledge retention across various tasks, including class-imbalanced classification and continual learning scenarios with models like LLaVA-7B and GPT-2. This approach is significant for practitioners as it enhances optimization efficiency and model performance in continual learning contexts, potentially mitigating the forgetting problem prevalent in standard training regimes.