Spectral gradient descent (SpecGD) orthogonalizes matrix parameter updates and has inspired practical optimizers such as Muon. They often perform well in large language model training, but their dynamics remain poorly understood, especially in factorized parameterizations where the product matrix does not receive orthogonalized updates. We study such dynamics through matrix factorization (MF), where the orthogonalization is applied separately to the factor updates. We analyze spectral gradient flow (SpecGF)—a continuous-time analog of SpecGD—in the low-rank MF setting and prove “equal-rate” dynamics: all singular values grow at equal rates up to small deviations. Consequently, smaller singular values attain their target values earlier than larger ones, contrasting with the largest-first stepwise learning observed in standard gradient flow. Moreover, SpecGF converges to global minima from almost all initializations, provided the factor norms remain bounded; with  regularization, we obtain global convergence. Empirically, we observe that LoRA fine-tuning with orthogonalization-based optimizers including Muon exhibit near-uniform growth in the product of LoRA adapters, consistent with the mechanism predicted by our MF analysis.

Theoretical
ICML 2026 Workshop