Uniform Spectral Growth under Factor-wise Muon Orthogonalization in Matrix Factorization and LoRA

Changmin Kang*, Jihun Yun*, Baekrok Shin, Yeseul Cho, Chulhee Yun

Abstract

Spectral gradient descent (SpecGD) orthogonalizes matrix parameter updates and has inspired practical optimizers such as Muon. They often perform well in large language model training, but their dynamics remain poorly understood, especially in factorized parameterizations where the product matrix does not receive orthogonalized updates. We study such dynamics through matrix factorization (MF), where the orthogonalization is applied separately to the factor updates. We analyze spectral gradient flow (SpecGF)—a continuous-time analog of SpecGD—in the low-rank MF setting and prove “equal-rate” dynamics: all singular values grow at equal rates up to small deviations. Consequently, smaller singular values attain their target values earlier than larger ones, contrasting with the largest-first stepwise learning observed in standard gradient flow. Moreover, SpecGF converges to global minima from almost all initializations, provided the factor norms remain bounded; with regularization, we obtain global convergence. Empirically, we observe that LoRA fine-tuning with orthogonalization-based optimizers including Muon exhibit near-uniform growth in the product of LoRA adapters, consistent with the mechanism predicted by our MF analysis.

Theoretical

ICML 2026 Workshop

Uniform Spectral Growth under Factor-wise Muon Orthogonalization in Matrix Factorization and LoRA

Download Paper

Github Code

Recent Posts