MLX-LoRA-Studio

DoRA (Weight-Decomposed Low-Rank Adaptation)

Adaptation method · train_type: dora

LoRA plus a magnitude–direction decomposition: the combined weight is split into a unit-norm direction and a learnable magnitude, tuned independently. Matches full fine-tuning more closely on instruction-following.

Overview

DoRA keeps the same low-rank LoRA update but factorises the combined weight into a magnitude vector and a direction matrix so the two can be tuned independently. It costs roughly the same memory and time as LoRA; the only downside is a slightly larger adapter file.

Intuition

Objective (math)

Let W₀ ∈ R^{out×in} be the frozen base weight.

W'        =  W₀  +  ( α / r )  ·  B · A
V         =  W'                              (frozen after each step)
m         =  ‖W₀‖_c                          (per-column magnitude, learnable)
W_dora    =  m  ·  V  /  ‖V‖
y         =  W_dora · x

The unit-norm rescaling on V is what makes DoRA different from LoRA plus a magnitude multiplier.

What the settings change

DoRA shares the LoRA settings table (rank, scale, dropout, num_layers, fuse, resume_adapter_file); set train_type: dora to select the DoRA wrapper.

When to use it

DoRA is worth trying when LoRA plateaus on a metric that tracks style or format compliance (DoRA is reported to match full fine-tuning more closely on instruction-following). It costs roughly the same memory and time as LoRA; the only downside is a slightly larger adapter file.

In the app

On the Train tab → Fine-tune section: pick DORA in the segmented Fine-tune picker (LORA / DORA / FULL) → train_type: dora. The same LoRA Settings controls apply to DoRA — Layers (num_layers), Rank, Scale, Dropout — alongside the Quantization picker. fuse is a toggle in Training Settings.

Tips & gotchas

References

See also