Sign In

Spectral Basis Adapter

0

Spectral Basis Adapter

Spectral Basis Adapter (SBA)

A Dynamic, Conditional Alternative to LoRA for Large Diffusion Models


Links


Abstract

In this work, I introduce the Spectral Basis Adapter (SBA) — a novel parameter-efficient adaptation mechanism designed and implemented by me as an alternative to static LoRA adapters. SBA replaces fixed low-rank updates with a dynamic, context-conditioned mixture of orthogonal spectral bases, enabling significantly higher expressiveness while preserving training efficiency.

This document presents the full motivation, mathematical formulation, architectural design, and practical integration of SBA into large diffusion models such as Stable Diffusion XL (SDXL).


1. Motivation

1.1 The Limitation of Static Adapters

LoRA models the weight update as:

ΔW = B · A

Where:

  • A ∈ ℝ^{r×d}

  • B ∈ ℝ^{d×r}

  • r ≪ d

Once trained, ΔW is fixed. This implies:

  • Same adaptation at all diffusion timesteps

  • Same behavior for all prompts

  • No context-dependent modulation

This is fundamentally misaligned with diffusion models, where:

  • Early timesteps require global structure

  • Late timesteps require fine detail

  • Text semantics vary drastically across prompts


2. Core Idea of SBA

SBA generalizes LoRA by introducing a spectral basis bank and a conditional gating mechanism.

Instead of a single low-rank update, SBA learns:

  • A shared low-rank projection space

  • Multiple orthogonal transformation bases in that space

  • A context-dependent mixture over those bases


3. Mathematical Formulation

In this section, I present a rigorous mathematical formulation of the Spectral Basis Adapter (SBA), highlighting how it generalizes LoRA while preserving low-rank efficiency.


3.1 Baseline: Linear Transformation

Let a pretrained linear layer be defined as:

W₀ ∈ ℝ^{d_out × d_in}, y₀ = W₀ x

The base weights W₀ are frozen during SBA training.


3.2 LoRA Revisited

LoRA introduces a low-rank update:

ΔW = B A, with rank r ≪ min(d_in, d_out)

Resulting in:

y = W₀ x + B (A x)

This update is static and independent of timestep or conditioning.


3.3 SBA Parameterization

SBA replaces the static low-rank update with a conditional operator.

I define:

  • A ∈ ℝ^{r × d_in} (down-projection)

  • B ∈ ℝ^{d_out × r} (up-projection)

  • A spectral basis bank {S₁, …, S_K}, where S_k ∈ ℝ^{r × r}

The low-rank latent representation is:

h = A x


3.4 Conditioning Variables

Let:

  • t ∈ ℝ^{d_t} be the diffusion timestep embedding

  • c ∈ ℝ^{d_c} be the conditioning (text) embedding

I concatenate them to form the conditioning vector:

z = [t ∥ c] ∈ ℝ^{d_t + d_c}


3.5 Gating Function

A lightweight gating function G determines the contribution of each spectral basis:

α = softmax(G(z)) ∈ ℝ^{K}

Where G is implemented as a single linear layer:

G(z) = W_g z + b_g

This design ensures:

  • Minimal parameter overhead

  • Stable mixed-precision behavior

  • No per-layer MLP explosion


3.6 Spectral Mixing Operator

The effective mixing matrix is computed as:

M(t, c) = Σ_{k=1}^{K} α_k S_k

Key property:

  • Each S_k is initialized as an orthogonal matrix

  • M(t, c) is a convex combination in spectral space


3.7 Nonlinear Spectral Transformation

The latent representation is transformed as:

h' = σ(M(t, c) · h)

Where σ(·) is the SiLU activation.

This introduces nonlinearity inside the low-rank subspace, which LoRA lacks.


3.8 Output Projection

The final SBA contribution is:

Δy = B h'

And the full layer output becomes:

y = W₀ x + Δy


3.9 Gradient Flow Properties

During backpropagation:

  • Gradients flow through A, B, S_k, and G

  • W₀ remains frozen

  • The conditioning gate G learns when and how to adapt

This decouples representation capacity from adaptation dynamics.


4. Architectural Components

This section details the architectural design choices I made to ensure SBA is expressive, stable, and memory-efficient.


4.1 Low-Rank Projection Space

The matrices A and B define a shared low-dimensional subspace. All spectral transformations operate inside this space, which:

  • Bounds computational cost by O(r²)

  • Enables expressive transformations via basis mixing

  • Preserves LoRA-level parameter efficiency


4.2 Orthogonal Spectral Bases

Each spectral basis S_k satisfies:

S_kᵀ S_k ≈ I

Initialization strategy:

  • S₁ = I (identity)

  • S₂…S_K initialized via QR decomposition

This guarantees:

  • Stable initialization

  • Identity-preserving behavior at early training

  • Smooth interpolation between bases


4.3 Conditioning Gate Design

I intentionally avoid deep MLP gates. Instead:

G(z) = Linear(z)

This ensures:

  • O(K·(d_t+d_c)) parameters per layer

  • Linear memory growth

  • Fast convergence

Empirically, this prevents optimizer-state VRAM blow-up observed with MLP-based gates.


4.4 Residual Formulation

SBA is always applied as a residual:

y = W₀ x + Δy

This preserves:

  • Pretrained model behavior

  • Training stability

  • Compatibility with existing checkpoints


4.5 Shared Low-Rank Projections

  • A ∈ ℝ^{r×d}

  • B ∈ ℝ^{d×r}

These are equivalent to LoRA projections but shared across all bases.


4.6 Spectral Basis Bank

A set of K matrices:

{B₁, B₂, ..., B_K}, Bᵢ ∈ ℝ^{r×r}

Initialization:

  • B₁ = Identity matrix

  • B₂…B_K = Random orthogonal matrices (QR decomposition)

This ensures:

  • Stability at initialization

  • Expressive rotational subspaces


4.7 Conditioning Gate

A lightweight linear gate:

G(t, c) = Linear([t || c]) → ℝ^{K}

Followed by softmax to produce mixture coefficients.

Design goals:

  • Extremely low parameter overhead

  • No per-layer MLP explosion

  • Mixed-precision safe


5. SBA vs LoRA

Property LoRA SBA

Rank Fixed Fixed

Weights Static Dynamic

Conditioning None Timestep + Text

Expressivity Linear Spectral / Nonlinear

Parameter Efficiency High High

Temporal Adaptation

Prompt Sensitivity


6. Injection into Diffusion Transformers

SBA is injected by wrapping existing Linear layers without modifying the base model weights.

6.1 Injection Targets

In SDXL UNet:

  • Attention Q, K, V projections

  • Attention output projections

  • Transformer input/output projections

Optional:

  • FFN layers

  • ResNet time projections


6.2 Global Context Passing

SBA uses a lightweight global context:

  • t_emb from timestep embedding

  • c_emb from pooled text embedding

These are captured once per forward pass and reused across all SBA layers.

This avoids:

  • Recomputing embeddings per layer

  • Breaking Diffusers forward signatures


7. Training Characteristics

7.1 Parameter Count

Typical SDXL setup:

  • Rank = 4–8

  • Bases = 4

Results in:

  • 5–12M trainable parameters

  • 0 base UNet parameters updated


7.2 Memory Behavior

Key optimizations:

  • Frozen base weights

  • Linear gate instead of MLP

  • Gradient checkpointing

  • Mixed precision

Result:

  • Fits in ~10GB VRAM for SDXL


7.3 Optimization

Two learning rates:

  • Gate parameters (higher LR)

  • Projection + basis parameters (lower LR)

This stabilizes early training and prevents mode collapse.


8. Why SBA Works

SBA succeeds because it:

  • Decouples capacity from adaptation dynamics

  • Uses orthogonal bases to preserve information flow

  • Learns how to adapt, not just what to adapt

  • Matches the non-stationary nature of diffusion

In effect, SBA turns each Linear layer into a conditional operator, not a static weight matrix.


9. Extensions and Future Work

Potential directions:

  • Basis sparsity regularization

  • Training on an image dataset

  • Frequency-aware timestep gating

  • Cross-layer shared basis banks

  • Rank-adaptive SBA (research)

  • SBA for LLM attention blocks


10. Conclusion

In this work, I designed and implemented the Spectral Basis Adapter (SBA) as a principled, efficient, and expressive alternative to LoRA.

By introducing context-conditioned spectral transformations inside low-rank adapters, I demonstrated how SBA:

  • Increases expressiveness without full fine-tuning

  • Adapts dynamically across diffusion timesteps

  • Responds sensitively to prompt semantics

  • Preserves practical VRAM and training efficiency

SBA moves parameter-efficient fine-tuning away from static weight updates and toward dynamic, operator-level adaptation, narrowing the gap between lightweight adapters and full model retraining.


Author Statement

Author & Architect: YSNRFD

I conceived, designed, and implemented SBA, including its mathematical formulation, architectural design, memory optimizations, and practical integration into SDXL using Diffusers and PyTorch.


Keywords

Spectral Basis Adapter, SBA, LoRA, Diffusion Models, SDXL, Parameter-Efficient Fine-Tuning, Dynamic Adapters, Transformers


Files

Find all files related to SBA in Attachments section

0