Guide: Lora resizing.

Tldr:

Loras don't have to take up 217mb, 307mb or 1.2gb each on your hard drive. With a simple resize script, you can usually get visually indistinguishable results (90-99% similarity) in just 18-100mb.

This is a reminder that lora resizing exists and can be automated (batch script below).

Resizing is a technique which takes a trained lora, which has a standard rank on every layer, and - using certain linear algebra tricks - creates a new lora which tries to mimic the original's behaviour on each layer (like a knock-off version of the lora), by using a smaller and nonstandard rank. Because in many layers you don't actually need as high a rank as the lora was trained on, a lot of them are just sitting around pretending to look busy; if we cut them out, we can usually get a lora half to third of the size without sacrificing the important bits it had learned.

It works with any base model variants (1.5, XL, hunyuan etc) as long as they follow a typical lora structure. Virtually all civ's loras do. Here's a link to a batch script.

https://civitai.com/models/153182/resize-loras-batch-script

Most up to date script is here:

https://github.com/kohya-ss/sd-scripts/blob/main/networks/resize_lora.py

How well a lora can be compressed and how long it takes vary. The closer the final "Frobenius norm retention" value shown is to 100%, the more similar the resized version will be to the original (barring small details). Loras exported from checkpoints are not compressible without severe loss of detail, because export is essentially resizing already. You'll usually see 70% and almost no size reduction no matter what setting, but luckily there are few of those on civ. Non lora models (loha) might be uncompressible since the resizing script doesn't support their structure.

Typically, the numbers I've seen are:

1.5: <50mb with 95% retention. I don't use any settings below high. Takes about a minute to compress.

XL: ~120mb / 95% retention for high, 50-100mb / 92% retention for med, 20 - 50mb / 87 - 92% retention for low (I discard models which are below this threshold). Takes about 3-5 minutes.

Flux: 30-80mb / 95-97% for med, ~20-50mb / ~92% retention for low. It's very slow and seems to be the most taxing of the lot, 20-25 minutes. I barely use high.

Hunyuan: The current generation is very poorly optimised; Every single lora I've compressed on high was <50mb, 98-99% retention and confirmed visually identical. Takes about 10 minutes. (Food for thought: If specific layers aren't getting trained properly across many loras, we might get faster & better training if we simply omit them entirely or use other layers. I'm seeing way too many rank 2/3 mlp.fc1 / linear1 layers which are x6-7 as big as other layers, for example, that's a whole lot of dead weight.)

(03/25) Wan: Very difficult to compress, takes up 4.5gb vram and an hour by itself (40 attention blocks which have 5k weights each layer at least!), they are significantly denser than hunyuan (probably due to having less basis to rely on), reaching ~96% retention / 80-150mb after compression. I've only conducted a few tests, but overall resizing this model is not recommended. (Unless you have a spare gpu.)

(12/25) Z image turbo: Comparable with flux, slightly slower and people aren't really experimenting with rank yet. 30-50mb / 95-98% retention for either med or high, curiously. Takes 25-30 minutes.

I believe zit loras (besides sliders which are universal) are fairly pointless since the model cannot support more than a handful at an extremely low collective weight (1-2) before breaking down, but as far as compression they are good candidates. Base is equally good and more stable for lora usage.

(04/26) Klein 9B: Same speed as flux or slightly faster, the standard 158mb almost always reaches 40-70mb with 92-97% retention, recommended.

(04/26) Ltx 2 / 2.3: ~2/3 wan time (40 minutes), 95-97% retention with 32/64 dim, the big 1-2gb loras will reach 100-300mb, the smaller ones more like 20-40. For the former, I believe compression is worthwhile.

Compressing loras has a fairly small vram footprint - I've been running these concurrently with heavy gens using a small monitor script without any issues for a year or two.

Edit: Added wan.

Edit: Added zit.

Edit: Added klein, ltx.