Sign In

Guide: Lora resizing.

21

Guide: Lora resizing.

Tldr:

Loras don't have to take up 217mb, 307mb or 1.2gb each on your hard drive. With a simple resize script, you can usually get visually indistinguishable results (90-99% similarity) in just 18-100mb.

This is a reminder that lora resizing exists and can be automated (batch script below).

Resizing is a technique which takes a trained lora, which has a standard rank on every layer, and - using certain linear algebra tricks - creates a new lora which tries to mimic the original's behaviour on each layer (like a knock-off version of the lora), by using a smaller and nonstandard rank. Because in many layers you don't actually need as high a rank as the lora was trained on, a lot of them are just sitting around pretending to look busy; if we cut them out, we can usually get a lora half to third of the size without sacrificing the important bits it had learned.

It works with any base model variants (1.5, XL, hunyuan etc) as long as they follow a typical lora structure. Virtually all civ's loras do. Here's a link to a batch script.

https://civitai.com/models/153182/resize-loras-batch-script

Due to the emergence of additional trainers, the resize script requires a small adjustment in order to support the various formats. The adjustment is in this PR:

https://github.com/kohya-ss/sd-scripts/pull/2057

How well a lora can be compressed and how long it takes vary. The closer the final "Frobenius norm retention" value shown is to 100%, the more similar the resized version will be to the original (barring small details). Loras exported from checkpoints are not compressible without severe loss of detail, because export is essentially resizing already. You'll usually see 70% and almost no size reduction no matter what setting, but luckily there are few of those on civ. Non lora models (loha) might be uncompressible since the resizing script doesn't support their structure.

Typically, the numbers I've seen are:

1.5: <50mb with 95% retention. I don't use any settings below high. Takes about a minute to compress.

XL: ~120mb / 95% retention for high, 50-100mb / 92% retention for med, 20 - 50mb / 87 - 92% retention for low (I discard models which are below this threshold). Takes about 3-5 minutes.

Flux: 30-80mb / 95-97% for med, ~20-50mb / ~92% retention for low. It's very slow and seems to be the most taxing of the lot, 20-25 minutes. I barely use high.

Hunyuan: The current generation is very poorly optimised; Every single lora I've compressed on high was <50mb, 98-99% retention and confirmed visually identical. Takes about 10 minutes. (Food for thought: If specific layers aren't getting trained properly across many loras, we might get faster & better training if we simply omit them entirely or use other layers. I'm seeing way too many rank 2/3 mlp.fc1 / linear1 layers which are x6-7 as big as other layers, for example, that's a whole lot of dead weight.)

(03/25) Wan: Very difficult to compress, takes up 4.5gb vram and an hour by itself (40 attention blocks which have 5k weights each layer at least!), they are significantly denser than hunyuan (probably due to having less basis to rely on), reaching ~96% retention / 80-150mb after compression. I've only conducted a few tests, but overall resizing this model is not recommended. (Unless you have a spare gpu.)

(12/25) Z image turbo: Comparable with flux, slightly slower and people aren't really experimenting with rank yet. 30-50mb / 95-98% retention for either med or high, curiously. Takes 25-30 minutes.

Compressing loras has a fairly small vram footprint - I've been running these concurrently with heavy gens using a small monitor script without any issues for a year or two.

Edit: Added wan.

Edit: Added zit.

21