ghost
Sign In

HelloWorld Stable Cascade Early Beta

145
1.4k
23
Updated: Mar 7, 2024
base model
Verified:
SafeTensor
Type
Checkpoint Trained
Stats
1,378
Reviews
Published
Feb 14, 2024
Base Model
Stable Cascade
Hash
AutoV2
64EAEDB103
default creator card background decoration
LEOSAM's Avatar
LEOSAM
License:
This Stability AI Model is licensed under the Stability AI Non-Commercial Research Community License, Copyright (c) Stability AI Ltd. All Rights Reserved.

This model is an Early Beta version of the HelloWorld SC. Based on the training scripts provided by Stability AI official, I fine-tuned the Stable Cascade stage_c_lite model (1B version).

Please note that the current v0.1 version is the earliest test model. The main purpose is to familiarize myself with the new training process. After image generation tests, the overall performance of this model is not stable, with better effects in close-up scenes, but noticeable degradation in image quality in full-scene compositions, such as full-body shots of characters.

Here are some core data from my training that may be helpful for other model authors interested in Stable Cascade:
The v0.1 version used a total of 740 realistic training images, covering themes like portraits, science fiction, and pallas's cat. All images were tagged using our open-source GPT4V tagger. The graphics card used was one 48G VRAM RTX6000ada. The total training time was 3.5 hours.

The config file parameters are as follows:

lr: 1.5e-6
batch_size: 6
image_size: 1024
multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16, 9/21]
grad_accum_steps: 1
updates: 12500
backup_every: 2500
save_every: 500
warmup_updates: 1
use_fsdp: false
adaptive_loss_weight: True

The above parameters would occupy about 45G of VRAM during training. The official training scripts seem to be aimed at high-VRAM cards like the A100, without much VRAM optimization, so I suggest that model authors with 24G or less VRAM wait for kohya-ss's updates. Currently, even if the batch size is set to 1, fine-tuning on stage_c_lite.safetensors still requires 30G of VRAM using the official training scripts.

(Update: A netizen has optimized the official script for GPU memory usage. At present, for the stage C 1B model, fine-tuning requires only 10GB of GPU memory, which is sufficient.)

Future plans:
Once kohya-ss is updated, I plan to use the complete HelloWorld 6.0 training set to fine-tune the SDXL, Stable Cascade 3.6B, and 1B versions respectively.

I have high hopes for Stable Cascade and hope that this version can fix some of the shortcomings of SDXL in widespread use, attracting more players from SD1.5 into the new generation of SD model ecosystems.

Special thanks:

I would like to express my gratitude to Fok, the creator of the Ronghua model, for his immense help during the execution of the SC model training script. He is also optimizing and testing his own SC model; I'm looking forward to hearing good news.


本模型为HelloWorld Stable Cascade版本的早期测试模型。基于stability AI官方所提供训练脚本,我在stage_c_lite.safetensors模型(1b版本)基础上进行了微调训练。

请注意目前的v0.1版为最早期测试模型,主要目的是熟悉新版本整个训练过程。经过图像生成测试,本模型的整体发挥并不稳定,近景效果较好,全景构图如人物全身照之类,则会出现明显的画质劣化现象。

以下是我在本次训练中的一些核心数据,希望能对其他对 Stable Cascade感兴趣的模型作者有所参考:

v0.1版本整个训练共使用了740张写实训练图,涵盖人像、科幻、兔狲等主题。所有图片均基于我们的开源GPT4V tagger进行了打标,所使用显卡为单张显存48G的RTX6000ada。整个训练时间为3.5小时。

config文件中参数如下

lr: 1.5e-6
batch_size: 6
image_size: 1024
multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16, 9/21]
grad_accum_steps: 1
updates: 12500
backup_every: 2500
save_every: 500
warmup_updates: 1
use_fsdp: false
adaptive_loss_weight: True

以上参数训练时会占用约45g显存。官方训练脚本应是面向A100等大显存显卡,未做过多显存优化,建议24G及以下显存的模型作者,可以等待后续kohya训练器的支持性更新。目前官方训练脚本,即使batch size设为1,对stage_c_lite.safetensors进行微调也仍需要30G显存。

后续计划:

待kohya训练器更新后,我将使用HelloWorld 6.0的完整训练集,分别对SDXL、Stable Cascad 3.6B及1B版本进行微调训练。对于Stable Cascad我满怀期待,希望这个版本可以修复SDXL在普及使用过程中的一些不足,将更多玩家从SD1.5中吸引到新世代SD模型生态中来。

特别致谢:

感谢容华模型的作者Fok在SC模型训练脚本运行时对我的莫大帮助。他也正在优化测试自己的sc模型,期待有好消息