Chroma is a fantastic and highly versatile model capable of producing photo-like results, but it can require careful prompting. This finetune aims to improve reliability in realistic/photo-based styles while preserving Chroma’s broad concept knowledge (subjects, objects, scenes, etc.). Chroma can probably do anything this model can, but UnCanny aims to be more lenient.
Personally I'd recommend downloading the non-flash model, then you can experiment with steps, CFG, flash-lora-ranks to suit your needs. The flash version has a rank-128 lora baked in. Some example images were made using a flash or low-step lora - see settings below. GGUFs on HuggingFace.
Example Generation Notes
Prompting: For photos, simply describing what you want to see in natural sentences seems to work well. Tags seem to push the model closer towards art/anime, natural language towards photos. For photos use terms like photo or photography, but avoid photorealistic. Photorealistic might be good for photorealistic art.
Example settings (not necessarily optimal):
Workflow: Chroma template workflow in ComfyUI
Steps (flash lora): 15 works well with rank-128. Depends on flash-lora rank.
Steps (base): ~30-35
CFG (flash lora): 1 works well with rank-128. Depends on flash-lora rank.
CFG (base): ~3.5
Sampler:
res_2mScheduler:
bong_tangent
Support
Have too much money? Want to support further training?
https://ko-fi.com/dawncreates
Training Details
The model was trained locally on a medium sized collection of openly licensed images and my own photos, using Chroma-HD as the base. Each epoch included images at 3–5 different resolutions, though only a subset of the dataset was used per epoch. Except for the extra resolutions, OneTrainer's default config for 24gb Chroma finetuning was used. The dataset consists almost exclusively of SFW-images of people and landscapes, so to retain Chroma-HD's original conceptual understanding, selected layers were merged back at various ratios. All the juice and concepts comes from Chroma itself, my model just nudges it towards realism - so get to work on Chroma finetuners - it has so much potential!
I aim to continue finetuning and experimenting, but the current version has some juice.
All images were captioned using JoyCaption: https://github.com/fpgaminer/joycaption
The model was trained using OneTrainer: https://github.com/Nerogar/OneTrainer
NOTE: The original v1 had some bugged layer names - this is now fixed (as of the evening of the 31st of October). Having the wrong version shouldn't affect generation in ComfyUI - but it might affect things like training and quantization.
