home models images videos posts articles bounties challenges events updates shop

Yolkhead's Albums

Name: Yolkhead's Albums
Rating: 0 (0 reviews)
Author: sirrece

429

Updated: May 19, 2025

base model

overfitting sdxl mixed media suno udio

Download (6.62 GB)

Verified: 5 months ago

SafeTensor

Details

Type	Checkpoint Trained
Stats	176 1,436 1.4k
Reviews	Positive (32)
Published	Jan 17, 2025
Base Model	SDXL 1.0
Training	Steps: 1,470 Epochs: 10
Usage Tips	Clip Skip: 1
Hash	AutoV2 67D81520C6

1 File

About this version

default creator card background decoration

Silver Tier Supporter Badge December 2024

sirrece

License:

CreativeML Open RAIL++-M Addendum

Free resources for AI and ML: patreon.com/yolkhead

Everything on the Patreon is 100% free thanks to some awesome member support, so if you find any of this helpful, they def had a big part in that.

All generated images should still include the metadata, so just drag them into Forge's image inspector. There is never any inpainting, and all should be directly reproducible. I often don't even use hires.fix for high resolution generations, in favor of instead just pushing cfg really high and doing it native as a form of bragging that no one really notices, but I enjoy thoroughly.

VERSIONS:

CRUSH (sdxl)

Presave the album here: https://artists.landr.com/055905712428

generation_guide--------------------

Works best with sticky negative prompt method: https://www.patreon.com/posts/sticky-negatives-119624467

I recommend starting at CFG 10-20, DPM 2a or DPM 3M SDE, and UNIFORM scheduler, step count of 15 - 60 (15 really is sufficient in many cases, sometimes you can go as low as 11). Note that this does not contain any hyper module or anything like that, its just that high cfg without artifacting broadly means accurate embeddings, which in turn means lower step counts and/or higher resolution is possible.

I haven't tested many loras with it, but from the ones I have tested loras need around 1/10th of the normal weight

/end_generation_guide---------------

This one is interesting in that it is the most generalized I've used to date. It is a simple merge, using a tree tier to build up, ultimately, to an 8 model merge using 50%/50% weights at each stage in the process.

No, I do not know what is in it. That was a part of the whole thing, namely, I did blind image testing to determine what models I would be putting into it, and I altered the names accordingly to ensure my own preconceptions wouldn't bias me in one direction or another.

Essentially, the weights in this model are going to be much closer to the "average" of user preference here, since the pool of models I tested are those highly preferred by users, and of those, I culled based upon further individual preference optimization. As such, the model's weights are not particularly strong, and a high cfg is normal to get the signal strength you want.

This means if you train on top of it, you need to set your learning rate MUCH lower. I will get into why on a video sometime, but trust me, you'll see a much better loss curve a several orders of magnitude lower than normal learning. This is related to the preservation of the original signals within the model's weights via relativism, which due to the mean reversion are very "delicate," but that's for another time when I can get to it on the Patreon.

I recommend DPM2a, DPM 3M SDE, or DDPM for this one. DPM 2a in particular benefits at high cfg, since the extra added noise is easily taken care of if you have a strong and accurate vector, and its addition basically adds useful material for correcting mistakes, while also helping avoid overfit.

PINK CONCRETE (flux)

Listen to the album here: https://open.spotify.com/album/6mb2KnxcVOIKZBzEiq2Mdg?si=EIlFSDTfSfaFJglMPttk4g

Music video for pink concrete: https://www.instagram.com/reel/DD4Ah0LObCe

I highly recommend using zer0int's finetunes of clip-L in conjunction with this, and really any, flux finetunes, as the performance uplift is frankly spectacular.

They can be downloaded here: https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-BEST-smooth-GmP-TE-only-HF-format.safetensors

And here: https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF.safetensors

There is a difference between the two of them, although which is better seems to be uncertain: its worth keeping both as you may find yourself having trouble with a particular prompt, and find that switching which clip you are using suddenly fixes the issue.

This one was built off of a process I've used in the past in SDXL fine tuning, albeit more sophisticated here in that I needed to produce much higher quality images for my dataset in order to avoid damaging the model's unet in unintended ways. In general, the higher quality the model is, the more care the dataset used to train it requires, since any "decrease" in quality can subjectively harm qualities of its original composition.

This is an overall uplift. It doesn't do NSFW the way some of the flux finetunes do, but to be fair, no flux fine tune at the moment can touch SDXL on that front, so its a moot point. My primary concern with this model was to undo a lot of the safety training on base flux to improve the unet quality and overall adherence as a starting point for future finetuning (and it seems to have worked better than anticipated).