<p>I've been doing a few tests with Flux training using <a target="_blank" rel="ugc" href="https://civitai.com/models/train">CivitAI's onsite training tool</a>. (<a target="_blank" rel="ugc" href="https://education.civitai.com/using-civitai-the-on-site-lora-trainer/">Documentation</a></p><p>I wanted to share the results a set of experiments with captions for a World Morph style model I created.</p><p></p><h3 id="part-2-of-this-article-zxds7kksx"><a target="_blank" rel="ugc" href="https://civitai.com/articles/7146">Part 2 of this article</a></h3><h3 id="similar-article-but-with-a-character-f9vib9n8b"><a target="_blank" rel="ugc" href="https://civitai.com/articles/6868">Similar article but with a character</a></h3><p></p><h1 id="wooly-style-flux-lora-3sslkttmp"><a target="_blank" rel="ugc" href="https://civitai.com/models/664199">Wooly Style Flux LoRA</a></h1><p><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/fe3f36fd-2e16-4b87-ac5d-b79988ad2f35/width=525/fe3f36fd-2e16-4b87-ac5d-b79988ad2f35.jpeg" />I've uploaded each version of the model as a separate version, with it's own images and such on the side. In this article, I will share a bit more of the process behind each version, and have some comparison pictures.</p><hr /><h1 id="training-settings-erjxaafpy">Training Settings</h1><p>I went with the recommended training settings from the <a target="_blank" rel="ugc" href="https://education.civitai.com/using-civitai-the-on-site-lora-trainer/">Documentation.</a></p><p>Specifically, adjusting repeats to reach ~1000 steps in the training.</p><p>I did however also go for 1024 in resolution. This seems to have worked fine for me. But so has 512 in my earlier trainings.</p><p><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/6928696e-97b0-4371-8899-e9419f91f324/width=525/6928696e-97b0-4371-8899-e9419f91f324.jpeg" /></p><pre><code>{
  "engine": "kohya",
  "unetLR": 0.0005,
  "clipSkip": 1,
  "loraType": "lora",
  "keepTokens": 0,
  "networkDim": 2,
  "numRepeats": 6,
  "resolution": 1024,
  "lrScheduler": "cosine_with_restarts",
  "minSnrGamma": 5,
  "noiseOffset": 0.1,
  "targetSteps": 1088,
  "enableBucket": true,
  "networkAlpha": 16,
  "optimizerType": "AdamW8Bit",
  "textEncoderLR": 0,
  "maxTrainEpochs": 5,
  "shuffleCaption": false,
  "trainBatchSize": 4,
  "flipAugmentation": true,
  "lrSchedulerNumCycles": 3
}</code></pre><p></p><hr /><h1 id="version-1-no-captions-6r1wymele"><a target="_blank" rel="ugc" href="https://civitai.com/models/664199?modelVersionId=743307">Version 1 - No Captions</a></h1><p><strong>Does not use any trigger word in the captions.</strong></p><p>This version is trained exactly like it sounds. With uploaded images without captions. The CivitAI training tools warns when you do this, since this may not work well for all models.</p><p>I think that this method of training works best for styles, like art styles and World Morphs. Think models where you usually want to apply the model to the entire image instead of having it be just a specific part of it.</p><p>Since there are no captions, there are no specific trigger words for the model. Instead I describe what I want using natural language. So I went with: <code>Made out of wool</code> as my "trigger word" for it. And to be clear, this is not a trigger word as in something I trained it on, but just what I prompted it with to get the effect out of the model.</p><p>This definitely brings forward my training, even though it's never seen that combination of tokens from my training.</p><p>Here are some comparisons to without and with the model. Without to the left, and with to the right.</p><p><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/db4b5255-e146-42bd-aec8-add94922e96c/width=525/db4b5255-e146-42bd-aec8-add94922e96c.jpeg" />We can see that for some images, we get a natural "felt" look without the LoRA, but with it, we certainly find our training data. Some things that are not natural to "woolify", may not get the treatment at all. Lots of comparison images are without effect without the LoRA.</p><p>Note the "<strong>woolkswagen</strong>" :D</p><p></p><hr /><h1 id="version-2-single-word-captions-9n98j06tc"><a target="_blank" rel="ugc" href="https://civitai.com/models/664199?modelVersionId=743334">Version 2 - Single Word Captions</a></h1><p><strong>Uses trigger word "w00lyw0rld".</strong></p><p>This version of the model is trained using my normal World Morph style. By having a trigger word, followed by a simple word, essentially the subject/concept of that image. For more information about this style of training, <a target="_blank" rel="ugc" href="https://civitai.com/articles/3326/lora-training-dataset-creation-bing">read this guide</a>.</p><p>I think this worked well. As it usually does. It activates fine with the trigger word.</p><p>I can see a slight degradation in understanding and effect when the prompt is very long and complex.</p><p><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/c0e80a0c-23ec-4dbf-9a4f-c532847908eb/width=525/c0e80a0c-23ec-4dbf-9a4f-c532847908eb.jpeg" />More clearly than training with no trigger words, we get no effect, or a strong effect.</p><p>This method of captioning is of course useful because it will let you isolate and make the model pay attention to applying your LoRA to the right parts of the image. For example, did you want to make the Car or the Road into being made out of wool?</p><p></p><hr /><h1 id="version-3-wd14-captions-lu4o70iy1"><a target="_blank" rel="ugc" href="https://civitai.com/models/664199?modelVersionId=743347">Version 3 - WD14 Captions</a></h1><p><strong>Uses trigger word "w00lyw0rld".</strong></p><p>This version of the model was trained using a trigger word and WD14 captions. These captions can be generated by the CivitAI training tool. When you are at the step of uploading images, you can generate captions in this style there. You can also do it using Kohya and other trainers.</p><p>I have also released a tool to help you do it quickly, by just entering the images you wish to caption in a folder, and running a script. Check out <a target="_blank" rel="ugc" href="https://github.com/MNeMoNiCuZ/joytag-batch">JoyTag-Batch Github here</a>.</p><p>I found that the effect of using this style of captioning was very strong. In many cases when I compared all 4 trained models, my preference was to the WD14 captioned ones. It seem to more often convert more of the whole image into the desired style.</p><p>Of course you have to consider that each training brings with it a lot of randomness, so maybe this was just a lucky epoch.</p><p><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/447aeb2b-63aa-4ae3-ac88-d9bf60ab7862/width=525/447aeb2b-63aa-4ae3-ac88-d9bf60ab7862.jpeg" />As with the single word caption, this one has a trigger word, so unsurprisingly there's no effect without it. Thus isolating the effect to when we want it. Overall the results are very impressive.</p><p></p><hr /><h1 id="version-4-joycaption-captions-(complex-captions)-fvva41ewr"><a target="_blank" rel="ugc" href="https://civitai.com/models/664199?modelVersionId=743615">Version 4 - JoyCaption Captions (Complex Captions)</a></h1><p><strong>Does not use any trigger word in the captions.</strong></p><p>This version uses long and complex captions to describe the training images in very high detail. The captions are generated using the <a target="_blank" rel="ugc" href="https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha">JoyCaption tool</a>. I also created a script to let you run this on all images in an /input/-folder, for easy quick tagging without needing to open Kohya or anything else. Here's the <a target="_blank" rel="ugc" href="https://github.com/MNeMoNiCuZ/joy-caption-batch/">Joy-Caption-Batch Github page</a>.</p><p>Captions are very long (perhaps overly so), and I trained them unedited, without adding any trigger word. Here's a caption example:</p><pre><code>The image is a photograph of a whimsical, hand-knitted toy airplane against a backdrop of a clear blue sky dotted with fluffy white clouds. The toy airplane, crafted in a chunky knit style, features a predominantly cream-colored body with red accents. The nose of the plane is red, while the tail is yellow with a red tip. The wings and fuselage are adorned with red stripes, and the windows are represented by small, round blue circles. The texture of the knit is evident, with a slightly rough and bumpy surface typical of hand-knitted items. The toy airplane is positioned in the center of the frame, floating in mid-air, giving a sense of flight and movement. The background is slightly blurred, emphasizing the toy airplane as the central focus. The overall style of the image is playful and nostalgic, reminiscent of vintage children's toys. The photograph captures the toy airplane in a way that highlights its detailed craftsmanship and the soft, cozy texture of the knit.</code></pre><p>This is likely too much detail, and it's a bit repetitive. But it is how the model returns the captions currently, so it's what I went with.</p><p>I feel like a slightly held back and more focused model would likely produce better results. You could always fairly easily cut it off after a few sentences, or write a script that strips it to the nearest complete sentence near a certain token amount.</p><p>To activate the model, I once again used with: <code>Made out of wool</code> as my "trigger word" for it. And to be clear, this is not a trigger word as in something I trained it on, but just what I prompted it with to get the effect out of the model.</p><p><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/bda420f9-2b86-43fb-823c-b0c65c48107a/width=525/bda420f9-2b86-43fb-823c-b0c65c48107a.jpeg" />Similar to the NoCaption training, this one has an effect even without the model loaded, since Flux is so capable of understanding natural language.</p><p>In general, I find this model to be slightly weaker than the others, but it still has a good effect and I would not be sad with the results if this was the only model I had trained.</p><p>For additional results, please read <a target="_blank" rel="ugc" href="https://civitai.com/articles/6723?highlight=455686">this thread</a> about it in my JoyCaption tool article. Your results may vary. You will need to do some testing of your own.</p><p></p><hr /><h1 id="the-dataset-ik3cq11ha">The Dataset</h1><p>The dataset for this model was generated entirely using Flux. I'm using my "One-click-dataset" workflow for ComfyUI with a new version designed for Flux use (IPAdapter removed). This works very well, as long as you caption decently and your concept is not over-trained in the model. For example, trying to get a Spider Man-style model, without getting Spider Man's face everywhere is very hard!</p><p><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/e1d432ea-6556-4f82-b5c6-c0d3a4d4f273/width=525/e1d432ea-6556-4f82-b5c6-c0d3a4d4f273.jpeg" />The individual datasets are available to download on each model's page.</p><p></p><hr /><h1 id="comparisons-3zlkyxp04">Comparisons</h1><p><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/055173f6-830e-45a3-a186-37ce9c84a0e6/width=525/055173f6-830e-45a3-a186-37ce9c84a0e6.jpeg" />Full-sized version on <a target="_blank" rel="ugc" href="https://imgur.com/a/AOt5XPh">Imgur</a>.</p><p></p><hr /><h1 id="sdxl-version-u7s3uetll"><a target="_blank" rel="ugc" href="https://civitai.com/models/664159">SDXL Version</a></h1><p>I trained 9 different versions of this dataset in SDXL. Neither one is as good as any of the Flux versions. For some reason this model has a very hard time to get the effects out even with SDXL. I had to increase the weight of the model to 1.5 to get the desired effect. I need some different training settings...</p><p><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/ca26c980-7432-48bc-91be-dd428bfc326b/width=525/ca26c980-7432-48bc-91be-dd428bfc326b.jpeg" /><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/001ad002-7f34-4af5-88c8-7fb084ad902e/width=525/001ad002-7f34-4af5-88c8-7fb084ad902e.jpeg" />For SDXL, I used the Single Word training method.</p><p></p><hr /><h1 id="trigger-word-vs.-no-trigger-word-35jc463mu">Trigger Word vs. no Trigger Word?</h1><p>I tried to use both trigger words and without trigger words for these experiments. My conclusion is that both methods work equally well. It's all about how you want to use the model in the end.</p><p>If you use a trigger word, you can more precisely activate the learned data, such as applying the effect to a particular part of the image.</p><p>If you don't use a trigger word, you may need to figure out how to trigger and activate the learned data from the content of your trained data. In my examples here by using "made out of wool", even though this was never trained into the model.</p><p></p><hr /><h1 id="final-conclusions-tbd-6dnddqp06">Final Conclusions? [TBD]</h1><p>TBD. More versions incoming with 4 new captioning tools and methods.</p><p>What do you think? Please share relevant knowledge below. &lt;3</p><p></p><p><a target="_blank" rel="ugc" href="https://civitai.com/articles/7146">Part 2 with an actual final "conclusion" can be found here.</a></p><p></p><hr /><h1 id="but-what-about-the-buzz-unt6afvmy"><span style="color:rgb(253, 126, 20)">But what about the buzz?</span></h1><p>Yeah, it took some buzz to train this model. If you feel like you have too much, feel free to drop some for this article or unlock one of the models. Thanks for reading!</p>

CaptionThumbnail.jpg

Flux Style Captioning Differences - Training Diary

physical violence

weapon violence

wide hips

revealing clothes

downblouse

convenient censoring

pg-13

corpses

suggestive

oral invitation

pg13

sexy

huge breasts

thick thighs

sexual situations

male nudity

disturbing

male swimwear or underwear

female swimwear or underwear

partial nudity

undressed

female nudity

breasts out

exposed female nipple

breast out

lingerie

male underwear

hair over breasts

female swimwear

gigantic breasts

no panties

graphic violence or gore

covered nipples

huge butt

strapless leotard

sitting on face

emaciated bodies

one breast out

female underwear

nude

nsfw

graphic male nudity

adult toys

illustrated explicit nudity

nudity

graphic female nudity

hentai

futanari

porn

sexual intent

genitals

peeing

vore

oral

sexual activity

anal

blowjob

dildo riding

incest

hanging

hate symbols

nazi party

white supremacy

diapers

scat

self injury

hate speech

urine

extremist

child on child

latex clothing

swimwear

bukkake

fellatio

cumshot

implied fellatio

eat_cum

cumdrip

cum in pussy

cum on face

after fellatio

cum on hair

cum on body

cum on tongue

cum on hands

cum in mouth

triple fellatio

autofellatio

fucked silly

cum on pussy