Update 6 - 10/10/2024 5:03 am;
The training cycles have both reached the 35 epoch point and testing will begin once I get the comfy set up with the x/y image grid plot.
Update 5 - 10/9/2024 6:57 am;
The network volume ran out of space and the training can't resume from this one. I'll need to retrain the 1d version since it got stuck on epoch 21 and that's considerably lower than 35, which will require full retrain to epoch 35 in order to match this particular version.
https://civitai.com/models/803213?modelVersionId=937952
Have some fun, test it for yourself. The training settings are below.
I am currently preparing my hat with bbq sauce in the metaphorical slow cooker so it's easier to metaphorically eat (yet another reminder why I shouldn't judge a book by it's cover). The D2 turned out shockingly good, but I'm not fully convinced until the D1-og training matches the epoch point for true comparison.
Update 4 - 12:39 pm;
I'm about ready to eat my hat. The models are highly divergent and the outcome of them is looking heavily in D2's favor.
Update 3 - 9:28 am;
Tragically I had to completely restart the training. I had accidentally omitted some important images, which means both trainings are fully restarted from this time and the new step count has gone up to about 15k instead of 14k.
Update 2 - 8:21 am;
During testing of epoch 6 I noticed a serious divergence. The standard Flux 1D model fails to produce poses in multiple subjects simultaneously consistently, while the Flux1D2 model is producing full matching lineups. More training required for both.
Update 1 - 7:29 am;
The first 5/79 epochs are in and the outcome is showing promise for both models. I'd say they are both contenders and they are definitely slightly deviant. More information as time progresses.
Disabling and enabling the CLIP on the lora loader has shown some minor effect, but nothing too deviant yet. Both models have introduced the all fours pose together on the same epoch, which correlates with my original findings.
Currently running 2 trainings;
Runpod hosted Kohya_SS Flux - SD3.Flux branch
Dataset:
Simulacrum - V12 - Tags Only; 1400~ images 1024x1024
Due to the dataset being heavily modified since Consistency v32, I need to do a mirrored and identical retraining for Flux1D to ensure the solidity of the experiment.
I've rebranded the dataset to it's correct versioning within the large scale booru training sequence; Dubbed; Simulacrum-v1.7 due to it's close relation to stage 2 and the implications of the outcomes.
8 dims
16 alpha
cosine with restarts - 2 cycles
Flux1D
Seed 420
T5_xxl_fp16
Flux Lora UNLR 0.0001 TELR 0.000005
4x 4090s
ADAMW
ETA; 33 hours~ (starting 5:53 am 10/8/2024 gmt-7)
CFG Guidance 3.5
Flux1D2
Seed 420
T5_xxl_fp16
Flux Lora UNLR 0.0001 TELR 0.000005
4x 4090s
ADAMW
ETA; 33 hours~ (starting 5:53 am 10/8/2024 gmt-7)
CFG Guidance 1
The guidelines suggest that CFG 1 is the way to train this.
Estimated costs; $140~
Lets see how good this boast of better more cohesive LORAS really is, or if I'm wasting money. The standard flux diffusors code and documentation suggests that training the TE is a viable option when training loras, so I'm giving it a go for this run. We'll see how it goes.
The final output will consist of 100 images of each of the two models compared side-by-side using 10 generated captions, each caption meant to define a gradient increasing complexity while introducing the various booru tags that Simulacrum is defining for superimposition and stylistic intent.
The outcome determines my decided training rival for Schnell, which appears to have the best and most consistent outcome from Booru tag training so far, with less guided complexity due to the nature of Schnell's distillation based on timestep rather than direct guidance.
This experiment also will determine my opinion based on the direct utility with tag specific datasets, which is currently against the D2 model, but I'm a scientist. My opinion will change or fixate depending solely on the outcome of this neutral experiment. There may be another experiment without TE depending if neither model responded, or if it did too much damage to the model's context.



