A better-captioned attempt at the mst3c:FLUX model. This is also based on concepts, not characters, which may result in a more versatile style. Results from training show that it will try to add speech bubbles if no text is defined as being in the image.
May be over or under-trained.
Start at 0.58 and go up.