I tested some things on the official Huggingface Space of SD3 made by SAI, and I noticed that the settings hinted by Emad and SAI people are totally misleading.
First of all, the suggested settings are 28 steps and 5 CFG, but they lead most of the time at body horror and bad prompt following.
In my tests, even holding a simple item like a mayo jar is totally impossible for SD3 at base settings, while you might have better luck (in my experience) with higer CFG and lower steps.
This behaviour on the end of SD3 might mean NOT undertraining but rather the opposite, alas SD3 might be overfitting in its UNET/DiT and could (I hope) fixed with training only the text encoders.
Only time will tell.