See Training Data for tags list
It seems that SDXL does learn the concept faster than SD1, my last bottomheavy model trained way longer with similar config. 16 epocs vs 160 epocs (but this model is under trained probably)
A lower network_dim seems to work fine, I will probably try even lower than 16 later.
Training per iteration is about ~2-3x slower on SDXL than it was for SD1, but the speed that SDXL learns the concept makes it somewhat competitive.
Didn't necessarily need text encoder training to get decent results. On SD1 I would have to train for much, much longer if I did not use text encoder training. Here it learned the concept pretty well without.