I see a lot of people bagging SD3 and yes its shit at making people in leotards lieing on the grass. But so is SDXL and SD 1.5
Anyway have you tried the cliptextencoderSD3? Heres and example of how it works.
Text L:
This prompt focuses on the specific details and elements related to the appearance,
eg.
one large leather bound book front left of screen, wooden table, candle, background black framed metal windows, sandstone block walls, statue knight it shining armour in the background to the right
Text G:
This prompt describes the overall scene, lighting conditions, and background environment. It sets the context
eg.
The photo is taken at night, indoors in a castle
T5xxl:
artstation, ((Photorealistic))
one large leather bound book front left of screen, wooden table, candle, background black framed metal windows, sandstone block walls, statue knight it shining armour in the background to the right, The photo is taken at night, indoors in a castle
Ultra realistic HD 8K Quality,
oh also the secret sauce is to put artstation in all your positive prompts ;)
Here is my example link
https://civitai.com/posts/3503273?returnUrl=%2Fmodels%2F497255%2Fstable-diffusion-3-sd3
Enjoy