What am I doing wrong?
There can be many different reasons: different height/width, different Clip Skip, different upscaler with different denoising strength.
The Version of SD and the Checkpoint might be different (older) or the original used additional LoRA.
Finally, sometimes the example Images are just wrong.
It still looks like you get a close result so you don't need to perfectly replicate the example.
The results on model posts are often upscaled + denoised to make it look better
Also did you copy the negative prompts?
And no image would be same (even with seeds) because of GPU difference
If you want to get the exact same picture, you need to use image to image, not a text to image.
Pictures in Stable Diffution are generated on the basis of Noise Points, and every graphics card produces a different noise seed.
In the settings, the noise offset ENDD is a key parameter that affects the noise seed, which is not showed in your images of operation.
In the settings, CLIP SKIP is a key factor that affects the picture, which is not reflected in the picture you showed.
What is poor about your results? Please specify what you trying to achieve.