NOTE: This model has it's own VAE, which is baked into the model. For best results, please ensure that the selected VAE in automatic1111 is set to "Automatic". If you've never poked around in the VAE settings, this will be the default.
NextPhoto is the result of a whole lot of training, data curation, and block merging. The model is designed exclusively for the generation of photo-realistic photos, and as such it cannot generate non-photo images (even if prompted to do so). For more details about version 3.0, check out the "About this version".
All sample images were generated using ESRGAN_4x upscaling model at 2x upscaling, with 0.45 denoising strength. I'm not gonna upload a 32 bit model, as the v3 model was trained using 16bit precision, so it would literally just be a waste of space.
(highly recommended) The negative prompt is quite important for the photorealism, but you don't really have to change it ever to get great results. I'd recommend the following negative prompt as a base: (worst quality:0.8), cartoon, halftone print, burlap,(cinematic:1.2), (verybadimagenegative_v1.3:0.3), (surreal:0.8), (modernism:0.8), (art deco:0.8), (art nouveau:0.8)
This prompt uses the verybadimagenegative_v1.3 textual embedding. You'll need
Place the downloaded file into the "embeddings" folder of the SD WebUI root directory, then restart stable diffusion.
Positive Prompts: You don't need to think about the positive a whole ton - the model works quite well with simple positive prompts.
A well-lit photograph of woman at the train station
A perfect well-lit medium photograph of an old married couple sitting on their porch
A poorly lit photograph of a man walking on the trail at night
For more examples of positive prompts you can look at the sample photos for the model.
Upscaling: This model works will still generate photorealistic images without upscaling, but upscaling is strongly recommended for photorealism. You'll need to use the ESRGAN_4x upscaling model (not R-ESRGAN) in the hires fix section for decent results. Set the weight anywhere from 0.3 to 0.5 for best results, and the upscale amount to 2. I normally set my weight to 0.5 or 0.45.
Sampler: I use DPM++ 2M Karas, and generally don't stray from it. While the other samplers can still produce good results, DPM++ 2M Karas is the most consistent in my experience with this model.
For further improvements:
Reduce your CFG scale: The default classifier free guidance scale scale of 7 works good, but occasionally this can be too high. Reduce the CFG scale until you like the results - I generally bottom out at 4.0, as anything lower than that and the negative prompt starts getting ignored. Increasing the CFG scale past 7 or 8 will result in more "dramatized" photos (not in a good way), but will also result in the model listening more to the prompts, so balance as needed. High CFG scales can work well for specific situations, but lower CFG scales work great quite consistently.
Avoid excess LORA and Textual Inversion use: As v2 and v3 of this model are custom trained and not purely block merged, any LORAs or Textual Inversions may not work as well as they do in other models. Based on my experience, you can still get good results with them, but I'd recommend treading lightly - I'd recommend an additive approach where you add LORAs or inversions selectively when needed.