home models images videos posts articles bounties challenges events updates shop

NextPhoto

Name: NextPhoto
Rating: 5 (2863 reviews)
Author: bigbeanboiler

2.9k

34.7k

263

Updated: Oct 5, 2024

style

photorealistic photo easy

Download (1.99 GB)

Verified: 2 years ago

SafeTensor

Details

Type	Checkpoint Trained
Stats	14,211 85,170 20
Reviews	Very Positive (280)
Published	Aug 2, 2023
Base Model	SD 1.5
Training	Steps: 10,000 Epochs: 400
Usage Tips	Clip Skip: 1
Trigger Words	photo photograph
Hash	AutoV2 1C1F913F3B

1 File

About this version

Version 3.0 is the result of hundreds of hours of training, tweaking, block merging, and refining. The training data included hand curated and hand written captions of a set of over 1000 images carefully selected to be high quality and representative. This was curated from some photos I took myself, a sample of 200 photos selected using Laion5B KNN searching, and a hand curated collection from a variety of sources, all of which were paid for.

Version 3.0 also includes a brand new VAE, trained with a custom loss metric that I developed that focuses on spectral-similarity using wavelet loss. It also uses LPIPS perceptual similarity to enhance very fine details. The new VAE has improved realism over the standard vae-ft-mse-840000-ema-pruned, though there are occasional artifacts in the form of orange highlights. These are uncommon though, and can easily be resolved through variance, slight prompt changing, a new seed, or image2image.

Final results result in the following improvements:

Significantly improved realism
Significantly improved skin-texture
Better lighting
More natural colors (v2.0 suffered a lot from color shift)
Less over-fitting than v2.0
Better subject integration when using the new VAE (less dark halos)

default creator card background decoration

bigbeanboiler

License:

CreativeML Open RAIL-M Addendum

NOTE: This model has it's own VAE, which is baked into the model. For best results, please ensure that the selected VAE in automatic1111 is set to "Automatic". If you've never poked around in the VAE settings, this will be the default.

NextPhoto is the result of a whole lot of training, data curation, and block merging. The model is designed exclusively for the generation of photo-realistic photos, and as such it cannot generate non-photo images (even if prompted to do so). For more details about version 3.0, check out the "About this version".

All sample images were generated using ESRGAN_4x upscaling model at 2x upscaling, with 0.45 denoising strength. I'm not gonna upload a 32 bit model, as the v3 model was trained using 16bit precision, so it would literally just be a waste of space.

Usage Guide

(highly recommended) The negative prompt is quite important for the photorealism, but you don't really have to change it ever to get great results. I'd recommend the following negative prompt as a base: (worst quality:0.8), cartoon, halftone print, burlap,(cinematic:1.2), (verybadimagenegative_v1.3:0.3), (surreal:0.8), (modernism:0.8), (art deco:0.8), (art nouveau:0.8)
- This prompt uses the verybadimagenegative_v1.3 textual embedding. You'll need
  - https://civitai.com/models/11772/verybadimagenegative
- Place the downloaded file into the "embeddings" folder of the SD WebUI root directory, then restart stable diffusion.
Positive Prompts: You don't need to think about the positive a whole ton - the model works quite well with simple positive prompts.
- Examples:
  - A well-lit photograph of woman at the train station
  - A perfect well-lit medium photograph of an old married couple sitting on their porch
  - A poorly lit photograph of a man walking on the trail at night
- For more examples of positive prompts you can look at the sample photos for the model.
Upscaling: This model works will still generate photorealistic images without upscaling, but upscaling is strongly recommended for photorealism. You'll need to use the ESRGAN_4x upscaling model (not R-ESRGAN) in the hires fix section for decent results. Set the weight anywhere from 0.3 to 0.5 for best results, and the upscale amount to 2. I normally set my weight to 0.5 or 0.45.
Sampler: I use DPM++ 2M Karas, and generally don't stray from it. While the other samplers can still produce good results, DPM++ 2M Karas is the most consistent in my experience with this model.
For further improvements:
- Reduce your CFG scale: The default classifier free guidance scale scale of 7 works good, but occasionally this can be too high. Reduce the CFG scale until you like the results - I generally bottom out at 4.0, as anything lower than that and the negative prompt starts getting ignored. Increasing the CFG scale past 7 or 8 will result in more "dramatized" photos (not in a good way), but will also result in the model listening more to the prompts, so balance as needed. High CFG scales can work well for specific situations, but lower CFG scales work great quite consistently.
- Avoid excess LORA and Textual Inversion use: As v2 and v3 of this model are custom trained and not purely block merged, any LORAs or Textual Inversions may not work as well as they do in other models. Based on my experience, you can still get good results with them, but I'd recommend treading lightly - I'd recommend an additive approach where you add LORAs or inversions selectively when needed.