Sign In

DPO (Direct Preference Optimization) LoRA for XL and 1.5 - OpenRail++

677
9.5k
74k
156
Updated: Dec 25, 2023
tool
Verified:
SafeTensor
Type
LoRA
Stats
6,902
68,472
Reviews
Published
Dec 24, 2023
Base Model
SDXL 1.0
Hash
AutoV2
C100EC5708
default creator card background decoration
EN
enfugue

What is DPO?

DPO is Direct Preference Optimization, the name given to the process whereby a diffusion model is finetuned based on human-chosen images. Meihua Dang et. al. have trained Stable Diffusion 1.5 and Stable Diffusion XL using this method and the Pick-a-Pic v2 Dataset, which can be found at https://huggingface.co/datasets/yuvalkirstain/pickapic_v2, and wrote a paper about it at https://huggingface.co/papers/2311.12908.

What does it Do?

The trained DPO models have been observed to produce higher quality images than their untuned counterparts, with a significant emphasis on the adherence of the model to your prompt. These LoRA can bring that prompt adherence to other fine-tuned Stable Diffusion models.

Who Trained This?

These LoRA are based on the works of Meihua Dang (https://huggingface.co/mhdang) at

https://huggingface.co/mhdang/dpo-sdxl-text2image-v1 and https://huggingface.co/mhdang/dpo-sd1.5-text2image-v1, licensed under OpenRail++.

How were these LoRA Made?

They were created using Kohya SS by extracting them from other OpenRail++ licensed checkpoints on CivitAI and HuggingFace.

1.5: https://civitai.com/models/240850/sd15-direct-preference-optimization-dpo extracted from https://huggingface.co/fp16-guy/Stable-Diffusion-v1-5_fp16_cleaned/blob/main/sd_1.5.safetensors.

XL: https://civitai.com/models/238319/sd-xl-dpo-finetune-direct-preference-optimization extracted from https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors

These are also hosted on HuggingFace at https://huggingface.co/benjamin-paine/sd-dpo-offsets/